Home >> Blog >> freeware >>

The Best Way to Read PDF in Perl

The Best Way to Read PDF in Perl

PDF files are not ASCII-based, so you cannot read a PDF file directly with basic Perl commands. But a Perl module is available that has commands you can use to read PDF file.

1.Install the CAM::PDF Module

Open a command shell with Start > All Programs > Accessories > Command Prompt.

Type “cpan” and press Enter to get the cpan prompt.

Types “install CAM::PDF” and press Enter. When the install is finished, the transcript reads “CAM::PDF is up to date (1.52).”

Exit the cpan prompt by typing “exit” and pressing Enter.

2.Writing the Perl Script

Open a text editor. If you are not sure which one to use, use Notepad. Open Notepad with Start > All Programs > Accessories > Notepad.

Enter the following code in the text editor:

#!/usr/bin/perl
use CAM::PDF; # Name of the Perl Module

my $file_name = shift;

my $pdf = CAM::PDF->new($file_name);

for my $page (1 .. $pdf->numPages()) {

my $text = $pdf->getPageText($page);
@lines = split (/\n/, $text);

foreach (@lines) {
print “$_\n”;
}
}

Save the script as pdf.pl, or another name that is meaningful to you.

At the command prompt, type “pdf.pl <file_name_of_PDF_to_read>” while inserting the file name of the PDF you want to read in the appropriate place.

Example: pdf.pl farewell_to_arms.pdf

If you want to manipulate the text as the script reads the PDF file, do so in the following loop:

foreach (@lines) {
print “$_\n”;
}

This entry was posted in freeware and tagged , . Bookmark the permalink.

Related Posts :

Category

New Post

Featured Tools

Tags

Copyright © 2010 - 2011  DoremiSoft Co., Ltd. All Rights Reserved.