PDF files are not ASCII-based, so you cannot read a PDF file directly with basic Perl commands. But a Perl module is available that has commands you can use to read PDF file.
1.Install the CAM::PDF Module
Open a command shell with Start > All Programs > Accessories > Command Prompt.
Type “cpan” and press Enter to get the cpan prompt.
Types “install CAM::PDF” and press Enter. When the install is finished, the transcript reads “CAM::PDF is up to date (1.52).”
Exit the cpan prompt by typing “exit” and pressing Enter.
2.Writing the Perl Script
Open a text editor. If you are not sure which one to use, use Notepad. Open Notepad with Start > All Programs > Accessories > Notepad.
Enter the following code in the text editor:
#!/usr/bin/perl
use CAM::PDF; # Name of the Perl Module
my $file_name = shift;
my $pdf = CAM::PDF->new($file_name);
for my $page (1 .. $pdf->numPages()) {
my $text = $pdf->getPageText($page);
@lines = split (/\n/, $text);
foreach (@lines) {
print “$_\n”;
}
}
Save the script as pdf.pl, or another name that is meaningful to you.
At the command prompt, type “pdf.pl <file_name_of_PDF_to_read>” while inserting the file name of the PDF you want to read in the appropriate place.
Example: pdf.pl farewell_to_arms.pdf
If you want to manipulate the text as the script reads the PDF file, do so in the following loop:
foreach (@lines) {
print “$_\n”;
}