Perl – How to do a full-text search of PDF files from Perl


I have a bunch of PDF files and my Perl program needs to do a full-text search of them to return which ones contain a specific string.
To date I have been using this:

my @search_results = `grep -i -l \"$string\" *.pdf`;

where $string is the text to look for.
However this fails for most pdf's because the file format is obviously not ASCII.

What can I do that's easiest?

There are about 300 pdf's whose name I do not know in advance. PDF::Core is probably overkill. I am trying to get pdftotext and grep to play nice with each other given I don't know the names of the pdf's, I can't find the right syntax yet.

Final solution using Adam Bellaire's suggestion below:

@search_results = `for i in \$( ls ); do pdftotext \$i - | grep --label="\$i" -i -l "$search_string"; done`;

Best Solution

The PerlMonks thread here talks about this problem.

It seems that for your situation, it might be simplest to get pdftotext (the command line tool), then you can do something like:

my @search_results = `pdftotext myfile.pdf - | grep -i -l \"$string\"`;