How to search contents of multiple pdf files?

How could I search the contents of PDF files in a directory/subdirectory? I am looking for some command line tools. It seems that grep can't search PDF files.


Your distribution should provide a utility called pdftotext :

find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "your pattern"' ;

The "-" is necessary to have pdftotext output to stdout, not to files. The --with-filename and --label= options will put the file name in the output of grep. The optional --color flag is nice and tells grep to output using colors on the terminal.

(In Ubuntu, pdftotext is provided by the package xpdf-utils or poppler-utils .)

This method, using pdftotext and grep , has an advantage over pdfgrep if you want to use features of GNU grep that pdfgrep doesn't support. Note : pdfgrep-1.3.x supports -C option for printing line of context.


There is pdfgrep, which does exactly what its name suggests.

pdfgrep -R 'a pattern to search recursively from path' /some/path

I've used it for simple searches and it worked fine.

(There are packages in Debian, Ubuntu and Fedora.)

Since version 1.3.0 pdfgrep supports recursive search. This version is available in Ubuntu since Ubuntu 12.10 (Quantal).


Recoll is a fantastic full-text GUI search application for Unix/Linux that supports dozens of different formats, including PDF. It can even pass the exact page number and search term of a query to the document viewer and thus allows you to jump to the result right from its GUI.

Recoll also comes with a viable command-line interface and a web-browser interface.

链接地址: http://www.djcxy.com/p/65446.html

上一篇: 我如何加载现有的PDF

下一篇: 如何搜索多个pdf文件的内容?