If you translate PDF files from time to time, you may ask the following questions:
- Is this a text (editable) PDF or an image PDF?
- Will I be able to translate text inside images if my PDF converter or PDF editor does not support Optical Character Recognition? (this also applies to users of TransPDF online PDF translation platform)
- How much additional effort will be required to translate text on the images in this PDF?
PDF Image Extractor is a free application for Windows operating system that extracts all images from any number of PDF files and saves them to a subfolder in the same folder as the original PDF. You can then look at all the extracted images and determine the following:
- If this is an image PDF, i.e. when each extracted image represents one page.
- If this is a mixed PDF where only some pages are scanned pages.
- If some extracted images contain text to be translated.
- Amount of effort required to translate text on images.
Obtaining and opening PDF Image Extractor
Download PDF Image Extractor (current version: 1.0.0, 15 April 2020). There are two options:
- Installation program. It installs the application under Program Files, so you will need administrator privileges if you are working within a corporate network.
- Portable archive. This is a ZIP file which can be extracted to any folder. After unzipping the contents of the archive to a folder, you will need to run PdfImageExtractor executable file from within the extracted folder.
Using PDF Image Extractor
The main application dialogue allows you to add any number of PDF documents which need to be processed:
To select individual PDF files, use Add files... button. If you need to process all PDF files located in a specific folder, use Add folder... button.
To remove PDF files from the list, use Remove button to remove selected files or Clear button to remove all files displayed in the list.
To process the selected files, click Extract images... button.
When processing is completed, you will see a report:
The report contains the following information: full path to the original PDF file, whether images were found in the PDF or not, full path to the folder where the images were saved, and description of the error if a problem occurred during processing.
The program will save the images to a folder called ‘PDF_file_name_images’ where ‘PDF_file_name is the name of the processed PDF file. You will be prompted if such folder already exists for any selected PDF file.
PDF Image Extractor has the following limitations:
- Password-protected PDF files will not be processed. An error will be displayed for such files. You will need to unprotect them first before processing.
- Not all PDFs will be processed due to the color space of the images or to the way images are stored inside the PDF. If such issues arise, you will see an error in the processing report. It is recommended to use an external cloud service for extracting images from such PDFs: Extract PDF.
PDF Image Extractor is free software and distributed under AGPL license.