Pdf to text windows command line

I need to use command line switches to execute the ‘Save as Text’ command. Ideally, I want to:

use a command line switch to open a PDF
use a command line switch to convert the PDF to a text file by mimicking the ‘Save as Text’ command.
use a command line to close the PDF.

Is this possible? If so, then does anyone know how to do this?

Mark

28.5k7 gold badges60 silver badges90 bronze badges

asked Jul 28, 2009 at 19:09

Maybe you can try this: https://github.com/luochen1990/nodejs-easy-pdf-parser

It is a npm package and you need to install nodejs (and npm) to use it.

It can be used as a command line tool:

npm install -g easy-pdf-parser
pdf2text test.pdf > test.txt

And this tool will sort text lines by their y coordinates, so it works great at most case. And it also works well with unicode and cross platform.

answered Jul 14, 2018 at 9:14

luochen1990luochen1990

3,6031 gold badge20 silver badges36 bronze badges

Don’t use CMD; use AutoIt. Very easy to do and takes a few lines

Run("file.pdf")
winwait("Adobe")
send(?);; whatever commands necessary to save as text
send("{enter}")
send("!{F4}")

Matthias

7,3636 gold badges56 silver badges88 bronze badges

answered Jun 25, 2012 at 8:34

I don’t understand why you’d not want to use free software (not freeware), pdftotext is the ideal solution.
However, if you just want to actually open and save the PDF in an automated fashion using the Windows GUI, you could use vbscript and the sendkeys command.

Just use pdftotext though, it would be much more reliable and won’t cost you a whole box.

answered Jul 28, 2009 at 20:55

Gareth DavidsonGareth Davidson

4,7922 gold badges25 silver badges44 bronze badges

I think the below VBscript should do the trick. It will take all .pdf files in a given folder location and save them as .txt files. One major bummer is it only works if your machine is not locked since it uses the SendKeys command. If anyone has a solution that works while a computer is locked, please send it my way!

Set objFSO = CreateObject("Scripting.FileSystemObject")
objStartFolder = "PATH_OF_ALL_PDFS_YOU_WANT_TO_CONVERT_HERE"
Set objFolder = objFSO.GetFolder(objStartFolder)

Set colFiles = objFolder.Files
For Each objFile In colFiles
  extension = Mid(objFile.Name, Len(objFile.Name) - 3, 4)
  file = Mid(objFile.Name, 1, Len(objFile.Name) - 4)
  fullname = objFSO.BuildPath(objStartFolder, objFile.Name)
  fullname_txt = objFSO.BuildPath(objStartFolder, file + ".txt")

  Set objFSO = CreateObject("Scripting.FileSystemObject")

  If extension = ".pdf" And Not objFSO.FileExists(fullname_txt) Then
      WScript.Echo fullname
    Set WshShell = WScript.CreateObject("WScript.Shell")
    WshShell.Run """" + fullname + """"
    WScript.Sleep 1000
    WshShell.SendKeys "%"
    WScript.Sleep 100
    WshShell.SendKeys "f"
    WScript.Sleep 100
    WshShell.SendKeys "h"
    WScript.Sleep 100
    WshShell.SendKeys "x"
    WScript.Sleep 300
    WshShell.SendKeys "{ENTER}"

    count = 0
    'this little step prevents the loop from moving on to the next .pdf before the conversion to .txt is complete
    Do While i = 0 And count < 100
      On Error Resume Next
      Set fso = CreateObject("Scripting.FileSystemObject")
      Set MyFile = fso.OpenTextFile(fullname_txt, 8)
      If Err.Number = 0 Then
        i = 1
      End If
      count = count + 1
      WScript.Sleep 20000
    Loop
  End If
Next

answered Dec 22, 2015 at 22:03

AcieAcie

485 bronze badges

Источник

PDF2TXT

Version 4.0
November 20, 2015
Copyright 2005 — 2015 by Jamal Mazrui
GNU Lesser General Public License (LGPL)

Description
Installation
Choosing PDF Source and TXT Target
Text Extraction Settings
Viewing Area
[Toggling between a File and Folder List](#toggling-between-a-file-and-folder List)
[Configuration Check Boxes](#configuration-check Boxes)
Action Buttons
URL Source,
Hot Keys
The Log File
Command Line Operation
File Association
Change Log
Development Notes

Description

PDF to TXT — also written as PDF2TXT — is a free program for converting files in Portable Document Format (.pdf extension) to plain text(.txt extension). The program lets you convert multiple files in a single, batch operation, either from a GUI dialog or a console-mode command line. The resulting text files can be read in almost any editing or viewing program. PDF2TXT, itself, also includes a plain text view for reading PDF files. The program should work on any version of Windows.

Installation

The installation program for PDF2TXT is called PDF2TXT_setup.exe. When executed, it prompts for an installation folder for the program. The default folder is c:PDF2TXT. Although this is not a standard location for programs on a Windows computer, benefits include fewer keystrokes to type when entering paths to .pdf source or .txt target files, as well as the ability to put files in subfolders of the program without needing administrative rights. If you want a standard installation folder, however, respond to the prompt by entering c:Program Files (x86)PDF2TXT.

The installation process creates a program group for PDF2TXT on the Windows start menu, containing choices to launch PDF2TXT, read Documentation for PDF2TXT, and uninstall PDF2TXT. Also created is a desktop shortcut with an associated hot key, enabling PDF2TXT to be conveniently launched by pressing Control+Alt+Shift+P. Another shortcut is placed in the Send To folder so that a PDF may be viewed in PDF2TXT via the context menu in Windows Explorer.

Choosing PDF Source and TXT Target

After PDF2TXT is installed, launching it activates a main dialog with several capabilities and settings. First, it prompts you to select a PDF source. This can be either a single PDF file or a folder containing multiple PDF files (another section explains how it can also be an Internet URL). In the initial edit box, you can type the full path to the file or folder desired. Alternatively, you can tab to buttons that invoke different sub dialogs depending on whether you want to choose a file or folder as the PDF source. (Yet another option, described later, is to pass the path to the PDF source as a parameter on the command line when pdf2txt.exe is launched.)

By default, the PDF source is the folder c:PDF2TXTpdf. Any source may be chosen, however, and the program remembers the last one used.

Similarly, an edit box and associated button let you specify the target folder for converted files. These will have the same base name, but an extension of .txt instead of .pdf. The default target folder is c:PDF2TXTtxt. Note that the PDF source may be either a file or folder, but the TXT target is always a folder.

Text Extraction Settings

Two settings fundamentally affect how text is extracted from a PDF. If the PDF requires a password to unlock its content, type it in the edit box provided. If the PDF is an image format without textual characters — e.g., the result of a scan — mark the checkbox so that optical character recognition (OCR) is performed instead of the usual techniques of extracting text. This OCR technique is also separately available at https://github.com/JamalMazrui/pdf2ocr.

OCR is a much slower and more error-prone process, but it may be the best option when the usual methods do not work. This technique uses Google Tesseract, the best open source OCR available, which is not as good as commercial OCR packages. Due to technical issues, there is not a simple way of aborting an OCR process that has already started — you have to close the PDF2TXT program.

Another checkbox lets you produce several more text files as output, corresponding to the following where text.pdf is the input:

test_meta.txt = metadata about the PDF such as the authoring tool, page count, image-only status, and Tagged status for accessibility
test_urls.txt = URLs extracted from the PDF, listed one per line
test_gettext.txt = text version produced by the gettext.exe utility
test_xpdf.txt = text version produced by the pdftotext.exe utility

text_miner.txt = text version produced by the PDFMinor library in the pdf2tag.exe utility

test_tag.txt = text version with markup of accessibility tags, produced by the pdf2tag.exe utility

test.htm = HTML version produced by the pdf2tag.exe utility

The Extra Outputs choice is primarily intended for diagnostic purposes, e.g., determining whether a PDF was produced with accessibility in mind or determining which text version is the most readable if the default test.txt result is unsatisfactory. A webmaster could post an HTML alternative to a PDF. The conversion translates visual aspects of the PDF such as fonts, but not structural elements such as headings, unfortunately.

Viewing Area

Within the main dialog, a read-only, multi-line edit control serves as a viewing area between the source and target controls just discussed. This scrollable view can show one of three kinds of information: (1) the text of a PDF, 2) a list of PDF files, or (3) the results of a batch conversion. The label for the viewing area changes to indicate the kind of information being shown: «View file,» «View folder,» or «View results.»

You can navigate the viewing area with standard windows keystrokes, e.g., Control+Home or Control+End to go to the top or bottom of text. Control+F lets you search forward for a string of characters, and Control+Shift+F lets you search backward. F3 searches for the same string again in the forward direction, and Shift+F3 searches again backward. Control+G lets you go to a percent completion point through the file being viewed. Control+K sets a bookmark for the file, Control+Shift+K clears it, and Alt+K goes to it.

You can press Shift with arrow keys to select text or Control+A to select all. Alternatively, you can press F8 to set the starting point of a selection, navigate to the ending point desired, and then press Shift+F8 to select the text between these points.

Press Control+C to copy selected text to the clipboard. Alternatively, press Control+Shift+C, or Alt+F8, to copy and append to the clipboard, adding to rather than replacing its existing text. A form feed or page break character (ASCII code 12) will separate each clip copied there. Control+F8 is a shortcut that copies all text in the viewing area without having to select it first, equivalent to Control+A followd by Control+C.

If you invoke the Open button and choose a PDF from its sub dialog, the text of the PDF will be placed in the viewing area, and keyboard focus will go there. If you invoke the Select button to choose a PDF folder instead of a file, its list of PDFs will be shown. A status bar at the bottom of the dialog indicates the current position in the viewing area.

Toggling between a File and Folder List

The Look button behaves in a special way when the viewing area has focus. If you press Alt+L when in the viewing area, PDF2TXT will toggle between a folder and file view. If viewing a folder, PDF2TXT will switch to a view of the file that was on the line containing the caret. If viewing a file, PDF2TXT will switch to a view of the folder that contained the file. In addition, PDF2TXT will automatically search for the name of the file last viewed and place the caret just after it if found.

This feature lets you easily explore the PDFs in a folder, one after another. Initially, You might display a list of files by pressing Alt+L when the PDF source is a folder. You can then arrow down through the list until you find a PDF you want to view. At that point, press Alt+L to view the file. When you want to continue exploring the folder list again, press Alt+L to return to it at the position of the file you last viewed.

Configuration Check Boxes

Four check boxes let you configure PDF2TXT. The one labeled «Include subfolders,» will look for PDF files not only in the specified folder, but in subfolders under it. For example, you could probably convert many PDF files on your computer by checking this setting and specifying the c: root folder as the PDF source! This setting is unchecked by default.

The check box labeled «Move PDF when done» will transfer a PDF to a subfolder called «Done» after a successful conversion. This is a subfolder of the PDF2TXT program folder, with a default location of c:PDF2TXTdone. The benefit of this check box is that PDF files are stored away for backup after they have been converted to text. This setting is unchecked by default.

The checkbox labeled «Replace TXT if found» determines whether to skip a conversion if a corresponding target file already exists. If you do not check the setting to move source files when done, you may want to check this setting so that unnecessary time is not spent on repeatedly converting PDF files left in the source folder, since they then will be skipped if corresponding target files already exist. This setting is checked by default.

The Append check box determines whether a detailed conversion log file is newly created each time a conversion is run. This setting is checked by default so that previous information is not lost. A section below further describes the log file.

Action Buttons

The remaining controls of the main dialog are buttons that perform various actions. The Convert button is the default: the one that will be activated by pressing Enter on any control except another button. The viewing area will show the results of a batch conversion. This information includes the number of pages in each PDF converted. It also indicates when a conversion was either not possible or was skipped because the target file already existed and you chose not to replace files.

Press Escape if you need to abort a batch conversion of many files that is taking too long! Note that this program is relatively quick, however, compared to other available methods of converting PDF files to text. Moreover, its batch mode feature lets you run conversions unattended.

The source for a conversion is treated differently if the viewing area has focus. If viewing a list of PDFs in a folder or on a web page, then PDF2TXT regards the source as the file name on the current line (the one containing the caret). Thus, you can cursor to a PDF of interest and press Enter to convert it to text. If successfully converted, PDF2TXT assumes you may also want to examine its content in the viewing area, so a Look command is automatically performed as well (see below). If there is a conversion error, however, PDF2TXT leaves the error message in the viewing area. If you have been examining a list of PDFs and decide you want to convert them all rather than a single file, navigate to the top line of the viewing area that lists the number of PDFs in the list, and then press Enter.

If the source edit box already specifies what you want to view, or a path is easy to type into it, then the Look button is quicker to use than the Open or Select sub dialog. Activating the Look button takes the current source specification and goes to a view of either the text of a source file or the list of a source folder, putting focus in the view area so you can read the information. If viewing a PDF, its metadata is displayed before the body.

The Defaults button restores the default configuration settings of PDF2TXT. You can use it to return to the initial folders and checkbox settings.

The Explorer button lets you browse the source, target, or done folder with Windows Explorer. It allows you to examine files that either have been converted or would not convert—thus needing other approaches to access their content.

The Quit button closes PDF2TXT. Alt+F4 does the same thing.

The Help button displays this complete documentation in the default web browser. For context-sensitive help on a particular control, press F1 when it has focus. Hence, you can tab through the dialog and press F1 on each control to learn how to use it.

URL Source,

If you are connected to the Internet, you can specify a URL as a PDF source instead of a file or folder on your local computer. The URL can be the complete download path to a PDF on the Internet. Alternatively, the URL can be the path to a web page containing one or more links to PDF files. You can use Internet Explorer to navigate to such a web page and then invoke the «Grab URL» button to put its URL into the source edit box of PDF2TXT.

The Look button works with a URL source similarly to a local file or folder. For example, you can press Alt+L to view a list of PDFs on a web page. The toggling feature, described above, is also supported, allowing you to consecutively examine the PDFs linked to a web page. If you view a PDF on the Internet, PDF2TXT will automatically download a copy to the PDF subfolder of the program folder, e.g., to c:PDF2TXTpdf.

The Convert button also works with a URL source. Thus, you can easily convert all PDFs on a web page with a single batch operation.

Hot Keys

Almost all controls of PDF2TXT are directly usable with unique, mnemonic Alt key combinations based on the initial letter of the control’s label. Thus, as you become familiar with the controls, you can operate them more quickly with hot keys rather than navigating to them with the tab key or mouse. For example, press Alt+P to go to the edit box for typing a PDF source, or Alt+S to select a source folder from a tree view of your computer. Press Alt+L to look at a file or folder, or Alt+V to red what is already in the viewing area. Press Alt+I to toggle the «Include subfolders» setting, or Alt+D to restore all defaults. The text extraction settings in the second row of controls use a letter corresponding to the second syllable or word, i.e., Alt+W for the Password edit box and Alt+F for the Image Format checkbox.

The Log File

The conversion log file is named log.txt and located in the Done subfolder of the PDF2TXT program folder. It records information about each attempt to convert a PDF2TXT file. It indicates whether the conversion succeeded (meaning any resulting text), and then lists metadata of the PDF, including security settings that could explain a failed conversion.

There is a choice to view the log file in the PDF2TXT program group off the Start Menu. You can also get to the file via the Explore button of the PDF2TXT program, choosing the Done folder to navigate with Windows Explorer. Additionally, you can open the file in another application through its direct path (default settings):
c:PDF2TXTdonelog.txt

If the log file grows larger than you want, simply delete it or uncheck the setting that configures PDF2TXT to append to an existing log file. Each use of the Convert button would then generate a new log file. This information is more detailed than the results placed in the viewing area.

Command Line Operation

The pdf2txt.exe executable may be run with various command line parameters. The parameters can set values for controls in the main dialog. Parameters can also cause PDF2TXT to run in an automatic, console mode—without a dialog box or further user intervention involved.

When the .pdf extension is associated with the PDF2TXT program (explained in another section), Windows Explorer or Internet Explorer will open a PDF file by launching PDF2TXT with the name of the PDF passed as a parameter on the command line. If PDF2TXT is launched with more than one command line parameter, however, the program will assume you want to run it in console rather than GUI mode. The syntax for parameters is described as follows. If a PDF source file, folder, or URL is specified, it must be the first parameter. If a TXT target folder is specified, it must be the second parameter. The source or target must be enclosed in quotes if its name contains spaces.

All parameters besides source and target names begin with a space and forward slash (/), followed by the hot key letter in the dialog corresponding to the setting affected. A trailing plus (+) sign in the parameter indicates a status of On, and a minus (-) sign indicates Off. The plus sign can also be omitted to indicate On. Capitalization does not matter. Here is a list of parameters:

a = Automatic, console mode (use /a- to force GUI mode with multiple parameters)
i = Include subfolders
m = Move PDF when done
r = Replace TXT if found
d = Default settings (no /d- is defined)
g = Grab URL as source from Internet Explorer (no /g- is defined)

For example, to convert all files using default settings except for the Move setting, you could enter:
PDF2TXT /d /m

To use current settings except grab a URL as source, enter:
PDF2TXT /a /g

To convert files from a temporary folder to the current folder, enter:
PDF2TXT "c:temp files" .

To do the same, but in GUI rather than console mode, enter:
PDF2TXT "c:temp files" . /a-

For greater console mode convenience, another version of PDF2TXT, having the abbreviated name p2t.exe, is also available in the program folder. This version only runs in console mode, whether zero, one, or more parameters are specified. It uses «standard output» to display conversion results. The shorter executable name means less characters to type on the command line. For example, to run a batch conversion in console mode using the current settings of PDF2TXT, you could simply enter
p2t

Like DOS commands generally, the above assumes that you have either made c:PDF2TXT the current directory or included it in a PATH statement.

File Association

The PDF2TXT group on the Start Menu contains shortcuts for changing what program automatically opens a file with a .pdf extension in Windows Explorer. If you decide that you like the interface of PDF2TXT enough to make it the default program for PDF files, you can set the file association accordingly. Later, if you decide you want to return to the conventional association, you can do that, too.

When the .pdf extension is associated with PDF2TXT, an application such as Windows Explorer when opening a file, or Internet Explorer after downloading a file, will pass the name of the PDF as a command-line parameter to pdf2txt.exe. When the program is launched in this way, it automatically invokes the Look button, placing text of the PDF in the viewing area and putting keyboard focus there.

Change Log

Version 3.6 on November 15, 2015
Recompiled with later versions of PowerBASIC, QuickPDF, and Tesseract. Changed source documentation to Markdown.

Version 3.5 on February 6, 2012
Updated Tesseract utility for OCR. Updated QuickPDF library. Used that library rather than GhostScript to convert from PDF to .tif files for Tesseract. The result is considerably better OCR quality.

Development Notes

I welcome comments and suggestions on PDF2TXT. For the technically curious, I developed it with the PowerBASIC programming language from http://PowerBASIC.com and a couple of third party libraries: EZGUI from http://cwsof.com and QuickPDF from http://QuickPDF.com.

Some free, third-party utilities are included in the PDF2TXT program folder:

pdftotext.exe in the xpdf package
gettext.exe from kryltech.com. Since that website may be down, the license for the program is included in the docs subfolder of the PDF2TXT program folder.
pdf2tag.exe, an adaptation of the pdf2txt.py script included with the PDFMiner library

Some status messages are spoken with the Windows (SAPI) speech engine or with the JAWS, System Access, or Window-Eyes screen reader if active. These direct speech messages are produced with APIs via a component of the SayIt distribution, which is also available seperately at https://github.com/JamalMazrui/SayIt.

The PowerBASIC code to PDF2TXT, itself (but not commercial libraries used), is open source under the GNU Lesser General Public License (LGPL), documented at http://gnu.org. Source code files are located in the source subfolder of the PDF2TXT program folder.

Ideas and feedbak from the discussion list program-l@FreeLists.org aided the original design and testing of PDF2TXT. The latest installer is available at http://EmpowermentZone.com/PDF2TXT_setup.exe.

You can download it with the Elevate Version hotkey, F11. This checks whether a newer version is available, and offers to install it.

Источник

VeryDOC PDF to Text Converter either can be used to convert PDF to text by software interface or convert PDF to text by command line. In this article, I will show you how to use the command line version.

First, download PDF to Text Converter

When downloading, you will find it is an exe. Please double click the exe file and follow the installation message to install it on your computer. The installation may take a few seconds.
When installation finishes, please go to the installation folder and find pdf2txt.exe.

Second, find parameter list and usage

Please call pdf2txt.exe in MS Dos Windows and press –h or –? on the keyword then you will see the parameter list.

PDF2TXT <input PDF file> [output TXT file] 
	[-logfile] [-open] [-space] [-html] [-format] [-silent] [-blankline] 
	[-summary] [-zoom <num>] [-?] [-h]
	<input PDF file>: Open an existing PDF file to convert.
	[output TXT file]: Write to TEXT file, the default is same filename of 
	input PDF file.
	[-first <page number>]: Specify the first page number.
	[-last  <page number>]: Specify the last page number.
	[-logfile]: Write log to "C:pdf2txt.log" file.
	[-open]: Auto open the text file after it be created.
	[-space]: Auto insert spaces into text file.
	[-html]: Output to a HTML file, not a text file.
	[-format]: Keep the page layout in the generated TXT file.
	[-silent]: Disable error and warning messages.
	[-blankline]: Auto delete blank line in the generated TXT file.
	[-summary]: Get PDF document summary.
	[-zoom <num>]: Set zoom ratio, the range is from 50 to 200.
	[-unicode]: 
	Create unicode (UTF-8) encoding text file.
	[-?]: Help.
	[-h]: Help.

Third, convert PDF to text by command line

Please run the conversion in compliance with the usage and examples.
Examples
When convert single PDF to text, please refer to the following command line. You do not need to specify the output folder, the conversion can be done in a few seconds. And even if there is no log information, the conversion has been done.
This command line software supports wild character when you do the batch conversion, please refer to the following command line. If you need to specify the output folder, please specify the output file path at the end of the command line.

C:>PDF2TXT C:*.pdf
C:>PDF2TXT C:*.pdf C:*.txt
C:>PDF2TXT C:test*.pdf C:test*.txt

And there are more functions, I can not list all of them here. In the VeryDOC knowledge base, we will publish more articles about its function. Please pay more attention to our website. Now let us check the conversion effect from the following snapshot.During the using, if you have any question, please contact us as soon as possible.

Rating: 0.0/10 (0 votes cast)

Источник

PDF2TXT is an easy to use software tool. Here is a user manual that will help to get started with the product. Find here user and command line interface overview.

Run PDF2TXT to convert PDF to plain text

Run PDF2TXT by double-clicking on it’s icon on the desktop.
Now you should add some files to convert. Click “Add File(s)” button and select some .pdf file.
Note: the destination folder is now the folder of your source file. You can change it to other one if you want.
Click “Convert to text” button.
Once the file was converted, you can open the destination folder by clicking button with folder icon (on the right of the destination folder)

Interface Overview

File menu

New conversion. This option clears the list of PDF files to start new conversion.
Exit. Quit the program.

Edit menu

Add File(s). Use this option to add some PDF file to the list;
Remove. Remove selected PDF file from the list;

Convert menu

Convert to text. This command converts PDF files from the list to the text.

Help

Manual – open PDF2TXT manual;
Home Page – visit PDF2TXT home page;
Obtain Support – contact AKS-Labs with support query;
Order Online – order PDF2TXT on-line;
Enter registration code – enter registration code here to unlock the software;
About – about About window

Toolbar

New conversion. This option clears the list of PDF files to start new conversion;
Add File(s). Use this option to add some PDF file to the list;
Remove. Remove selected PDF file from the list;
Order Online – order PDF2TXT on-line;

File list

File list contains the list of PDF files to be converted. The list contains name of file, it’s size, path and modification date;

Bottom pane

Save to folder. This text box is to specify the destination folder;
Convert to text. This button allows to convert PDFs from the list to plain text files;
Folder button. Click this button to open the destination folder;

Command Line Interface for batch conversion

To use PDF2TXT in command line mode use the following syntaxes:

pdf2txt.exe source.pdf destination.txt

Источник

Right Protect your privacy and data security (online converter needs uploading).
Right Convert PDFs in BATCH.
Right Quickly select source: just drag-and-drop your files.
Right NO downloading needed, store result in your local PC directly.
Right Handy converter: run at any time, even no network is okay.
Right Support Command Line Interface: improve the productivity if you’re good at programming.
Right Support system level context menu.
Right Reasonable & affordable license fee, and enjoy FREE lifetime support.
Right Intuitive, practical and compact interface, genuine and familiar PDF RED.
Right 100% CLEAN: NO Ad, NO Bundle, NO Virus, NO Spyware, just for BETTER.

PDF to Text is used to quickly convert PDF documents to plain txt format files in batch mode. It works without Adobe Acrobat or Adobe Reader, and has Command Line Interface (CLI), fast and accurate conversion ability, friendly interface, small size.

It retains the original text, format and layout (as much as possible) in the output text files.

Also, PDF to Text supports converting the PDF files that have some restrictions, such as «Content Copying», «Saving as Text» are not allowed.

If you want to let your documents management system supports PDF search or want to extract the text from PDF files, it might be useful.

Supports Deutsch, English, Español, Français, Italiano, Magyar, Nederlands, Português (Brasil), Slovenščina, Türkçe, Русский, 简体中文, 繁體中文, 日本語, 한국어, ไทย.
Get Free License via Translation!

Deutsch

PDF to Text unterstützt schnelle Konvertierung von PDF Documenten in reine Textdateien. Arbeitet ohne Adobe Acrobat oder Adobe Reader, besitzt ein Command Line Interface (CLI), schnelle und genaue Konvertierung, freudliches GUI, kleiner Platzbedarf. Lässt Original-Text, Format und Layout (möglichst) unverändert in der Ausgabetextdatei. PDF to Text unterstützt die Konvertierung von PDF-Dateien mit Beschränkungen wie «Content Copying», «Saving as Text», «Page Extraction». Falls Ihr Dokumenten Management System PDF-Suche unterstützt oder Sie Text aus PDF-Dateien extrahieren wollen, dann ist dieses Programm hilfreich.

Hauptmerkmale

Unterstützt Command Line Interface (CLI).
Schnell, genau, klein und ansprechende Benutzeroberfläche.
Arbeitet ohne Adobe Acrobat oder Adobe Reader.
Unterstützt schnelle Konvertierung von PDF Documenten in reine Textdateien.
Lässt Original-Text, Format und Layout (möglichst) unverändert in der Ausgabetextdatei.
Unterstützt Konvertierung PDF in Textdatei im Batch Modus.
Konvertiert alle Seiten einer PDF-Datei in eine einzige reine Textdatei.
Unterstützt die Konvertierung von PDF-Dateien mit Beschränkungen wie «Content Copying», «Saving as Text» nicht erlaubt.

Español

PDF a texto se usa para convertir velozmente documentos PDF a archivos de texto por lotes. Funciona sin Adobe Acrobat/Reader, cuenta con interfaz de línea de comandos (CLI), conversión rápida y precisa, interacción amigable GUI, pequeño tamaño. Además, PDF a texto soporta la conversión de los archivos PDF que tengan algunas restricciones, tales como «Copia de contenido», «guardar como texto», «Página extracción» que no están permitidas. Podría ser útil, si desea que su sistema de gestión de documentos PDF sea compatible con la búsqueda o extracción de texto en archivos PDF.

Características principales

Soporta interfaz de línea de comandos (CLI).
Interfaz rápida, exacta, pequeña y agradable.
Trabaja sin Adobe Acrobat o Adobe Reader.
Soporta la conversión de documentos PDF en archivos de texto sin formato.
Conserva el texto original, el formato y el diseño (lo más posible) en los archivos a guardar.
Soporta la conversión de PDF a TXT por lotes.
Convierte todas las páginas de un archivo PDF en un archivo de texto sin formato.
Soporta la conversión de los archivos PDF que tienen algunas restricciones, tales como «Copia de contenido», «guardar como texto» que no están permitidos.

Français

PDF to texte est utilisé pour convertir rapidement des documents PDF vers des fichiers de texte brut en mode batch. Il fonctionne sans Adobe Acrobat ni Adobe Reader et a une interface graphique conviviale et une interface en ligne de commande (CLI), capacité de conversion rapide et précise. Il conserve le format et la dispostion du texte original (autant que possible) dans les fichiers convertis en texte brut. En outre, PDF to Text prend en charge la conversion des fichiers PDF qui présentent des restrictions, telles que «Copie du contenu», «Enregistrer comme texte», «Extraction de Page» lorsqu’elles ne sont pas autorisées. Que vous vouliez laisser votre système de gestion de vos documents supporter la recherche de PDF ou que vous vouliez extraire du texte à partir de fichiers PDF, il peut vous être utile.

Principales caractéristiques

Prend en charge l’Interface en ligne de commande (CLI).
Rapide, précis, petit avec une Interface graphique agréable.
Fonctionne sans Adobe Acrobat ni Adobe Reader.
Supporte la conversion de documents PDF en fichiers de texte brut.
Conserve le format et la disposition du texte original (autant que possible) dans les fichiers texte de sortie.
Supporte la conversion de PDF en TXT en mode batch.
Convertit toutes les pages d’un PDF en un seul fichier de texte brut.
Prend en charge la conversion des fichiers PDF qui présentent des restrictions, telles que la «Copie de contenu» ou «Enregistrer comme texte», non autorisées.

Italiano

PDF to Text è in grado di convertire rapidamente i documenti PDF in file di testo normale in gruppi. Senza usare Adobe Acrobat o Adobe Reader, anche con interfaccia da riga di comando (CLI), conversione veloce e precisa, dimensioni ridotte. Mantiene il formato e la struttura del testo originale durante la conversione (per quanto possibile. Inoltre, è possibile convertire i file PDF che hanno restrizioni, come «Copia Contenuto», «Salvataggio come Testo», «Estrazione Pagina» non sono consentiti. Il programma gestisce i documenti PDF, supporta la ricerca ed estrae il testo dai file PDF.

Caratteristiche principali

Supporta interfaccia da Riga di Comando (CLI).
Interfaccia veloce, preciso, piccolo.
Funziona senza Adobe Acrobat o Adobe Reader.
Supporta la conversione dei documenti PDF in file di testo normale.
Mantiene il testo originale, formato e struttura (per quanto possibile) nei file di testo di uscita.
Supporta la conversione da PDF in TXT in gruppi.
Converte tutte le pagine di un file PDF in un unico file di testo semplice.
Supporta la conversione dei file PDF che hanno alcune limitazioni, come ad esempio «Copia dei Contenuti», «Salvataggio come Testo» non sono consentiti.

Magyar

A PDF to Text PDF-dokumentum egyszerű szövegfájlba konvertálására való kötegelt módon. Adobe Acrobat vagy Adobe Reader nélkül és parancssorból (CLI) is használható. Gyors és alapos konvertálást végez. Barátságos GUI, kis méret. Megőrzi a szöveges célfájlban az eredeti szövegformátumot és elrendezést (amennyire lehetséges). Átkonvertálja a korlátozásokat tartalmazó PDF-fájlt is — «Tartalom-másolás», «Mentés szövegként», «Lapkibontás. Ha szeretne a dokumentumkezelő-rendszerrel PDF-keresést végezni vagy zeretné kibontani a szöveget a PDF-fájlból, ez hasznos lehet.

Kulcstulajdonságok

Parancssoros felület (CLI) támogatása.
Gyors, pontos, apró és barátságos felület.
Adobe Acrobat vagy Adobe Reader nélkül használható.
A PDF-dokumentumot egyszerű szövegfájlba konvertálja.
Megőrzi a cél szövegfájlban az eredeti szöveget, formátumot és elrendezést (amennyire lehetséges).
Kötegelten is átkonvertálja a PDF-et szöveggé.
A PDF összes lapját EGYETLEN egyszerű szövegfájllá alakítja.
Átkonvertálja a korlátozásokat tartalmazó PDF-fájlokat — «Tartalom másolása», «Mentés szövegként».

Nederlands

PDF naar tekst gebruik om PDF-documenten te converteren naar lege tekst bestanden in groep mode. Het werkt zonder Adobe Acrobat of Adobe Reader, en heeft Command Line Interface (CLI), snel en nauwkeurig conversievermogen, GUI, klein formaat. Het heeft de oorspronkelijke tekst formaat en weergave (zoveel kan) in de tekstbestanden. Steunt het omzetten van de PDF-bestanden met beperkingen zoals inhoud kopiëren en opslaan als tekst zijn onmogelijk. Als u wilt dat uw documentmanagementsysteem ondersteunt PDF zoeken of wilt u de tekst uit PDF-bestanden uit te pakken, kan het nuttig zijn.

Belangrijkste kenmerken

Ondersteunt Command Line Interface (CLI).
Snel, nauwkeurig, klein, en gebruiksvriendelijke interface.
Werken zonder Adobe Acrobat of Adobe Reader.
Ondersteunt het converteren van PDF-documenten naar platte tekst bestanden.
Behoudt de oorspronkelijke tekst, het formaat en de lay-out (zoveel mogelijk) in de output tekstbestanden.
Ondersteunt het omzetten van PDF naar TXT in groepmode.
Converteert alle pagina’s van een PDF in EEN leeg tekst bestand.
Ondersteunt converteren van PDF-bestanden die restricties hebben zoals “inhoud kopiëren”, “opslaan als tekst” niet toegestaan.

Português (Brasil)

PDF para Texto converte rapidamente documentos PDF para arquivos de texto simples em lote. Ele funciona sem o Adobe Acrobat ou Adobe Reader, e tem uma Interface de Linha de Comandos (CLI), conversão rápida e precisa, interface amigável e tamanho pequeno. Ele mantém o texto original, formato e layout (o máximo possível) nos arquivos convertidos. Além disso, suporta a conversão de arquivos PDF que tenham restrições, quando «Copiar conteúdo», «Salvar como texto» e «Extração de páginas» não são permitidos. Se você quiser deixar o seu sistema de gerenciamento de documentos suporta pesquisa PDF ou deseja extrair o texto de arquivos PDF, ele pode ser útil.

Principais Características

Suporta Interface de Linha de Comandos (CLI).
Rápido, preciso, pequeno e amigável interface.
Trabalha sem o Adobe Acrobat ou Adobe Reader.
Suporta a conversão de documentos PDF para arquivos de texto simples.
Mantém o texto original, formato e layout (o máximo possível) nos arquivos de texto de saída.
Suporta conversão de PDF para TXT no modo de lote.
Converte todas as páginas de um PDF em um arquivo de texto simples.
Suporta a conversão de arquivos PDF que têm algumas restrições, quando «Copiar conteúdo», «Salvar como texto» não são permitidos.

Slovenščina

‘PDF to Text’ se uporablja za hitro, serijsko pretvorbo PDF dokumentov v datoteke z golim besedilom. Dela brez programa Adobe Acrobat/Reader, ima vmesnik ukazne vrstice, hitro in natančno pretvorbo, prijazen GUI, je majhen. Ohrani izvorno besedilo, obliko in postavitev (kolikor je mogoče) v izhodnih TXT datotekah. Podpira tudi pretvorbo PDF datotek z omejitvami za ‘Kopiranje vsebine’, ‘Shrani kot besedilo’ in «Razširjanje strani». Če želite, da vaš sistem za upravljanje dokumentov podpira iskanje PDF ali želite izvleči besedilo iz PDF datotek vam bo to morda koristno.

Ključne funkcije

Podpira vmesnik ukazne vrstice (CLI).
Hiter, natančen, majhen in prijazen vmesnik.
Delo brez programa Adobe Acrobat/Adobe Reader.
Podpira prtvorbo PDF dokumentov v datoteke z golim besedilom.
Ohrani izvorno besedilo, obliko in postavitev (kolikor je to mogoče) v izhodnih besedilnih datotekah.
Podpira serijsko pretvorbo PDF v besedilo (TXT).
Pretvori vse strani v PDF v eno datoteko z navadnim besedilom.
Podpira tudi pretvorbo datotek PDF, ki imajo omejitve kot sta nedovoljeni ‘Kopiranje vsebine’ in ‘Shrani kot besedilo’.

Türkçe

PDF to Text toplu iş modunda PDF dosyalarını hızla düz metin dosyasına çevirmek için kullanılır. Adobe Acrobat veya Adobe Reader olmadan çalışır ve Komut Satırı Arabirimi (CLI), hızlı ve doğru dönüşüm yeteneği, GUI dostu, küçük boyutu vardır. Bu çıktı metin dosyalarında (mümkün olduğunca) orijinal metin, biçimini ve düzenini korur. Ayrıca, Metin PDF böyle «İçerik Kopyalama», «metin olarak kaydetme» gibi bazı kısıtlamalar var PDF dosyalarını dönüştürme destekler, «Sayfa Ekstraksiyon» izin verilmez. Siz belgelerin yönetim sistemi için PDF aramalarını destekler izin verir veya PDF dosyalarından metin ayıklamak istiyorsanız, bu yararlı olabilir.

Temel özellikleri

Komut Satırı Arabirimi (CLI) destekler.
Hızlı, doğru, küçük ve arayüz dostu.
Adobe Acrobat veya Adobe Reader olmadan çalışır.
PDF belgelerini düz metin dosyalarına dönüştürmeyi destekler.
Çıktı metin dosyalarında (mümkün olduğunca) orijinal metin, biçimini ve düzenini korur.
PDF to TXT, toplu modunda dönüştürmeyi destekler.
PDF tüm sayfalarını düz bir metin dosyası içine çevirir.
Bazı kısıtlamalar var PDF dosyalarını dönüştürmeyi destekler, örneğin «metin olarak kaydetme» «İçerik Kopyalama» olarak izin verilmez.

Русский

PDF to Text используется для быстрого преобразования PDF-файлов в текстовые файлы в пакетном режиме. Приложение не требует наличия Adobe Acrobat или Adobe Reader, обладает простым интерфейсом, малым размером и поддерживает работу с командной строкой. В ходе преобразования из PDF в текст, приложение сохраняет исходную структуру текста (по возможности) и поддерживает PDF-файлы с определенной степенью защиты (кроме файлов с защитой содержимого). Данное приложение будет полезно, если вы захотите использовать индексирование PDF-документов в вашей системе или вам требуется получить тексты из PDF-файлов.

Ключевые особенности

Поддерживает работу с командной строкой (CLI).
Поддержка командной строки, дружелюбный интерфейс, малый размер, точность и быстрота.
Работает без необходимости установки Adobe Acrobat или Adobe Reader.
Поддерживает преобразование PDF-файлов в текстовый формат.
Сохраняет исходную структуру текста (по возможности).
Поддерживает преобразование PDF-файлов в текст в пакетном режиме.
Трансформирует все содержимое PDF-документов в простой текстовый файл.
Поддерживает Поддерживает преобразование PDF-файлов с определенной степенью защиты, при этом файлы с защитой содержимого и запретом на копирование текста не поддерживаются.

简体中文

PDF to Text 用来以批量方式快速转换 PDF 文档为纯文本文件。不必安装 Adobe Acrobat 或 Adobe Reader 即可工作，且拥有命令行接口（CLI）、快速而精确的转换能力、友好的图形用户界面（GUI）、小巧等特点。它会在输出文本文件中保留 PDF 文件的原始文本、格式及布局（尽可能多）。此外，支持转换有所限制的 PDF 文件，例如不允许拷贝内容、保存为文本及页面提取的 PDF。如果您打算让您的文档管理系统支持 PDF 查询或提取 PDF 文件中的文字，本产品将会很有用。

主要特色

支持命令行接口（CLI）。
快速、精确、小巧、界面友好。
无须安装 Adobe Acrobat 或 Adobe Reader 即可工作。
支持转换 PDF 文档为纯文本文件。
在输出文本文件中保留 PDF 文件的原始文本、格式及布局（尽可能多）。
支持批量转换 PDF 为文本。
转换 PDF 所有页到一个纯文本文件。
支持转换有所限制的 PDF 文件，例如不允许拷贝内容及保存为文本的 PDF。

繁體中文

PDF to Text 用來以批量方式快速轉換 PDF 文檔為純文字檔。不必安裝 Adobe Acrobat 或 Adobe Reader 即可工作，且擁有命令列介面（CLI）、快速而精確的轉換能力、友好的圖形化使用者介面（GUI）、小巧等特點。它會在輸出文字檔中保留 PDF 檔的原始文本、格式及佈局（盡可能多）。此外，支援轉換有所限制的 PDF 檔，例如不允許拷貝內容、存儲為文本及頁面提取的 PDF。如果您打算讓您的文檔管理系統支援 PDF 搜詢或提取 PDF 檔中的文字，本產品將會很有用。

主要特色

支援命令列介面（CLI）。
快速、精確、小巧、介面友好。
無須安裝 Adobe Acrobat 或 Adobe Reader 即可工作。
支援轉換 PDF 文檔為純文字檔。
在輸出文字檔中保留 PDF 檔的原始文本、格式及佈局（盡可能多）。
支援批量轉換 PDF 為文本。
轉換 PDF 所有頁到一個純文字檔。
支援轉換有所限制的 PDF 檔，例如不允許拷貝內容及存儲為文本的 PDF。

日本語

PDF to Textは、PDFファイルをプレーンテキストファイルに一括変換するために使用されます。 Adobe AcrobatやAdobe Readerがなくても動作し、CLI（コマンドラインインターフェース）、迅速・正確な変換、フレンドリーなGUI、小さいサイズなどの特徴を持っています。オリジナルのテキスト、フォーマット、レイアウトは、出力テキストファイルで可能な限り維持されます。しかし、「コンテンツのコピー」、「テキスト形式で保存」などを制限するPDFファイルには対応していません。文書管理システムが、PDFの検索をサポートするようにしたり、PDFファイルからテキストを抽出したい場合に使用できます。

基本機能

CLI（コマンドラインインターフェース）。
迅速・正確な小さなサイズのフレンドリーなインターフェース。
Adobe AcrobatやAdobe Readerがなくても動作。
PDF文書をテキストファイルに変換。
出力テキストファイルで可能な限りオリジナルのテキストとフォーマット、レイアウトを維持。
PDF -> TXT一括変換をサポート。
PDFのすべてのページを一つのテキストファイルに変換。
「コンテンツのコピー」、「テキスト形式で保存」などを制限するPDFファイルは変換不可。

한국어

PDF to Text는 PDF 파일을 일반 텍스트 파일로 일괄변환하는 데 사용됩니다. Adobe Acrobat이나 Adobe Reader가 없어도 작동하며 CLI(명령 줄 인터페이스), 신속정확한 변환, 친화적인 GUI, 작은 크기 등의 특징을 가지고 있습니다. 원본텍스트와 형식, 레이아웃은 출력 텍스트 파일에서 될수록 유지됩니다. 그러나 «콘텐츠의 복사», «텍스트형식으로 저장» 등을 제한하는 PDF 파일은 지원하지 않습니다. 문서관리시스템이 PDF 검색을 지원하도록 하거나 PDF 파일로부터 텍스트를 추출하려는 경우에 사용할 수 있습니다.

주요 기능

CLI(명령 줄 인터페이스).
신속정확한 작은 크기의 친화적인 인터페이스.
Adobe Acrobat이나 Adobe Reader가 없어도 작동.
PDF 문서를 일반 텍스트 파일로 변환.
출력텍스트 파일에서 가능한 한 원본텍스트와 형식, 레이아웃을 유지.
PDF ->TXT 일괄변환을 지원.
PDF의 모든 페이지를 하나의 일반 텍스트 파일로 변환.
«콘텐츠의 복사», «텍스트형식으로 저장» 등을 제한하는 PDF 파일은 변환 불가능.

ไทย

โปรแกรม PDF to Text ใช้เพื่อแปลงเอกสาร PDF เป็นไฟล์ข้อความธรรมดาในโหมดชุดงาน ทำงานได้โดยไม่ต้องใช้ Adobe Acrobat หรือ Adobe Reader มีอินเตอร์เฟซบรรทัดคำสั่ง รวดเร็ว แม่นยำ ส่วนติดต่อผู้ใช้แบบกราฟิกที่เป็นมิตร และ ขนาดเล็ก มันใช้รูปแบบและเค้าโครงของไฟล์ต้นฉบับมากที่สุดเท่าที่จะเป็นไปได้ นอกจากนี้ยังสนับสนุนการแปลงไฟล์ PDF ที่มีข้อจำกัดย่างเช่นการคัดลอกเนื้อหา ห้ามการบันทึกเป็นข้อความ และ การดึงหน้าออก ถ้าคุณต้องการให้ระบบการจัดการเอกสารของคุณสนับสนุนการค้นหาคำไหนไฟล์ PDF หรือต้องการถอนข้อความออกจากไฟล์ PDF อาจเป็นประโยชน์ คุณลักษณะสำคัญ: รองรับ Command Line Interface (CLI) อินเทอร์เฟซที่รวดเร็ว แม่นยำ ขนาดเล็ก และ ใจง่าย ทำงานโดยไม่ใช้ Adobe Acrobat หรือ Adobe Reader รองรับการแปลงเอกสาร PDF เป็นไฟล์ข้อความธรรมดา รักษารูปแบบและเค้าโครงจากต้นฉบับในไฟล์เอาต์พุต รองรับการแปลงไฟล์ PDF เป็น TXT แบบชุดงาน แปลงไฟล์ PDF ที่มีหลายหน้าเป็นไฟล์ TXT ไฟล์เดี่ยว สนับสนุนการแปลงไฟล์ PDF ที่มีข้อจำกัดบางอย่างเช่น ห้ามคัดลอกข้อความ หรือ ห้ามบันทึกเป็นข้อความ

Version Release / Update Date	Features / Improvements	Bug Fixes
16.0 Jul 24, 2021	Important (I): greatly optimized the source code to improve converting efficiency. Optimized internal efficiency.	I: when scrolling the vertical/horizontal scroll bar, the position of the «Open PDF/Result» buttons that associated to the selected row will be seriously confused. If the computer uses a non-Gregorian calendar (such as Buddhist calendar, Islamic calendar, etc.), the valid license may be judged as expired.
15.1 May 09, 2021	Lets the <Add PDF files…> button get focus after started program (friendlier). Optimized internal efficiency.	Important (I): the associated controls of selected row may be misplaced when resizing the main window. I: piracy will be falsely reported on a very small number of computers.
15.0 Apr 25, 2021	Important (I): greatly improved the converting speed. I: can sort the [Size] column accurately. I: greatly improved the opening speed for each interface. Can display program logo in system «Apps & features» list. Optimized Turkish texts. Optimized the official website. Optimized internal efficiency.	I: the list no longer flickers when adding/removing PDF files in batch.
14.0 Dec 22, 2020	Important (I): can display the converting progress, previous versions only show you a prompt — «Converting now, please wait…». I: can display the up/down arrows on the sorted column header. Can open the PDF location(s) by context menu. Can open the output location(s) by context menu.
13.1 Nov 15, 2020	Important (I): made the resizable main interface. Optimized the self-service — «Lost license key?». Optimized internal efficiency.	I: has not completely deleted the system context menu after uninstalled it.
13.0 Nov 10, 2020	Important (I): now it can deal with the Unicode PDF paths/names and output paths in any language, such as Hindi, Korean, Russian, etc. In previous, it can only process the ASCII paths/names and your local Unicode characters, and to process local Unicode characters, you need to set «system locale» as your local language. Optimized the data access algorithm for the main list. Now can show the update history directly if the new version is available and you chose to view the updates (in previous versions, just only jump to the ‘Update History’ segment, but has not expanded it, you need to expand it manually to see the update history). Removed the «Add to Quick Launch Bar» option from the installer, because from Win Vista, it has no meaning.	I: the licensed copy may be judged as trial version in CLI if you call the CLI before verifying process is finished.
12.1 Nov 04, 2020	Optimized the license verifying module. Optimized internal efficiency.	Important: the viewing file icons are not displayed properly when adjust the corresponding column size.
12.0 May 11, 2020	Important (I): plays a sound when finished, and you can set your favorite sound. I: can automatically create the custom output path if it is not existing. I: optimized the interface: neater and more unified. I: optimized the software texts for all the 16 languages. Added «Update history» function to the «Help» menu. Center-align the icons of the 3 picture buttons on the upper right corner. Can display the update history for Traditional and Simplified Chinese. Optimized internal efficiency.
11.0 Jan 21, 2019 Download This Version	Important (I): supports system level context menu. I: fully supports native ไทย. Uses the exact Golden Ratio for the size of main interface. Optimized «Request support…» related features. Can remove the whole PTT program folder after uninstalled it. Changed «Get/retrieve your license (free)» menu item to «Lost license key?» to eliminate ambiguity. Changed «Microsoft Internet Explorer» string to «Microsoft Web Browser», because Windows 10 uses Microsoft Edge, for consistency, now calls them as «Microsoft Web Browser». Optimized internal efficiency.
10.0 Aug 18, 2017 Download This Version	Important (I): fully supports native Slovenščina. I: no longer ask you to view the result(s) if there is no any successful conversion. Optimized the Redistributable CLI. Optimized the appearance of help menu. Optimized software texts.	I: if you use non-English OS, you MAY see the unreadable no-English texts when install it. I: in a few languages, click «Help -> Command line interface» menu item will go to the non-existent CLI documentation segment. Fixed a little text errors for JA, KO and NL languages.
9.0 Jan 22, 2017 Download This Version	Important: can remember the output settings. Important: you can add PDF files by double-click blank area of the listbox, it’s a new friendlier design. Important: added the icons to the context menu of the listbox. Added the «Video tutorial» menu item to the «Help» button. Optimized official webpages. Uses the special purchase page for PTT, simpler and clearer.	Important: if the characters count of PDF’ text is less than 800, you MAY get an error message during converting (but it’s successfully converted to text). In the trial version, when you click <Convert all> button, it will show the limitation and ask you buy a license, but if you clicked <Cancel> on the prompt dialog box, it will do converting still.
8.0 Oct 08, 2016	Added the «Information» icon to the «About…» interfaces of CLI and Redistributable CLI, made it friendlier. Optimized the appearance of the main interface — no longer displays the surplus blank. Optimized the «view output/PDF» icons (their locations are lower than others in the previous versions). Optimized the official website. Optimized the -? argument for CLI: points to the really existing segment of the help document. Optimized the installer, simplified the installed steps. Changed the trial limitation from «16 days» to «800 characters».	Fixed a latent bug — pressing <Enter> on the system dialog boxes may open the selected PDF file.
7.0 May 22, 2016	Important: the list or main interface supports sorting data. Important: fully supports native 日本語. Important: fully supports native 한국어. Optimized the English texts of software (interfaces and message boxes), especially, standardized the initial capital. Added the proper icon to all system message boxes, and all of these message boxes support pressing <Esc> to close. Optimized some texts for Italiano and Deutsch languages. Optimized the web pages of PDF to Text.	Important: if you selected Español language during installing, the interface will become 繁體中文 when first run.
6.0 May 22, 2016	Important: fully supports native Русский. Important: fully supports native Nederlands. Important: uses the Segoe UI font which is used in Vista ~ Win10 and highly recommended by Microsoft for all interfaces, makes PTT to be more beautiful and friendlier greatly. Optimized the color of license type and «license to…» labels for about interface. Changed the splitter color from black to gray for about interface to make it friendlier. Changed the format of all screenshots on the website from GIF to PNG, in order to let you see the original and most clear appearance of the product. Optimized the text format of license agreement for installer, let it to be easier to read. No longer uses the bold texts on the main interface, replaced them as colorful texts, more readable.
5.0 Apr 01, 2016	Important: fully supports native Español. The [Size] column supports comma style. Optimized internal efficiency. Closed the automatic new version checking feature of PTT Redistributable CLI, in order to avoid the confusing for your end users. Optimized the «request remote support» feature — links to company site directly, in order to avoid clicking the mouse twice and jumping the page again. Use the new company logo on the «About…» interface. Optimized some texts for current 9 languages. Changed Français (France) to Français. Because it is the standard Français, usually do not need to specify the country.
4.1 Dec 15, 2015	Important: fully supports native Italiano.	Fixed a little text errors in license agreement of the installer. Fixed a little errors of Français (France) and Magyar texts.
4.0 Oct 07, 2015	Important: perfected UI effect for any system DPI setting, in any supported Operating Systems (system DPI setting: in Windows 7, it can be changed at «Control Panel -> Appearance and Personalization -> Display -> Make text and other items larger or smaller»). Optimized two English texts.
3.6 Sep 09, 2015	Important: supports Windows 10. Important: fully supports Türkçe. Optimized some Deutsch texts. Optimized a Traditional Chinese text.	Press <Enter> on system dialog boxes will open the selected PDF file in the list. The texts of Traditional Chinese installer are too small in Windows 10.
3.5 Aug 17, 2015	Important: fully supports Deutsch.
3.4 Aug 16, 2015	Important: fully supports native Français (France). Optimized one Traditional Chinese language item.
3.3 Aug 07, 2015	Important: fully supports native Magyar. Optimized English and Português (Brasil) texts. Optimized the user interfaces: unified font name and size, adjusted the size and location of some controls. Uses balloon tooltips. Added company logo to the «About…» interface, and links to company homepage.	The right border of «Output path» is covered. In the extreme case, the language choosing menu may cause a serious internal error.
3.2 Jul 20, 2015	Important: supports Command Line Interface (CLI). Important: supports multi-language. Important: fully supports native Portuguese (Brazil), 简体中文 and 繁體中文. Important: now the installer supports multi-language also, and let you choose when install. Even, PTT can directly use the language that selected during installing. Important: uses Golden Ratio for the main interface and its start position. Important: supports shortcut keys for all operations on the main interface. Supports to open the selected PDF by pressing <Enter> key. Moved all actions which need to run the default eMail client to the online webpages, in order to avoid the confusion if who has not a default eMail client. Optimized the source code to improve efficiency. Shortened the executable file name («PDF to Text.exe») as «PTT.exe», in order to facilitate Command Line call. Optimized the location of the drop-down menus on the main interface. Allows you to run multiple instances. Optimized the user interface. Open translation interface for multi-language version. Published as shareware (because now supports Command Line Interface (CLI)).	Important: this function cannot work: double-click an item to open the corresponding PDF file.
3.1 Aug 15, 2013	Supports Unicode filename and path.
3.0 Nov 24, 2012	Published as freeware. Supports to drag and drop PDF files to the listing.	Do not quit adding when selected a duplicate PDF file, just skip it.
2.2 Aug 12, 2012	Supports converting the PDF files that have some restrictions, such as «Content Copying», «Saving as Text», «Page Extraction» are not allowed.
2.1 Feb 22, 2012	Supports viewing the output plain text files directly after converted.
2.0 Oct 10, 2011	Supports opening the original PDF files in the list. Gives prompt if there are converted files.
1.2 Mar 29, 2011	Supports customizing the output path.
1.1 Nov 05, 2010	Adds <Open Result> button to the selected item if it has already been converted.
1.0 Jun 08, 2010	New release.

1. Easy Way

Just click the related buttons on the right to share it to Facebook, Twitter, Google+, etc.

2. Manual Sharing

You can also manually share it by using the following text in your email or other places.

Subject: Recommend Software — PDF to Text: Convert PDF to Plain Text Files in Batch Mode (from ‘your name here’)

Content:
Hi,

I’m using PDF to Text, it is used to quickly convert PDF documents to plain text files in batch mode. It works without Adobe Acrobat or Adobe Reader, and has friendly interface, small size, accurate and fast conversion ability.

You should have a test!

Official page: https://www.pdf-helper.com/pdf-to-text/
Direct download: https://www.pdf-helper.com/files/pdf-to-text.zip

amazingly fast

Installed on Windows 7 machine. Registered fine with no problems. Tested on a multi-page text only pdf. Was amazingly fast and opened in notepad. I often have need of portions of articles for research. Nice little program.

By wyndham @ Feb 22, 2020

Accurate and fast converter

Accurate and fast converter. Appreciate this software, thank you very much.

By David Roper @ Oct 12, 2019

actually works well

This PDF converter is one that actually works well. The box to drop PDF files into is easy to use, too. The default destination for the changed TXT files is exactly where you had the PDF files. You can also change the destination easily if desired. There is no reason NOT to get this GEM for your tool box. It simply WORKS. Finally we have one to use.

By nameshaker @ Sep 12, 2018

Interesting software

Interesting software, thanks!

By gerrymar @ Nov 29, 2017

This is a great little program and does what it claims.

This is a great little program and does what it claims. Editing in text is now simple and from this stage conversion to your favour format is endless. This is a keeper.

nice app

Installed and registered without problems on a Win 10 Pro 64-bit system. A small interface opens, you can choose several languages, add a PDF and change this. In a single column, text does what it claims, it does what it claims also in a multi column text. A small utility for a quick PDF to TXT conversion in simple PDF structures, a useful little helper.

By fatherted @ Jun 21, 2015

Great another converter

Great another converter, just what I needed to add to my collection.

By Paulo Neto (BR) @ Feb 15, 2014

That’s great!

I found in your PDF to TXT program something that others have not. A faithful conversion of a PDF file For example I have a file with two columns of text (journal type), and the conversion he kept the two columns. That’s great!

By ILoveFreeSoftware.com @ Sep 27, 2013

neat and clean PDF converter

Overall, this PDF to Text Converter offers a simple and easy way to allow editing of text through a text editor. It also supports PDF files which have restrictions on them such as ‘Content Copying’ and ‘Saving as Text’ which are not allowed. It is a neat and clean PDF converter that provides basic, yet excellent functionality and ease of use

By Softpedia.com @ Jul 26, 2012

Convert multiple PDF files to plain text format with the aid of batch processing operations offered by this handy piece of software

PDF to Text is a small software application whose purpose is to help you convert PDF files to plain text file format using batch processing operations.
User-friendly layout
The tool implements an intuitive behavior, so even less experienced users can easily discover and tweak its functions. Files can be added in the working environment using the built-in browse button or drag-and-drop support.
What’s more, you can view information about each PDF, such as file path, size, and status, remove the selected items from the lists, and clear the entire workspace with just one click.
Conversion options
PDF to Text gives you the possibility to save the converted items to the source folder or specify a user-defined saving directory. It is important to mention that the program is not able to process password-protected PDF files.
Additionally, the application asks you if you want to check out the output directory at the end of the task. It provides support for batch operations, which means you can process multiple files at the same time.
Performance
Since it doesn’t require much computer knowledge to set up the dedicated parameters, you can learn to master the process in no time. During our testing we have noticed that PDF to Text carries out a task quickly and provides very good output results. It doesn’t eat up a lot of CPU and memory, so the overall performance of the computer is not hampered.
Bottom line
All things considered, PDF to Text offers a straightforward software solution and comes bundled with basic features for helping you convert PDF files to plain text file format using batch processing operations.

By BitsDuJour.com @ Nov 09, 2011

Convert PDF Documents to Plain Text Files

The PDF format is great for communicating documents but sometimes you just need to work with the text. Copying and pasting sometimes works, and sometimes produces a load of gibberish. The best way to get your hands on the text of a PDF file is by using today’s discount software promotion, PDF to Text!
PDF to Text lets you change a PDF document to plain text file, with support for the conversion of multiple files in batch. With PDF to Text, you’ll be able to get at the core text, which you can then use in other applications. Plus, you’ll be pleased to know that, for multiple-page PDF files, one conversion will turn all of those pages into a single plain text file, so there’s no need to change each page individually. There’s even support for command line interface input.
You’ll find that PDF to Text is an invaluable tool for working with PDF files that have restrictions on “Save as Text” and content copying. Just load up the PDF, hit a button, and then go on your way with the text that you need!

Источник

OCR to Any Converter Command Line

OCR software is used to make the text of a scanned document accessible.
Essentially, OCR software identifies text characters to make the document
searchable and editable. To use OCR software, you simply scan a text file and
run the OCR. The process is fully automatic and only takes seconds, leaving you
with a completely searchable and editable document.

OCR to Any Converter Command Line is a Windows Command Line (Console)
application which can be used to batch convert scanned PDF, TIFF and Image files
(JPEG, JPG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM) to editable Word, Excel,
CSV, HTML, TXT, Pure Text Layer PDF, Invisible Text Layer PDF, etc. formats. OCR
to Any Converter Command Line includes a great Table Recovery Engine, all table
contents in scanned PDF, TIFF and Image files can be recognized as table objects
and inserted into Word, Excel, HTML, Text, CSV, etc. formats.

OCR to Any Converter Command Line is the best command line software for OCR
recognition. OCR to Any Converter Command Line has been generally recognized as
the most accurate English OCR program, and it also supports OCR in over 60 other
languages. OCR to Any Converter Command Line can conveniently be run through the
command line, if that is what you prefer, so you have the flexibility that you
need. To test it out, download the free trial version of OCR to Any Converter
Command Line and you can begin using it right away.

https://veryutils.com/dl.php/ocr2any_cmd.zip

OCR Console: Integration of text recognition, barcode recognition and
scanner support into your applications without programming effort. Ideal for
server applications, repetitive tasks and embedded systems.

Efficient conversion of business-critical documents
Convert image documents and PDF files into editable digital formats, directly
from a scanner or from image files. Save them in a multitude of formats: DOCX,
PDF, XML and many more. Ready for editing, sharing and archiving.

Batch conversion with automatic document separation
Divide batches of scanned, multi-page documents into individual documents by the
number of pages or by barcodes or separation words. The ideal preparation for
later archiving.

Fully automatic, embeddable processing without interaction
Add OCR functionality to existing business solutions. Automatically call
parameterized processes from within your document workflow. Ideal for unattended
processing on servers and within batch scripts.

Highest accuracy

OCR Console is based on the best OCR engine and delivers the best
precise results.
Highly accurate recognition and layout preserving formatting including
tables, numberings and graphics.

Easy integration

Developed as command line program for easy integration
Quick embedding without programming knowledge
No user interaction required, fully automatable
One-click processing with user-defined batch files

High Flexibility

Control over all functions with command parameters
Compatible with almost all scanners
Conversion of scanned documents and PDF files
Recognition of barcodes (1D and 2D)
Renaming of files according to recognized key words or barcodes
Batch division

Output formats for every scenario

PDF as platform independent document standard
PDF/A 1-3 for long-time archival
TIFF and JPEG for space-consuming storage
DOCX, XLSX, PPTX, RTF, TEXT and further Office formats for later editing

OCR to Any Converter Command Line supports following command line options,

X:ocr2any_cmd]ocr2any.exe
VeryPDF OCR to Any Converter Command Line v5.3
——————————————————-
Description:

Convert text based PDF files to plain text files.
Convert scanned PDF files and image files to plain text files and searchable
PDF files by OCR technology.
Convert embedded fonts in PDF file to a new searchable PDF file.
Keep color during PDF, TIFF and image formats to searchable PDF files
conversion.
Deskew, Despeckle and Noise Removal, Auto-Orientation, Dithering, Black
Border Removal.
Use Enhanced OCR Technology to convert Scanned PDF, TIFF and image files to
RTF, DOC, TXT, CSV, Excel, HTML formats.
Create MS Excel document in several layouts.
PDF to Excel Converter: Convert tables from PDF and image files to Microsoft
Excel spreadsheets.
PDF to HTML Converter: Convert your PDFs to high quality reflowed HTML while
preserving styles, tables, etc.
Table Recovery: Superior reconstruction of bordered and borderless tables as
table objects, with formatting, in Word & HTML.

Input formats:

Text based PDF files
Scanned PDF files
Scanned single page and multi-page TIFF files
Scanned JPEG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM files

Output formats:

Plain text files without layout
Plain text files with layout
Plain text based PDF files (PDF is contain text only)
Attach OCRed text layer to original PDF file
OCRed BW PDF files with hidden text layer
OCRed Color PDF files with hidden text layer
OCRed Grayscale PDF files with hidden text layer
Output to TIFF, PNG, BMP, TGA, GIF with Deskew, Despeckle, etc. options
Scanned PDF, TIFF and image files to RTF format
Scanned PDF, TIFF and image files to DOC format
Scanned PDF, TIFF and image files to Tab Text format
Scanned PDF, TIFF and image files to CSV format
Scanned PDF, TIFF and image files to MS Excel format
Scanned PDF, TIFF and image files to HTML format
Extract X1, Y1, X2, Y2 coordinates for each character
Extract X1, Y1, X2, Y2 coordinates for each Word

——————————————————-
Usage: ocr2any.exe [options] [PDF-file]
[Text-file]

-firstpage [int]   : first PDF page to convert

-lastpage [int]    : last PDF page to convert

-res [int]         : set resolution, the
unit is DPI (default is 300 dpi)

-ownerpwd [string] : set owner password for encrypted PDF file

-userpwd [string] : set user password for encrypted PDF file

-layout            :
maintain original physical layout

-layout2           : pdf to
table conversion with Best Column Alignment

-table             :
same as -layout2

-pdf2table         : same as -layout2

-noc
: don’t insert page breaks 0x0C between pages in text file

-bitcount [int]    : set color depth when render PDF page to
image data, it can be set 1, 8, 24, default is 8bit

-rotate [int]      : rotate pages before OCR

-threshold [int]   : lightness threshold that used to convert image to
B&W, from 1 to 255, 0 is auto, default is -1

-imageopt          : deskew and
despeckle images automatically

-dither [int]      : convert the color image to B&W
using the desired method:
    -dither 0: Floyd-Steinberg
    -dither 1: Ordered-Dithering (4×4)
    -dither 2: Burkes
    -dither 3: Stucki
    -dither 4: Jarvis-Judice-Ninke
    -dither 5: Sierra
    -dither 6: Stevenson-Arce
    -dither 7: Bayer (4×4 ordered dithering)

-resizewidth [int] : resize the image’s width, only availalbe when -resizeheight
used

-resizeheight [int]: resize the image’s height, only availalbe when -resizewidth
used

-scaleimage [int] : scale the image in percent before OCR, e.g., -scaleimage
200

-flip
: flip the image vertically

-mirror            :
mirror the image horizontally

-ocr
: enable OCR function for scanned PDF file

-lang [string]     : choose the language for OCR engine

-ocrmode [int]     : set OCR mode
    -ocrmode 0: output to text file
    -ocrmode 1: OCR PDF pages and insert new text layer under
original PDF pages
    -ocrmode 2: output to plain text based PDF file
    -ocrmode 3: output to OCRed PDF file (BW) with hidden text
layer
    -ocrmode 4: output to OCRed PDF file (Color) with hidden text
layer

-text [string] : add additional text at end of each text page, this
parameter supports the following variables:
    %PageNumber%: current page number
    %PageCount% : total page count of PDF file

-outboxfile     : output [X, Y, Width, Height] information
for each word when OCR

-producer [string] : Set ‘producer’ to output PDF file

-creator [string] : Set ‘creator’ to output PDF file

-subject [string] : Set ‘subject’ to output PDF file

-title [string]    : Set ‘title’ to output PDF file

-author [string]   : Set ‘author’ to output PDF file

-keywords [string] : Set ‘keywords’ to output PDF file

-ownerpwdout [string]: Set ‘owner password’ to output PDF file

-openpwdout [string] : Set ‘open password’ to output PDF file

-keylen [int]      : Key length (40 or 128 bit)
    -keylen 0: 40 bit RC4 encryption (Acrobat 3 or higher)
    -keylen 1: 128 bit RC4 encryption (Acrobat 5 or higher)
    -keylen 2: 128 bit RC4 encryption (Acrobat 6 or higher)

-encryption [int]   : Set restrictions to PDF file
    -encryption    0: Encrypt the file only
    -encryption 3900: Deny anything
    -encryption    4: Deny printing
    -encryption    8: Deny modification of
contents
    -encryption   16: Deny copying of contents
    -encryption   32: No commenting
===128 bit encryption only — ignored if 40 bit encryption is used
    -encryption 256: Deny FillInFormFields
    -encryption 512: Deny ExtractObj
    -encryption 1024: Deny Assemble
    -encryption 2048: Disable high res. printing
    -encryption 4096: Do not encrypt metadata

-ocr2
: use enhanced OCR module to convert scanned PDF and image files to PDF, RTF,
DOC, TXT, XLS, CSV, Excel, HTML files

-ocr2aor
: detect page direction and rotate it automatically when -ocr2 used

-ocr2autorotate      : same as -ocr2aor

-ocr2excelmode [int] : set output Excel format when -ocr2 used
    -ocr2excelmode 0: One big sheet + All page sheets
    -ocr2excelmode 1: All page sheets
    -ocr2excelmode 2: One big sheet, default mode

-dumpcharpos        : Output to a Text file
with coordinates for each character

-dumpwordpos        : Output to a Text file
with coordinates for each word

-outputformat [int] : the format of output document, default is controlled by
extension name
    -outputformat    1: output to RTF format
    -outputformat    2: output to ASCII format
    -outputformat    3: output to ASCIILB format,
center some text lines
    -outputformat    4: output to 123V2 format
    -outputformat    5: output to AMIPRO1_2 format
    -outputformat    6: output to COMMAASCII
format
    -outputformat    7: output to EXCELV2 format
    -outputformat    8: output to SMARTASCII
format
    -outputformat    9: output to WORDWIN format,
same as RTF format
    -outputformat   10: output to WP50 format
    -outputformat   11: output to WP51 format
    -outputformat   12: output to NATIVE format
    -outputformat   13: output to NATIVE_TEXT format
    -outputformat   14: output to TABASCII format
    -outputformat   15: output to HTML format
    -outputformat 8888: output to plain text based PDF format
    -outputformat 8889: output to plain text file with original
layout
    -outputformat 8890: output to plain HTML file with absolute
position
    -outputformat 8891: output to CSV file with perfect columns

-outfmt [int]      : same as -outputformat

-gendebugimage     : Generate debug image file

-delblankpages     : Delete blank pages from PDF file

-linewidth [int]   : Remove black borders which width less than this
value, default is 8

-specklesize [int] : Remove the speckles which size less than this value,
default is 20

-$ [string]        : input your License Key

OCR to Any Converter Command Line Examples:
ocr2any.exe C:in.pdf C:out.txt
ocr2any.exe -firstpage 1 -lastpage 1 C:in.pdf C:out.txt
ocr2any.exe -ocr -res 300 C:in.pdf C:out.txt
ocr2any.exe -ownerpwd 123 -userpwd 456 C:in.pdf C:out.txt
ocr2any.exe -layout C:in.pdf C:out.txt
ocr2any.exe -layout2 C:in.pdf C:out.txt
ocr2any.exe -table C:in.pdf C:out.txt
ocr2any.exe -pdf2table C:in.pdf C:out.txt
ocr2any.exe -noc C:in.pdf C:out.txt
ocr2any.exe C:in.tif C:out.txt
ocr2any.exe C:in.jpg C:out.txt
ocr2any.exe C:in.bmp C:out.txt
ocr2any.exe C:in.png C:out.txt
ocr2any.exe -ocr -lang eng C:in.pdf C:out.txt
ocr2any.exe -ocr -lang eng+kor C:in.pdf C:out.txt
ocr2any.exe -ocr -lang eng+jpn C:in.pdf C:out.txt
ocr2any.exe -ocr -bitcount 1 C:in.pdf C:out.txt
ocr2any.exe -ocr -bitcount 8 C:in.pdf C:out.txt
ocr2any.exe -ocr -bitcount 24 C:in.pdf C:out.txt
ocr2any.exe -ocr -lang deu C:in.pdf C:out.txt
ocr2any.exe -lang deu C:in.tif C:out.txt
ocr2any.exe -text «PageText %PageNumber% of %PageCount%» C:in.pdf C:out.txt
ocr2any.exe -subject «subject» C:in.pdf C:out.pdf
ocr2any.exe -ownerpwdout 123 -keylen 2 -encryption 3900 C:in.pdf C:out.pdf
ocr2any.exe -subject «subject» -title «title» C:in.pdf C:out.pdf
ocr2any.exe -ocr -lang eng -ocrmode 0 C:in.pdf C:out.txt
ocr2any.exe -ocr -lang deu -ocrmode 1 C:in.pdf C:out.pdf
ocr2any.exe -ocr -lang eng -ocrmode 2 C:in.pdf C:out.pdf
ocr2any.exe -ocr -lang eng -ocrmode 3 C:in.pdf C:out.pdf
ocr2any.exe -ocr -lang eng -ocrmode 2 -outboxfile C:in.pdf C:out.pdf
ocr2any.exe -ocr -lang fra -ocrmode 1 C:in.pdf C:out.pdf
ocr2any.exe -ocr -lang ita -ocrmode 1 C:in.pdf C:out.pdf
ocr2any.exe -ocr -lang nld -ocrmode 1 C:in.pdf C:out.pdf
ocr2any.exe -ocr -lang spa -ocrmode 1 C:in.pdf C:out.pdf
ocr2any.exe -bitcount 24 -ocrmode 4 -ocr C:in.pdf C:out.pdf
ocr2any.exe -bitcount 8 -ocrmode 4 -ocr C:in.pdf C:out.pdf
ocr2any.exe -ocrmode 4 -ocr C:in.tif C:out.pdf
ocr2any.exe -ocrmode 3 -threshold 200 -ocr C:in.tif C:out.pdf
ocr2any.exe -ocrmode 4 -rotate 90 -ocr C:in.tif C:out.pdf
ocr2any.exe -ocr -lang jpn -ocrmode 4 -bitcount 24 -threshold 240 -res 200 C:in.pdf
C:out.pdf
ocr2any.exe -ocr -lang chi_sim -ocrmode 4 -threshold 240 -res 200 C:in.pdf C:out.pdf
ocr2any.exe -ocr -lang chi_tra -ocrmode 4 -threshold 240 -res 200 C:in.pdf C:out.pdf
ocr2any.exe -ocr -lang chi_sim+eng -ocrmode 4 -threshold 240 -res 200 C:in.pdf
C:out.pdf
ocr2any.exe -ocr -lang chi_sim+deu -ocrmode 4 -threshold 240 -res 200 C:in.pdf
C:out.pdf
ocr2any.exe -delblankpages D:test.pdf D:out.pdf
ocr2any.exe -delblankpages -linewidth 8 D:test.pdf D:out.pdf
ocr2any.exe -delblankpages -specklesize 20 D:test.pdf D:out.pdf

Use Enhanced OCR options:
ocr2any.exe -ocr2 -ocr2aor C:in.tif C:out.rtf
ocr2any.exe -ocr2 -ocr2aor C:in.tif C:out.doc
ocr2any.exe -ocr2 -ocr2aor C:in.tif C:out.xls
ocr2any.exe -ocr2 -ocr2aor C:in.pdf C:out.rtf
ocr2any.exe -ocr2 -ocr2aor C:in.pdf C:out.doc
ocr2any.exe -ocr2 -ocr2excelmode 0 C:in.pdf C:out.xls
ocr2any.exe -ocr2 -ocr2excelmode 1 C:in.pdf C:out.xls
ocr2any.exe -ocr2 -ocr2excelmode 2 C:in.pdf C:out.xls
ocr2any.exe -ocr2 C:in.pdf C:out.doc
ocr2any.exe -ocr2 C:in.pdf C:out.rtf
ocr2any.exe -ocr2 C:in.png C:out.xls
ocr2any.exe -ocr2 C:in.tif C:out.csv
ocr2any.exe -ocr2 C:in.bmp C:out.txt
ocr2any.exe -ocr2 C:in.gif C:out.htm
ocr2any.exe -ocr2 C:in.pdf C:out.html
ocr2any.exe -ocr2 D:temp*.pdf D:temp*.html
ocr2any.exe -ocr2 D:temp*.pdf D:temp*.doc
ocr2any.exe -ocr2 C:in.pdf C:out.rtf
ocr2any.exe -ocr2 -lang deu C:in.pdf C:out.doc
ocr2any.exe -ocr2 -lang deu C:in.pdf C:out.xls
ocr2any.exe -ocr2 -dumpcharpos C:in.pdf C:out.txt
ocr2any.exe -ocr2 -dumpwordpos C:in.pdf C:out.txt
ocr2any.exe -ocr2 -dumpcharpos C:in.pdf C:out.rtf
ocr2any.exe -ocr2 -dumpwordpos C:in.pdf C:out.rtf
ocr2any.exe -ocr2 C:in.pdf C:text.pdf
ocr2any.exe -ocr2 C:in.tif C:out.pdf
ocr2any.exe -ocr2 C:in.png C:out.pdf
ocr2any.exe -ocr2 C:in.jpg C:out.pdf
ocr2any.exe -ocr2 C:in.tif C:out.doc
ocr2any.exe -ocr2 C:in.tif C:out.rtf
ocr2any.exe -ocr2 C:in.tif C:out.txt
ocr2any.exe -ocr2 C:in.tif C:out.xls
ocr2any.exe -ocr2 -ocr2autorotate C:in.tif C:out.pdf
ocr2any.exe -ocr2 -ocr2autorotate C:in.tif C:out.doc
ocr2any.exe -ocr2 -outputformat 1 C:in.tif C:out.rtf
ocr2any.exe -ocr2 -outputformat 2 C:in.tif C:out.txt
ocr2any.exe -ocr2 -outputformat 3 C:in.tif C:out.txt
ocr2any.exe -ocr2 -outputformat 6 C:in.tif C:out.txt
ocr2any.exe -ocr2 -outputformat 7 C:in.tif C:out.xls
ocr2any.exe -ocr2 -outputformat 8 C:in.tif C:out.txt
ocr2any.exe -ocr2 -outputformat 9 C:in.tif C:out.doc
ocr2any.exe -ocr2 -outputformat 13 C:in.tif C:out.txt
ocr2any.exe -ocr2 -outputformat 14 C:in.tif C:out.txt
ocr2any.exe -ocr2 -outputformat 15 C:in.tif C:out.html
ocr2any.exe -ocr2 -dumpcharpos -dumpwordpos -outputformat 8888 C:in.tif C:out.pdf
ocr2any.exe -ocr2 -dumpcharpos -dumpwordpos -outputformat 8889 C:in.tif C:out.txt
ocr2any.exe -ocr2 -dumpcharpos -dumpwordpos -outputformat 8890 C:in.tif C:out.html
ocr2any.exe -ocr2 -dumpcharpos -dumpwordpos -outputformat 8891 C:in.tif C:out.csv
ocr2any.exe -ocr2 -scaleimage 200 -threshold 240 D:in.tif D:out.txt
ocr2any.exe -ocr2 -scaleimage 300 D:in.tif D:out.txt

Process image files with Deskew, Despeckle and Noise Removal, Black Border
Removal options:
ocr2any.exe -imageopt C:in.tif C:out.tif
ocr2any.exe -imageopt -rotate 45 C:in.png C:out.tif
ocr2any.exe -imageopt -rotate 90 C:in.png C:out.tif
ocr2any.exe -imageopt -threshold 0 C:in.tif C:out.bmp
ocr2any.exe -threshold 240 C:in.tif C:out.bmp
ocr2any.exe -dither 0 C:in.bmp C:out.png
ocr2any.exe -dither 7 C:in.bmp C:out.png
ocr2any.exe -imageopt -resizewidth 800 -resizeheight 600 C:in.gif C:out.tga
ocr2any.exe -imageopt -flip C:in.png C:out.gif
ocr2any.exe -imageopt -mirror C:in.tif C:out.pcx
ocr2any.exe -imageopt C:in.bmp C:out.tif

Following command line will OCR all PDF files in D:temp folder to text files:
for %F in (D:temp*.pdf) do ocr2any.exe -ocr -lang deu «%F» «%~dpnF.txt»

Following command line will OCR all PDF files in D:temp folder and
sub-directories to text files:
for /r D:temp %F in (*.pdf) do ocr2any.exe -ocr «%F» «%~dpnF.txt»

Following command line will OCR all PDF files from D:temp folder and output
text files to C:test folder:
for %F in (D:temp*.pdf) do ocr2any.exe -ocr «%F» «C:test%~nF.txt»

Following command lines will use Enhanced OCR options:
for %F in (D:temp*.pdf) do ocr2any.exe -ocr2 -lang deu «%F» «%~dpnF.txt»
for %F in (D:temp*.pdf) do ocr2any.exe -ocr2 -lang eng «%F» «%~dpnF.doc»
for %F in (D:temp*.tif) do ocr2any.exe -ocr2 «%F» «%~dpnF.doc»
for %F in (D:temp*.tif) do ocr2any.exe -ocr2 -ocr2autorotate «%F» «%~dpnF.xls»
for /r D:temp %F in (*.pdf) do ocr2any.exe -ocr2 «%F» «%~dpnF.rtf»
for %F in (D:temp*.pdf) do ocr2any.exe -ocr2 «%F» «C:test%~nF.html»»
ocr2any.exe -ocr2 D:temp*.tif D:temp*.html
ocr2any.exe -ocr2 -ocr2excelmode 0 D:temp*.pdf D:temp*.xls
ocr2any.exe -ocr2 D:temp*.png D:temp*.rtf
ocr2any.exe -ocr2 D:temp*.tif D:temp*.csv
ocr2any.exe -ocr2 D:temp*.pdf D:temp*.doc

The following OCR languages are supported:

Afrikaans (afr)	Greek (ell)	Odiya (ori)
Albanian (sqi)	Gujarati (guj)	Panjabi (pan)
Amharic (amh)	Haitian (hat)	Persian (fas)
Ancient Greek (grc)	Hebrew (heb)	Polish (pol)
Arabic (ara)	Hindi (hin)	Portuguese (por)
Assamese (asm)	Hungarian (hun)	Pushto (pus)
Azerbaijani (aze)	Icelandic (isl)	Romanian (ron)
Basque (eus)	Indic (inc)	Russian (rus)
Belarusian (bel)	Indonesian (ind)	Sanskrit (san)
Bengali (ben)	Inuktitut (iku)	Serbian (srp)
Bosnian (bos)	Irish (gle)	Sinhala (sin)
Bulgarian (bul)	Italian (ita)	Slovak (slk)
Burmese (mya)	Japanese (jpn)	Slovenian (slv)
Catalan (cat)	Javanese (jav)	Spanish (spa)
Cebuano (ceb)	Kannada (kan)	Swahili (swa)
Central Khmer (khm)	Kazakh (kaz)	Swedish (swe)
Cherokee (chr)	Kirghiz (kir)	Syriac (syr)
Chinese — Simplified (chi_sim)	Korean (kor)	Tagalog (tgl)
Chinese — Traditional (chi_tra)	Kurukh (kru)	Tajik (tgk)
Croatian (hrv)	Lao (lao)	Tamil (tam)
Czech (ces)	Latin (lat)	Telugu (tel)
Danish (dan)	Latvian (lav)	Thai (tha)
Dutch (nld)	Lithuanian (lit)	Tibetan (bod)
Dzongkha (dzo)	Macedonian (mkd)	Tigrinya (tir)
English (eng)	Malay (msa)	Turkish (tur)
Esperanto (epo)	Malayalam (mal)	Uighur (uig)
Estonian (est)	Maltese (mlt)	Ukrainian (ukr)
Finnish (fin)	Marathi (mar)	Urdu (urd)
Frankish (frk)	Math/Equations (equ)	Uzbek (uzb)
French (fra)	Middle English (1100-1500) (enm)	Vietnamese (vie)
Galician (glg)	Middle French (1400-1600) (frm)	Welsh (cym)
Georgian (kat)	Nepali (nep)	Yiddish (yid)
German (deu)	Norwegian (nor)

System requirement

Windows 2000 / XP / Server 2003 / Vista / Server 2008 / 7 / 8 / Later systems of both 32 and 64-bit.

Источник

PDF2TXT

Contents

Description

Installation

Choosing PDF Source and TXT Target

Text Extraction Settings

Viewing Area

Toggling between a File and Folder List

Configuration Check Boxes

Action Buttons

URL Source,

Hot Keys

The Log File

Command Line Operation

File Association

Change Log

Version 3.5 on February 6, 2012
Updated Tesseract utility for OCR. Updated QuickPDF library. Used that library rather than GhostScript to convert from PDF to .tif files for Tesseract. The result is considerably better OCR quality.

Run PDF2TXT to convert PDF to plain text

Interface Overview

Command Line Interface for batch conversion

Deutsch

Español

Français

Italiano

Magyar

Nederlands

Português (Brasil)

Slovenščina

Türkçe

Русский

简体中文

繁體中文

日本語

한국어

ไทย

1. Easy Way

2. Manual Sharing

Вот еще несколько интересных статей:

PDF2TXT

Contents

Description

Installation

Choosing PDF Source and TXT Target

Text Extraction Settings

Viewing Area

Toggling between a File and Folder List

Configuration Check Boxes

Action Buttons

URL Source,

Hot Keys

The Log File

Command Line Operation

File Association

Change Log

Version 3.5 on February 6, 2012 Updated Tesseract utility for OCR. Updated QuickPDF library. Used that library rather than GhostScript to convert from PDF to .tif files for Tesseract. The result is considerably better OCR quality.

Run PDF2TXT to convert PDF to plain text

Interface Overview

Command Line Interface for batch conversion

Deutsch

Español

Français

Italiano

Magyar

Nederlands

Português (Brasil)

Slovenščina

Türkçe

Русский

简体中文

繁體中文

日本語

한국어

ไทย

1. Easy Way

2. Manual Sharing

Вот еще несколько интересных статей:

Version 3.5 on February 6, 2012
Updated Tesseract utility for OCR. Updated QuickPDF library. Used that library rather than GhostScript to convert from PDF to .tif files for Tesseract. The result is considerably better OCR quality.