How to Convert PDF to TXT — Free Online Guide
You need the text from a PDF — to paste into another document, feed to an AI tool, run a search on, or process programmatically. The conversion seems trivial. Then you open the output and find half the sentences are scrambled, or the file contains literally nothing but whitespace.
PDF-to-TXT conversion has two distinct cases that require entirely different approaches. This guide explains both, shows you the right tool for each, and includes the command-line methods for handling difficult PDFs.
Quick answer: To extract plain text from a PDF for free: use Convertlo's PDF to TXT converter — processes in-browser, no upload. For scanned PDFs (image-only), you'll need OCR first (Google Drive or Tesseract) before the text can be extracted.
Digital PDF vs Scanned PDF — The Critical Distinction
PDF-to-TXT tools fail or produce garbage for one of two reasons, and the right fix depends on which problem you have:
| PDF Type | What It Contains | Extraction Method |
|---|---|---|
| Digital PDF | Real text characters embedded in the file | Direct text extraction (fast, accurate) |
| Scanned PDF | Images/photographs of pages | OCR required (slower, has errors) |
How to tell which type you have
Open the PDF in any viewer and try to select text with your cursor. If you can click and drag to highlight individual words, the PDF has embedded text and can be extracted directly. If you can only select the entire page as if it were a picture — or can't select anything at all — it's a scanned PDF and you need OCR.
The Column Scrambling Problem
Even with digital PDFs, a common failure is text from multi-column layouts being extracted in the wrong reading order. Here's why:
PDF stores text as individually positioned elements with X/Y coordinates on the page. "Reading order" is not stored in the file — it's implied by position. A two-column academic paper has text elements scattered across the page, and a naive extractor that processes them top-to-bottom will mix text from both columns together instead of reading left column first, then right column.
Result: "The algorithm The results achieves show that O(n log n) our method complexity outperforms" instead of two separate coherent columns.
Convert PDF to TXT Free — Right Now
No upload, no software, no signup. Your file never leaves your device.
Method 1 — Convert PDF to TXT Free in Your Browser
- Open convertlo.pro/pdf-to-txt.html on any device.
- Drag and drop your PDF file, or click Browse to select it.
- Extraction runs 100% locally — your document never leaves your device.
- Click Download to save the TXT file, or copy the text directly.
Works best on digital PDFs with embedded text. For scanned PDFs, see the OCR methods below. Ideal for confidential documents — contracts, medical records, legal documents — that you don't want to upload to any server.
Method 2 — Google Drive OCR (Best for Scanned PDFs)
- Go to drive.google.com and upload your scanned PDF.
- Once uploaded, right-click the file → Open with → Google Docs.
- Google automatically applies OCR and opens the document with extracted text.
- Go to File → Download → Plain Text (.txt) to save as TXT.
Google Drive OCR is excellent for clearly printed scanned documents. Accuracy is high on clean scans at 200+ DPI. Handwritten text and poor-quality scans have lower accuracy. Note: your document is uploaded to Google's servers.
Method 3 — pdfminer.six (Command Line, Best for Complex Layouts)
- Install:
pip install pdfminer.six - Extract text with layout analysis:
pdf2txt.py -o output.txt input.pdf
For multi-column documents, use the layout analysis mode:
pdf2txt.py -l -o output.txt input.pdf
pdfminer.six uses box-detection algorithms to reconstruct reading order from positioned text elements — significantly better than naive top-to-bottom extraction for academic papers, newspapers, and magazine layouts.
Method 4 — Tesseract OCR (Command Line, for Scanned PDFs)
- Install Tesseract from github.com/tesseract-ocr (or
brew install tesseracton Mac). - Tesseract works on image files. For a scanned PDF, first convert pages to images:
pdftoppm -r 300 input.pdf page tesseract page-1.ppm output.txt
Or use ocrmypdf — a wrapper that handles the PDF→image→OCR→PDF pipeline in one command:
pip install ocrmypdf ocrmypdf --sidecar output.txt input.pdf /dev/null
The --sidecar output.txt flag saves the OCR text to a separate TXT file. /dev/null discards the output PDF if you only want the text. On Windows replace /dev/null with NUL.
Tool Comparison by PDF Type
| Tool | Digital PDF | Scanned PDF | Multi-column | Privacy |
|---|---|---|---|---|
| Convertlo | Excellent | Not supported | Good | 100% local |
| Google Drive OCR | Good | Excellent | Good | Cloud upload |
| pdfminer.six | Excellent | Not supported | Best | 100% local |
| ocrmypdf / Tesseract | Good | Excellent | Good | 100% local |
| Adobe Acrobat | Excellent | Excellent | Excellent | Cloud optional |