Document Conversion

How to Convert PDF to TXT — Free Online Guide

May 18, 2026 · 8 min read · Convertlo Team ·Last updated: Jun 4, 2026

You need the text from a PDF — to paste into another document, feed to an AI tool, run a search on, or process programmatically. The conversion seems trivial. Then you open the output and find half the sentences are scrambled, or the file contains literally nothing but whitespace.

PDF-to-TXT conversion has two distinct cases that require entirely different approaches. This guide explains both, shows you the right tool for each, and includes the command-line methods for handling difficult PDFs.

Quick answer: To extract plain text from a PDF for free: use Convertlo's PDF to TXT converter — processes in-browser, no upload. For scanned PDFs (image-only), you'll need OCR first (Google Drive or Tesseract) before the text can be extracted.

Digital PDF vs Scanned PDF — The Critical Distinction

PDF-to-TXT tools fail or produce garbage for one of two reasons, and the right fix depends on which problem you have:

PDF Type	What It Contains	Extraction Method
Digital PDF	Real text characters embedded in the file	Direct text extraction (fast, accurate)
Scanned PDF	Images/photographs of pages	OCR required (slower, has errors)

How to tell which type you have

Open the PDF in any viewer and try to select text with your cursor. If you can click and drag to highlight individual words, the PDF has embedded text and can be extracted directly. If you can only select the entire page as if it were a picture — or can't select anything at all — it's a scanned PDF and you need OCR.

Scanned PDF + standard converter = empty output. Standard PDF-to-TXT tools extract embedded text. If there is none, they return an empty file or whitespace. This is not a bug — there is simply no text layer to extract. You need an OCR tool instead.

The Column Scrambling Problem

Even with digital PDFs, a common failure is text from multi-column layouts being extracted in the wrong reading order. Here's why:

PDF stores text as individually positioned elements with X/Y coordinates on the page. "Reading order" is not stored in the file — it's implied by position. A two-column academic paper has text elements scattered across the page, and a naive extractor that processes them top-to-bottom will mix text from both columns together instead of reading left column first, then right column.

Result: "The algorithm The results achieves show that O(n log n) our method complexity outperforms" instead of two separate coherent columns.

Tools with better column detection: pdfminer.six (Python library) uses layout analysis to reconstruct reading order from multi-column PDFs. Adobe Acrobat's Export to Text also handles columns well. Simple tools that call a basic PDF parser often scramble multi-column text.

Convert PDF to TXT Free — Right Now

No upload, no software, no signup. Your file never leaves your device.

Convert PDF to TXT Browse Document Converters

Method 1 — Convert PDF to TXT Free in Your Browser

Convertlo — No Upload, No Install

Recommended for digital PDFs

Open convertlo.pro/pdf-to-txt.html on any device.
Drag and drop your PDF file, or click Browse to select it.
Extraction runs 100% locally — your document never leaves your device.
Click Download to save the TXT file, or copy the text directly.

Works best on digital PDFs with embedded text. For scanned PDFs, see the OCR methods below. Ideal for confidential documents — contracts, medical records, legal documents — that you don't want to upload to any server.

Method 2 — Google Drive OCR (Best for Scanned PDFs)

Google Drive — Free OCR for Scanned Documents

Best for scanned PDFs

Go to drive.google.com and upload your scanned PDF.
Once uploaded, right-click the file → Open with → Google Docs.
Google automatically applies OCR and opens the document with extracted text.
Go to File → Download → Plain Text (.txt) to save as TXT.

Google Drive OCR is excellent for clearly printed scanned documents. Accuracy is high on clean scans at 200+ DPI. Handwritten text and poor-quality scans have lower accuracy. Note: your document is uploaded to Google's servers.

Method 3 — pdfminer.six (Command Line, Best for Complex Layouts)

pdfminer.six — Python, Windows/Mac/Linux

Install: pip install pdfminer.six
Extract text with layout analysis:

pdf2txt.py -o output.txt input.pdf

For multi-column documents, use the layout analysis mode:

pdf2txt.py -l -o output.txt input.pdf

pdfminer.six uses box-detection algorithms to reconstruct reading order from positioned text elements — significantly better than naive top-to-bottom extraction for academic papers, newspapers, and magazine layouts.

Method 4 — Tesseract OCR (Command Line, for Scanned PDFs)

Tesseract — Free Open-Source OCR

Install Tesseract from github.com/tesseract-ocr (or brew install tesseract on Mac).
Tesseract works on image files. For a scanned PDF, first convert pages to images:

pdftoppm -r 300 input.pdf page
tesseract page-1.ppm output.txt

Or use ocrmypdf — a wrapper that handles the PDF→image→OCR→PDF pipeline in one command:

pip install ocrmypdf
ocrmypdf --sidecar output.txt input.pdf /dev/null

The --sidecar output.txt flag saves the OCR text to a separate TXT file. /dev/null discards the output PDF if you only want the text. On Windows replace /dev/null with NUL.

Tool Comparison by PDF Type

Tool	Digital PDF	Scanned PDF	Multi-column	Privacy
Convertlo	Excellent	Not supported	Good	100% local
Google Drive OCR	Good	Excellent	Good	Cloud upload
pdfminer.six	Excellent	Not supported	Best	100% local
ocrmypdf / Tesseract	Good	Excellent	Good	100% local
Adobe Acrobat	Excellent	Excellent	Excellent	Cloud optional

Frequently Asked Questions

Why does my PDF to TXT conversion scramble the text?

PDF stores text as positioned elements, not as a linear reading flow. In multi-column layouts, a naive extractor processes text top-to-bottom and mixes both columns. Tools like pdfminer.six use layout analysis to reconstruct proper reading order. If the text is completely garbled, the PDF is likely scanned and needs OCR.

What is the difference between a digital PDF and a scanned PDF?

A digital PDF (created from software) contains actual embedded text characters. A scanned PDF is a photograph of a document — it contains images of text, not actual text. Try selecting text in the PDF: if you can highlight words, it has embedded text. If you can only select the whole page as an image, it's scanned and needs OCR.

How can I tell if my PDF is digital or scanned?

Open the PDF in any viewer and try selecting text with your cursor. If you can click and drag to highlight individual words, it has embedded text (digital PDF). If clicking the page selects the whole page as an image, or you can't select anything, it's a scanned PDF requiring OCR.

Which tools handle scanned PDFs?

Google Drive's built-in OCR (upload → right-click → Open with Google Docs) is the easiest free option. For local processing: Tesseract (free, open-source) and ocrmypdf (a Tesseract wrapper with PDF-specific improvements) are excellent command-line tools. Adobe Acrobat has built-in OCR but is paid software.

Does PDF to TXT work on iPhone and Android?

Yes. Convertlo works in any modern browser including Safari on iPhone and Chrome on Android. Your file never leaves your device — all extraction happens locally. For scanned PDFs on mobile, Google Drive's OCR works well through the mobile app.

📄

Convertlo Editorial Team

Document conversion guides covering the practical realities of format compatibility, OCR accuracy, and text extraction from complex layouts.

About Convertlo →