pdfdq servers communicate with 3 different APIs (Application Programming Interfaces) to perform text extraction, character recognition, image description, or data summarization on PDF files.

API results are inserted into a SQL database, along with a reference image and links to download and review the PDF file or summarize the data returned from extraction, recognition, or description APIs.

Authenticated users can then instantly locate information by searching the database for API results.

Core features of a pdfdq server

PDF text extraction: LLMWhisperer provides 5 modes to recognize or extract text layers from PDF files
Image text recognition: OCRSpace performs character recognition on scanned pages, images, photos
Image description: GPT generates natural‑language descriptions along with semantically related words
SQL database: extracted and summarized text and file links are stored in a dedicated SQL database
Data Summarization: GPT can optionally summarize the text results returned from any of the APIs
Full‑text search: users can search for data in thousands of files at once, found text highlighted in red
Multi-user interface: share web-based process management and search interfaces with registered users
Included utilities: convert PDF to numbered images, multiple images to PDF, overlay text on images

                    Practically any type of file can be printed to PDF (documents, spreadsheets, images, emails, invoices...) 
                

Contract & Agreement Analysis – locate clauses, terms, obligations, renewal dates, risk language
Real Estate & Title Archives – deeds, plat maps, surveys, historical packets
Compliance & Audit – regulatory filings, audit packets, certifications, inspection reports
Financial & Accounting – invoices, statements, receipts, financial reports
Litigation Knowledge Management – evidence, briefs, motions, depositions, exhibits
Technical & Engineering – schematics, blueprints, test reports, engineering documents
Customer Support – support tickets, warranty documents, service manuals, customer correspondence
Supply Chain & Logistics – bills of lading, shipping manifests, customs documents, QC reports
Human Resources – employee resumes, certifications, training records, policy guidelines
Research & Knowledge Management – scientific papers, research documents, lab reports, field notes
Project & Program Documentation – proposals, project plans, meeting packets, status reports
Manufacturing & Quality Assurance – inspection reports, certifications, test results, production logs
Sales & Marketing – catalogs, proposals, presentations, contracts, customer correspondence
Operational Data Retrieval – manuals, SOPs, engineering drawings, technical specifications

LLMWhisperer API

native text extraction or character recognition

OCR.Space API

character recognition on text embedded in images

OpenAI GPT API

image description and result summarization

Multiple APIs and process modes capture more detail

Extract information that single-mode PDF optical character recognition (OCR) and text extraction systems silently miss or fail to recognize

Extracted data can be searched with minimal information

pdfdq search interfaces with AJAX-powered search suggest hints enable users to find information by text string or number sequence

Files can be searched by text, description, or summary

Descriptions including semantic search words make images searchable by text embedded in the image, description, concept, or summary

Stream-lined process and information management

Each PDF file added or deleted from the database instantly updates searchable content, providing real-time control of published data

Extremely simple to start, load, process, and maintain

Select a server subscription below, sign up for API keys
We send you a link to log-in as the server administrator
You log-in, insert API keys, add editor and viewer accounts
Editors log-in and upload PDF files to your pdfdq server
Editors select extraction, recognition, or description API
pdfdq query and file link are sent to the selected API
Query results and file links are inserted in the database
Results can optionally be summarized for search inclusion
Searchable results become available to all users. Boom!

                    Test the pdfdq interface below. Enter letters in the search box to see AJAX search-suggest. Click the magnifying glass or Search button. Click the "More" link to view highlighted found text.
                

				Each API has different methods, capabilities, and pricing options. See the table at the bottom of this page for links for more information or to acquire your API keys. 
			

API Feature	LLMWhisperer	OCR.Space	OpenAI GPT‑5.2
Core Function	Fast Whisper‑based OCR with text extraction + layout	Machine Learning OCR engine with strong stylized‑text support	AI vision model with OCR + layout + semantic reasoning
Accuracy (Plain Text)	High for clean scans	High for clean scans	High for clean scans, can hallucinate
Accuracy (Stylized Text)	Moderate	High - excels with curved/embossed text	High - handles unusual fonts well
Layout Preservation	High - tables, forms	Moderate	High - tables, forms, handwriting
Semantic Understanding	None	None	Full semantic interpretation
Speed	Very fast	Fast	Moderate
Security	Encrypted transit, no training, ephemeral processing	Encrypted transit, no training, transient processing	Encrypted transit, no training, transient processing
Pricing	$0.001–$0.0015 per page	$0.002–$0.01 per page	$0.005–$0.02 per page
Service Model	Cloud API	Cloud API	Cloud API

					Sign up for your API keys (see the links at the bottom of this page to order) and send us your server selection to turn on search super powers... 
				

Essential

$90.00/month
(annual billing)

25 GB SSD disk space
2 GB RAM / 100% vCPU
Litespeed + LSCache
Redis Object Cache
Unlimited data transfer
Includes SSL certificate

Team

$98.00/month
(annual billing)

50 GB SSD disk space
3 GB RAM / 200% vCPU
Litespeed + LSCache
Redis Object Cache
Unlimited data transfer
Includes SSL certificate

Department

$108.00/month
(annual billing)

100 GB SSD space
4 GB RAM / 200% vCPU
Litespeed + LSCache
Redis Object Cache
Unlimited data transfer
Includes SSL certificate

Professional

$122.00/month
(annual billing)

150 GB SSD space
6 GB RAM / 300% vCPU
Litespeed + LSCache
Redis Object Cache
Unlimited data transfer
Includes SSL certificate

Frequently Asked Questions

What is included in a pdfdq server subscription?

pdfdq subscriptions include a custom-compiled PHPRunner-based pdfdq web interface, a U.S.-based Litespeed PHP web server with LSCache enabled, a private dedicated MySQL relational database, and an active SSL certificate. The subscription amount varies depending on the server option selected above. Each pdfdq server is configured specifically for each subscription account. Please allow up to 24 hours for server provision.

How secure is my data?

All file uploads and API interactions occur over encrypted HTTPS connections. Files are isolated per request, and processed files are deleted or marked for deletion by the API at the end of each process. LLMWhisperer's ephemeral processing provides the strongest privacy and compliance posture by eliminating data retention entirely. GPT and OCR.Space's transient processing use limited, policy‑controlled retention to support the API's operational needs. pdfdq servers store the returned results in a private, dedicated, cloud-based SQL database along with the PDF file, which was moved to secure directory on the server during processing. System users are required to log-in with a registered user name and password that triggers a 2FA (2 Factor Authentication) message to the users registered email address containing a 6 character security code that must be entered in the log-in form to gain access.

What types of documents can pdfdq process?

pdfdq can process PDF documents up to 20 MB in size, and since practically any file can be printed to PDF (word processing documents, spreadsheets, images, blueprints, emails, invoices, scans, contracts, invoices, reports, forms, archives...) essentially any type of original document can be printed to PDF and processed. If your PDF files are large or text-dense you can use the pdf2img and img2pdf utilities (explained below, demo available above) to split the file into smaller files for processing.

What makes pdfdq more versatile than many other solutions?

Other solutions typically provide a single method for extracting text from PDF files while pdfdq provides 8 different file processing options. LLMWhisperer has Native, Low Cost, High Quality, Form, and Table modes, all with layout preservation. OCR.Space is great for general character recognition and excels at reading text embedded in images with stylized or slanted components. OCR.SpaceCode mode is slightly modified to work with files with embedded programming code, replacing leading "<" characters with a "|" character, making it so the code is still easy to read but won't inadvertently execute in the browser. OpenAI GPT-5.2 can generate detailed descriptions with semantically-related words for images and can also summarize the text results returned from APIs.

How accurate are the extraction or recognition results?

It depends heavily on the original PDF. If the file was printed to PDF by a word processor, spreadsheet, or other desktop application, it will often contain a text "layer" that Native mode can read with very high accuracy while preserving text layout. If the layer is not available (as is typical for scanned documents and images), providing the text is printed with readable fonts, then High Quality, Form, or Table will return accurate results, also preserving text layout. For general character recognition without layout preservation, or if the text has stylized or angled components, or if text is embedded in images, then OCRSpace will probably return the highest quality results. You can see the results of the different processing modes on various PDF content using the demo pdfdq interface above.

Where is the extracted data stored?

Recognized and extracted text or descriptions, result summaries, generated semantic metadata, optional reference label, and reference image are stored in your pdfdq subscription account's dedicated SQL database, along with a link to download and view the original uploaded PDF, which is moved to a secure directory on your pdfdq server after processing.

Can multiple users access the system?

pdfdq servers support web-enabled multi‑user access with role‑based permissions (administrators, editors, viewers), allowing teams to securely collaborate, including uploading documents, performing extraction, recognition, description processes, and running searches.

How does search work?

Search operates directly against SQL‑stored text and metadata, enabling dynamic full‑text queries using advanced AJAX-based search suggest hints and highlighted found text results. You can test the search interface using the example pdfdq search interface above. Enter a few characters, like "wer" or "ast" or "111" and pause for a second to see AJAX-based search hints. Click the magnifying glass icon to the right of the search box to run the search, then click the "more" link to view the found text, highlighted in red. You can also select the gear icon top right in the demo pdfdq interface above and select Advanced search or Show search panel to build more complex queries involving data from multiple fields.

Does pdfdq require plugins or desktop software?

pdfdq interfaces are entirely web browser‑based and don't require client‑side plugins or software installations. Desktop, web, and mobile devices equipped with standard web browsers (Chrome, Explorer, Firefox, Safari...) can all be used by authenticated users to connect and interact with the server.

what do the pdf2img, img2pdf, and txt2img utilities do?

The pdf2img utility converts PDF files into sequential PNG or JPEG images, one image per page. This is useful when you want to process sub-sections of large files. After the large file has been conveted to image files you can download all page images in a single zip file. Open the zip file, extract the pages you want to process, then use the img2pdf utility to generate a new PDF file containing just the selected pages. The txt2img utility can be used to overlay text onto images prior to converting the image to PDF for processing. Images can then be converted to PDF files either by using the img2pdf utility or by dragging the image into a web browser window and printing to PDF (you can optionally reduce or enlarge the image in this process). You can test all of these utilities using the example pdfdq search interface above.

Can pdfdq servers be integrated with other systems?

Servers are configured as isolated, stand-alone applications for security purposes. The only subscriber-available interface to the database is through the pdfdq web browser interface. Authenticated administrator level users can export extracted data in Word, Excel, or CSV formats. Servers can optionally be configured to allow external database access by request.

                    "Businesses with employees that can instantly locate important information could potentially have a pretty significant advantage over businesses with employees that can't."
                

Ready to get started? Sign up for API keys (see table below) and then contact us to register your pdfdq server subscription.

					Here are links to the LLMWhisperer, OCRSpace, or OpenAI GPT API websites. You can order your API keys directly from these pages. 
				

Service or Application	Pricing	Description
LLMWhisperer	$0.001–$0.0015/page	LLMWhisperer is adept at extracting complex data, especially repeating sections and line items when the layout of document needs to be preserved in the extracted text.
OCRSpace Pro PDF Plan	$60.00/month - 30,000 conversions	The OCR.space OCR service converts PDF files into editable text using Optical Character Recognition (OCR). Supports multi-page files and multi-column text recognition.
OpenAI GPT-5.2	$0.005–$0.02/page	The OpenAI API provides state-of-the-art text generation, natural language processing, and computer vision, ideal for describing individual images and summarizing process results.

pdfdq Pricing & Competitive Comparison

PDFDQ Pricing & Competitive Comparison

Platform Pricing

Fixed monthly server subscription. API processing costs are paid directly to the API service by the subscriber for wholesale pricing and visibility.

Plan	Server Subscription	Pages Included	Estimated Processing Cost*	Estimated Total Monthly Cost
Essential	$90	Up to 10,000	~$150	~$240
Team	$98	Up to 25,000	~$375	~$473
Department	$108	Up to 50,000	~$750	~$858
Professional	$122	Up to 100,000	~$1,500	~$1,622

*Estimated processing costs assume blended API service usage for the full Pages Included page count at approximately $0.015 per page. Actual costs will vary depending on the number of pages actually processed and the API services selected. LLMWhisperer Native Mode is the lowest processing cost per page, followed by the other LLMWhisperer modes, then OCR.Space, the GPT-5.2.

Competitive Comparison

Platform	Pricing Model	Estimated Monthly Cost	Time to Deploy	Cost Predictability	Implementation Simplicity
pdfdq	Server Subscription + API Pricing	$240 – $1,622	1-2 Days	High	Very High
Relativity / Everlaw	Per‑matter / per‑GB	$3,000 – $15,000+	Weeks–Months	Low	Low
Hyperscience / Rossum	Enterprise SaaS	$2,000 – $20,000+	Months	Low	Low
ABBYY	Enterprise license	$2,000 – $10,000+	Months	Medium	Low
Google Document AI	Usage‑based API	$300 – $3,000+	Weeks	Low	Medium
AWS Textract	Per‑page + infrastructure	$250 – $2,500+	Weeks	Low	Low
Per‑Seat AI Assistants	Seat‑based SaaS	$1,000+ per user	Days–Weeks	Low	Medium
DIY OCR + LLM Stack	Internal build	Hidden	Months	Very Low	Very Low

PDF Data Query servers help people find needles in haystacks, quickly.