pdfdq servers communicate with 3 different APIs (Application Programming Interfaces) to perform text extraction, character recognition, image description, or data summarization on PDF files.

API results are inserted into a SQL database, along with a reference image and links to download and review the PDF file or summarize the data returned from extraction, recognition, or description APIs.

Authenticated users can then instantly locate information by searching the database for API results.

Core features of a pdfdq server

  • PDF text extraction: LLMWhisperer provides 5 modes to recognize or extract text layers from PDF files
  • Image text recognition: OCRSpace performs character recognition on scanned pages, images, photos
  • Image description: GPT generates natural‑language descriptions along with semantically related words
  • SQL database: extracted and summarized text and file links are stored in a dedicated SQL database
  • Data Summarization: GPT can optionally summarize the text results returned from any of the APIs
  • Full‑text search: users can search for data in thousands of files at once, found text highlighted in red
  • Multi-user interface: share web-based process management and search interfaces with registered users
  • Included utilities: convert PDF to numbered images, multiple images to PDF, overlay text on images
Practically any type of file can be printed to PDF (documents, spreadsheets, images, emails, invoices...)
  • Contract & Agreement Analysis – locate clauses, terms, obligations, renewal dates, risk language
  • Real Estate & Title Archives – deeds, plat maps, surveys, historical packets
  • Compliance & Audit – regulatory filings, audit packets, certifications, inspection reports
  • Financial & Accounting – invoices, statements, receipts, financial reports
  • Litigation Knowledge Management – evidence, briefs, motions, depositions, exhibits
  • Technical & Engineering – schematics, blueprints, test reports, engineering documents
  • Customer Support – support tickets, warranty documents, service manuals, customer correspondence
  • Supply Chain & Logistics – bills of lading, shipping manifests, customs documents, QC reports
  • Human Resources – employee resumes, certifications, training records, policy guidelines
  • Research & Knowledge Management – scientific papers, research documents, lab reports, field notes
  • Project & Program Documentation – proposals, project plans, meeting packets, status reports
  • Manufacturing & Quality Assurance – inspection reports, certifications, test results, production logs
  • Sales & Marketing – catalogs, proposals, presentations, contracts, customer correspondence
  • Operational Data Retrieval – manuals, SOPs, engineering drawings, technical specifications
LLMWhisperer API
native text extraction or character recognition

OCR.Space API
character recognition on text embedded in images

OpenAI GPT API
image description and result summarization
Multiple APIs and process modes capture more detail
Extract information that single-mode PDF optical character recognition (OCR) and text extraction systems silently miss or fail to recognize
Extracted data can be searched with minimal information
pdfdq search interfaces with AJAX-powered search suggest hints enable users to find information by text string or number sequence
Files can be searched by text, description, or summary
Descriptions including semantic search words make images searchable by text embedded in the image, description, concept, or summary
Stream-lined process and information management
Each PDF file added or deleted from the database instantly updates searchable content, providing real-time control of published data

Extremely simple to start, load, process, and maintain

  • Select a server subscription below, sign up for API keys
  • We send you a link to log-in as the server administrator
  • You log-in, insert API keys, add editor and viewer accounts
  • Editors log-in and upload PDF files to your pdfdq server
  • Editors select extraction, recognition, or description API
  • pdfdq query and file link are sent to the selected API
  • Query results and file links are inserted in the database
  • Results can optionally be summarized for search inclusion
  • Searchable results become available to all users. Boom!
Test the pdfdq interface below. Enter letters in the search box to see AJAX search-suggest. Click the magnifying glass or Search button. Click the "More" link to view highlighted found text.

Each API has different methods, capabilities, and pricing options. See the table at the bottom of this page for links for more information or to acquire your API keys.

API Feature LLMWhisperer OCR.Space OpenAI GPT‑5.2
Core Function Fast Whisper‑based OCR with text extraction + layout Machine Learning OCR engine with strong stylized‑text support AI vision model with OCR + layout + semantic reasoning
Accuracy (Plain Text) High for clean scans High for clean scans High for clean scans, can hallucinate
Accuracy (Stylized Text) Moderate High - excels with curved/embossed text High - handles unusual fonts well
Layout Preservation High - tables, forms Moderate High - tables, forms, handwriting
Semantic Understanding None None Full semantic interpretation
Speed Very fast Fast Moderate
Security Encrypted transit, no training, ephemeral processing Encrypted transit, no training, transient processing Encrypted transit, no training, transient processing
Pricing $0.001–$0.0015 per page $0.002–$0.01 per page $0.005–$0.02 per page
Service Model Cloud API Cloud API Cloud API

Sign up for your API keys (see the links at the bottom of this page to order) and send us your server selection to turn on search super powers...

Essential

$90.00/month
(annual billing)
  • 25 GB SSD disk space
  • 2 GB RAM / 100% vCPU
  • Litespeed + LSCache
  • Redis Object Cache
  • Unlimited data transfer
  • Includes SSL certificate

Team

$98.00/month
(annual billing)
  • 50 GB SSD disk space
  • 3 GB RAM / 200% vCPU
  • Litespeed + LSCache
  • Redis Object Cache
  • Unlimited data transfer
  • Includes SSL certificate

Department

$108.00/month
(annual billing)
  • 100 GB SSD space
  • 4 GB RAM / 200% vCPU
  • Litespeed + LSCache
  • Redis Object Cache
  • Unlimited data transfer
  • Includes SSL certificate

Professional

$122.00/month
(annual billing)
  • 150 GB SSD space
  • 6 GB RAM / 300% vCPU
  • Litespeed + LSCache
  • Redis Object Cache
  • Unlimited data transfer
  • Includes SSL certificate

Frequently Asked Questions

What is included in a pdfdq server subscription?
pdfdq subscriptions include a custom-compiled PHPRunner-based pdfdq web interface, a U.S.-based Litespeed PHP web server with LSCache enabled, a private dedicated MySQL relational database, and an active SSL certificate. The subscription amount varies depending on the server option selected above. Each pdfdq server is configured specifically for each subscription account. Please allow up to 24 hours for server provision.
How secure is my data?
All file uploads and API interactions occur over encrypted HTTPS connections. Files are isolated per request, and processed files are deleted or marked for deletion by the API at the end of each process. LLMWhisperer's ephemeral processing provides the strongest privacy and compliance posture by eliminating data retention entirely. GPT and OCR.Space's transient processing use limited, policy‑controlled retention to support the API's operational needs. pdfdq servers store the returned results in a private, dedicated, cloud-based SQL database along with the PDF file, which was moved to secure directory on the server during processing. System users are required to log-in with a registered user name and password that triggers a 2FA (2 Factor Authentication) message to the users registered email address containing a 6 character security code that must be entered in the log-in form to gain access.
What types of documents can pdfdq process?
pdfdq can process PDF documents up to 20 MB in size, and since practically any file can be printed to PDF (word processing documents, spreadsheets, images, blueprints, emails, invoices, scans, contracts, invoices, reports, forms, archives...) essentially any type of original document can be printed to PDF and processed. If your PDF files are large or text-dense you can use the pdf2img and img2pdf utilities (explained below, demo available above) to split the file into smaller files for processing.
What makes pdfdq more versatile than many other solutions?
Other solutions typically provide a single method for extracting text from PDF files while pdfdq provides 8 different file processing options. LLMWhisperer has Native, Low Cost, High Quality, Form, and Table modes, all with layout preservation. OCR.Space is great for general character recognition and excels at reading text embedded in images with stylized or slanted components. OCR.SpaceCode mode is slightly modified to work with files with embedded programming code, replacing leading "<" characters with a "|" character, making it so the code is still easy to read but won't inadvertently execute in the browser. OpenAI GPT-5.2 can generate detailed descriptions with semantically-related words for images and can also summarize the text results returned from APIs.
How accurate are the extraction or recognition results?
It depends heavily on the original PDF. If the file was printed to PDF by a word processor, spreadsheet, or other desktop application, it will often contain a text "layer" that Native mode can read with very high accuracy while preserving text layout. If the layer is not available (as is typical for scanned documents and images), providing the text is printed with readable fonts, then High Quality, Form, or Table will return accurate results, also preserving text layout. For general character recognition without layout preservation, or if the text has stylized or angled components, or if text is embedded in images, then OCRSpace will probably return the highest quality results. You can see the results of the different processing modes on various PDF content using the demo pdfdq interface above.
Where is the extracted data stored?
Recognized and extracted text or descriptions, result summaries, generated semantic metadata, optional reference label, and reference image are stored in your pdfdq subscription account's dedicated SQL database, along with a link to download and view the original uploaded PDF, which is moved to a secure directory on your pdfdq server after processing.
Can multiple users access the system?
pdfdq servers support web-enabled multi‑user access with role‑based permissions (administrators, editors, viewers), allowing teams to securely collaborate, including uploading documents, performing extraction, recognition, description processes, and running searches.
How does search work?
Search operates directly against SQL‑stored text and metadata, enabling dynamic full‑text queries using advanced AJAX-based search suggest hints and highlighted found text results. You can test the search interface using the example pdfdq search interface above. Enter a few characters, like "wer" or "ast" or "111" and pause for a second to see AJAX-based search hints. Click the magnifying glass icon to the right of the search box to run the search, then click the "more" link to view the found text, highlighted in red. You can also select the gear icon top right in the demo pdfdq interface above and select Advanced search or Show search panel to build more complex queries involving data from multiple fields.
Does pdfdq require plugins or desktop software?
pdfdq interfaces are entirely web browser‑based and don't require client‑side plugins or software installations. Desktop, web, and mobile devices equipped with standard web browsers (Chrome, Explorer, Firefox, Safari...) can all be used by authenticated users to connect and interact with the server.
what do the pdf2img, img2pdf, and txt2img utilities do?
The pdf2img utility converts PDF files into sequential PNG or JPEG images, one image per page. This is useful when you want to process sub-sections of large files. After the large file has been conveted to image files you can download all page images in a single zip file. Open the zip file, extract the pages you want to process, then use the img2pdf utility to generate a new PDF file containing just the selected pages. The txt2img utility can be used to overlay text onto images prior to converting the image to PDF for processing. Images can then be converted to PDF files either by using the img2pdf utility or by dragging the image into a web browser window and printing to PDF (you can optionally reduce or enlarge the image in this process). You can test all of these utilities using the example pdfdq search interface above.
Can pdfdq servers be integrated with other systems?
Servers are configured as isolated, stand-alone applications for security purposes. The only subscriber-available interface to the database is through the pdfdq web browser interface. Authenticated administrator level users can export extracted data in Word, Excel, or CSV formats. Servers can optionally be configured to allow external database access by request.
"Businesses with employees that can instantly locate important information could potentially have a pretty significant advantage over businesses with employees that can't."

Ready to get started? Sign up for API keys (see table below) and then contact us to register your pdfdq server subscription.


Here are links to the LLMWhisperer, OCRSpace, or OpenAI GPT API websites. You can order your API keys directly from these pages.

Service or Application Pricing Description
LLMWhisperer
  • $0.001–$0.0015/page
LLMWhisperer is adept at extracting complex data, especially repeating sections and line items when the layout of document needs to be preserved in the extracted text.
OCRSpace Pro PDF Plan
  • $60.00/month - 30,000 conversions
The OCR.space OCR service converts PDF files into editable text using Optical Character Recognition (OCR). Supports multi-page files and multi-column text recognition.
OpenAI GPT-5.2
  • $0.005–$0.02/page
The OpenAI API provides state-of-the-art text generation, natural language processing, and computer vision, ideal for describing individual images and summarizing process results.

pdfdq Pricing & Competitive Comparison

PDFDQ Pricing & Competitive Comparison

Platform Pricing

Fixed monthly server subscription. API processing costs are paid directly to the API service by the subscriber for wholesale pricing and visibility.

Plan Server Subscription Pages Included Estimated Processing Cost* Estimated Total Monthly Cost
Essential $90 Up to 10,000 ~$150 ~$240
Team $98 Up to 25,000 ~$375 ~$473
Department $108 Up to 50,000 ~$750 ~$858
Professional $122 Up to 100,000 ~$1,500 ~$1,622

*Estimated processing costs assume blended API service usage for the full Pages Included page count at approximately $0.015 per page. Actual costs will vary depending on the number of pages actually processed and the API services selected. LLMWhisperer Native Mode is the lowest processing cost per page, followed by the other LLMWhisperer modes, then OCR.Space, the GPT-5.2.

Competitive Comparison

Platform Pricing Model Estimated Monthly Cost Time to Deploy Cost Predictability Implementation Simplicity
pdfdq Server Subscription + API Pricing $240 – $1,622 1-2 Days High Very High
Relativity / Everlaw Per‑matter / per‑GB $3,000 – $15,000+ Weeks–Months Low Low
Hyperscience / Rossum Enterprise SaaS $2,000 – $20,000+ Months Low Low
ABBYY Enterprise license $2,000 – $10,000+ Months Medium Low
Google Document AI Usage‑based API $300 – $3,000+ Weeks Low Medium
AWS Textract Per‑page + infrastructure $250 – $2,500+ Weeks Low Low
Per‑Seat AI Assistants Seat‑based SaaS $1,000+ per user Days–Weeks Low Medium
DIY OCR + LLM Stack Internal build Hidden Months Very Low Very Low