Frequently Asked Questions
What is included in a pdfdq server subscription?
pdfdq subscriptions include a custom-compiled PHPRunner-based pdfdq web interface, a U.S.-based Litespeed PHP web server with LSCache enabled, a private dedicated MySQL relational database, and an active SSL certificate. The subscription amount varies depending on the server option selected above. Each pdfdq server is configured specifically for each subscription account. Please allow up to 24 hours for server provision.
How secure is my data?
All file uploads and API interactions occur over encrypted HTTPS connections. Files are isolated per request, and processed files are deleted or marked for deletion by the API at the end of each process. LLMWhisperer's ephemeral processing provides the strongest privacy and compliance posture by eliminating data retention entirely. GPT and OCR.Space's transient processing use limited, policy‑controlled retention to support the API's operational needs. pdfdq servers store the returned results in a private, dedicated, cloud-based SQL database along with the PDF file, which was moved to secure directory on the server during processing. System users are required to log-in with a registered user name and password that triggers a 2FA (2 Factor Authentication) message to the users registered email address containing a 6 character security code that must be entered in the log-in form to gain access.
What types of documents can pdfdq process?
pdfdq can process PDF documents up to 20 MB in size, and since practically any file can be printed to PDF (word processing documents, spreadsheets, images, blueprints, emails, invoices, scans, contracts, invoices, reports, forms, archives...) essentially any type of original document can be printed to PDF and processed. If your PDF files are large or text-dense you can use the pdf2img and img2pdf utilities (explained below, demo available above) to split the file into smaller files for processing.
What makes pdfdq more versatile than many other solutions?
Other solutions typically provide a single method for extracting text from PDF files while pdfdq provides 8 different file processing options. LLMWhisperer has Native, Low Cost, High Quality, Form, and Table modes, all with layout preservation. OCR.Space is great for general character recognition and excels at reading text embedded in images with stylized or slanted components. OCR.SpaceCode mode is slightly modified to work with files with embedded programming code, replacing leading "<" characters with a "|" character, making it so the code is still easy to read but won't inadvertently execute in the browser. OpenAI GPT-5.2 can generate detailed descriptions with semantically-related words for images and can also summarize the text results returned from APIs.
How accurate are the extraction or recognition results?
It depends heavily on the original PDF. If the file was printed to PDF by a word processor, spreadsheet, or other desktop application, it will often contain a text "layer" that Native mode can read with very high accuracy while preserving text layout. If the layer is not available (as is typical for scanned documents and images), providing the text is printed with readable fonts, then High Quality, Form, or Table will return accurate results, also preserving text layout. For general character recognition without layout preservation, or if the text has stylized or angled components, or if text is embedded in images, then OCRSpace will probably return the highest quality results. You can see the results of the different processing modes on various PDF content using the demo pdfdq interface above.
Where is the extracted data stored?
Recognized and extracted text or descriptions, result summaries, generated semantic metadata, optional reference label, and reference image are stored in your pdfdq subscription account's dedicated SQL database, along with a link to download and view the original uploaded PDF, which is moved to a secure directory on your pdfdq server after processing.
Can multiple users access the system?
pdfdq servers support web-enabled multi‑user access with role‑based permissions (administrators, editors, viewers), allowing teams to securely collaborate, including uploading documents, performing extraction, recognition, description processes, and running searches.
How does search work?
Search operates directly against SQL‑stored text and metadata, enabling dynamic full‑text queries using advanced AJAX-based search suggest hints and highlighted found text results. You can test the search interface using the example pdfdq search interface above. Enter a few characters, like "wer" or "ast" or "111" and pause for a second to see AJAX-based search hints. Click the magnifying glass icon to the right of the search box to run the search, then click the "more" link to view the found text, highlighted in red. You can also select the gear icon top right in the demo pdfdq interface above and select Advanced search or Show search panel to build more complex queries involving data from multiple fields.
Does pdfdq require plugins or desktop software?
pdfdq interfaces are entirely web browser‑based and don't require client‑side plugins or software installations. Desktop, web, and mobile devices equipped with standard web browsers (Chrome, Explorer, Firefox, Safari...) can all be used by authenticated users to connect and interact with the server.
what do the pdf2img, img2pdf, and txt2img utilities do?
The pdf2img utility converts PDF files into sequential PNG or JPEG images, one image per page. This is useful when you want to process sub-sections of large files. After the large file has been conveted to image files you can download all page images in a single zip file. Open the zip file, extract the pages you want to process, then use the img2pdf utility to generate a new PDF file containing just the selected pages. The txt2img utility can be used to overlay text onto images prior to converting the image to PDF for processing. Images can then be converted to PDF files either by using the img2pdf utility or by dragging the image into a web browser window and printing to PDF (you can optionally reduce or enlarge the image in this process). You can test all of these utilities using the example pdfdq search interface above.
Can pdfdq servers be integrated with other systems?
Servers are configured as isolated, stand-alone applications for security purposes. The only subscriber-available interface to the database is through the pdfdq web browser interface. Authenticated administrator level users can export extracted data in Word, Excel, or CSV formats. Servers can optionally be configured to allow external database access by request.