Skip to main content

Documentation Index

Fetch the complete documentation index at: https://vaquill.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Vaquill ingests over 50 file formats - the ones that show up in real legal practice. Upload them as-is; the system extracts text, runs OCR where needed, and makes the content searchable across your matter.
FormatNotes
PDFNative and image-only; OCR runs automatically on scanned pages
DOCXModern Word format; preserves headings and tracked changes metadata
DOCMMacro-enabled Word; macros are ignored, text is extracted
DOCLegacy Word; converted before extraction
DOT, DOTXWord templates
RTFRich Text Format
TXTPlain text
MDMarkdown
HTML, HTMWeb pages and HTML exports
FormatNotes
XLSXModern Excel; all sheets ingested
XLSMMacro-enabled Excel
XLSLegacy Excel
XLTX, XLTM, XLTExcel templates
CSVTabular data
FormatNotes
PPTXModern PowerPoint; slide content and speaker notes
PPTMMacro-enabled PowerPoint
PPTLegacy PowerPoint
POT, POTXPowerPoint templates
FormatNotes
PNGOCR runs automatically
JPG, JPEGOCR runs automatically
GIFOCR runs on the first frame
WebPOCR runs automatically
TIFF, TIFMulti-page TIFFs supported
BMPOCR runs automatically
SVGText elements extracted
FormatNotes
EMLSingle message export; attachments ingested separately
MSGOutlook message format
MBOXMailbox archive; all messages ingested
PSTOutlook archive; all folders ingested
Attachments within email files are extracted and ingested as separate searchable documents, with a link back to the parent message.
FormatNotes
PY, JS, TS, JAVA, CPP, GO, RUSTSource code files; useful for IP and code-review matters
JSON, XMLStructured data
FormatNotes
ZIPContents extracted recursively
TAR, TAR.GZContents extracted recursively
RARContents extracted recursively
7ZContents extracted recursively
When you upload an archive, the system extracts and ingests each file inside as a separate searchable document. Folder structure is preserved.
Password-protected files (encrypted PDFs, password-locked archives, protected Office files) cannot be processed automatically. Remove the password before upload, or supply the password via the per-file ingestion options.

OCR for Scanned Documents

For image-based files and scanned PDFs:
  • OCR runs automatically on upload
  • The resulting text is searchable like any other document
  • The original image is preserved; click any citation to see the highlighted passage on the page
  • Handwritten content is recognized where legible
  • Common OCR artifacts (broken hyphens, mis-recognized characters) are cleaned up automatically

Size and Page Limits

Upload limits vary by plan - see your dashboard or the Subscriptions page for current per-plan caps on file size, page count, and batch size. For very large files or batches, contact support to discuss enterprise ingestion options.

What Happens After Upload

1

Safety scan

The file is scanned for safety. No untrusted code is executed.
2

Text extraction and OCR

Text is extracted; OCR runs on image-based content.
3

Indexing

The document is indexed for search across your matter.
4

Citation detection

Citations within the document are detected and linked to authorities.
5

Ready for analysis

The document is available for analysis in any tool.
Upload progress and processing status are visible in the matter document list.

Tips

Group with archives. Upload a ZIP of all documents in a discovery production to preserve folder structure.
Email-to-matter. Forward EML or MSG files to the matter’s unique email address for automatic ingestion (see Email Ingestion).
Re-OCR if needed. If the OCR quality is poor on a scanned document, you can request a re-OCR pass with stricter settings.

Document Search

Search across everything you have uploaded.

Email Ingestion

Forward emails to ingest documents and attachments.

Drive Import

Pull documents directly from Google Drive.