Vaquill ingests over 50 file formats - the ones that show up in real legal practice. Upload them as-is; the system extracts text, runs OCR where needed, and makes the content searchable across your matter.Documentation Index
Fetch the complete documentation index at: https://vaquill.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Documents (PDF, Word, RTF, TXT, Markdown, HTML)
Documents (PDF, Word, RTF, TXT, Markdown, HTML)
| Format | Notes |
|---|---|
| Native and image-only; OCR runs automatically on scanned pages | |
| DOCX | Modern Word format; preserves headings and tracked changes metadata |
| DOCM | Macro-enabled Word; macros are ignored, text is extracted |
| DOC | Legacy Word; converted before extraction |
| DOT, DOTX | Word templates |
| RTF | Rich Text Format |
| TXT | Plain text |
| MD | Markdown |
| HTML, HTM | Web pages and HTML exports |
Spreadsheets (Excel, CSV)
Spreadsheets (Excel, CSV)
| Format | Notes |
|---|---|
| XLSX | Modern Excel; all sheets ingested |
| XLSM | Macro-enabled Excel |
| XLS | Legacy Excel |
| XLTX, XLTM, XLT | Excel templates |
| CSV | Tabular data |
Presentations (PowerPoint)
Presentations (PowerPoint)
| Format | Notes |
|---|---|
| PPTX | Modern PowerPoint; slide content and speaker notes |
| PPTM | Macro-enabled PowerPoint |
| PPT | Legacy PowerPoint |
| POT, POTX | PowerPoint templates |
Images (OCR applied automatically)
Images (OCR applied automatically)
| Format | Notes |
|---|---|
| PNG | OCR runs automatically |
| JPG, JPEG | OCR runs automatically |
| GIF | OCR runs on the first frame |
| WebP | OCR runs automatically |
| TIFF, TIF | Multi-page TIFFs supported |
| BMP | OCR runs automatically |
| SVG | Text elements extracted |
Email (EML, MSG, MBOX, PST)
Email (EML, MSG, MBOX, PST)
| Format | Notes |
|---|---|
| EML | Single message export; attachments ingested separately |
| MSG | Outlook message format |
| MBOX | Mailbox archive; all messages ingested |
| PST | Outlook archive; all folders ingested |
Code and data (source files, JSON, XML)
Code and data (source files, JSON, XML)
| Format | Notes |
|---|---|
| PY, JS, TS, JAVA, CPP, GO, RUST | Source code files; useful for IP and code-review matters |
| JSON, XML | Structured data |
Archives (ZIP, TAR, RAR, 7Z)
Archives (ZIP, TAR, RAR, 7Z)
| Format | Notes |
|---|---|
| ZIP | Contents extracted recursively |
| TAR, TAR.GZ | Contents extracted recursively |
| RAR | Contents extracted recursively |
| 7Z | Contents extracted recursively |
OCR for Scanned Documents
For image-based files and scanned PDFs:- OCR runs automatically on upload
- The resulting text is searchable like any other document
- The original image is preserved; click any citation to see the highlighted passage on the page
- Handwritten content is recognized where legible
- Common OCR artifacts (broken hyphens, mis-recognized characters) are cleaned up automatically
Size and Page Limits
Upload limits vary by plan - see your dashboard or the Subscriptions page for current per-plan caps on file size, page count, and batch size. For very large files or batches, contact support to discuss enterprise ingestion options.What Happens After Upload
Upload progress and processing status are visible in the matter document list.
Tips
Related
Document Search
Search across everything you have uploaded.
Email Ingestion
Forward emails to ingest documents and attachments.
Drive Import
Pull documents directly from Google Drive.

