Importing PDF format

PDF support is available starting from version 0.1.23 (download latest version). But it got a big improvement on version (0.1.51). While we do support PDF files, there are some important considerations to keep in mind:

Understanding PDF Complexity

PDF files can be challenging to parse accurately because they don’t follow a standardized internal structure. Unlike formats such as HTML or Markdown, PDFs are primarily designed for visual presentation, which means:

Text flow and ordering can vary significantly between documents
Layout and formatting information may not translate cleanly
Different PDF creation tools can produce very different internal structures

Current Limitations

We are aware of certain limitations in our PDF handling:

Table parsing may not always be accurate
Images from PDF files are not currently displayed
Complex layouts might not be interpreted correctly
PDFs that are basically images (store data as images) can’t be processed. We don’t support OCR at the moment.

Speed

Processing PDF documents can be slow. Starting version 0.1.51 local AI models are used to improve the quality and address challenges. Once you import a medium to large PDF file you might prefer to let it be processed in the background. Documents that are being processed won’t show up in the home page. You can see the in progress documents in the In-Progress Actions panel on top right.

In Progress Actions Panel

Recommendations

If you have access to alternative formats for the same content, we recommend using them instead. For example:

For academic papers, if available on arXiv, use the HTML version instead of PDF
For documentation, prefer native web formats or markdown when possible

Support and Feedback

If you notice any issues with PDF parsing or have suggestions for improvement, please contact our support team. While we’re actively working on improving PDF support, we appreciate your feedback to help us identify and prioritize improvements.