Importing PDF format

PDF support is available starting from version 0.1.23 (download latest version). While we do support PDF files, there are some important considerations to keep in mind:

Understanding PDF Complexity

PDF files can be challenging to parse accurately because they don’t follow a standardized internal structure. Unlike formats such as HTML or Markdown, PDFs are primarily designed for visual presentation, which means:

  • Text flow and ordering can vary significantly between documents
  • Layout and formatting information may not translate cleanly
  • Different PDF creation tools can produce very different internal structures

Current Limitations

We are aware of certain limitations in our PDF handling:

  • Table parsing may not always be accurate
  • Images from PDF files are not currently displayed
  • Complex layouts might not be interpreted correctly

Recommendations

If you have access to alternative formats for the same content, we recommend using them instead. For example:

  • For academic papers, if available on arXiv, use the HTML version instead of PDF
  • For documentation, prefer native web formats or markdown when possible

Support and Feedback

If you notice any issues with PDF parsing or have suggestions for improvement, please contact our support team. While we’re actively working on improving PDF support, we appreciate your feedback to help us identify and prioritize improvements.