Importing PDF format
PDF support is available starting from version 0.1.23 (download latest version). While we do support PDF files, there are some important considerations to keep in mind:
Understanding PDF Complexity
PDF files can be challenging to parse accurately because they don’t follow a standardized internal structure. Unlike formats such as HTML or Markdown, PDFs are primarily designed for visual presentation, which means:
- Text flow and ordering can vary significantly between documents
- Layout and formatting information may not translate cleanly
- Different PDF creation tools can produce very different internal structures
Current Limitations
We are aware of certain limitations in our PDF handling:
- Table parsing may not always be accurate
- Images from PDF files are not currently displayed
- Complex layouts might not be interpreted correctly
Recommendations
If you have access to alternative formats for the same content, we recommend using them instead. For example:
- For academic papers, if available on arXiv, use the HTML version instead of PDF
- For documentation, prefer native web formats or markdown when possible
Support and Feedback
If you notice any issues with PDF parsing or have suggestions for improvement, please contact our support team. While we’re actively working on improving PDF support, we appreciate your feedback to help us identify and prioritize improvements.