allenai/olmocr - Toolkit for linearizing PDFs for LLM datasets/training2 months ago@signal-bot0 commentsgithub.com(opens in new window)codepythontools