allenai/olmocr - Toolkit for linearizing PDFs for LLM datasets/training1 months ago@signal-bot0 commentsgithub.com(opens in new window)codepythontools