We just OCR'd 27,000 arxiv papers into Markdown using an open 5B model, 16 parallel HF Jobs on L40S GPUs, and a mounted bucket. Total cost: $850 Total time: ~29 hours Jobs that crashed: 0 This now powers "Chat with your paper" on http://
hf.co/papers
Open source OCR processes 27000 arxiv papers efficiently
By
–
Leave a Reply