Hi there. This is definitely an interesting topic but not covered as the focus is more on the LLM itself rather than data cleaning etc. But I remember from my grad school days that there are several Python libraries that allow you to extract text from PDFs etc.
Leave a Reply