Imagine a world where artificial intelligences can navigate the web as easily as we do. This could become a reality thanks to a new standard called llms.txt, proposed by AI veteran Jeremy Howard. This standard is designed to help AI systems find and process information more efficiently.
Why is it relevant? Most current web pages are designed for humans, which can make it difficult for language models, as they often struggle to handle large volumes of text. The llms.txt is proposed as a solution to this difficulty, allowing models to access content in a more focused and friendly manner.
Making the web more accessible for LLMs
The format of llms.txt is simple and effective. Each file begins with the project name and a brief summary, followed by additional details and links to other documents in Markdown format.
This structure is intended to improve the reading and understanding of websites by AI systems.
Additionally, Howard suggests that website owners provide Markdown versions of their HTML pages by simply adding .md to their URLs. Projects like FastHTML are already implementing this approach by automatically generating Markdown versions of their documents.
This initiative could be especially beneficial for developers and code libraries, as it would make it easier for AIs to understand structured information. The AI company Anthropic has also uploaded its own llms.txt for its documentation, highlighting the importance of this standard.
Collaboration with existing web standards
The llms.txt does not come to replace but to coexist with already known web tools, such as robots.txt and sitemap.xml. While these standards help search engines crawl pages, the llms.txt focuses on helping AIs identify and understand the most relevant content of a site, including links to additional resources.
The key to the success of this new standard lies in adoption by web developers. If enough sites begin to use llms.txt, we could witness a radical change in the way AIs read and understand online content.
However, essential questions about the future of the web also arise. Who is responsible when an AI rewrites the content of a site? How is the copyright of the owners protected? These questions still await clear answers from AI labs.
0 Comments