Alibaba's Open-Source Web Agent Tops Leaderboard
Alibaba's WebSailor: First open-source web agent to challenge BrowseComp benchmark, achieving superhuman performance on complex information retrieval tasks.
"AI Disruption" Publication 7100 Subscriptions 20% Discount Offer Link.
In internet information retrieval tasks, even powerful LLMs sometimes get trapped in "information fog":
When problems are simple and paths are clear, models can often find answers using memory or one or two searches. However, when facing highly uncertain problems with vague clues, models struggle to get it right.
For example, when we ask a straightforward question (like "What is the population of a certain city?"), a search engine can find the answer immediately.
But if the question is designed to be very complex, such as "What is the name of this musical piece that is closely related to a South American capital, whose lyricist received a local honorary title in the early 21st century, and whose melody composer attended a famous art academy in western Colombia?" both humans and AI find it very difficult to find a direct entry point.
These types of problems require reading many web pages, carefully piecing together clues step by step, and gradually clearing away the fog to find the answer. This exceeds the capacity of human limited memory and attention, and far surpasses the capabilities of ordinary open-source models.
Is there a way to enable open-source large models to master this ability to see through the clouds?
Alibaba Tongyi Lab's latest proposed solution, WebSailor, significantly improves the performance of open-source models on complex web reasoning tasks through a complete set of innovative post-training methods.