Llama Version o1 Released, Based on AlphaGo Zero Paradigm

Explore the open-source O1 project, recreating OpenAI’s model with AlphaGo Zero, Monte Carlo Tree Search, and LLaMA for advanced AI reasoning.

Nov 05, 2024

∙ Paid

Recreating OpenAI O1 Inference Large Model: Latest Progress in the Open-Source Community

The LLaMA-based O1 project has just been released by the Shanghai AI Lab team.

The description clearly mentions the use of Monte Carlo Tree Search, Self-Play Reinforcement Learning, PPO, and AlphaGo Zero’s dual-strategy paradigm (prior strategy + value assessment).

Point to be noted - they mentioned Alpha Go Zero, not Alpha Go! Which means it is not dependent on human-generated data, but based on pure RL with a known set of "rules"!

This is even bigger than o1! Big, if true!

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.

AI Disruption

Llama Version o1 Released, Based on AlphaGo Zero Paradigm

Explore the open-source O1 project, recreating OpenAI’s model with AlphaGo Zero, Monte Carlo Tree Search, and LLaMA for advanced AI reasoning.

Continue reading this post for free, courtesy of Meng Li.