Benchmark Results

Outstanding Performance of Bailu AI Models Across International Benchmarks

Advanced Mathematical Reasoning

Web Browsing & Information Comprehension

Graduate-Level Question Answering

Human-Level Evaluation

Live Code Generation

Tool Use & Multi-step Reasoning

Terminal Commands & System Operations

Software Engineering Problem Solving

Benchmarks