返回首頁

Benchmark Results

Outstanding Performance of Bailu AI Models Across International Benchmarks

AIME 25

AIME 25

Advanced Mathematical Reasoning

BrowseComp

BrowseComp

Web Browsing & Information Comprehension

GPQA

GPQA

Graduate-Level Question Answering

HLE

HLE

Human-Level Evaluation

LiveCodeBench V6

LiveCodeBench V6

Live Code Generation

τ²-Bench

τ²-Bench

Tool Use & Multi-step Reasoning

Terminal-Bench

Terminal-Bench

Terminal Commands & System Operations

SWE-Bench Verified

SWE-Bench Verified

Software Engineering Problem Solving

8
Benchmarks
4
Categories
頂尖
Industry Level
持續
Continuous Updates

© 2025 BAILU CODE. All Rights Reserved.