Birdwatch Note Rating
2025-05-04 15:51:18 UTC - HELPFUL
Rated by Participant: 4270719BEB4B8B86E5E1FC4D050BE66C2B30ED9B05CFE1F8C0B907069BD3367D
Participant Details
Original Note:
The post here is claiming that it "beats Sonnet 3.7 Thinking and OpenAI o1", but what is missing is that Qwen3-235B is using a whole format, and the rest are using diff, which means it isn't a fair comparison and not accurate as diff formats are much harder for models to do. https://aider.chat/docs/more/edit-formats.html
All Note Details