Birdwatch Note
2025-02-20 09:22:02 UTC - MISINFORMED_OR_POTENTIALLY_MISLEADING
o3-mini isn't better than grok 3 in EVERY eval. Even without the cons@64 method, grok 3 is better than o3-mini on GPQA (80.2 x 79.7). Without the cons@64 method, Grok 3 mini also outperforms o3-mini in 3 out of the 5 benchmarks shown (AIME'24, GPQA, and LiveCodeBench v5). https://x.ai/blog/grok-3
Written by 917D318FC303781283CA16D8059B3E4C0F52D027F9504C1981D25F625B348A66
Participant Details
Original Tweet
Tweet embedding is no longer reliably available, due to the platform's instability (in terms of both technology and policy). If the Tweet still exists, you can view it here: https://twitter.com/foo_bar/status/1892407015038996740
Please note, though, that you may need to have your own Twitter account to access that page. I am currently exploring options for archiving Tweet data in a post-API context.
All Information
- ID - 1892504979610521633
- noteId - 1892504979610521633
- participantId -
- noteAuthorParticipantId - 917D318FC303781283CA16D8059B3E4C0F52D027F9504C1981D25F625B348A66 Participant Details
- createdAtMillis - 1740043322838
- tweetId - 1892407015038996740
- classification - MISINFORMED_OR_POTENTIALLY_MISLEADING
- believable -
- harmful -
- validationDifficulty -
- misleadingOther - 0
- misleadingFactualError - 1
- misleadingManipulatedMedia - 0
- misleadingOutdatedInformation - 0
- misleadingMissingImportantContext - 0
- misleadingUnverifiedClaimAsFact - 0
- misleadingSatire - 0
- notMisleadingOther - 0
- notMisleadingFactuallyCorrect - 0
- notMisleadingOutdatedButNotWhenWritten - 0
- notMisleadingClearlySatire - 0
- notMisleadingPersonalOpinion - 0
- trustworthySources - 1
- summary
- o3-mini isn't better than grok 3 in EVERY eval. Even without the cons@64 method, grok 3 is better than o3-mini on GPQA (80.2 x 79.7). Without the cons@64 method, Grok 3 mini also outperforms o3-mini in 3 out of the 5 benchmarks shown (AIME'24, GPQA, and LiveCodeBench v5). https://x.ai/blog/grok-3
Note Ratings
rated at | rated by | |
2025-02-20 21:42:03 -0600 | Rating Details | |
2025-02-20 10:08:11 -0600 | Rating Details | |
2025-02-20 09:54:12 -0600 | Rating Details | |
2025-02-20 09:14:24 -0600 | Rating Details | |
2025-02-20 08:28:47 -0600 | Rating Details | |
2025-02-20 07:31:42 -0600 | Rating Details | |
2025-02-20 07:16:50 -0600 | Rating Details | |
2025-02-20 07:09:35 -0600 | Rating Details | |
2025-02-20 07:03:17 -0600 | Rating Details | |
2025-02-20 07:00:28 -0600 | Rating Details | |
2025-02-20 07:00:03 -0600 | Rating Details | |
2025-02-20 06:27:41 -0600 | Rating Details | |
2025-02-20 06:21:27 -0600 | Rating Details | |
2025-02-20 05:47:24 -0600 | Rating Details | |
2025-02-20 05:45:57 -0600 | Rating Details | |
2025-02-20 05:27:56 -0600 | Rating Details | |
2025-02-20 05:18:44 -0600 | Rating Details | |
2025-02-20 05:18:20 -0600 | Rating Details | |
2025-02-20 05:13:24 -0600 | Rating Details | |
2025-02-20 05:12:28 -0600 | Rating Details | |
2025-02-20 04:46:18 -0600 | Rating Details | |
2025-02-20 04:20:39 -0600 | Rating Details | |
2025-02-20 03:58:15 -0600 | Rating Details | |
2025-02-20 03:58:04 -0600 | Rating Details | |
2025-02-20 03:51:41 -0600 | Rating Details | |
2025-02-20 03:35:46 -0600 | Rating Details | |
2025-02-20 03:31:59 -0600 | Rating Details | |
2025-02-21 02:20:22 -0600 | Rating Details | |
2025-02-20 12:20:03 -0600 | Rating Details | |
2025-02-20 10:30:53 -0600 | Rating Details | |
2025-02-20 07:44:21 -0600 | Rating Details | |
2025-02-20 07:25:26 -0600 | Rating Details | |
2025-02-20 07:19:42 -0600 | Rating Details | |
2025-02-20 06:40:22 -0600 | Rating Details | |
2025-02-20 06:29:06 -0600 | Rating Details | |
2025-02-20 06:06:46 -0600 | Rating Details | |
2025-02-20 05:03:15 -0600 | Rating Details | |
2025-02-20 04:59:17 -0600 | Rating Details | |
2025-02-20 04:30:19 -0600 | Rating Details | |
2025-02-20 04:29:18 -0600 | Rating Details | |
2025-02-20 03:46:30 -0600 | Rating Details | |
2025-02-20 03:28:23 -0600 | Rating Details | |
2025-02-24 12:39:58 -0600 | Rating Details | |
2025-02-23 07:59:51 -0600 | Rating Details | |
2025-02-20 18:26:04 -0600 | Rating Details | |
2025-02-20 15:12:51 -0600 | Rating Details | |
2025-02-20 12:14:33 -0600 | Rating Details | |
2025-02-20 12:08:46 -0600 | Rating Details | |
2025-02-20 07:13:04 -0600 | Rating Details | |
2025-02-20 06:46:07 -0600 | Rating Details | |
2025-02-20 06:44:43 -0600 | Rating Details | |
2025-02-20 06:30:46 -0600 | Rating Details | |
2025-02-20 06:08:34 -0600 | Rating Details | |
2025-02-20 06:00:05 -0600 | Rating Details | |
2025-02-20 05:45:33 -0600 | Rating Details | |
2025-02-20 05:45:17 -0600 | Rating Details | |
2025-02-20 05:22:50 -0600 | Rating Details | |
2025-02-20 05:06:30 -0600 | Rating Details | |
2025-02-20 05:02:23 -0600 | Rating Details | |
2025-02-20 04:32:46 -0600 | Rating Details | |
2025-02-20 04:12:46 -0600 | Rating Details | |
2025-02-20 04:01:00 -0600 | Rating Details | |
2025-02-20 10:53:47 -0600 | Rating Details | |
2025-02-20 10:40:10 -0600 | Rating Details | |
2025-02-20 07:03:47 -0600 | Rating Details | |
2025-02-20 07:03:07 -0600 | Rating Details | |
2025-02-20 07:01:09 -0600 | Rating Details | |
2025-02-20 06:58:39 -0600 | Rating Details | |
2025-02-20 06:27:40 -0600 | Rating Details | |
2025-02-20 05:47:16 -0600 | Rating Details | |
2025-02-20 05:06:41 -0600 | Rating Details | |
2025-02-20 04:44:27 -0600 | Rating Details | |
2025-02-20 04:38:34 -0600 | Rating Details | |
2025-02-20 03:46:42 -0600 | Rating Details | |
2025-07-08 07:47:35 -0500 | Rating Details |