Birdwatch Note Rating
2024-01-25 16:06:02 UTC - NOT_HELPFUL
Rated by Participant: B35FEC9D7D5AD9E62A3275F7FA7C70C359713A4E241BEDA1DB693CD54F75CCDD
Participant Details
Original Note:
This paper is built on (very) problematic foundations As RJ Skerry-Ryan at Google pointed out, there is no such thing as raw bytes, and UTF-8 can be considered a hand-designed, highly biased tokenizer. This shouldn't have made it through internal peer-review. References: https://en.m.wikipedia.org/wiki/UTF-8 https://x.com/rustyryan/status/1750338422181609832?s=20
All Note Details