If you hack a file with invalid characters in SB, it will only show parts of the file up to the invalid characters.
Also, it will FREEZE SmileBasic if you try to PRGEDIT a line that is after the character.
L4CNN3Z4
Crossed out 4 is still regular 4 ;(
Files with invalid UTF-8 data are cut off
Root / SmileBASIC Bug Reports / [.]
12Me21Created:
What is an 'invalid character'?
UTF-16 is still supposed to handle all unicode characters.
Q: What is UTF-16? A: UTF-16 uses a single 16-bit code unit to encode the most common 63K characters, and a pair of 16-bit code units, called surrogates, to encode the 1M less commonly used characters in Unicode. Originally, Unicode was designed as a pure 16-bit encoding, aimed at representing all modern scripts. (Ancient scripts were to be represented with private-use characters.) Over time, and especially after the addition of over 14,500 composite characters for compatibility with legacy sets, it became clear that 16-bits were not sufficient for the user community. Out of this arose UTF-16.Basically, unicode will never support character codes higher than 1,114,111, and all versions of Unicode (UTF-8, UTF-16, and UTF-32) are able to represent ALL unicode characters.
well, they don't really have a choice. The goal is to have a single code for every language in the world, and some languages just have completely identical-looking characters with completely different meanings.
I don't really know about messy and disorganized, you're probably right about that, and fonts pretty much never try to support more than two or three languages (one of them nearly always being english). But there's really no better alternative. I think it's amazing that we have one, single code that can be shared with anyone in the world and any software, and not lose any data. Before unicode, everyone had a completely different system.