LoginLogin
Might make SBS readonly: thread

Lossless floating-point to string conversion, and back.

Root / FAQs / [.]

SquareFingersCreated:
I am looking for a way to convert a value stored in a floating-point variable, to a string. The conversion must be lossless, i.e. any two distinct floating-point values must map to distinct strings (so since 1+1e-10==1 is false, and STR$(1) and STR$(1+1e-10) both give the same string, this means STR$ doesn't do the job). And there must be a way to convert the string back to the precise floating-point value that generated it. Does anyone know of a way to do this?

Does FORMAT$() work? I know there are precision parameters that can be passed.

FORMAT$("%20F",1+1E-10) gives the string " 1.000000", and increasing the number between "%" and "F" just adds more spaces to the left. This is the same as FORMAT$("%20F",1), and 1+1E-10==1 is false. EDIT: There is more to FORMAT$ than I thought, so this is worth more investigating. A#=1E-307:?A#==0 gives false, and A#=1E-307:A#=A#/10:?A#==0 gives true, so the smallest value that can be stored in a floating-point variable appears to be between 1E-307 and 1E-308. FORMAT$("%0.308F",A#) would appear to do the trick. The strings will be long, but I'm not that concerned with that. That's one half of the problem, thanks. Now I need a way to convert it back to a floating-point value. VAL won't do the trick for very small values. Maybe if, for very small values, I trim zeroes from the left, then add "e-(however many zeroes I trimmed)" to the right of the string, then VAL will do the trick... I shall investigate. EDIT: Oh no! Checking for equality to zero does not check for equality to zero. A#=1E-323:?A#==0 gives true, but then FORMAT$("%0.330F",A#) gives a string which is different from zero. 3E-324 appears to be about the smallest value for which FORMAT$ gives a string different from zero.

Interesting, it seems that denormalized values are being treated as zero in calculations. Might be a quirk of the floating-point hardware? Edit: Might be flush-to-zero mode: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473c/CJAJBEAF.html

If the floats are 64-bit and a character is 8 bits, then there MUST be a unique representation for each floating point value with just 8 characters. But I'm not sure if the floats are 32 or 64 (double), and also, text is Unicode which means that some characters are 8 bits and some are 16, making things more complicated than that. If we can get each byte or every two bytes from a float in an integer, then this is easy. But is there a way to do this? By the way, do you need this code to run fast? Just curious. Depending on what you need this for, you might be better off reworking your code to use fixed-point numbers which can be stored as integers, making string conversion a lot easier with CHR$. Edit: I'll tell you, SmileBoom made this sort of thing a lot simpler in Petit Computer, which had only fixed-point numbers. Every number was some integer multiplied by 1/4096, or in other words, there were 12 bits after the point. Even integers were saved in this format, e.g. 6 = 110.000000000000 binary

If the floats are 64-bit and a character is 8 bits, then there MUST be a unique representation for each floating point value with just 8 characters. But I'm not sure if the floats are 32 or 64 (double), and also, text is Unicode which means that some characters are 8 bits and some are 16, making things more complicated than that. If we can get each byte or every two bytes from a float in an integer, then this is easy. But is there a way to do this? By the way, do you need this code to run fast? Just curious. Depending on what you need this for, you might be better off reworking your code to use fixed-point numbers which can be stored as integers, making string conversion a lot easier with CHR$. Edit: I'll tell you, SmileBoom made this sort of thing a lot simpler in Petit Computer, which had only fixed-point numbers. Every number was some integer multiplied by 1/4096, or in other words, there were 12 bits after the point. Even integers were saved in this format, e.g. 6 = 110.000000000000 binary
SB uses UTF-16 internally, so every character in a string is really 2 bytes. The UTF-8 conversion happens later when saving to a text file, but it's handled in such a way that you won't need to worry about it. So, 4 characters could store the binary representation of a real if encoded properly.

SB uses UTF-16 internally, so every character in a string is really 2 bytes. The UTF-8 conversion happens later when saving to a text file, but it's handled in such a way that you won't need to worry about it. So, 4 characters could store the binary representation of a real if encoded properly.
Thanks for the clarification. What does SB do when using CHR$ with a code that isn't in their font?

Character codes just wrap around. CHR$(65536) is the same as CHR$(0).

By the way, do you need this code to run fast?
Not super fast. Right now I've got code which I think will work for my purposes:
DEF ST$(A#)
 VAR A$,E%
 A$=FORMAT$("%0.324F",A#)
 WHILE (RIGHT$(A$,1)=="0")
  A$=LEFT$(A$,LEN(A$)-1)
 WEND
 IF (RIGHT$(A$,1)==".") THEN
  A$=LEFT$(A$,LEN(A$)-1)
  E%=0
  WHILE (RIGHT$(A$,1)=="0")
   INC E%
   A$=LEFT$(A$,LEN(A$)-1)
  WEND
  IF (A$=="") THEN A$="0" ELSE A$=A$+"E"+STR$(E%)
 ELSE
  E%=0
  WHILE (LEFT$(A$,3)=="0.0")
   DEC E%
   A$="0."+RIGHT$(A$,LEN(A$)-3)
  WEND
  A$=A$+"E"+STR$(E%)
 ENDIF
 RETURN A$
END
I think VAL will undo this operation to give exactly the same floating-point value I started with.
you might be better off reworking your code
The floating-point values are not part of my code: I am working on a tool which will receive data from external sources.

Character codes just wrap around. CHR$(65536) is the same as CHR$(0).
Thanks, although this doesn't answer my question - maybe you misunderstood. Are there 65536 characters defined in the SmileBASIC font? If not, what happens if you insert a value that isn't defined? Will it be saved correctly, and if so, how will it get rendered? Edit: for example, I come from Israel. If I do CHR$(&H05D0), do I get א or what?
you might be better off reworking your code
The floating-point values are not part of my code: I am working on a tool which will receive data from external sources.
That's why I said "Depending on what you need this for" ;)

You can use any of the 65536 character values, but ones that are undefined will all map to the same symbol when displayed, which is a box or something. I would say that the character encoding is actually closer to UCS-2 (because characters larger than 2 bytes are not handled).

Character codes just wrap around. CHR$(65536) is the same as CHR$(0).
Thanks, although this doesn't answer my question - maybe you misunderstood. Are there 65536 characters defined in the SmileBASIC font? If not, what happens if you insert a value that isn't defined? Will it be saved correctly, and if so, how will it get rendered? Edit: for example, I come from Israel. If I do CHR$(&H05D0), do I get א or what?
The console font has around 4000 characters, the dialog font more. (I think around 5000?) As calc said, characters that aren't defined just show up as a stock "unknown" character, but the character code is still preserved. 05D0 is defined in neither font, unfortunately.

The console font has around 4000 characters, the dialog font more. (I think around 5000?) As calc said, characters that aren't defined just show up as a stock "unknown" character, but the character code is still preserved. 05D0 is defined in neither font, unfortunately.
I see. Thanks! I've got a bunch of things I want to test with this, but they'll just have to wait for when it's finally released over here.