Lossless floating-point to string conversion, and back.

SquareFingersCreated: ~10 years ago

I am looking for a way to convert a value stored in a floating-point variable, to a string. The conversion must be lossless, i.e. any two distinct floating-point values must map to distinct strings (so since 1+1e-10==1 is false, and STR$(1) and STR$(1+1e-10) both give the same string, this means STR$ doesn't do the job). And there must be a way to convert the string back to the precise floating-point value that generated it. Does anyone know of a way to do this?

~10 years ago

calc84maniac #2

Does FORMAT$() work? I know there are precision parameters that can be passed.

~10 years ago

SquareFingers #3

FORMAT$("%20F",1+1E-10) gives the string " 1.000000", and increasing the number between "%" and "F" just adds more spaces to the left. This is the same as FORMAT$("%20F",1), and 1+1E-10==1 is false. EDIT: There is more to FORMAT$ than I thought, so this is worth more investigating. A#=1E-307:?A#==0 gives false, and A#=1E-307:A#=A#/10:?A#==0 gives true, so the smallest value that can be stored in a floating-point variable appears to be between 1E-307 and 1E-308. FORMAT$("%0.308F",A#) would appear to do the trick. The strings will be long, but I'm not that concerned with that. That's one half of the problem, thanks. Now I need a way to convert it back to a floating-point value. VAL won't do the trick for very small values. Maybe if, for very small values, I trim zeroes from the left, then add "e-(however many zeroes I trimmed)" to the right of the string, then VAL will do the trick... I shall investigate. EDIT: Oh no! Checking for equality to zero does not check for equality to zero. A#=1E-323:?A#==0 gives true, but then FORMAT$("%0.330F",A#) gives a string which is different from zero. 3E-324 appears to be about the smallest value for which FORMAT$ gives a string different from zero.

~10 years agoEdited ~10 years ago by SquareFingers

calc84maniac #4

Interesting, it seems that denormalized values are being treated as zero in calculations. Might be a quirk of the floating-point hardware? Edit: Might be flush-to-zero mode: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473c/CJAJBEAF.html

~10 years agoEdited ~10 years ago by calc84maniac

NeatNit #5

If the floats are 64-bit and a character is 8 bits, then there MUST be a unique representation for each floating point value with just 8 characters. But I'm not sure if the floats are 32 or 64 (double), and also, text is Unicode which means that some characters are 8 bits and some are 16, making things more complicated than that. If we can get each byte or every two bytes from a float in an integer, then this is easy. But is there a way to do this? By the way, do you need this code to run fast? Just curious. Depending on what you need this for, you might be better off reworking your code to use fixed-point numbers which can be stored as integers, making string conversion a lot easier with CHR$. Edit: I'll tell you, SmileBoom made this sort of thing a lot simpler in Petit Computer, which had only fixed-point numbers. Every number was some integer multiplied by 1/4096, or in other words, there were 12 bits after the point. Even integers were saved in this format, e.g. 6 = 110.000000000000 binary

~10 years agoEdited ~10 years ago by NeatNit

snail_#6

If the floats are 64-bit and a character is 8 bits, then there MUST be a unique representation for each floating point value with just 8 characters. But I'm not sure if the floats are 32 or 64 (double), and also, text is Unicode which means that some characters are 8 bits and some are 16, making things more complicated than that. If we can get each byte or every two bytes from a float in an integer, then this is easy. But is there a way to do this? By the way, do you need this code to run fast? Just curious. Depending on what you need this for, you might be better off reworking your code to use fixed-point numbers which can be stored as integers, making string conversion a lot easier with CHR$. Edit: I'll tell you, SmileBoom made this sort of thing a lot simpler in Petit Computer, which had only fixed-point numbers. Every number was some integer multiplied by 1/4096, or in other words, there were 12 bits after the point. Even integers were saved in this format, e.g. 6 = 110.000000000000 binary

SB uses UTF-16 internally, so every character in a string is really 2 bytes. The UTF-8 conversion happens later when saving to a text file, but it's handled in such a way that you won't need to worry about it. So, 4 characters could store the binary representation of a real if encoded properly.

~10 years ago

NeatNit #7

SB uses UTF-16 internally, so every character in a string is really 2 bytes. The UTF-8 conversion happens later when saving to a text file, but it's handled in such a way that you won't need to worry about it. So, 4 characters could store the binary representation of a real if encoded properly.

Thanks for the clarification. What does SB do when using CHR$ with a code that isn't in their font?

~10 years ago

snail_#8

Character codes just wrap around. CHR$(65536) is the same as CHR$(0).

~10 years ago

SquareFingers #9

By the way, do you need this code to run fast?

Not super fast. Right now I've got code which I think will work for my purposes:

DEF ST$(A#)
 VAR A$,E%
 A$=FORMAT$("%0.324F",A#)
 WHILE (RIGHT$(A$,1)=="0")
  A$=LEFT$(A$,LEN(A$)-1)
 WEND
 IF (RIGHT$(A$,1)==".") THEN
  A$=LEFT$(A$,LEN(A$)-1)
  E%=0
  WHILE (RIGHT$(A$,1)=="0")
   INC E%
   A$=LEFT$(A$,LEN(A$)-1)
  WEND
  IF (A$=="") THEN A$="0" ELSE A$=A$+"E"+STR$(E%)
 ELSE
  E%=0
  WHILE (LEFT$(A$,3)=="0.0")
   DEC E%
   A$="0."+RIGHT$(A$,LEN(A$)-3)
  WEND
  A$=A$+"E"+STR$(E%)
 ENDIF
 RETURN A$
END

I think VAL will undo this operation to give exactly the same floating-point value I started with.

you might be better off reworking your code

The floating-point values are not part of my code: I am working on a tool which will receive data from external sources.

~10 years ago

NeatNit #10

Character codes just wrap around. CHR$(65536) is the same as CHR$(0).

Thanks, although this doesn't answer my question - maybe you misunderstood. Are there 65536 characters defined in the SmileBASIC font? If not, what happens if you insert a value that isn't defined? Will it be saved correctly, and if so, how will it get rendered? Edit: for example, I come from Israel. If I do CHR$(&H05D0), do I get א or what?

you might be better off reworking your code
The floating-point values are not part of my code: I am working on a tool which will receive data from external sources.

That's why I said "Depending on what you need this for" ;)

~10 years agoEdited ~10 years ago by NeatNit

calc84maniac #11

You can use any of the 65536 character values, but ones that are undefined will all map to the same symbol when displayed, which is a box or something. I would say that the character encoding is actually closer to UCS-2 (because characters larger than 2 bytes are not handled).

~10 years ago

snail_#12

Character codes just wrap around. CHR$(65536) is the same as CHR$(0).
Thanks, although this doesn't answer my question - maybe you misunderstood. Are there 65536 characters defined in the SmileBASIC font? If not, what happens if you insert a value that isn't defined? Will it be saved correctly, and if so, how will it get rendered? Edit: for example, I come from Israel. If I do CHR$(&H05D0), do I get א or what?

The console font has around 4000 characters, the dialog font more. (I think around 5000?) As calc said, characters that aren't defined just show up as a stock "unknown" character, but the character code is still preserved. 05D0 is defined in neither font, unfortunately.

~10 years agoEdited ~10 years ago by snail_

NeatNit #13

The console font has around 4000 characters, the dialog font more. (I think around 5000?) As calc said, characters that aren't defined just show up as a stock "unknown" character, but the character code is still preserved. 05D0 is defined in neither font, unfortunately.

I see. Thanks! I've got a bunch of things I want to test with this, but they'll just have to wait for when it's finally released over here.

~10 years ago