View Issue Details

IDProjectCategoryView StatusLast Update
0000627LDMud 3.3Portabilitypublic2018-01-29 22:57
Reporterwedsall Assigned Tozesstra  
PriorityurgentSeverityblockReproducibilityalways
Status resolvedResolutionfixed 
Platformx86_64 
Product Version3.3.718 
Target Version3.3.719Fixed in Version3.3.719 
Summary0000627: Floating point numbers on 64bit platforms may be saved incorrectly
DescriptionHello!
 Just wanted to report this, I'm not sure if its a bug or not.

I just moved from 32 to 64 bit.

Our players have an attacks_pending floating integer in their save files.

An example of:
 attacks_pending 3.499965555966e-01=ffff:59995fd0

Gets turned into the following after logging in then logging back out (saving):
 attacks_pending inf=ffff:7ffffff8

And this format is non-restorable.
 Error: Illegal format (value string) when restoring bin/player/_finger from
blab/bla/bla.o line 26..

TagsNo tags attached.

Relationships

related to 0000641 resolvedzesstra LDMud 3.5 save_object() should never write NaN or INF into savefiles. 

Activities

zesstra

2009-04-17 08:43

administrator   ~0001051

I will have a look this evening. Given the nature of the floating number implementation in the driver, this sounds possible.

zesstra

2009-04-17 15:37

administrator   ~0001052

Ok, I can confirm this issue with this number on my machine. The floating point number 3.499965555966e-01 seems to be regarded as 'infinity' when converted into our floating point format and back again:
Floatvar before restore: 0.349997
Floatvar after restore: inf.0
If that value is written to the savefile, it can't be restored. This is because 'numbers' (including floats) in restore_svalue() have to begin with a digit or -. The string 'inf=ffff:7ffffff8' the doesn't match any case in the switch and if regarded as illegal.

One issue here is: restore_object() should be able to restore 'infinity', if save_object() can write that.
The other, why 3.499965555966e-01 is mis-interpreted in the first place.

Currently we split a floating point number into mantissa and exponent, the mantissa being a p_int and the exponent a ph_int. On 32 bit platforms that is a 32bit value and a 16bit value. On 64 bit systems these are 64bit and 32bit wide.
save_svalue() stores only the lower 16 bits of the exponent (0xFFFF). The exponent for the number given above is -1 (0xFFFFFFFF), which is truncated to 0xFFFF in save_svalue(). Upon restore, the exponent is 65535 (0xFFFF) and so the resulting floating point number is 'infinity'.

Big warning: Until we solve this bug, assume the driver will scramble all floating point numbers with negative exponents! Don't use on production systems!

zesstra

2009-04-17 16:24

administrator   ~0001053

Ok, so several things...

1) I think, we need a test case for this (negative exponents). Any other suggestions, which the testcase should include?

2) Solution:
The quickest would be to correct save_svalue(). As Quick-Fix ok, but I think, this is an incomplete solution, because we don't know if other places make assumptions about the size of exponent and mantissa and because on LP64 systems, the FPN are suddenly 96 bit, not 48 without any benefit.
The most portable thing would be to use fixed-width integer like uint32_t and uint16_t for mantissa and exponent. (Which we can only do this if we check all users of floats in the code and are sure, that changes in svalue_s are acceptable.)
In addition, we might want to take 0000496 into account, if we anyway change the svalue type and add a further optional floating point format, using basically double and split only in save_svalue().

2009-04-17 18:11

 

0001-Fix-for-float-corruption-in-restore_svalue.patch (1,927 bytes)   
From bf4d47073a245bd0f181dfb86a15a53005b25daf Mon Sep 17 00:00:00 2001
From: zesstra <zesstra@zesstra.de>
Date: Sat, 18 Apr 2009 00:08:51 +0200
Subject: [PATCH] Fix for float corruption in restore_svalue().

We use 32 bit mantissas and 16 bit exponents for our floats. Ensure to
write and read fixed-width types in save_svalue() and restore_svalue().
Fixes #627, where restore_svalue() interpreted the 16bit integer for the
exponent as 32bit integer, resulting in data corruption if the exponent
was negative.

Signed-off-by: zesstra <zesstra@zesstra.de>
---
 src/object.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/object.c b/src/object.c
index b3a8516..59068f4 100644
--- a/src/object.c
+++ b/src/object.c
@@ -6310,9 +6310,9 @@ save_svalue (svalue_t *v, char delimiter, Bool writable)
         char *source, c;
 
         source = number_buffer;
-        (void)sprintf(source, "%.12e=%"PRIxPHINT":%"PRIxPINT
-                     , READ_DOUBLE(v), v->x.exponent & 0xffff
-                     , v->u.mantissa);
+        (void)sprintf(source, "%.12e=%"PRIx16":%"PRIx32
+                     , READ_DOUBLE(v), (int16_t)v->x.exponent
+                     , (int32_t)v->u.mantissa);
         c = *source++;
         do L_PUTC(c) while ( '\0' != (c = *source++) );
         L_PUTC(delimiter);
@@ -8386,8 +8386,12 @@ restore_svalue (svalue_t *svp, char **pt, char delimiter)
         if ( NULL != (cp = strchr(cp, '=')) &&  restore_ctx->restored_host == CURRENT_HOST)
         {
             cp++;
-            if (sscanf(cp, "%"SCNxPHINT":%"SCNxPINT, &svp->x.exponent, &svp->u.mantissa) != 2)
+            int32_t mantissa;
+            int16_t exponent;
+            if (sscanf(cp, "%"SCNx16":%"SCNx32, &exponent, &mantissa) != 2)
                 return 0;
+            svp->x.exponent=exponent;
+            svp->u.mantissa=mantissa;
         }
         else
         {
-- 
1.6.1

zesstra

2009-04-17 18:27

administrator   ~0001055

I added a quick fix for the immediate issue, which fixes restore_svalue() to interpret the value for the exponent as 16bit wide integer (int16_t). Additionally, the mantissa is also explicitly read as 32bit wide integer (int32_t) and also save_svalue() explicitly writes int16_t and int32_t.

This solves the first part of the problem (data corruption). The second part (the unability to restore 'infinity') is not solved, but the data corruption is anyway the more severe.

wedsall

2009-04-17 19:56

reporter   ~0001056

thanks once again for the quick response Zesstra.

zesstra

2009-04-19 14:04

administrator   ~0001058

I prepared two other patches for the floating point implementation. They don't deal with the issue here, but for the sake of simplicity I will attach them here for discussion:
The first one changes the READ_DOUBLE, SPLIT_DOUBLE, STORE_DOUBLE defines in svalue.h to static functions.
The second one changes interpret.c in F_FLOAT to use LOAD_INT32 and LOAD_SHORT (an old TODO), which removes any conditional compilation from that place.

Comment: the compiler uses ins_long() to store the mantissa which sounds like a waste of 4 byte on LP64 platforms. But despite its name ins_long() stores a 32bit wide integer with PUT_INT32, so that is OK. The downside is: there is no possibility to store a 64 bit value (p_int on LP64) in the bytecode... This may be something we need to repair very soon - I planned to do that during some bytecode cleanup I started for 3.5.x, but it may be needed in 3.3.x as well.

2009-04-19 14:04

 

0002-Changed-Defines-for-FLOAT_FORMAT_0-to-functions.patch (2,112 bytes)   
From b2409c1685880f1d21bbf0fcbc3d983c8a2c3ebd Mon Sep 17 00:00:00 2001
From: zesstra <zesstra@zesstra.de>
Date: Sat, 18 Apr 2009 01:12:45 +0200
Subject: [PATCH 2/3] Changed Defines for FLOAT_FORMAT_0 to functions.

The defines READ_DOUBLE, SPLIT_DOUBLe and STORE_DOUBLE were exchanged
by static inline functions.
STORE_DOUBLE_USED is not needed anymore and defined empty.
Additionally, SPLIT_DOUBLE returns now the mantissa as p_int instead of long.

Signed-off-by: zesstra <zesstra@zesstra.de>
---
 src/svalue.h |   25 ++++++++++++++-----------
 1 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/src/svalue.h b/src/svalue.h
index 7ade63b..4797e83 100644
--- a/src/svalue.h
+++ b/src/svalue.h
@@ -408,21 +408,24 @@ double READ_DOUBLE(struct svalue *svalue_pnt)
 
 #define FLOAT_FORMAT_0
 
-#define READ_DOUBLE(svalue_pnt) ( ldexp( (double)((svalue_pnt)->u.mantissa) , \
-                (svalue_pnt)->x.exponent-31 ) )
-
+static INLINE double READ_DOUBLE(svalue_t *sv) {
+    return ldexp( (double)(sv->u.mantissa), sv->x.exponent - 31 );
+}
 /* if your machine doesn't use the exponent to designate powers of two,
-   the use of ldexp in SPLIT_DOUBLE won't work; you'll have to mulitply
+   the use of ldexp in SPLIT_DOUBLE won't work; you'll have to multiply
    with 32768. in this case */
 
-#define SPLIT_DOUBLE(double_value, int_pnt) \
-( (long)ldexp( frexp( (double_value), (int_pnt) ), 31) )
+static INLINE p_int SPLIT_DOUBLE(double doublevalue, int *int_p) {
+    return (p_int)ldexp( frexp( doublevalue, int_p ), 31);
+}
 
-#define STORE_DOUBLE_USED int __store_double_int_;
-#define STORE_DOUBLE(dest, double_value) (\
-((dest)->u.mantissa = SPLIT_DOUBLE((double_value), &__store_double_int_)),\
- (dest)->x.exponent = (short)__store_double_int_\
-)
+// STORE_DOUBLE_USED is not needed anymore and defined empty.
+#define STORE_DOUBLE_USED
+static INLINE void STORE_DOUBLE(svalue_t *dest, double doublevalue) {
+    int exponent;
+    dest->u.mantissa = SPLIT_DOUBLE(doublevalue, &exponent);
+    dest->x.exponent = (ph_int)exponent;
+}
 
 #endif /* ifndef STORE_DOUBLE */
 
-- 
1.6.1

2009-04-19 14:04

 

0003-Use-LOAD_-macros-in-F_FLOAT.patch (2,101 bytes)   
From 83efc3844b800e9d5a29a46dbf9ad565405929d1 Mon Sep 17 00:00:00 2001
From: zesstra <zesstra@zesstra.de>
Date: Sun, 19 Apr 2009 19:44:09 +0200
Subject: [PATCH 3/3] Use LOAD_ macros in F_FLOAT.

The compiler stores float literals with PUT_INT32 and PUT_SHORT. Reading
them from the bytecode was done by memcpy(). Some conditional compilation
was required to read the correct datatypes. This patch implements an old
TODO and makes F_FLOAT use the LOAD_INT32 and LOAD_SHORT macros from
bytecode.h.

Signed-off-by: zesstra <zesstra@zesstra.de>
---
 src/interpret.c |   22 +++++-----------------
 1 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/src/interpret.c b/src/interpret.c
index ea2f5c7..8722d64 100644
--- a/src/interpret.c
+++ b/src/interpret.c
@@ -8525,31 +8525,19 @@ again:
          * TODO: This code makes heavy assumptions about data sizes and
          * TODO:: layout. E.g. there need not be a 16-Bit integral type
          * TODO:: available.
-         * TODO: It should be rewritten to use the LOAD_ macros (but
-         * TODO:: then the compiler needs to use them, too.
+         * TODO: short doesn't to be a 16 bit wide type (which the float format
+         * TODO:: expects). LOAD_INT16 would be nice (change in compiler as well).
          */
 
-#if SIZEOF_CHAR_P == 4
-        sp++;
-        sp->type = T_FLOAT;
-
-        memcpy((char *)&sp->u.mantissa, pc, sizeof(sp->u.mantissa));
-        memcpy((char *)&sp->x.exponent, pc + sizeof(sp->u.mantissa), sizeof(sp->x.exponent));
-        pc += sizeof(sp->u.mantissa)+sizeof(sp->x.exponent);
-#else
         int32 mantissa;
-        /* TODO: int16 */ short exponent;
+        short exponent;
 
         sp++;
         sp->type = T_FLOAT;
-
-        memcpy((char *)&mantissa, pc, sizeof(mantissa));
+        LOAD_INT32(mantissa, pc);
+        LOAD_SHORT(exponent, pc);
         sp->u.mantissa = mantissa;
-
-        memcpy((char *)&exponent, pc + sizeof(mantissa), sizeof(exponent));
         sp->x.exponent = exponent;
-        pc += sizeof(mantissa)+sizeof(exponent);
-#endif
         break;
     }
 
-- 
1.6.1

zesstra

2009-05-04 15:04

administrator   ~0001076

I applied the patch and a testcase in r2558.

Note:
There is one issue left here. If a float is actually too large to be represented and it is saved in save_object()/save_value(), a restore with restore_object()/restore_value() will fail (not only for the float, but for the complete savefile). A similar issue may be "NaN".
I think, we have to address this as well, although it is not really urgent.
restore_svalue() expects numbers to start with [0-9,-], but sprintf("%.12e") with my libc will print infinity as 'inf' (and NaN probably as 'nan'/'NaN'). Does anybody know, if the exact format is standardized in C89 or C99?

2009-05-07 03:38

 

0005-Changed-restore_svalue-to-recognize-inf-and-nan.patch (2,343 bytes)   
From f0750f4f3556b9634930fe66ed6408b84812dddc Mon Sep 17 00:00:00 2001
From: zesstra <zesstra@zesstra.de>
Date: Wed, 6 May 2009 21:52:14 +0200
Subject: [PATCH 5/5] Changed restore_svalue() to recognize "inf" and "nan".

C99 specifies to represent 'infinity' and 'NaN' by "[-]inf" or "[-]infinity"
and a string starting with "nan". If restore_svalue() stumbles upon
such data (e.g. "inf=ffff:59995fd0") it tries to restore exponent and
mantissa and ignores "inf"/"infinity"/"nan".

Signed-off-by: zesstra <zesstra@zesstra.de>
---
 src/object.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/object.c b/src/object.c
index 59068f4..5420173 100644
--- a/src/object.c
+++ b/src/object.c
@@ -8359,6 +8359,7 @@ restore_svalue (svalue_t *svp, char **pt, char delimiter)
     case '-':  /* A number */
     case '0': case '1': case '2': case '3': case '4':
     case '5': case '6': case '7': case '8': case '9':
+    case 'i': case 'n':    // for infinity and NaN (C99).
       {
         char c, *numstart = cp;
         int nega = 0;
@@ -8371,7 +8372,8 @@ restore_svalue (svalue_t *svp, char **pt, char delimiter)
         }
 
         while(lexdigit(c = *cp++)) l = (((l << 2) + l) << 1) + (c - '0');
-        if (c != '.')
+        if (c != '.'
+            && c != 'i' && c != 'n')    // for infinity/NaN in floats
         {
             put_number(svp, nega ? -l : l);
             *pt = cp;
@@ -8381,9 +8383,15 @@ restore_svalue (svalue_t *svp, char **pt, char delimiter)
         /* If a float was written by the same host type as we are using,
          * restore the internal representation.
          * Otherwise, parse the float normally.
+         * C99 specifies to write "[-]infinity" or "[-]inf" for infinity and
+         * a string starting with 'nan' for NaN. We should restore such values
+         * the most meaningful way possible (use mantissa + exponent in any case).
          */
         svp->type = T_FLOAT;
-        if ( NULL != (cp = strchr(cp, '=')) &&  restore_ctx->restored_host == CURRENT_HOST)
+        if ( NULL != (cp = strchr(cp, '=')) &&  
+            (restore_ctx->restored_host == CURRENT_HOST
+             || strstr(numstart, "inf") != NULL
+             || strstr(numstart, "nan") != NULL) )
         {
             cp++;
             int32_t mantissa;
-- 
1.6.1

zesstra

2009-05-07 03:42

administrator   ~0001087

C99 specifies that sprintf uses "[-]inf" or "[-]infinity" for infinity and a string starting with "nan" for NaN when printing floating point numbers.
I attached a small patch making restore_svalue() to restore such numbers by using the the separately stored mantissa and exponent regardless of CURRENT_HOST. But I am not sure if there is a better strategy. Comments are welcome. ;-)

zesstra

2009-05-11 16:31

administrator   ~0001102

Just noticed, that my patch misses to take quoted floats into account.
BTW: Would be easier, if we use a new savefile format with a special first character as a tag for floats. The downside is, that the savefiles gets a little bit bigger as less readable for humans.

zesstra

2009-05-26 05:10

administrator   ~0001159

I split the part of restoring infinity and NaN into bug 0000640 for handling in 3.3.720, so that this only deals with the incorrect storing of the floats. And this part is fixed. :-)

zesstra

2010-11-16 10:42

administrator   ~0001911

Fix committed to master branch.

zesstra

2018-01-29 19:59

administrator   ~0002341

Fix committed in revision 6b46c043298f9a4e5bcd1040e0bbd4d58084be9f to master branch (see changeset 1893 for details). Thank you for reporting!

zesstra

2018-01-29 22:57

administrator   ~0002392

Fix committed in revision 6b46c043298f9a4e5bcd1040e0bbd4d58084be9f to master branch (see changeset 3230 for details). Thank you for reporting!

Issue History

Date Modified Username Field Change
2009-04-17 08:15 wedsall New Issue
2009-04-17 08:43 zesstra Note Added: 0001051
2009-04-17 08:43 zesstra Status new => acknowledged
2009-04-17 15:37 zesstra Note Added: 0001052
2009-04-17 15:37 zesstra Status acknowledged => confirmed
2009-04-17 15:38 zesstra Priority normal => urgent
2009-04-17 15:38 zesstra Severity major => block
2009-04-17 15:38 zesstra Target Version => 3.3.719
2009-04-17 15:41 zesstra Reproducibility random => always
2009-04-17 15:41 zesstra Category Runtime => Portability
2009-04-17 15:41 zesstra Platform => x86_64
2009-04-17 15:41 zesstra Summary possible float bug with save_object => Floating point numbers on 64bit platforms may be saved incorrectly
2009-04-17 16:24 zesstra Note Added: 0001053
2009-04-17 18:11 zesstra File Added: 0001-Fix-for-float-corruption-in-restore_svalue.patch
2009-04-17 18:13 zesstra Status confirmed => assigned
2009-04-17 18:13 zesstra Assigned To => zesstra
2009-04-17 18:27 zesstra Note Added: 0001055
2009-04-17 19:56 wedsall Note Added: 0001056
2009-04-19 14:04 zesstra Note Added: 0001058
2009-04-19 14:04 zesstra File Added: 0002-Changed-Defines-for-FLOAT_FORMAT_0-to-functions.patch
2009-04-19 14:04 zesstra File Added: 0003-Use-LOAD_-macros-in-F_FLOAT.patch
2009-05-04 15:04 zesstra Note Added: 0001076
2009-05-05 15:50 zesstra Project LDMud => LDMud 3.3
2009-05-07 03:38 zesstra File Added: 0005-Changed-restore_svalue-to-recognize-inf-and-nan.patch
2009-05-07 03:42 zesstra Note Added: 0001087
2009-05-11 16:31 zesstra Note Added: 0001102
2009-05-26 05:08 zesstra Relationship added related to 0000641
2009-05-26 05:10 zesstra Note Added: 0001159
2009-05-26 05:10 zesstra Status assigned => resolved
2009-05-26 05:10 zesstra Fixed in Version => 3.3.719
2009-05-26 05:10 zesstra Resolution open => fixed
2010-11-16 10:42 zesstra Source_changeset_attached => ldmud.git master 6b46c043
2010-11-16 10:42 zesstra Note Added: 0001911
2018-01-29 19:59 zesstra Source_changeset_attached => ldmud.git master 6b46c043
2018-01-29 19:59 zesstra Note Added: 0002341
2018-01-29 22:57 zesstra Source_changeset_attached => ldmud.git master 6b46c043
2018-01-29 22:57 zesstra Note Added: 0002392