Bak Posted November 26, 2010 Report Posted November 26, 2010 I've been debugging this one particular bug forever now (since the last update) and have isolated it to this debug segment within the code: char buf[256]; char* theta_c_ptr = (char*)θ // theta is a double stack variablechar* speedFactor_ptr = (char*)&speedFactor; // speedFactor is a double stack variable double sinTheta = sin(theta);char* sinTheta_ptr = (char*)&sinTheta; double res = speedFactor * sinTheta;char* res_ptr = (char*)&res; snprintf(buf, sizeof(buf), ": Wormholes.cpp mid, " "speedFactor = %f(%08x %08x); " "theta = %f(%08x %08x); " "sinTheta = %f(%08x %08x); " "CHANGE Y SHOULD BE %f(%08x %08x) => %i\n", speedFactor, *((int*)speedFactor_ptr), *((int*)(speedFactor_ptr+4)), // print speed factor in hex theta, *((int*)theta_c_ptr), *((int*)(theta_c_ptr+4)), // print theta in hex sinTheta, *((int*)sinTheta_ptr), *((int*)(sinTheta_ptr+4)), // print sin theta in hex res, *((int*)res_ptr), *((int*)(res_ptr+4)), // print res in hex (int)(res)); printMessage(buf); ~99% of the time when I replay a particular trace I get this output: : Wormholes.cpp mid, speedFactor = 666.666667(55555555 4084d555); theta = -2.498092(6b7a8560 c003fc17); sinTheta = -0.600000(33333333 bfe33333); CHANGE Y SHOULD BE -400.000000(ffffffff c078ffff) => -399 ~1% of the time, particularly when the function is concurrently being accessed by another thread, I get this output: : Wormholes.cpp mid, speedFactor = 666.666667(55555555 4084d555); theta = -2.498092(6b7a8560 c003fc17); sinTheta = -0.600000(33333333 bfe33333); CHANGE Y SHOULD BE -400.000000(00000000 c0790000) => -400 Now from this it would appear that "double res = speedFactor * sinTheta;" can result in two different values, 0xffffffffc078fffff or 0x00000000c0790000, which seems wrong. I'm hoping to figure out the root cause of this. Here are my ideas: 1. The round mode for floating point is set to a "don't care" mode where, based on some unknown factor, it may round in different ways2. There is a cross-thread memory corruption occuring that changes the value of sinTheta in a slight and usually-unnoticeable way3. multiplication is not thread-safe due to some compiler/linker options missing4. the floating-point flags are not properly stored/reset when switching between threads in Windows all of these seem completely wrong to me. Any other ideas? Am I missing something simpler? Quote
JoWie Posted November 26, 2010 Report Posted November 26, 2010 (edited) Have you tried compiling with -ffloat-store (GCC)?(On x86 CPU registers have more precision than a double) You can also make GCC more IEEE complaint using the -mieee or -mieee-with-inexact flag, the last one is very slow. http://gcc.gnu.org/onlinedocs/gcc/DEC-Alpha-Options.htmlIEEE compliance will make sure you get the same results on every system Edited November 26, 2010 by JoWie Quote
Samapico Posted November 26, 2010 Report Posted November 26, 2010 Could you isolate the problem better if you tried using non-pointer variables? It could give you a clue whether it is memory-related or floating-point related Quote
Bak Posted November 26, 2010 Author Report Posted November 26, 2010 Those options look like they're only for DEC Alpha architectures, but I'll try them anyway. All the variables are on the stack (non pointers). The pointers are only used to be able to print out the hex value of the double's (since you can't use %x directly) Quote
Samapico Posted November 26, 2010 Report Posted November 26, 2010 All the variables are on the stack (non pointers). The pointers are only used to be able to print out the hex value of the double's (since you can't use %x directly)Oh... haven't really looked at the code, to be honest Quote
JoWie Posted November 26, 2010 Report Posted November 26, 2010 (edited) Looks like the -mieee flag is indeed only for DEC Alpha. On http://gcc.gnu.org/wiki/FloatingPointMath I found this interesting:For legacy x86 processors without SSE2 support, and for m68080 processors, GCC is only able to fully comply with IEEE 754 semantics for the IEEE double extended (long double) type. Operations on IEEE double precision and IEEE single precision values are performed using double extended precision. In order to have these operations rounded correctly, GCC would have to save the FPU control and status words, enable rounding to 24 or 53 mantissa bits and then restore the FPU state. This would be far too expensive. The extra intermediate precision and range may cause flags not be set or traps not be raised. Also, for double precision, double rounding may affect the final results. Whether or not intermediate results are rounded to double precision or extended precision depends on optimizations being able to keep values in floating-point registers. The option -ffloat-store prevents GCC from storing floating-point results in registers. While this avoids the indeterministic behavior just described (at great cost), it does not prevent accuracy loss due to double rounding. I am not sure if the second paragraph also refers to SSE2. Anyways SSE2 should be in any Intel/AMD CPU made later than 2002. Edited November 26, 2010 by JoWie Quote
Kilo Posted December 13, 2010 Report Posted December 13, 2010 if you could stay online for more than 5 minutes gosh using optimization? Quote
Bak Posted December 20, 2010 Author Report Posted December 20, 2010 arnk found this one: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 To summarize, this defect effectively states that: assert( (x/y) == (x/y) ) may cause an assertion if compiled with optimization. While I understand why it happens, that doesn't mean it isn't a defect. Thismakes it impossible to turn on the optimizer with any code using floating pointand still expect to get a correct result. Perhaps in some situations this isokay, but in general this is not. Really unbelievable. I'm redoing discretion to use fixed-point math instead of floating point, at least for physics... Quote
Bak Posted December 20, 2010 Author Report Posted December 20, 2010 OKay, discretion has been changed to use fixed-point for physics. I keep two tables around one for sin and one for arctan. It works now... no more of this bug. Quote
Kilo Posted December 20, 2010 Report Posted December 20, 2010 can i trade in the $5 for working on getting linux building to work Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.