Ticket #202 (closed defect: invalid)

Opened 2 years ago

Last modified 15 months ago

Undefined behavior in Coordinate::hashCode

Reported by: mloskot Owned by: mloskot
Priority: major Milestone: 3.2.0
Component: Core Version: svn-trunk
Severity: Significant Keywords: coordinate hash double
Cc:

Description (last modified by mloskot) (diff)

If ./configure fails to detect availability of 64-bit integer, it sets int64 typedef to long int (in file platform.h).

In this case, when in64 is 32-bit wide, undefined behavior occurs in Coordiante::hashDouble() function:

unsigned int Coordinate::hashCode(double d)
{
   int64 f = (int64)(d);
   return (int)(f^(f>>32)); // <--- UB
}

According to the standards C (section 6.5.7/3) and C++ (section 5.8/1):

The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

This error occur in f>>32, when sizeof(f) == 32.

Simple but not ideal fix could be:

#include <cstring> // std::memcpy

int Coordinate::hashDouble(double d)
{
   unsigned int arr[2] = { 0 };
   std::memcpy(arr, &d, sizeof(double));
   return (arr[0] ^ (((arr[1] >> 16) & 0x000000FF)
           | ((arr[1] >> 8) & 0x0000FF00)
           | ((arr[1] << 8) & 0x00FF0000)
           | ((arr[1] << 16) & 0xFF000000)));
}

Change History

Changed 2 years ago by bmharper

Mateusz - you mentioned that this violates aliasing rules:

unsigned int HashDouble?( double d ) {

unsigned int* i = (unsigned int*) &d; return i[0] i[1];

}

Can you explain further, or point me to the relevant C/++ docs?

Thanks, Ben

ps-- Your code references &x. Should this not be &d?

Changed 2 years ago by mloskot

  • description modified (diff)

Changed 2 years ago by mloskot

Ben,

The doc is C++ ISO/IEC 14882:2003 (6.10 lvalues and rvalues, paragraph 15). Also, chapter 6.5 and paragraph 7 of ISO 9899:1999 defines aliasing rules for C language.

Shortly, the rule is as simple as:

Pointers of different types (say int* and float*) can’t point to the same object

It was very well explained in presentation by Andrey Bokhanko and Alexander Isaev from Intel Labs ( PDF)

If standards are too heavy, then Wikipedia is handy, see  Type punning article and floating-point example based on your suggestion and commented this way:

this kind of type punning is more dangerous than most

Just to give an example of how dangerous this kind of hack can be, here is  my story of finding root of a very strange bug in GDAL that took me hours :-)

Changed 2 years ago by bmharper

Thanks Mateusz! I wasn't aware of these issues.

Changed 20 months ago by pramsey

  • milestone set to 3.1.0

Changed 20 months ago by pramsey

  • owner changed from geos-devel@… to mloskot

Changed 20 months ago by pramsey

It doesn't look like hashCode is even used, unless it's a magical method that gets called implicitly by some other operation? Envelope::hashCode uses the double hashcode, but the envelope method seems unused.

Changed 20 months ago by pramsey

  • milestone changed from 3.1.0 to 3.2.0

Defering to 3.2 or when someone finds a use case that exercises this.

Changed 15 months ago by strk

I think there'd be much more troubles if int64 isn't available. See precision/CommonBits.h or index/quadtree/DoubleBits.h, both used (haschCode isn't).

We might bail out on configure if 64bit integers aren't supported by the platform. Mat: do you actually have a system failing that ?

Changed 15 months ago by mloskot

No, I don't have that configuration any more.

Changed 15 months ago by strk

  • status changed from new to closed
  • resolution set to invalid

I've had ./configure print a WARNING about undefined behaviour when 64bit integer type isn't found. See r2558.

Note: See TracTickets for help on using tickets.