Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#2521 closed defect (fixed)

XPlane causes undefined behavior with GCC 4.3.x

Reported by: Mateusz Łoskot Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: svn-trunk
Severity: major Keywords: xplane hash gcc o2 aliasing
Cc:

Description (last modified by Mateusz Łoskot)

Recently, Even and I, we have noticed that ogr_xplane.py test fails in GDAL built using GCC 4.3. The problem was very mysterious. Everything works well with pre-4.3 versions of GCC.

TEST: ogr_xplane_awy_dat ... fail
    wrong number of features for layer AirwayIntersection : 20. 14 were expected 

Today, we spent almost all day trying to debug it and we got very close to find what's wrong. We found out that the problem is near the hash generation calls (cpl_hash_set.cpp). We suspected there is a bug in GCC 4.3 or we have signed integer overflow that causes undefined behavior somewhere in the code, etc. Fortunately to GDAL, but not to our investigation, nothing like that seemed to be true.

The success was that we managed to isolate a small test program (see attached hashdouble_aliasing_ub.cpp) and use it to reproduce the problem. However, ome small puzzles were still missing to understand <em>why</em>.

I decided to ask GCC folks for help in explaining this issue. Here is my post on gcc-help with detailed description of the problem: Strange -O2 optimization issue. Here is first response we got in which Brian Dessent reveals what's going wrong in our code.

To quote it shortly, there is a function in file ogr_xplane_awy_reader.cpp:292:

static unsigned long OGRXPlaneAirwayHashDouble(double* pdfVal)
{
   unsigned int* pnValue = (unsigned int*)pdfVal;
   return pnValue[0] ^ pnValue[1];
}

with cast and

this violates the ISO C aliasing rules

I've experimented and did build GDAL with aggressive options

-O2 -fstrict-aliasing -Wstrict-aliasing=1

and GCC 4.3 output a small flood of warnings for lines possibly breaking strict aliasing rules. What's most important here, it does warn about the suspicious cast in

g++ -O2 -fstrict-aliasing -Wstrict-aliasing=1 -fPIC  -Wall  -I.. -I../.. -I/home/mloskot/dev/gdal/_svn/trunk/gdal/port -I/home/mloskot/dev/gdal/_svn/trunk/gdal/gcore -I/home/mloskot/dev/gdal/_svn/trunk/gdal/alg -I/home/mloskot/dev/gdal/_svn/trunk/gdal/ogr -I/home/mloskot/dev/gdal/_svn/trunk/gdal/ogr/ogrsf_frmts -DOGR_ENABLED -I/home/mloskot/dev/gdal/_svn/trunk/gdal/port  -c -o ../o/ogr_xplane_awy_reader.o ogr_xplane_awy_reader.cpp
  mogr_xplane_awy_reader.cpp:294: warning: dereferencing type-punned pointer might break strict-aliasing rules

IMHO, OGRXPlaneAirwayHashDouble should use std::memcpy should cast double to two 32-bit words with union (patch attached), and it will solve the problem.

Attachments (2)

hashdouble_aliasing_ub.cpp (839 bytes ) - added by Mateusz Łoskot 16 years ago.
Test program for strange -O2 optimization of hashing functions by GCC 4.3
ticket-2521-hashdouble-ub-fix.patch (1.9 KB ) - added by Mateusz Łoskot 16 years ago.
Patch with proposed fix for the UB caused by broken strict aliasing in XPlane driver. I vote for double const& than double* so we don't have to remember and worry if given pdfVal is an array or single value.

Download all attachments as: .zip

Change History (8)

by Mateusz Łoskot, 16 years ago

Attachment: hashdouble_aliasing_ub.cpp added

Test program for strange -O2 optimization of hashing functions by GCC 4.3

comment:1 by Mateusz Łoskot, 16 years ago

I've updated configuration of the slave mloskot-gcc43-quick and so it uses following compilation flags:

-O2 -fstrict-aliasing -Wstrict-aliasing=1

Now, everyone can observe compilation output and warnings mentioned above (for example, see Build 13).

comment:2 by Mateusz Łoskot, 16 years ago

Description: modified (diff)

by Mateusz Łoskot, 16 years ago

Patch with proposed fix for the UB caused by broken strict aliasing in XPlane driver. I vote for double const& than double* so we don't have to remember and worry if given pdfVal is an array or single value.

comment:3 by Even Rouault, 16 years ago

Resolution: fixed
Status: newclosed

Fixed in trunk in r15097. I prefered the memcpy version, but your fix was OK too.

Mateusz, many thanks for the in-depth analysis of all that. There will be always things to discover in the marvellous world of C !

comment:4 by Mateusz Łoskot, 16 years ago

Even,

I thank you too for your help!

FYI, first patch I uploaded included memcpy version, but after some discussion with Frank I replaced it with new version that uses union. I just agree with some of Frank's concerns that memcpy may decrease performance. Anyway, if we will want to discuss it deeper and work out some best practice for potential issues in future (see warnings in mloskot-gcc43-quick output), I'm sure Frank will share his considerations.

comment:5 by Even Rouault, 16 years ago

I don't think that performance is a problem in that context. The hash table is a data structure fast enough so that we don't have to worry about an extra memcpy.

By curiosity I looked at the assembly code generated by gcc 4.3 -O2 and you can't see any memcpy in it (that's even true wil non-optimized build with older gcc). The hashdouble function is still inlined. The difference with the buggy version is a few extra assembly mnemonics

comment:6 by Mateusz Łoskot, 16 years ago

Even,

AFAIU Frank, it was more general concern as usage of memcpy() presents serious performance issues.

However, I'm very sure that GCC can optimize memcpy extremely well in that case. We have: local object of automatic storage, it's an array of size that is constant and known in compile-time.

Note: See TracTickets for help on using tickets.