Ticket #2521 (closed defect: fixed)

Opened 4 years ago

Last modified 4 years ago

XPlane causes undefined behavior with GCC 4.3.x

Reported by: mloskot Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: svn-trunk
Severity: major Keywords: xplane hash gcc o2 aliasing
Cc:

Description (last modified by mloskot) (diff)

Recently, Even and I, we have noticed that ogr_xplane.py test fails in GDAL built using GCC 4.3. The problem was very mysterious. Everything works well with pre-4.3 versions of GCC.

TEST: ogr_xplane_awy_dat ... fail
    wrong number of features for layer AirwayIntersection : 20. 14 were expected 

Today, we spent almost all day trying to debug it and we got very close to find what's wrong. We found out that the problem is near the hash generation calls (cpl_hash_set.cpp). We suspected there is a bug in GCC 4.3 or we have signed integer overflow that causes undefined behavior somewhere in the code, etc. Fortunately to GDAL, but not to our investigation, nothing like that seemed to be true.

The success was that we managed to isolate a small test program (see attached hashdouble_aliasing_ub.cpp) and use it to reproduce the problem. However, ome small puzzles were still missing to understand <em>why</em>.

I decided to ask GCC folks for help in explaining this issue. Here is my post on gcc-help with detailed description of the problem:  Strange -O2 optimization issue. Here is  first response we got in which Brian Dessent reveals what's going wrong in our code.

To quote it shortly, there is a function in file ogr_xplane_awy_reader.cpp:292:

static unsigned long OGRXPlaneAirwayHashDouble(double* pdfVal)
{
   unsigned int* pnValue = (unsigned int*)pdfVal;
   return pnValue[0] ^ pnValue[1];
}

with cast and

this violates the ISO C aliasing rules

I've experimented and did build GDAL with aggressive options

-O2 -fstrict-aliasing -Wstrict-aliasing=1

and GCC 4.3 output a small flood of warnings for lines possibly breaking strict aliasing rules. What's most important here, it does warn about the suspicious cast in

g++ -O2 -fstrict-aliasing -Wstrict-aliasing=1 -fPIC  -Wall  -I.. -I../.. -I/home/mloskot/dev/gdal/_svn/trunk/gdal/port -I/home/mloskot/dev/gdal/_svn/trunk/gdal/gcore -I/home/mloskot/dev/gdal/_svn/trunk/gdal/alg -I/home/mloskot/dev/gdal/_svn/trunk/gdal/ogr -I/home/mloskot/dev/gdal/_svn/trunk/gdal/ogr/ogrsf_frmts -DOGR_ENABLED -I/home/mloskot/dev/gdal/_svn/trunk/gdal/port  -c -o ../o/ogr_xplane_awy_reader.o ogr_xplane_awy_reader.cpp
  mogr_xplane_awy_reader.cpp:294: warning: dereferencing type-punned pointer might break strict-aliasing rules

IMHO, OGRXPlaneAirwayHashDouble should use std::memcpy should cast double to two 32-bit words with union (patch attached), and it will solve the problem.

Attachments

hashdouble_aliasing_ub.cpp Download (0.8 KB) - added by mloskot 4 years ago.
Test program for strange -O2 optimization of hashing functions by GCC 4.3
ticket-2521-hashdouble-ub-fix.patch Download (1.9 KB) - added by mloskot 4 years ago.
Patch with proposed fix for the UB caused by broken strict aliasing in XPlane driver. I vote for double const& than double* so we don't have to remember and worry if given pdfVal is an array or single value.

Change History

Changed 4 years ago by mloskot

Test program for strange -O2 optimization of hashing functions by GCC 4.3

Changed 4 years ago by mloskot

I've updated configuration of the slave  mloskot-gcc43-quick and so it uses following compilation flags:

-O2 -fstrict-aliasing -Wstrict-aliasing=1

Now, everyone can observe compilation output and warnings mentioned above (for example, see Build 13).

Changed 4 years ago by mloskot

  • description modified (diff)

Changed 4 years ago by mloskot

Patch with proposed fix for the UB caused by broken strict aliasing in XPlane driver. I vote for double const& than double* so we don't have to remember and worry if given pdfVal is an array or single value.

Changed 4 years ago by rouault

  • status changed from new to closed
  • resolution set to fixed

Fixed in trunk in r15097. I prefered the memcpy version, but your fix was OK too.

Mateusz, many thanks for the in-depth analysis of all that. There will be always things to discover in the marvellous world of C !

Changed 4 years ago by mloskot

Even,

I thank you too for your help!

FYI, first patch I uploaded included memcpy version, but after some discussion with Frank I replaced it with new version that uses union. I just agree with some of Frank's concerns that memcpy may decrease performance. Anyway, if we will want to discuss it deeper and work out some best practice for potential issues in future (see warnings in mloskot-gcc43-quick output), I'm sure Frank will share his considerations.

Changed 4 years ago by rouault

I don't think that performance is a problem in that context. The hash table is a data structure fast enough so that we don't have to worry about an extra memcpy.

By curiosity I looked at the assembly code generated by gcc 4.3 -O2 and you can't see any memcpy in it (that's even true wil non-optimized build with older gcc). The hashdouble function is still inlined. The difference with the buggy version is a few extra assembly mnemonics

Changed 4 years ago by mloskot

Even,

AFAIU Frank, it was more general concern as  usage of memcpy() presents serious performance issues.

However, I'm very sure that GCC can optimize memcpy extremely well in that case. We have: local object of automatic storage, it's an array of size that is constant and known in compile-time.

Note: See TracTickets for help on using tickets.