Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#4799 closed defect (fixed)

Windows gdal treats NaN's as 0

Reported by: timlinux Owned by: warmerdam
Priority: normal Milestone: 1.9.2
Component: OGR_SF Version: 1.9.1
Severity: normal Keywords: NaN
Cc:

Description (last modified by timlinux)

This issue was originally diagnosed by Ismail Sunni of the InaSAFE project. In #3576 there was a similar issue filed so feel free to close this and reopen the original if appropriate.

When QGIS opens a shapefile that has a NaN value, it is read (incorrectly) as 0.0 in Windows but (correctly) as NaN is Linux. This causes some of our (InaSAFE) tests to fail under Windows.

We tested with QGIS compiled against GDAL/OGR 1.9.1 on both Linux and Windows.

We also tested using lower level ogr/python bindings and can replicate the issue there too so it is not QGIS specific.

I am attaching a sample data set that you can use to replicate the issue.

If you have any hints as to a possible work around for this (for clients stuck on 1.9.1) in the interim it would be most appreciated.

Attachments (1)

nan_here.tar.gz (1.6 KB ) - added by timlinux 10 years ago.
Example file demonstrating nan issue

Download all attachments as: .zip

Change History (9)

by timlinux, 10 years ago

Attachment: nan_here.tar.gz added

Example file demonstrating nan issue

comment:1 by timlinux, 10 years ago

Description: modified (diff)

comment:2 by Even Rouault, 10 years ago

I've looked at the DBF and can see the "nan" string appearing it. I was wondering if "nan" is really a valid value for a real column in a DBF and if that works with other non-GDAL based tools.

For reference:

  • MS Excel displays a blank cell for those nan values (but also for any non number content that you put there by using an hexadecimal editor);
  • MS Access displays 0;
  • GlobalMapper displays nan, but apparently doesn't manage column types at all, so if instead of "nan" you put "foo", it will display "foo" too.

How was that DBF produced ?

Anyway, I've pushed a fix in trunk (r24899) and branches/1.9 (r24890) for that particular issue (was due to leading spaces before "nan" due to the right alignment of values in DBF). Note: I'm afraid that issues with nan will be a never ending problem. For example, after that fix, ogr2ogr will produce a DBF with "-1.#IND00000000000" values, that it will not recognize properly (due to the trailing zeroes. grrr. and I'm not sure that it should be serialized at such by the way)

comment:3 by Even Rouault, 10 years ago

(Read r24900 for the commint in branches/1.9)

comment:4 by Even Rouault, 10 years ago

r24901 (trunk) : Add test to check that we recognize 'nan' as a numeric value (#4799)

comment:5 by Even Rouault, 10 years ago

Component: defaultOGR_SF
Resolution: fixed
Status: newclosed

Closing this ticket. Tim, it would be cool if you can comment on how that DBF was produced (I'm still not convinced it is really valid)

comment:6 by timlinux, 10 years ago

Hi

My apologies for the delay in giving feedback. We are going to test to see if we write zero length strings on None to those cells the problem goes away. You can leave this ticket closed and I will update it with our findings for others who may follow.

Regards

Tim

comment:7 by ismailsunni, 10 years ago

Thanks for your reply.

To answer your question, the file was generated with ogr python binding on linux. When we looked at it on QGIS on Linux it's work. So, obviously the dbf file represent NaN correctly. However, on Windows, the NaN is read as 0.0.

Moreoever, when we look at the dbf file using text editor, they look the same on Linux and Windows.

So, how does ogr read those NaN on Windows? And why is the leading space only an issue in Windows, not in Linux?

comment:8 by Even Rouault, 10 years ago

ok, that explains things. So the file was produced by OGR. That's not a sign it is interoperable with other software and conformant to the DBF spec. I have no indication that is is valid, nor invalid, but my feeling is that "nan" isn't an expected value that other DBF readers will be able to interpret.

Why "nan" is interpreted correctly in Linux and not in Windows ? Because the C standard library implementation isn't the same in both systems. On Linux, a double whose value is nan is sprintf()'ed as "nan" and atof()/strtod() will interpret "nan", or " nan" correctly, whereas the Windows C standard library doesn't handle the "nan" string at all, hence the need for a software trick in the GDAL wrapper for strtod() to make it work on all platforms.

My position: nan are a really annoying beast to deal with (for example, nan is evaluated as different as nan : nan != nan), and you should really avoid manipulating them. That will be a never ending source of annoyances. For example, I doubt you can use an attribute filter to retrieve field values that would be set to nan.

However, OGR (and DBF, and SQL) properly supports the concept of empty/unset/null field, so that should be used instead (just don't set the field value : by default when creating a feature, all its fields are set to the empty/unset/null value)

Note: See TracTickets for help on using tickets.