Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#4623 closed defect (fixed)

CSV driver does not detect BOM header for UTF-8 files on Windows XP

Reported by: jpalmer Owned by: Even Rouault
Priority: normal Milestone: 1.9.1
Component: OGR_SF Version: svn-trunk
Severity: normal Keywords: Windows XP, CSV, OGR
Cc: Robert Coup

Description

When I run ogrinfo from my Windows XP commandline on a UTF-8 file with a BOM header, then header is not ignored and becomes part of the first attribute fieldname.

Example:

ogrinfo temp.csv temp INFO: Open of `temp.csv'

using driver `CSV' successful.

Layer name: temp Geometry: Unknown (any) Feature Count: 2 Layer SRS WKT: (unknown) id: String (0.0) name: String (0.0) WKT: String (0.0) OGRFeature(temp):1

id (String) = 426 name (String) = Colac Bay/Ōraka WKT (String) = LOC

OGRFeature(temp):2

id (String) = 427 name (String) = Colac Bay/Ōraka WKT (String) = BAY

Note this doesn't happen when I run this on my Ubuntu UTF-8 console.

Attachments (1)

temp.csv (141 bytes ) - added by jpalmer 12 years ago.

Download all attachments as: .zip

Change History (5)

by jpalmer, 12 years ago

Attachment: temp.csv added

comment:1 by Even Rouault, 12 years ago

That's the first time I ever see a CSV file with a UTF-8 BOM marker !!! Which software does produce that ?

This does also happen on Linux, but the console doesn't print the marker. However if you redirect the output of ogrinfo to a file and look with an hexadecimal editor, you'll see it. It would also cause problems if you translate to shapefile, etc..

comment:2 by Robert Coup, 12 years ago

Which software does produce that ?

Not much :) But if you want Excel to reliably open a CSV as UTF8 rather than doing something stupid, then it needs a BOM marker. So for anyone with unicode that's catering to Excel users, it'll have one.

comment:3 by Even Rouault, 12 years ago

Resolution: fixed
Status: newclosed

r24258 /trunk/ (3 files in 3 dirs): CSV: Detect and remove UTF-8 BOM marker if found (#4623)

r24259 /branches/1.9/gdal/ogr/ogrsf_frmts/csv/ogrcsvlayer.cpp: CSV: Detect and remove UTF-8 BOM marker if found (#4623)

comment:4 by jpalmer, 12 years ago

Thanks Even!

Note: See TracTickets for help on using tickets.