Opened 13 years ago

Closed 13 years ago

#4219 closed defect (worksforme)

[ogr2ogr] Umlauts not recognized correctly in DXF data

Reported by: tomdb Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: 1.8.1
Severity: normal Keywords: DXF
Cc:

Description

While the Version 1.7.3 works well, the version 1.8.x does not recognize correctly german umlauts in DXF data. I have tested the following drivers:

ESRI shapefile

csv

geojson

gml

pgdump

Attachments (1)

sample.dxf (45.3 KB ) - added by tomdb 13 years ago.
sample dxf file with umlauts

Download all attachments as: .zip

Change History (4)

comment:1 by warmerdam, 13 years ago

Component: defaultOGR_SF
Keywords: DXF added
Status: newassigned
Version: unspecified1.8.1

Tommaso,

Can you provide a small sample DXF file demonstrating the problem, and some hint how to find the feature with the Umlaut?

by tomdb, 13 years ago

Attachment: sample.dxf added

sample dxf file with umlauts

in reply to:  1 comment:2 by tomdb, 13 years ago

Replying to warmerdam:

Tommaso,

Can you provide a small sample DXF file demonstrating the problem, and some hint how to find the feature with the Umlaut?

hi, I attached a dxf file with only a feature which contains umlauts and the symbol 'ß'. The symbol 'ß' is also not recognized correctly. I saved the file as ASCII dxf 2010.

comment:3 by warmerdam, 13 years ago

Resolution: worksforme
Status: assignedclosed

The indicated DXF file has:

$DWGCODEPAGE
  3
ANSI_1252
  9

But the actual MTEXT text is UTF-8 encoded. It is corrupted when I internally try to apply an ANSI_1252 to UTF-8 recoding. I judge the file to be internally inconsistent and OGR's behavior to be reasonable. Please let me know if you feel otherwise.

With GDAL/OGR 1.8 the DXF driver assumes ANSI_1252 while the GDAL/OGR "trunk" code reads the DWGCODEPAGE variable, and also provides an override to change the assumed encoding. This is the DXF_ENCODING configuration variable. By setting it to UTF-8 with "trunk" I can see the proper results.

ogrinfo sample.dxf -al --debug off --config DXF_ENCODING UTF-8
INFO: Open of `sample.dxf'
      using driver `DXF' successful.

Layer name: entities
Geometry: Unknown (any)
Feature Count: 1
Extent: (1.301649, 5.273839) - (1.301649, 5.273839)
Layer SRS WKT:
(unknown)
Layer: String (0.0)
SubClasses: String (0.0)
ExtendedEntity: String (0.0)
Linetype: String (0.0)
EntityHandle: String (0.0)
Text: String (0.0)
OGRFeature(entities):0
  Layer (String) = 0
  SubClasses (String) = AcDbEntity:AcDbMText
  ExtendedEntity (String) = ACAD_MTEXT_DEFINED_HEIGHT_BEGIN     46 1.590669642857143 ACAD_MTEXT_DEFINED_HEIGHT_END
  Linetype (String) = (null)
  EntityHandle (String) = E5
  Text (String) = {Text with umlauts: äÄöÖüÜß}
  Style = LABEL(f:"Arial",t:"{Text with umlauts: äÄöÖüÜß}",s:0.2g,p:7,c:#000000)
  POINT (1.301649107142859 5.273839285714282 0)

I'm marking this "worksforme" but I really mean "works as intended".

Note: See TracTickets for help on using tickets.