Ticket #4219 (closed defect: worksforme)

Opened 21 months ago

Last modified 21 months ago

[ogr2ogr] Umlauts not recognized correctly in DXF data

Reported by: tomdb Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: 1.8.1
Severity: normal Keywords: DXF
Cc:

Description

While the Version 1.7.3 works well, the version 1.8.x does not recognize correctly german umlauts in DXF data. I have tested the following drivers:

ESRI shapefile

csv

geojson

gml

pgdump

Attachments

sample.dxf Download (45.3 KB) - added by tomdb 21 months ago.
sample dxf file with umlauts

Change History

follow-up: ↓ 2   Changed 21 months ago by warmerdam

  • keywords DXF added
  • status changed from new to assigned
  • version changed from unspecified to 1.8.1
  • component changed from default to OGR_SF

Tommaso,

Can you provide a small sample DXF file demonstrating the problem, and some hint how to find the feature with the Umlaut?

Changed 21 months ago by tomdb

sample dxf file with umlauts

in reply to: ↑ 1   Changed 21 months ago by tomdb

Replying to warmerdam:

Tommaso, Can you provide a small sample DXF file demonstrating the problem, and some hint how to find the feature with the Umlaut?

hi, I attached a dxf file with only a feature which contains umlauts and the symbol 'ß'. The symbol 'ß' is also not recognized correctly. I saved the file as ASCII dxf 2010.

  Changed 21 months ago by warmerdam

  • status changed from assigned to closed
  • resolution set to worksforme

The indicated DXF file has:

$DWGCODEPAGE
  3
ANSI_1252
  9

But the actual MTEXT text is UTF-8 encoded. It is corrupted when I internally try to apply an ANSI_1252 to UTF-8 recoding. I judge the file to be internally inconsistent and OGR's behavior to be reasonable. Please let me know if you feel otherwise.

With GDAL/OGR 1.8 the DXF driver assumes ANSI_1252 while the GDAL/OGR "trunk" code reads the DWGCODEPAGE variable, and also provides an override to change the assumed encoding. This is the DXF_ENCODING configuration variable. By setting it to UTF-8 with "trunk" I can see the proper results.

ogrinfo sample.dxf -al --debug off --config DXF_ENCODING UTF-8
INFO: Open of `sample.dxf'
      using driver `DXF' successful.

Layer name: entities
Geometry: Unknown (any)
Feature Count: 1
Extent: (1.301649, 5.273839) - (1.301649, 5.273839)
Layer SRS WKT:
(unknown)
Layer: String (0.0)
SubClasses: String (0.0)
ExtendedEntity: String (0.0)
Linetype: String (0.0)
EntityHandle: String (0.0)
Text: String (0.0)
OGRFeature(entities):0
  Layer (String) = 0
  SubClasses (String) = AcDbEntity:AcDbMText
  ExtendedEntity (String) = ACAD_MTEXT_DEFINED_HEIGHT_BEGIN     46 1.590669642857143 ACAD_MTEXT_DEFINED_HEIGHT_END
  Linetype (String) = (null)
  EntityHandle (String) = E5
  Text (String) = {Text with umlauts: äÄöÖüÜß}
  Style = LABEL(f:"Arial",t:"{Text with umlauts: äÄöÖüÜß}",s:0.2g,p:7,c:#000000)
  POINT (1.301649107142859 5.273839285714282 0)

I'm marking this "worksforme" but I really mean "works as intended".

Note: See TracTickets for help on using tickets.