Opened 13 years ago
Closed 13 years ago
#4219 closed defect (worksforme)
[ogr2ogr] Umlauts not recognized correctly in DXF data
Reported by: | tomdb | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | OGR_SF | Version: | 1.8.1 |
Severity: | normal | Keywords: | DXF |
Cc: |
Description
While the Version 1.7.3 works well, the version 1.8.x does not recognize correctly german umlauts in DXF data. I have tested the following drivers:
ESRI shapefile
csv
geojson
gml
pgdump
Attachments (1)
Change History (4)
follow-up: 2 comment:1 by , 13 years ago
Component: | default → OGR_SF |
---|---|
Keywords: | DXF added |
Status: | new → assigned |
Version: | unspecified → 1.8.1 |
comment:2 by , 13 years ago
Replying to warmerdam:
Tommaso,
Can you provide a small sample DXF file demonstrating the problem, and some hint how to find the feature with the Umlaut?
hi, I attached a dxf file with only a feature which contains umlauts and the symbol 'ß'. The symbol 'ß' is also not recognized correctly. I saved the file as ASCII dxf 2010.
comment:3 by , 13 years ago
Resolution: | → worksforme |
---|---|
Status: | assigned → closed |
The indicated DXF file has:
$DWGCODEPAGE 3 ANSI_1252 9
But the actual MTEXT text is UTF-8 encoded. It is corrupted when I internally try to apply an ANSI_1252 to UTF-8 recoding. I judge the file to be internally inconsistent and OGR's behavior to be reasonable. Please let me know if you feel otherwise.
With GDAL/OGR 1.8 the DXF driver assumes ANSI_1252 while the GDAL/OGR "trunk" code reads the DWGCODEPAGE variable, and also provides an override to change the assumed encoding. This is the DXF_ENCODING configuration variable. By setting it to UTF-8 with "trunk" I can see the proper results.
ogrinfo sample.dxf -al --debug off --config DXF_ENCODING UTF-8 INFO: Open of `sample.dxf' using driver `DXF' successful. Layer name: entities Geometry: Unknown (any) Feature Count: 1 Extent: (1.301649, 5.273839) - (1.301649, 5.273839) Layer SRS WKT: (unknown) Layer: String (0.0) SubClasses: String (0.0) ExtendedEntity: String (0.0) Linetype: String (0.0) EntityHandle: String (0.0) Text: String (0.0) OGRFeature(entities):0 Layer (String) = 0 SubClasses (String) = AcDbEntity:AcDbMText ExtendedEntity (String) = ACAD_MTEXT_DEFINED_HEIGHT_BEGIN 46 1.590669642857143 ACAD_MTEXT_DEFINED_HEIGHT_END Linetype (String) = (null) EntityHandle (String) = E5 Text (String) = {Text with umlauts: äÄöÖüÜß} Style = LABEL(f:"Arial",t:"{Text with umlauts: äÄöÖüÜß}",s:0.2g,p:7,c:#000000) POINT (1.301649107142859 5.273839285714282 0)
I'm marking this "worksforme" but I really mean "works as intended".
Tommaso,
Can you provide a small sample DXF file demonstrating the problem, and some hint how to find the feature with the Umlaut?