id summary reporter owner description type status priority milestone component version severity resolution keywords cc 5239 MSSQLSpatial output mangles non-ASCII characters iainelder tamas "This problem was originally described in the gis.se question [http://gis.stackexchange.com/questions/70708/how-to-preserve-non-ascii-characters-in-ogr2ogr-import ""How to preserve non-ASCII characters in ogr2ogr import?""]. I want to import a list of international place names and their locations into a SQL Server database using ogr2ogr. The source data is a CSV file encoded in UTF-8. I've prepared a five-record sample file called {{{london_points.csv}}}. It contains the records for London in Arabic, Czech, English. Russian, and Chinese. It looks like this: {{{ LanguageCode,Name,Longitude,Latitude ""ar"",""لندن"",""-0.143555"",""51.577222"" ""cz"",""Londýn"",""-0.143555"",""51.577222"" ""en"",""London"",""-0.143555"",""51.577222"" ""ru"",""Лондон"",""-0.143555"",""51.577222"" ""zw"",""倫敦"",""-0.143555"",""51.577222"" }}} (Don't worry if it looks like the Arabic line has swapped the {{{Name}}} and {{{Latitude}}} values. It's just a quirk of bidirectional text rendering.) To import the CSV file using ogr2ogr I created a VRT file called {{{london_points.vrt}}}. It tells ogr2ogr to import the data and derive a point column from the Longitude and Latitude columns. It looks like this: {{{ london_points.csv wkbPoint WGS84 }}} I used a command like this to import the records: {{{ ogr2ogr --DEBUG ON -s_srs EPSG:4326 -lco GEOM_TYPE=geography -overwrite -f MSSQLSpatial ""MSSQL:server=.;database=tempdb;trusted_connection=yes"" london_points.vrt }}} For clarity I've put each parameter on its own line. The output shows that ogr2ogr successfully imports five rows: {{{ OGR: OGROpen(london_points.vrt/014AFF40) succeeded as VRT. OGR_MSSQLSpatial: EstablishSession(Connection:""server=.;database=tempdb;trusted_connection=yes"") ODBC: SQLDriverConnect(DRIVER=SQL Server;server=.;database=tempdb;trusted_connection=yes) OGR_MSSQLSpatial: Using column ogr_fid as FID for table london_points. OGR: OGROpen(MSSQL:server=.;database=tempdb;trusted_connection=yes/0338CFF0) succeeded as MSSQLSpatial. OGR: OGROpen(london_points.csv/033A41C8) succeeded as CSV. MSSQLSpatial: DeleteLayer(london_points) OGR_MSSQLSpatial: Using column ogr_fid as FID for table london_points. OGR2OGR: 5 features written in layer 'london_points' ODBC: SQLDisconnect() VRT: 5 features read on layer 'london_points'. CSV: 5 features read on layer 'london_points'. }}} I use a query like this to inspect the results in the database: {{{ SELECT languagecode, name, longitude, latitude FROM tempdb.dbo.london_points; }}} The result set looks like this: {{{ languagecode name longitude latitude ------------ ------------ --------- --------- ar لندن -0.143555 51.577222 cz Londýn -0.143555 51.577222 en London -0.143555 51.577222 ru Лондон -0.143555 51.577222 zw 倫敦 -0.143555 51.577222 }}} The result set contains the correct number of rows. The {{{languagecode}}}, {{{longitude}}}, and {{{latitude}}} columns, which contain only ASCII characters, contain the correct values. But all the non-ASCII characters in the {{{name}}} column are mangled! I traced the import using [http://technet.microsoft.com/en-us/library/ms181091.aspx SQL Server Profiler] to capture the SQL statements that ogr2ogr uses to insert data into the {{{london_points}}} table. The trace captured these insert statements: {{{ INSERT INTO [dbo].[london_points] (ogr_geometry, [languagecode], [name], [longitude], [latitude]) VALUES (geography::STGeomFromText('POINT (-0.143555 51.577222)',4326), 'ar', 'لندن', '-0.143555', '51.577222'); go INSERT INTO [dbo].[london_points] (ogr_geometry, [languagecode], [name], [longitude], [latitude]) VALUES (geography::STGeomFromText('POINT (-0.143555 51.577222)',4326), 'cz', 'Londýn', '-0.143555', '51.577222'); go INSERT INTO [dbo].[london_points] (ogr_geometry, [languagecode], [name], [longitude], [latitude]) VALUES (geography::STGeomFromText('POINT (-0.143555 51.577222)',4326), 'en', 'London', '-0.143555', '51.577222'); go INSERT INTO [dbo].[london_points] (ogr_geometry, [languagecode], [name], [longitude], [latitude]) VALUES (geography::STGeomFromText('POINT (-0.143555 51.577222)',4326), 'ru', 'Лондон', '-0.143555', '51.577222'); go INSERT INTO [dbo].[london_points] (ogr_geometry, [languagecode], [name], [longitude], [latitude]) VALUES (geography::STGeomFromText('POINT (-0.143555 51.577222)',4326), 'zw', '倫敦', '-0.143555', '51.577222'); go }}} For clarity I split each INSERT VALUES statement across two lines. Inspecting the SQL reveals two problems. ogr2ogr mangles the values for the name column before they reach the server. ogr2ogr passes the values as varchar literals ({{{'abc'}}}) rather than nvarchar literals ({{{N'abc'}}}). Even if the query text contained the correct characters, the server would replace with a question mark any character that don't exist in the default code page. This query illustrates how SQL Server loses information when it processes varchar literals containing non-existent characters: {{{ SELECT 'لندن' AS varchar_value, N'لندن' AS nvarchar_value UNION ALL SELECT 'Londýn', N'Londýn' UNION ALL SELECT 'London', N'London' UNION ALL SELECT 'Лондон', N'Лондон' UNION ALL SELECT '倫敦', N'倫敦'; varchar_value nvarchar_value ------------- -------------- ???? لندن Londýn Londýn London London ?????? Лондон ?? 倫敦 }}} My default code page is Windows-1252. It contains characters for English and Czech, so varchar string literals are fine for these languages. Arabic, Russian, and Traditional Chinese supported only in Unicode, so all the characters are replaced by question marks. The MSSQLSpatial driver should be able to generate query text containing the correct nvarchar literal values." defect closed high 2.0.0 OGR_SF 1.9.2 normal fixed tamas