Opened 3 years ago

Closed 18 months ago

#5318 closed defect (fixed)

csv load fails with \"

Reported by: Kurt Schwehr Owned by: Kurt Schwehr
Priority: normal Milestone:
Component: default Version: unspecified
Severity: normal Keywords: csv
Cc:

Description

Don Hatch figured out the fix here.

Given

Description,Attr0,Lon,Lat
"three things",aaa,bbb,ccc
"three more things","ddd",e"e"e,""f""f""f""
"backslashes outside quotes",\ggg,hhh\,i\ii
"double backslashes outside quotes",\\jjj,kkk\\,l\\ll
"backslashes inside quotes","\mmm","nnn\","o\oo"
"double backslashes inside quotes","\\ppp","qqq\\","r\\rr"
"yet another three things",sss,ttt,uuu

opening the file fails to give the expected results.

got:  ['three things', 'aaa', 'bbb', 'ccc']
want: ['three things', 'aaa', 'bbb', 'ccc']

got:  ['three more things', 'ddd', 'eee', 'fff']
want: ['three more things', 'ddd', 'eee', 'fff']

got:  ['backslashes outside quotes', '\\ggg', 'hhh\\', 'i\\ii']
want: ['backslashes outside quotes', '\\ggg', 'hhh\\', 'i\\ii']

got:  ['double backslashes outside quotes', '\\\\jjj', 'kkk\\\\', 'l\\\\ll']
want: ['double backslashes outside quotes', '\\\\jjj', 'kkk\\\\', 'l\\\\ll']

# trouble starts here
got:  ['backslashes inside quotes', '\\mmm', 'nnn\\', 'o\\oo\ndouble backslashes inside quotes']
want: ['backslashes inside quotes', '\\mmm', 'nnn\\', 'o\\oo']

got:  ['yet another three things', 'sss', 'ttt', 'uuu']
want: ['double backslashes inside quotes', '\\\\ppp', 'qqq\\\\', 'r\\\\rr']

Proposed fix - remove this:

&& (i == 0 || pszWorkLine[i-1] != '\\')

source:trunk/gdal/ogr/ogrsf_frmts/csv/ogrcsvlayer.cpp#L169 will become:

    while( TRUE )
    {
        for( ; pszWorkLine[i] != '\0'; i++ )
        {
            if( pszWorkLine[i] == '\"' )
                nCount++;
        }

Change History (8)

comment:1 Changed 3 years ago by Even Rouault

Actually, OGRCSVReadParseLineL() is a port of CSVFindNextLine() in port/cpl_csvp.cpp that was introduced long ago in http://trac.osgeo.org/gdal/changeset/3809, which has an interesting comment : "Take into account [...] that quotes can be escaped with a backslash". However it seems that CSVSplitLine() that was introduced in http://trac.osgeo.org/gdal/changeset/3809 never took into account the fact that quotes could be escaped with a backslash.

If we take http://en.wikipedia.org/wiki/Comma-separated_values as a reference: "Embedded double quote characters may then be represented by a pair of consecutive double quotes (Creativyst 2010), or by prefixing an escape character such as a backslash (for example in Sybase Central)."

So the original intent of supporting backslash to escape double quote isn't unreasonable, but never worked. So IMHO, it probably doesn't hurt to remove the "&& (i == 0
pszWorkLine[i-1] != '
')" test in both ogrcsvlayer.cpp and in port/cpl_csv.cpp too with fixing of the comment.

comment:2 Changed 3 years ago by Kurt Schwehr

Status: newassigned

Thanks Even! Modified trunk csv/ogrcsvlayer.cpp with r26691. I will look into port/cpl_csv.cpp next. If these hold up, I'll back port them to the 1.10 branch.

comment:3 Changed 3 years ago by Kurt Schwehr

Does http://travis-ci.org/rouault/gdal_coverage/builds/15129492 mean that r26692 for port/cpl_csv.cpp is bad? I'm thinking this broke ogr_geom_pickle, ogr_pg_40, ogr_pg_51, ogr_sql_sqlite_16 (and others), etc?

comment:4 Changed 3 years ago by Kurt Schwehr

r26693 reverts r26692

  TEST: ogr_sql_sqlite_start_webserver ... HTTP Server started on port 8080
success
  TEST: ogr_sql_sqlite_16 ... 
http://127.0.0.1:8080/geocoding?q=%s
SELECT ogr_geocode('Paris')
SELECT ogr_geocode('Paris', 'geometry')
SELECT ogr_geocode('Paris', 'display_name') AS display_name
SELECT ogr_geocode('Paris', 'raw') AS raw
localhost - - [08/Dec/2013 10:38:09] code 404, message File Not Found: /geocoding?q=Error&addressdetails=1&limit=1&email=foo%40bar
ERROR 1: HTTP error code : 404
ERROR 1: Line 8: </body> doesn't have matching <body>.
SELECT ogr_geocode('Paris')
SELECT ogr_geocode('Paris', 'geometry')
SELECT ogr_geocode('Paris', 'display_name') AS display_name
SELECT ogr_geocode('Paris', 'raw') AS raw
localhost - - [08/Dec/2013 10:38:09] code 404, message File Not Found: /geocoding?q=Error&addressdetails=1&limit=1&email=foo%40bar
ERROR 1: HTTP error code : 404
ERROR 1: Line 8: </body> doesn't have matching <body>.
success
  TEST: ogr_sql_sqlite_17 ... success

comment:5 Changed 3 years ago by Even Rouault

r26692 was probably fine. I don't see it to break anything. ogr_sql_sqlite_16 and other tests trigger various (recoverable) error conditions, but if their state is 'success', nothing to worry about. Perhaps we should try to make them a bit less verbose to be less frightening.

comment:6 Changed 2 years ago by Even Rouault

What's the status of that ?

comment:7 Changed 2 years ago by Jukka Rahkonen

I have just closed a similar issue #2406. See if is has some valuable additional information.

comment:8 Changed 18 months ago by Jukka Rahkonen

Resolution: fixed
Status: assignedclosed

Proposed fix is in https://trac.osgeo.org/gdal/browser/trunk/gdal/ogr/ogrsf_frmts/csv/ogrcsvlayer.cpp and ogrinfo with GDAL 2.1-dev from the test csv looks good.

Note: See TracTickets for help on using tickets.