Opened 10 years ago
Closed 8 years ago
#5318 closed defect (fixed)
csv load fails with \"
Reported by: | Kurt Schwehr | Owned by: | Kurt Schwehr |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | default | Version: | unspecified |
Severity: | normal | Keywords: | csv |
Cc: |
Description
Don Hatch figured out the fix here.
Given
Description,Attr0,Lon,Lat "three things",aaa,bbb,ccc "three more things","ddd",e"e"e,""f""f""f"" "backslashes outside quotes",\ggg,hhh\,i\ii "double backslashes outside quotes",\\jjj,kkk\\,l\\ll "backslashes inside quotes","\mmm","nnn\","o\oo" "double backslashes inside quotes","\\ppp","qqq\\","r\\rr" "yet another three things",sss,ttt,uuu
opening the file fails to give the expected results.
got: ['three things', 'aaa', 'bbb', 'ccc'] want: ['three things', 'aaa', 'bbb', 'ccc'] got: ['three more things', 'ddd', 'eee', 'fff'] want: ['three more things', 'ddd', 'eee', 'fff'] got: ['backslashes outside quotes', '\\ggg', 'hhh\\', 'i\\ii'] want: ['backslashes outside quotes', '\\ggg', 'hhh\\', 'i\\ii'] got: ['double backslashes outside quotes', '\\\\jjj', 'kkk\\\\', 'l\\\\ll'] want: ['double backslashes outside quotes', '\\\\jjj', 'kkk\\\\', 'l\\\\ll'] # trouble starts here got: ['backslashes inside quotes', '\\mmm', 'nnn\\', 'o\\oo\ndouble backslashes inside quotes'] want: ['backslashes inside quotes', '\\mmm', 'nnn\\', 'o\\oo'] got: ['yet another three things', 'sss', 'ttt', 'uuu'] want: ['double backslashes inside quotes', '\\\\ppp', 'qqq\\\\', 'r\\\\rr']
Proposed fix - remove this:
&& (i == 0 || pszWorkLine[i-1] != '\\')
source:trunk/gdal/ogr/ogrsf_frmts/csv/ogrcsvlayer.cpp#L169 will become:
while( TRUE ) { for( ; pszWorkLine[i] != '\0'; i++ ) { if( pszWorkLine[i] == '\"' ) nCount++; }
Change History (8)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
Status: | new → assigned |
---|
Thanks Even! Modified trunk csv/ogrcsvlayer.cpp with r26691. I will look into port/cpl_csv.cpp next. If these hold up, I'll back port them to the 1.10 branch.
comment:3 by , 10 years ago
Does http://travis-ci.org/rouault/gdal_coverage/builds/15129492 mean that r26692 for port/cpl_csv.cpp is bad? I'm thinking this broke ogr_geom_pickle, ogr_pg_40, ogr_pg_51, ogr_sql_sqlite_16 (and others), etc?
comment:4 by , 10 years ago
TEST: ogr_sql_sqlite_start_webserver ... HTTP Server started on port 8080 success TEST: ogr_sql_sqlite_16 ... http://127.0.0.1:8080/geocoding?q=%s SELECT ogr_geocode('Paris') SELECT ogr_geocode('Paris', 'geometry') SELECT ogr_geocode('Paris', 'display_name') AS display_name SELECT ogr_geocode('Paris', 'raw') AS raw localhost - - [08/Dec/2013 10:38:09] code 404, message File Not Found: /geocoding?q=Error&addressdetails=1&limit=1&email=foo%40bar ERROR 1: HTTP error code : 404 ERROR 1: Line 8: </body> doesn't have matching <body>. SELECT ogr_geocode('Paris') SELECT ogr_geocode('Paris', 'geometry') SELECT ogr_geocode('Paris', 'display_name') AS display_name SELECT ogr_geocode('Paris', 'raw') AS raw localhost - - [08/Dec/2013 10:38:09] code 404, message File Not Found: /geocoding?q=Error&addressdetails=1&limit=1&email=foo%40bar ERROR 1: HTTP error code : 404 ERROR 1: Line 8: </body> doesn't have matching <body>. success TEST: ogr_sql_sqlite_17 ... success
comment:5 by , 10 years ago
r26692 was probably fine. I don't see it to break anything. ogr_sql_sqlite_16 and other tests trigger various (recoverable) error conditions, but if their state is 'success', nothing to worry about. Perhaps we should try to make them a bit less verbose to be less frightening.
comment:7 by , 9 years ago
I have just closed a similar issue #2406. See if is has some valuable additional information.
comment:8 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Proposed fix is in https://trac.osgeo.org/gdal/browser/trunk/gdal/ogr/ogrsf_frmts/csv/ogrcsvlayer.cpp and ogrinfo with GDAL 2.1-dev from the test csv looks good.
Actually, OGRCSVReadParseLineL() is a port of CSVFindNextLine() in port/cpl_csvp.cpp that was introduced long ago in http://trac.osgeo.org/gdal/changeset/3809, which has an interesting comment : "Take into account [...] that quotes can be escaped with a backslash". However it seems that CSVSplitLine() that was introduced in http://trac.osgeo.org/gdal/changeset/3809 never took into account the fact that quotes could be escaped with a backslash.
If we take http://en.wikipedia.org/wiki/Comma-separated_values as a reference: "Embedded double quote characters may then be represented by a pair of consecutive double quotes (Creativyst 2010), or by prefixing an escape character such as a backslash (for example in Sybase Central)."
')" test in both ogrcsvlayer.cpp and in port/cpl_csv.cpp too with fixing of the comment.