Opened 8 years ago
Last modified 5 years ago
#3071 new defect
v.db.join: fails with SQLite because of lower/UPPER case issues
Reported by: | sbl | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.6.2 |
Component: | Default | Version: | unspecified |
Keywords: | vector, DB, join | Cc: | |
CPU: | Unspecified | Platform: | Unspecified |
Description
v.db.join fails when target and other table have columns with the same names but different cases: e.g. ID and id
Here is an example to reproduce:
wget -N http://www.grassbook.org/wp-content/uploads/ncexternal/nc_geology.csv # check original map attributes v.db.select geology column=GEO_NAME,SHAPE_area # import of CSV table db.in.ogr input=nc_geology.csv output=nc_geology # work on copy of geology map in current mapset g.copy vector=geology,mygeology # Add uppercase column, which are lowercase in other table v.db.addcolumn map=mygeology column="GEOL_ID text" v.db.addcolumn map=mygeology column="LONGNAME text" # Fill join column v.db.update map=mygeology column=GEOL_ID query_column=GEO_NAME # check column names of vector map attributes v.info -c mygeology # check column names of legend table db.describe -c nc_geology # join table using key columns (map: "GEOL_ID"; table: "geol_id") v.db.join map=mygeology column=GEOL_ID other_table=nc_geology other_column=geol_id
Please find attached a diff which should fix the issue.
However column case handling might need some more deliberation, since e.g. PostgreSQL handles uppercase column names (when quoted). So comparing column names case-insensitive (as in the diff) is probably only a hack...?
Finally, I was wondering if it is OK that v.db.join possibly overwrites data in the target if both have column names in common (e.g. "cat")... Maybe better to add a flag to allow overwriting column content?
Attachments (3)
Change History (12)
by , 8 years ago
Attachment: | v.db.join.diff added |
---|
comment:1 by , 8 years ago
by , 8 years ago
Attachment: | v.db.join_quote.diff added |
---|
by , 8 years ago
Attachment: | v.db.addcolumn_quote.diff added |
---|
follow-up: 3 comment:2 by , 8 years ago
I checked the patches:
- v.db.addcolumn_quote.diff: contains backticks. This is unlikely to work in Python/SQL. Is this an encoding issue of the patch?
- v.db.join.diff: Not sure how SQLite behaves. I found an older discussion here: http://stackoverflow.com/a/19933159/452464 - is there any official reference on the SQLite site?
- v.db.join_quote.diff: If quotes are needed then also the other SQL queries needs to undergo an audit. Then we need to write down our coding style somewhere near https://trac.osgeo.org/grass/wiki/Grass7/VectorLib/OGRInterface or better a dedicated SQL/DBMI page
comment:3 by , 8 years ago
Replying to neteler:
I checked the patches:
- v.db.addcolumn_quote.diff: contains backticks. This is unlikely to work in Python/SQL. Is this an encoding issue of the patch?
backticks are used in MySQL to quote identifiers: http://dev.mysql.com/doc/refman/5.7/en/identifiers.html
- v.db.join.diff: Not sure how SQLite behaves. I found an older discussion here: http://stackoverflow.com/a/19933159/452464 - is there any official reference on the SQLite site?
See: https://www.sqlite.org/lang_keywords.html
- v.db.join_quote.diff: If quotes are needed then also the other SQL queries needs to undergo an audit. Then we need to write down our coding style somewhere near https://trac.osgeo.org/grass/wiki/Grass7/VectorLib/OGRInterface or better a dedicated SQL/DBMI page
It is indeed something that probably requires a deliberate decision either to allow for table and column names that require quoted identifiers in GRASS modules or not (of course users can do anything the DB backends allow for outside GRASS or using db.execute).
A drawback of quoted identifiers is that also nonsense - including SQL keywords - is possible e.g. in column names.
However, v.in.ogr writes data source names (which can contain upper case characters) as layer names, which again are (at least in some cases) used as table names. And also column names can be laundered to lower case during import (or not, users choice), making it hard to foresee if quoting is required, esp. in multi user environments.
So, at the moment at least, users can run into UPPER/lower case issues for different reasons.
comment:4 by , 8 years ago
Milestone: | 7.0.5 → 7.0.6 |
---|
comment:5 by , 7 years ago
For me this issue pops up on a regularly basis (amongst others with EU statistical data I have to join). Would be great if someone could evaluate the proposed solution (lower case comparison in v.db.join or using quoted identifiers) or come up with a more proper solution...
comment:6 by , 7 years ago
Keywords: | vector DB join added |
---|---|
Milestone: | 7.0.6 → 7.2.3 |
comment:8 by , 7 years ago
Milestone: | → 7.2.4 |
---|
comment:9 by , 5 years ago
Milestone: | 7.2.4 → 7.6.2 |
---|
Still relevant. The first patch would be a simple hack, that would have to be undone, if quoted identifiers were introduced.
Some update:
If SQLite would adhere to SQL 92 standard[1], quoting of identifiers in SQL would be able to fix the issue more properly (meanig that the original case sensitive comparison was correct).
However, it seems that SQLite is not case sensitive even if identifiers are quoted:
So, a case-insensitive workaround for SQLite seems necessary anyway...
And, having tested a simple column quoting approach (see new diffs), it seems that introducing column quoting seems to be a quite invasive change, as it allows for column names which can break several other tools (e.g. the attribute table viewer)...
It seems also relevant what the expected user-input is (esp. quoted or unquoted column names)! Cause after
I get Warning messages for every SQL query on the map...
[1] http://stackoverflow.com/a/19933159