Opened 8 years ago

Last modified 5 years ago

#3071 new defect

v.db.join: fails with SQLite because of lower/UPPER case issues

Reported by: sbl Owned by: grass-dev@…
Priority: normal Milestone: 7.6.2
Component: Default Version: unspecified
Keywords: vector, DB, join Cc:
CPU: Unspecified Platform: Unspecified

Description

v.db.join fails when target and other table have columns with the same names but different cases: e.g. ID and id

Here is an example to reproduce:

wget -N http://www.grassbook.org/wp-content/uploads/ncexternal/nc_geology.csv

# check original map attributes
v.db.select geology column=GEO_NAME,SHAPE_area

# import of CSV table
db.in.ogr input=nc_geology.csv output=nc_geology

# work on copy of geology map in current mapset
g.copy vector=geology,mygeology

# Add uppercase column, which are lowercase in other table
v.db.addcolumn map=mygeology column="GEOL_ID text"
v.db.addcolumn map=mygeology column="LONGNAME text"

# Fill join column
v.db.update map=mygeology column=GEOL_ID query_column=GEO_NAME

# check column names of vector map attributes
v.info -c mygeology

# check column names of legend table
db.describe -c nc_geology

# join table using key columns (map: "GEOL_ID"; table: "geol_id")
v.db.join map=mygeology column=GEOL_ID other_table=nc_geology other_column=geol_id

Please find attached a diff which should fix the issue.

However column case handling might need some more deliberation, since e.g. PostgreSQL handles uppercase column names (when quoted). So comparing column names case-insensitive (as in the diff) is probably only a hack...?

Finally, I was wondering if it is OK that v.db.join possibly overwrites data in the target if both have column names in common (e.g. "cat")... Maybe better to add a flag to allow overwriting column content?

Attachments (3)

v.db.join.diff (904 bytes ) - added by sbl 8 years ago.
v.db.join_quote.diff (1.7 KB ) - added by sbl 8 years ago.
v.db.addcolumn_quote.diff (1.5 KB ) - added by sbl 8 years ago.

Download all attachments as: .zip

Change History (12)

by sbl, 8 years ago

Attachment: v.db.join.diff added

comment:1 by sbl, 8 years ago

Some update:

If SQLite would adhere to SQL 92 standard[1], quoting of identifiers in SQL would be able to fix the issue more properly (meanig that the original case sensitive comparison was correct).

However, it seems that SQLite is not case sensitive even if identifiers are quoted:

v.db.addcolumn map=mygeology column='"c1" text' 
v.db.addcolumn map=mygeology column='"C1" text' # Fails due to duplicate columns

So, a case-insensitive workaround for SQLite seems necessary anyway...

And, having tested a simple column quoting approach (see new diffs), it seems that introducing column quoting seems to be a quite invasive change, as it allows for column names which can break several other tools (e.g. the attribute table viewer)...

It seems also relevant what the expected user-input is (esp. quoted or unquoted column names)! Cause after

db.execute sql='ALTER TABLE "mygeology" ADD COLUMN ""C1"" text' 

I get Warning messages for every SQL query on the map...

WARNING: SQLite driver: unable to parse decltype: C1"" text

[1] http://stackoverflow.com/a/19933159

by sbl, 8 years ago

Attachment: v.db.join_quote.diff added

by sbl, 8 years ago

Attachment: v.db.addcolumn_quote.diff added

comment:2 by neteler, 8 years ago

I checked the patches:

  • v.db.addcolumn_quote.diff​: contains backticks. This is unlikely to work in Python/SQL. Is this an encoding issue of the patch?
  • v.db.join.diff​: Not sure how SQLite behaves. I found an older discussion here: http://stackoverflow.com/a/19933159/452464 - is there any official reference on the SQLite site?
  • v.db.join_quote.diff​: If quotes are needed then also the other SQL queries needs to undergo an audit. Then we need to write down our coding style somewhere near https://trac.osgeo.org/grass/wiki/Grass7/VectorLib/OGRInterface or better a dedicated SQL/DBMI page

in reply to:  2 comment:3 by sbl, 8 years ago

Replying to neteler:

I checked the patches:

  • v.db.addcolumn_quote.diff​: contains backticks. This is unlikely to work in Python/SQL. Is this an encoding issue of the patch?

backticks are used in MySQL to quote identifiers: http://dev.mysql.com/doc/refman/5.7/en/identifiers.html

See: https://www.sqlite.org/lang_keywords.html

It is indeed something that probably requires a deliberate decision either to allow for table and column names that require quoted identifiers in GRASS modules or not (of course users can do anything the DB backends allow for outside GRASS or using db.execute).
A drawback of quoted identifiers is that also nonsense - including SQL keywords - is possible e.g. in column names.
However, v.in.ogr writes data source names (which can contain upper case characters) as layer names, which again are (at least in some cases) used as table names. And also column names can be laundered to lower case during import (or not, users choice), making it hard to foresee if quoting is required, esp. in multi user environments.
So, at the moment at least, users can run into UPPER/lower case issues for different reasons.

comment:4 by neteler, 8 years ago

Milestone: 7.0.57.0.6

comment:5 by sbl, 7 years ago

For me this issue pops up on a regularly basis (amongst others with EU statistical data I have to join). Would be great if someone could evaluate the proposed solution (lower case comparison in v.db.join or using quoted identifiers) or come up with a more proper solution...

comment:6 by sbl, 7 years ago

Keywords: vector DB join added
Milestone: 7.0.67.2.3

comment:7 by martinl, 6 years ago

Milestone: 7.2.3

Ticket retargeted after milestone closed

comment:8 by martinl, 6 years ago

Milestone: 7.2.4

comment:9 by sbl, 5 years ago

Milestone: 7.2.47.6.2

Still relevant. The first patch would be a simple hack, that would have to be undone, if quoted identifiers were introduced.

Note: See TracTickets for help on using tickets.