Opened 12 years ago

Closed 8 years ago

#2221 closed defect (fixed)

OGR SQL limits table name and column name character set

Reported by: trastourf Owned by: chaitanya
Priority: normal Milestone: 1.9.0
Component: OGR_SF Version: unspecified
Severity: normal Keywords: SQL
Cc: tamas, warmerdam

Description

The OGR SQL implementation limits table names and column names to letters, digits and [.+-_*].

For example, with mysql with the table "Départs" wich as a column 'NomDépart?' the command :

ogrinfo MySQL:test,host=localhost -sql "select NomDépart? from départs"

works fine. With an equivalent shape (départs.shp), i can do :

ogrinfo départs départs

this works for the filename, but i can't do :

ogrinfo départs.shp -sql "select NomDépart? from Départs" INFO: Open of `départs.shp'

using driver `ESRI Shapefile' successful.

ERROR 1: SQL: Missing comma after column NomD in SELECT statement.

Attachments (1)

départs.zip (546 bytes) - added by trastourf 12 years ago.

Download all attachments as: .zip

Change History (16)

Changed 12 years ago by trastourf

Attachment: départs.zip added

comment:1 Changed 12 years ago by warmerdam

Cc: tamas warmerdam added
Component: defaultOGR_SF
Keywords: SQL added
Milestone: 1.5.2
Owner: changed from warmerdam to Mateusz Łoskot

I believe the key is to improve this function in gdal/ogr/swq.c:

static int swq_isalphanum( char c )

{

    if( (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')
        || (c >= '0' && c <= '9') || c == '.' || c == '+' || c == '-'
        || c == '_' || c == '*' )
        return TRUE;
    else
        return FALSE;
}

It should be sufficient to extent it to treat all characters >= 128 as alphanumeric though some case may be required for systems with signed characters. Referring to Mateusz for when he is back on contract.

comment:2 Changed 12 years ago by warmerdam

Priority: normalhigh

Elevating priority. I'd like to see this fixed for 1.5.1, with a test case added in the trunk autotest.

comment:3 Changed 12 years ago by warmerdam

Milestone: 1.5.21.5.1

comment:4 Changed 12 years ago by tamas

I would also add support for the quoted strings to be 'blindly' tokenized.

Like for "select 'NomDépart?' from 'Départs'"

The tokenizer would automatically return the string between the quotes without calling swq_isalphanum at all.

comment:5 Changed 12 years ago by mloskot

Priority: highhighest
Status: newassigned

comment:6 Changed 12 years ago by Mateusz Łoskot

Frank,

I'm not sure if extending the character set if a good idea. The character mentioned in the report is non-ASCII character and it's value overflows char type, means

char c = 'é';

stores negative value in c. Actually, >=128 may mean relatively high unsigned value or negative value. Honestly, I'm not sure what solution is portable.

comment:7 Changed 12 years ago by Mateusz Łoskot

I've forgot to add that if we assume signed char, following solution should be sufficient:

if (c < 128 && isalnum(c))
{
   return true;
}

comment:8 Changed 12 years ago by warmerdam

Priority: highesthigh

Mateusz,

Note the current test allows a number of characters that are not isalnum() such as '*'. I'm not sure I know why this is included, but we should not remove it without some clear thought.

I do not see how "c < 128 && isalnum(c)" comes even close to accomplishing the goal of this ticket. Are you suggesting that isalnum() knows about accented characters? We absolutely do not want this test to be based on the current locale settings since we know that is a recipe for disaster.

Can't we just use the current test with one additional test? "
((unsigned char) c) > 127"

Please make sure a test of this makes it into the ogr_sql autotest script.

I also like Tamas' suggestion with regard to quoting, but it might be best to leave that till after 1.5.1 and perhaps even open a distinct ticket for the issue since it is likely to be more involved.

PS. I prefer to reserve "highest" priority for tickets I consider release blockers.

comment:9 Changed 12 years ago by Mateusz Łoskot

Frank,

I used isalnum function above because initially I misunderstood the issue. I thought it needs to be solved using current locale.

I've applied patch to trunk (r13937) along with test case (r13938) and the problem is (almost) fixed.

The table is correctly recognized and following query does not complain about unrecognized table but unrecognized field:

$ ogrinfo départs.shp -sql "select NomDépart from Départs"
INFO: Open of `départs.shp'
      using driver `ESRI Shapefile' successful.
ERROR 1: SQL: Unrecognised field name NomDépart.

Using asterix reports all fields but OGR does not seem to preserve encoding of field names:

$ ogrinfo départs.shp -sql "select * from Départs"
INFO: Open of `départs.shp'
      using driver `ESRI Shapefile' successful.

Layer name: Départs
Geometry: Point
Feature Count: 3
Extent: (0.158000, 0.228000) - (0.690000, 0.640000)
Layer SRS WKT:
(unknown)
ID: Integer (8.0)
NOMD?PART: String (45.0)
OGRFeature(Départs):0
  ID (Integer) = 0
  NOMD?PART (String) = Fr?d?ric
  POINT (0.158 0.64)
...

This issue is also commented in the ogr_sql_test.py:427

So, the problem is semi-fixed only. Any thoughts how to handle the encoding issue?

comment:10 Changed 12 years ago by tamas

Mateusz,

It seems these changes have broken most of the builders because of the UTF-8 charset in the filename cannot be handled by buildbot properly. Do you have a solution for this issue? I guess it might be a bug in the buildbot implementation or so.

Tamas

comment:11 Changed 12 years ago by Mateusz Łoskot

Thanks Tamas for reporting this problem. As we've discussed on IRC, I replaced these files with files having non-UTF filenames and added VRT proxy (r13949)

comment:12 Changed 12 years ago by warmerdam

Milestone: 1.5.11.5.2
Priority: highnormal

I think the remaining problems can be deferred to 1.5.2. I don't know the solution because I don't know where the problem is happening (are the accented characters actually in the .dbf file?)

comment:13 Changed 12 years ago by mloskot

Frank, In the SQL query

"select NomDépart from Départs"

token NomDépart? is a name of attribute from dbf file. It seems that accents in the name of the attribute are not preserved correctly, so attribute names comparison fails.

comment:14 Changed 10 years ago by warmerdam

Milestone: 1.5.4
Owner: changed from Mateusz Łoskot to chaitanya
Status: assignednew

Chaitanya,

Please pursue as time permits, not considered release critical.

comment:15 Changed 8 years ago by Even Rouault

Milestone: 1.9.0
Resolution: fixed
Status: newclosed

Works now in trunk. Probably a combination of OGR 1.8.0 rewritten OGR engine, and perhaps also encoding support in shapefile driver of 1.9.0dev

Test reenabled in r22770

Note: See TracTickets for help on using tickets.