RFC 52: Strict OGR SQL quoting
Author: Even Rouault
Contact: even dot rouault at spatialys dot com
Status: adopted, implemented in GDAL 2.0
This RFC proposes that OGR SQL enforces strict quoting rules for SQL literals and identifiers
Currently the OGR SQL engine deals indifferently with single quote characters (') and double quote characters ("), although they can be used in 2 different contexts :
- to specify string literals
- to specify column or table names (when they need quoting)
SQL 92 mandates that string literals are surrounded with single quote characters whereas quoted identifiers (column, table names) are surrounded with double quote charactes.
<delimited identifier> ::= <double quote> <delimited identifier body> <double quote> <character string literal> ::= <quote> [ <character representation> ... ] <quote>
Current OGR behaviour is generally fine, except in some situations when specifying columns in a SELECT statement or in a WHERE expression. The OGR SQL engine tries to determine the intent with the following rule : if the quoted string matches a column name, then it is assumed to be a column identifier, otherwise a string litteral. But sometimes (see http://trac.osgeo.org/gdal/ticket/5428), there are situations where filters such as "MyField?" = 'MYFIELD' are needed, and currently they always evaluate to TRUE.
To avoid any ambiguities and more conformant to SQL, string literals should be single-quoted, and identifiers (column/table names) unquoted or surrounded by double-quote charachter if conflicting with reserved keywords.
Implementation will be done by Even Rouault (Spatialys).
The proposed implementation lies in the "rfc52_stricter_sql" branch of the https://github.com/rouault/gdal2/tree/rfc52_stricter_sql repository.
The list of changes : https://github.com/rouault/gdal2/compare/rfc52_stricter_sql
This change affects compatibility in the sense that users that relied on the current permissive behaviour will have to adapt their SQL expressions. This might be quite tricky to detect as it will not always cause obvious runtime failures. For example " SELECT 'i_thought_this_would_be_interpreted_as_a_column_name_but_now_it_is_a_string_litteral' FROM atable " will run without error but not produce the previously expected result. So applications will have to review how they build SQL filters ( SetAttributeFilter?() ) and expressions ( ExecuteSQL() )
Expressions fixed to be compliant with stricter rules will also work with older GDAL versions.
A prominent warning should also be put on the OGR SQL documentation page to advertize the change of behaviour
Tests done with SQLite show that it is sometimes tolerant to misuse of quote characters, but in the reverse way as current OGR. So it will accept SELECT "non_existing_column" FROM 'atable' and interpret it as SELECT 'non_existing_column' FROM "atable". On the contrary, PostgreSQL will not accept it. I'd be rather inclined to follow PostgreSQL stricter approach to keep things simple and understandable: "string literals should be single-quoted, and identifiers (column/table names) unquoted or surrounded by double-quote charachter"
The Python autotest suite will be fixed to still pass.
+1 from JukkaR, TamasS, DanielM and EvenR