wiki:rfc28_sqlfunc

Version 6 (modified by warmerdam, 14 years ago) ( diff )

--

RFC 28: OGR SQL Generalized Expressions

Author: Frank Warmerdam
Contact: warmerdam@…
Status: Development

Summary

The OGR SQL evaluation engine currently does not allow general purpose functions to be applied to columns in SELECT statements. Some special purpose functions are supported (ie. CAST, COUNT, AVG, MAX, MIN, and SUM), but not as part of more general expressions and generally in very constrained arrangements. It is the intent of this work item to extend the OGR SQL engine to support fairly general purpose expression evaluation in the output field list of OGR SQL SELECT statements and to implement a few preliminary processing functions in a fashion compatible with standard SQL. As well, expressions used in WHERE clauses will be generalized to support evaluation of non-logical operations, such as math and functions. For example, after implementation it is intended the following could be evaluated.

SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM customers
SELECT id, "Regional Road" AS roadtypename FROM roads where roadtype=3
SELECT (subtotal+salestax) as totalcost from invoice_info where 100 <= (subtotal+salestax)

Technical Approach

Currently logical expressions take a very constrained format with the base elements having to be of the form <field> <operation> <constant_value>. As part of the generalization non-logical expressions will be supported and the left and right side of operators will be equally treated. The current OGR SQL parser is ad hoc and cannot be practically extended to this generalized form of expression. So at this point we will move to a yacc/bison based parser grammar for expressions.

Since it is not really practical to continue to use the existing ad hoc SELECT parsing when parts of the SELECT statement are expressions, the yacc/bison based parser will also be used to parse the whole SELECT statement.

The current expression node will be generalized to have 0-n children (for arguments to functions), and to treat field references and constant values as distinct leaf nodes rather than embedding this information in a node defining an operation.

It should be noted that as a side effect WHERE clauses will also support more general expressions - not just logical comparisons. For instance:

SELECT * WHERE (subtotal+salestax) > 100.0

New Functions

  • Math: +, -, *, /,
  • String: CONCAT, SUBSTR

SELECT Rules

SELECT <field-list> FROM <table_def>
     [LEFT JOIN <table_def> 
      ON [<table_ref>.]<key_field> = [<table_ref>.].<key_field>]*
     [WHERE <where-expr>] 
     [ORDER BY <sort specification list>]

<field-list> ::= <column-spec> [ { , <column-spec> }... ]

<column-spec> ::= <field-spec> [ <as clause> ]
                 | CAST ( <field-spec> AS <data type> ) [ <as clause> ]

<field-spec> ::= [DISTINCT] <field_ref>
                 | <cumm-field-func> ( [DISTINCT] <field-ref> )
                 | <field-expr>
                 | Count(*)

<field-expr> ::= <field_ref>
                 | <constant-value>
                 | <field-expr> <field-operator> <field-expr>
                 | <field-func> ( <field-expr-list> )
                 | ( <field-expr> )

<field-expr-list> ::= field-expr
                 |  field-expr , field-expr-list
                 |  <empty>

<as clause> ::= [ AS ] <column_name>

<data type> ::= character [ ( field_length ) ]
                | float [ ( field_length ) ]
                | numeric [ ( field_length [, field_precision ] ) ]
                | integer [ ( field_length ) ]
                | date [ ( field_length ) ]
                | time [ ( field_length ) ]
                | timestamp [ ( field_length ) ]

<cumm-field-func> ::= AVG | MAX | MIN | SUM | COUNT

<field-operator> ::= '+' | '-' | '/' | '*' | '||'

<field-func> ::= CONCAT | SUBSTR

<field_ref>  ::= [<table_ref>.]field_name

Special Notes

The existing CAST, and column summary functions COUNT, AVG, MIN, MAX and SUM will be treated more-or-less as functions but constrained to be root operations on column definitions and treated as a special case (still).

Compatability Implications

There are no anticipated backward compatability problems. For the most part the change results in some OGR SQL statements to work that would previously have generated an error.

Performance Implications

I am hopeful that the evaluation speed will not be significantly different for simple selections, but each output field will need to be evaluated as an expression (with presumably one value-from-field node).

Implementation Plan

Frank Warmerdam will implement, test and document for the GDAL/OGR 1.8 release.

Testing

All existing OGR SQL test suite tests should pass, and autotest/ogr/ogr_sql_test.py and perhaps other scripts will be extended for the new capabilities.

Documentation

The OGR SQL document will be extended to describe the new capabilities.

Note: See TracWiki for help on using the wiki.