wiki:RFC59-Draft

Version 6 (modified by sdlime, 14 years ago) ( diff )

--

RFC 59 - MapServer Expression Parser Overhaul

Overview

This is a draft RFC addressing 1) how the Bison/Yacc parser for logical expressions is implemented and 2) where in the MapServer code the parser can be used. This RFC could have broader impacts on query processing depending on additional changes at the driver level, specifically the RDBMS ones. Those changes don't have to occur for this to be a useful addition.

A principle motivation for this work is to support OGC filter expressions in a single pass in a driver-independent manner.

All of the work detailed here is being prototyped in a sandbox, visit:

http://svn.osgeo.org/mapserver/sandbox/sdlime/common-expressions/mapserver

Existing Expression Parsing

The existing logical expression handling in MapServer works like so:

1) duplicate expression string
2) substitute shape attributes into string (e.g. '[name]' => 'Anoka')
3) parse with yyparse()

The parser internally calls yylex() for its tokens. Tokens are the smallest pieces of an expression.

Advantages

  • it's simple and it works

Disadvantages

  • limited by substitution to strings, no complex types can be handled
  • have to perform the substitution and tokenize the resulting string for every feature

Proposed Technical Changes

This RFC proposes a number of technical changes. The core change, however, involved updating the way logical expressions work. Additional capitalize on this to bring additional capabilities to MapServer.

Core Parser Update

I propose moving to a setup where a logical expression is tokenized once (via our Flex-generated lexer) and then Bison/Yacc parser works through tokens (via a custom version of yylex() defined in mapparser.y) as necessary for each feature. This eliminates the substitution and tokenize steps currently necessary and opens up possibilities for supporting more complex objects in expressions. Basically we'd hang a list/array of tokens off an expressionObj, populate it in msLayerWhichItems() and leverage the tokens as needed in the parser. The following new structs and enums are added to mapserver.h:

enum MS_TOKEN_LOGICAL_ENUM { MS_TOKEN_LOGICAL_AND=100, MS_TOKEN_LOGICAL_OR, MS_TOKEN_LOGICAL_NOT };
enum MS_TOKEN_LITERAL_ENUM { MS_TOKEN_LITERAL_NUMBER=110, MS_TOKEN_LITERAL_STRING, MS_TOKEN_LITERAL_TIME, MS_TOKEN_LITERAL_SHAPE };
enum MS_TOKEN_COMPARISON_ENUM {
  MS_TOKEN_COMPARISON_EQ=120, MS_TOKEN_COMPARISON_NE, MS_TOKEN_COMPARISON_GT, MS_TOKEN_COMPARISON_LT, MS_TOKEN_COMPARISON_LE, MS_TOKEN_COMPARISON_GE, MS_TOKEN_COMPARISON_IEQ,
  MS_TOKEN_COMPARISON_RE, MS_TOKEN_COMPARISON_IRE,
  MS_TOKEN_COMPARISON_IN, MS_TOKEN_COMPARISON_LIKE,
  MS_TOKEN_COMPARISON_INTERSECTS, MS_TOKEN_COMPARISON_DISJOINT, MS_TOKEN_COMPARISON_TOUCHES, MS_TOKEN_COMPARISON_OVERLAPS, MS_TOKEN_COMPARISON_CROSSES, MS_TOKEN_COMPARISON_WITHIN, MS_TOKEN_COMPARISON_CONTAINS,
  MS_TOKEN_COMPARISON_BEYOND, MS_TOKEN_COMPARISON_DWITHIN
};
enum MS_TOKEN_FUNCTION_ENUM { MS_TOKEN_FUNCTION_LENGTH=140, MS_TOKEN_FUNCTION_TOSTRING, MS_TOKEN_FUNCTION_COMMIFY, MS_TOKEN_FUNCTION_AREA, MS_TOKEN_FUNCTION_ROUND, MS_TOKEN_FUNCTION_FROMTEXT };
enum MS_TOKEN_BINDING_ENUM { MS_TOKEN_BINDING_DOUBLE=150, MS_TOKEN_BINDING_INTEGER, MS_TOKEN_BINDING_STRING, MS_TOKEN_BINDING_TIME, MS_TOKEN_BINDING_SHAPE };

typedef union {
  double dblval;
  int intval;
  char *strval;
  struct tm tmval;
  shapeObj *shpval;
  attributeBindingObj bindval;
} tokenValueObj;

typedef struct {
  int token;
  tokenValueObj tokenval;
} tokenObj;

Some of these definitions hint at other features that will be detailed later. When we convert an expression string into a series of tokens we also store away the value associated with that token (if necessary). In many cases the token value is a litteral (string or number), in other cases its a reference to a feature attribute. In the latter case we use the attributeBindingObj already in use by MapServer to encapsulate the information necessary to quickly access the correct data (typically an item index value).

We always have had to make expression data available to the parser (and lexer) via temporary global variables. That would continue to be the case here, although different data are shared in this context. One thing to note is that once an expression is tokenized we no longer have to rely on the flex-generated lexer so, in theory, it should be easier to implement a thread-safe parser should we choose to do so.

Extending the Yacc grammar to support spatial operators

The mapserver.h definitions above allow for using shapeObj's within the Yacc grammar (we also define a shapeObj as a new base token type within mapparser.y). There are two types of shape-related tokens: 1) a shape binding, that is, a reference to the geometry being evaluated and 2) shape literals, shapes described as WKT within the expression string. For example:

In the expression:
  EXPRESSION (fromText('POINT(500000 5000000)') Intersects [shape])

  1) fromText('POINT(500000 5000000)') defines a shape literal
  2) [shape] is a shape binding

We can use these tokens in the grammar to implement all of the MapServer supported (via GEOS) logical operators. Note that in the above example fromText() appears as a function operating on a string. This is handled as a special case when tokenizing the string since we only want to do this once. So we create a shape literal based on the enclosed WKT string at this time.

Extending the grammar to support spatial functions

By supporting the use of more complex objects we can support functions on those objects. We could write:

  EXPRESSION (area([shape]) < 100000)

or rely on even more of the GEOS operators. (Note: only the area function is present in the sandbox.) To do this we need to somehow store a shapeObj's scope so that working copies can be free'd as appropriate. I would propose adding a int scope to the shapeObj structure definition. Shapes created in the course of expression evaluation would be tagged as having limited scope while literals or bindings would be left alone and presumably destroyed later. This saves having to make copies of shapes which can be expensive.

Context Sensitive Parsing

We could have done this all along but this would be an opportune time to implement context sensitive parser use. Presently we expect the parser to produce a true or false result, but certainly aren't limited to that. For example, at the moment you can write:

  CLASS
    ...
    TEXT ([area] acres)
  END

It looks as if the TEXT value is an expression (and it is stored as such) but it's not evaluated as one. It would be very useful to treat this as a true expression. This would open up a world of formatting options for drivers that don't support it otherwise (e.g. shapefiles). Ticket 2950 is an example where this would come in handy. Within the sandbox I've added toString, round and commify functions so that you can write:

  TEXT (commify(toString([area]*0.000247105381,"%.2f")) + ' ac')

Which converts area from sq. meters to acres, truncates the result to two decimal places adds commas (213234.123455 => 213,234.12) for crowd pleasing display.

Expression Use Elsewhere

Backwards Compatibility Issues

Security Issues

Attachments (2)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.