Opened 21 years ago

Closed 14 years ago

Last modified 14 years ago

#471 closed enhancement (fixed)

Update split function...

Reported by: sdlime Owned by: aboudreault
Priority: high Milestone: 6.0 release
Component: MapServer C Library Version: 4.1
Severity: minor Keywords:
Cc: aboudreault, dmorissette, warmerdam

Description (last modified by sdlime)

From steve:

- Fixing behavior of split function (soon to be msStringSplit). At present it
ignores consecutive delimeters. This presents a problem with CSV joined tables
where ,, actually means a field is empty. I'd like to change the behavior to
return an empty string.

From dan:

- While you're at it, I would like to that you also add an option to 
honour strings.  Actually, there is a function called 
CSLTokenizeString2() gdal/port/cpl_string.c that does that and more and 
that you could simply copy after a few small changes.

Change History (13)

comment:1 by sdlime, 21 years ago

Severity: normalenhancement
Status: newassigned
Version: unspecified4.1

comment:2 by sdlime, 16 years ago

Description: modified (diff)
Milestone: 5.4 release

Still a friggin' problem. Giving this one a milestone so it actually gets attention.

Steve

comment:3 by dmorissette, 14 years ago

Milestone: 5.6 release6.0 release

Steve, is the issue mostly with CSV files or are there other uses of split that fail? Alan could take care of this in 6.0 if you want. Just reassign and provide more details on the needs.

comment:4 by sdlime, 14 years ago

Cc: aboudreault added

Let's see, 6 years later... I don't remember but I bet it was CSV related only. Happy to re-assign!

Steve

comment:5 by dmorissette, 14 years ago

Cc: dmorissette added
Owner: changed from sdlime to aboudreault
Status: assignednew

comment:6 by aboudreault, 14 years ago

Cc: warmerdam added

For this bug, there are two options:

  • Keep the original msStringSplit() function and create a new msStringSplitComplex() function that will simply use CSLTokenizeString*() functions of gdal. If gdal is not available, the function will use the msStringSplit().
  • Copy the code of the CSLTokenizeString2() inside MapServer. Of course, there are a few changes to be done to get it working because it uses some other functions of CPL.

Personnally, I think using the CPL component is the best option. It would also add the honour strings option and more. Steve, what would you prefer?

comment:7 by sdlime, 14 years ago

The CPL function would be fine with me. No need to reinvent the wheel...

Steve

comment:8 by aboudreault, 14 years ago

The function msStringSplitComplex has been added:
char msStringSplitComplex(const char *string, const char *delimiters, int *num_tokens, int CSLTFlags);

Available Flags:

  • CSLT_ALLOWEMPTYTOKENS: allow the return of empty tokens when two delimiters in a row occur with no other text between them. If not set, empty tokens will be discarded
  • CSLT_STRIPLEADSPACES: strip leading space characters from the token (as reported by isspace())
  • CSLT_STRIPENDSPACES: strip ending space characters from the token (as reported by isspace())
  • CSLT_HONOURSTRINGS: double quotes can be used to hold values that should not be broken into multiple tokens
  • CSLT_PRESERVEQUOTES: string quotes are carried into the tokens when this is set, otherwise they are removed;
  • CSLT_PRESERVEESCAPES: if set backslash escapes (for backslash itself, and for literal double quotes) will be preserved in the tokens, otherwise
  • the backslashes will be removed in processing.

If mapserver is compiled without gdal support, the function will use msStringSplit rather than CSLTokenizeString2(). In those case, the CSLTFlags are of course ignored and only the first char of the delimiters will be used.

We could also remove msStringTokenize function from the source in the future.

Is the behavior ok?

Committed in r9594.

comment:9 by aboudreault, 14 years ago

emm... I think accepting a const char* as parameter for the delimiters and only use the first char could result in a misuse of the function (when gdal is not available) because the other chars will be silently ignored. Do you think it's ok to let the function as it is or we should take care of that possible problem?

comment:10 by dmorissette, 14 years ago

Good point. Since there is currently no requirement for multi-char delimiters anywhere in MapServer, the simplest solution is probably to always take a single char as argument in msStringSplitComplex(), and to copy it to a string buffer and pass it as a string to the CPL function.

comment:11 by dmorissette, 14 years ago

... and if there is a ever a need for multi-char delimiters somewhere in MapServer we can deal with the issue at that time.

comment:12 by aboudreault, 14 years ago

Resolution: fixed
Status: newclosed

Fixed and committed in r9594 and r9606.

comment:13 by aboudreault, 14 years ago

Daniel notified an important possible issue with the function. Depending if gdal support is enabled or not, the function will allocate the memory in different areas. All list allocated using CSLTokenizeString2 have to be freed by the CSLDestroy method else the application will crash on Windows. That said, since there is no way to determine if the list have been allocated by mapserver or by the gdal library (other than using ifdef USE_GDAL everytime we use the msStringSplit function, which is not a convenient solution)... we decided to simply copy the function in MapServer. This remove the dependency of gdal for that function and the complexity to determine what function to call to free the list.

The function is still named msStringSplitComplex. Here's the flags:

  • MS_HONOURSTRINGS
  • MS_ALLOWEMPTYTOKENS
  • MS_PRESERVEQUOTES
  • MS_PRESERVEESCAPES
  • MS_STRIPLEADSPACES
  • MS_STRIPENDSPACES

The call of msStringSplit has also been replaced by msStringSplitComplex for the CSV case to fix the initial problem.

Fixed in r9610.

Note: See TracTickets for help on using tickets.