Ticket #471 (closed enhancement: fixed)

Opened 10 years ago

Last modified 4 years ago

Update split function...

Reported by: sdlime Owned by: aboudreault
Priority: high Milestone: 6.0 release
Component: MapServer C Library Version: 4.1
Severity: minor Keywords:
Cc: aboudreault, dmorissette, warmerdam

Description (last modified by sdlime) (diff)

From steve:

- Fixing behavior of split function (soon to be msStringSplit). At present it
ignores consecutive delimeters. This presents a problem with CSV joined tables
where ,, actually means a field is empty. I'd like to change the behavior to
return an empty string.

From dan:

- While you're at it, I would like to that you also add an option to 
honour strings.  Actually, there is a function called 
CSLTokenizeString2() gdal/port/cpl_string.c that does that and more and 
that you could simply copy after a few small changes.

Change History

Changed 10 years ago by sdlime

  • status changed from new to assigned
  • version changed from unspecified to 4.1
  • severity changed from normal to enhancement

Changed 5 years ago by sdlime

  • description modified (diff)
  • milestone set to 5.4 release

Still a friggin' problem. Giving this one a milestone so it actually gets attention.

Steve

Changed 4 years ago by dmorissette

  • milestone changed from 5.6 release to 6.0 release

Steve, is the issue mostly with CSV files or are there other uses of split that fail? Alan could take care of this in 6.0 if you want. Just reassign and provide more details on the needs.

Changed 4 years ago by sdlime

  • cc aboudreault added

Let's see, 6 years later... I don't remember but I bet it was CSV related only. Happy to re-assign!

Steve

Changed 4 years ago by dmorissette

  • cc dmorissette added
  • owner changed from sdlime to aboudreault
  • status changed from assigned to new

Changed 4 years ago by aboudreault

  • cc warmerdam added

For this bug, there are two options:

  • Keep the original msStringSplit() function and create a new msStringSplitComplex() function that will simply use CSLTokenizeString*() functions of gdal. If gdal is not available, the function will use the msStringSplit().
  • Copy the code of the CSLTokenizeString2() inside MapServer. Of course, there are a few changes to be done to get it working because it uses some other functions of CPL.

Personnally, I think using the CPL component is the best option. It would also add the honour strings option and more. Steve, what would you prefer?

Changed 4 years ago by sdlime

The CPL function would be fine with me. No need to reinvent the wheel...

Steve

Changed 4 years ago by aboudreault

The function msStringSplitComplex has been added:
char **msStringSplitComplex(const char *string, const char *delimiters, int *num_tokens, int CSLTFlags);

Available Flags:

  • CSLT_ALLOWEMPTYTOKENS: allow the return of empty tokens when two delimiters in a row occur with no other text between them. If not set, empty tokens will be discarded
  • CSLT_STRIPLEADSPACES: strip leading space characters from the token (as reported by isspace())
  • CSLT_STRIPENDSPACES: strip ending space characters from the token (as reported by isspace())
  • CSLT_HONOURSTRINGS: double quotes can be used to hold values that should not be broken into multiple tokens
  • CSLT_PRESERVEQUOTES: string quotes are carried into the tokens when this is set, otherwise they are removed;
  • CSLT_PRESERVEESCAPES: if set backslash escapes (for backslash itself, and for literal double quotes) will be preserved in the tokens, otherwise
  • the backslashes will be removed in processing.

If mapserver is compiled without gdal support, the function will use msStringSplit rather than CSLTokenizeString2(). In those case, the CSLTFlags are of course ignored and only the first char of the delimiters will be used.

We could also remove msStringTokenize function from the source in the future.

Is the behavior ok?

Committed in r9594.

Changed 4 years ago by aboudreault

emm... I think accepting a const char* as parameter for the delimiters and only use the first char could result in a misuse of the function (when gdal is not available) because the other chars will be silently ignored. Do you think it's ok to let the function as it is or we should take care of that possible problem?

Changed 4 years ago by dmorissette

Good point. Since there is currently no requirement for multi-char delimiters anywhere in MapServer, the simplest solution is probably to always take a single char as argument in msStringSplitComplex(), and to copy it to a string buffer and pass it as a string to the CPL function.

Changed 4 years ago by dmorissette

... and if there is a ever a need for multi-char delimiters somewhere in MapServer we can deal with the issue at that time.

Changed 4 years ago by aboudreault

  • status changed from new to closed
  • resolution set to fixed

Fixed and committed in r9594 and r9606.

Changed 4 years ago by aboudreault

Daniel notified an important possible issue with the function. Depending if gdal support is enabled or not, the function will allocate the memory in different areas. All list allocated using CSLTokenizeString2 have to be freed by the CSLDestroy method else the application will crash on Windows. That said, since there is no way to determine if the list have been allocated by mapserver or by the gdal library (other than using ifdef USE_GDAL everytime we use the msStringSplit function, which is not a convenient solution)... we decided to simply copy the function in MapServer. This remove the dependency of gdal for that function and the complexity to determine what function to call to free the list.

The function is still named msStringSplitComplex. Here's the flags:

  • MS_HONOURSTRINGS
  • MS_ALLOWEMPTYTOKENS
  • MS_PRESERVEQUOTES
  • MS_PRESERVEESCAPES
  • MS_STRIPLEADSPACES
  • MS_STRIPENDSPACES

The call of msStringSplit has also been replaced by msStringSplitComplex for the CSV case to fix the initial problem.

Fixed in r9610.

Note: See TracTickets for help on using tickets.