Opened 18 years ago

Closed 18 years ago

#1753 closed defect (fixed)

JavaMapscript handles umlauts wrong.

Reported by: umn-ms@… Owned by: unicoletti
Priority: high Milestone:
Component: MapScript-Java Version: unspecified
Severity: normal Keywords:
Cc: mapserver@…

Description

Java/Mapscript doesn't handle german umlauts correctly.
Discussion can be found in 
http://search.gmane.org/?query=%22%5BUMN_MAPSERVER-USERS%5D+Java+Mapscript+-+querybyattribut%22&email=&group=&sort=relevance&DEFAULTOP=and&query=

mapscript_wrap.c converts String-parameters to UTF-8.
From there they are passed directly to mapserver-kernel.
Mapserver-Kernel expects one-byte-encoding (usually ISO-8859-1 ?!)

Attachments (5)

1753-umlauts.patch (1.8 KB ) - added by unicoletti 18 years ago.
typemap for char* to correctly handle conversiom from java to system dependent
subset.zip (70.4 KB ) - added by mapserver@… 18 years ago.
Sample Data
qba.map (421 bytes ) - added by mapserver@… 18 years ago.
Mapfile which includes the sample data
1753-umlauts-ver2.patch (11.0 KB ) - added by unicoletti 18 years ago.
Improved version of previous patch
1753-umlauts-ver3.patch (11.1 KB ) - added by unicoletti 18 years ago.
Modified to work with jdk 1.4

Download all attachments as: .zip

Change History (16)

comment:1 by unicoletti, 18 years ago

Status: newassigned
What mapserver expects depends on the setting of the LANG and LC_* variables,
anyway I was able to work around this issue by converting the sample dbf to
unicode. After conversion queries worked as expected.

While this is probably not the way things should work is indeed a solution for
this show-stopper issue.

I have done conversion by hand by dumping the file and then adding with dbfdump
and dbfcreate/dbfadd with a modified LANG=de_DE.UTF-8
If you know perl the whole conversion can be fully automated.

by unicoletti, 18 years ago

Attachment: 1753-umlauts.patch added

typemap for char* to correctly handle conversiom from java to system dependent

comment:2 by umn-ms@…, 18 years ago

Hi Umberto

Looked at your patch. Didn't understand every detail, since I'm not familiar
with swig. But the approach is obviously very good! Especially it's a good to
use the bulky Java-Encoding-functionality instead of writing an own!

Questions/Hints:
- The reverse-direction is missing, isn't it?
  When Mapserver delivers a String to Java the appropriate 
  String-Constructor (String(byte[] bytes) ) or something like should be used.

- String.getBytes() is used. Java-Doc says:
  "Encodes this String into a sequence of bytes using 
   the platform's default charset"
  I ask myself wether it would be better/possible to specify the CharSet
  which is used somewehere explicitly. 
  Example: Static Field in mapObj which can be set by the Java-Programmer. 
           Or specify somehow during  build-process?
  Having an easy mechanims to overwrite the default without side-effects on the
  machine could be helpfull.
  (I'm not shure about this point. But when using ArcIMS I always found it 
   awkward and dangerous that I had to configure global language settings in 
   a specific way just to make ArcIMS run.)

- I cannot promise "extensive testing". But could you mail me the 
  typedefinition? (What to do with the "patch"? :-)
  I would like to try to generate mapscript_wrap.c. From there on I could maybe
  assist by staring on the generated code to find possible bugs.

By
Benedikt


comment:3 by mapserver@…, 18 years ago

Cc: mapserver@… added

comment:4 by unicoletti, 18 years ago

> Hi Umberto
>
> Looked at your patch. Didn't understand every detail, since I'm not familiar
> with swig. But the approach is obviously very good! Especially it's a good to
> use the bulky Java-Encoding-functionality instead of writing an own!
>
> Questions/Hints:
> - The reverse-direction is missing, isn't it?
>  When Mapserver delivers a String to Java the appropriate
>  String-Constructor (String(byte[] bytes) ) or something like should be used.
>

Yes, it is but if this works then I can easily write the typemap to
handle the other case.

> - String.getBytes() is used. Java-Doc says:
>  "Encodes this String into a sequence of bytes using
>   the platform's default charset"
>  I ask myself wether it would be better/possible to specify the CharSet
>  which is used somewehere explicitly.

You already can, simply use the LANG and LC_* variables.

>  Example: Static Field in mapObj which can be set by the Java-Programmer.
>           Or specify somehow during  build-process?
>  Having an easy mechanims to overwrite the default without
> side-effects on the
>  machine could be helpfull.
>  (I'm not shure about this point. But when using ArcIMS I always found it
>   awkward and dangerous that I had to configure global language settings in
>   a specific way just to make ArcIMS run.)

That's probably a limitation of Windows. In linux you can have every
program run with a different character set.


>
> - I cannot promise "extensive testing". But could you mail me the
>  typedefinition? (What to do with the "patch"? :-)
>  I would like to try to generate mapscript_wrap.c. From there on I
> could maybe
>  assist by staring on the generated code to find possible bugs.
>

That's my fault: I was in a hurry and simply dumped the patch online.
To apply it cd into the mapscript/java directory and then apply the
patch with:

patch < /path/to/patch/file

The only file that should change is javamodule.i.

Best regards,
Umberto

by mapserver@…, 18 years ago

Attachment: subset.zip added

Sample Data

by mapserver@…, 18 years ago

Attachment: qba.map added

Mapfile which includes the sample data

by unicoletti, 18 years ago

Attachment: 1753-umlauts-ver2.patch added

Improved version of previous patch

comment:5 by unicoletti, 18 years ago

attachments.isobsolete: 01

by unicoletti, 18 years ago

Attachment: 1753-umlauts-ver3.patch added

Modified to work with jdk 1.4

comment:6 by unicoletti, 18 years ago

attachments.isobsolete: 01

comment:7 by umn-ms@…, 18 years ago

Hi Umberto

I had a look on mapscript_wrap.c. One little thing:
Memory for the strings, which are generated by JNU_GetStringNativeChars is
created  with "alloc". So it should be freed with "free" instead of
"ReleaseStringUTFChars".
Maybe/problably it's the same. But this could depend on the Java-implementation.

By
Benedikt

comment:8 by unicoletti, 18 years ago

I have applied the proposed patch with a little modification to prevent segfault
when the string to convert is null and some examples to cvs HEAD.
Will wait some days and if there is no negative feedback close the issue.

comment:9 by sdlime, 18 years ago

Component: MapScriptMapScript-Java

comment:10 by sdlime, 18 years ago

Status: assignednew

comment:11 by unicoletti, 18 years ago

Resolution: fixed
Status: newclosed
No further comments/complaints so marking as Closed
Note: See TracTickets for help on using tickets.