Opened 17 years ago
Closed 17 years ago
#2104 closed defect (invalid)
Java Mapscript classObj.setName() wrong characters on legend image
Reported by: | arkadi | Owned by: | unicoletti |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | MapScript-Java | Version: | |
Severity: | normal | Keywords: | |
Cc: |
Description
When non-ASCII data is passed to setName() to set class name, the characters on legend are incorrect. Please see first image (legend-bad.png) where setName() is called directly on String variable:
classObj c; c.setName(name);
And second image (legend-almost-good.png) is a result of:
c.setName(new String(name.getBytes("UTF-8")));
Which is almost fine, except single character "ā" (amacron) between "ņ" (ncedilla) and "ž" (zcaron) is broken. Not sure why this tricks works at all.
I pretty positive my setup got every other place right:
- The map is based on SHP data, DBF-s in CP1257 (single-byte) encoding. ENCODING CP1257 in MAP file is set. The output is correct.
- The points are from Oracle Spatial. Oracle NLS_LANG set to UTF-8 and Mapserver labelObj.setEncoding(UTF-8) was called. Labels displays correctly. Setting class expression on field that contains non-ASCII data (to draw layer points with different colors) also works.
Mapserver 4.10.1 from MS4W 2.2.3 on Windows XP. Sun Java 1.5.0_09. Tomcat 5.5.20 if that matters.
Attachments (9)
Change History (18)
by , 17 years ago
Attachment: | legend-bad.png added |
---|
follow-up: 2 comment:1 by , 17 years ago
Replying to arkadi:
- The points are from Oracle Spatial. Oracle NLS_LANG set to UTF-8 and Mapserver labelObj.setEncoding(UTF-8) was called. Labels displays correctly. Setting class expression on field that contains non-ASCII data (to draw layer points with different colors) also works.
Sorry, the above statement is only partially true. Expressions works only if Oracle NLS_LANG is set to single-byte CP1257 encoding. Labels on map works with both UTF-8 and CP1257. So looks like this is a related problem.
follow-up: 3 comment:2 by , 17 years ago
Status: | new → assigned |
---|
Replying to arkadi:
Replying to arkadi:
- The points are from Oracle Spatial. Oracle NLS_LANG set to UTF-8 and Mapserver labelObj.setEncoding(UTF-8) was called. Labels displays correctly. Setting class expression on field that contains non-ASCII data (to draw layer points with different colors) also works.
Sorry, the above statement is only partially true. Expressions works only if Oracle NLS_LANG is set to single-byte CP1257 encoding. Labels on map works with both UTF-8 and CP1257. So looks like this is a related problem.
Dealing with multi-byte encoding is known to be tricky in mapserver, but Java mapscript has excellent support for it, see the example mapscript/java/examples/QueryByAttributeUnicode.java in the source distribution of mapserver. The trick is to set environment, database (both client and server!) and shape files all to the same encoding.
I would avoid this:
c.setName(new String(name.getBytes("UTF-8")));
because it could confuse mapserver. Where did you get the the name String BTW?
follow-up: 4 comment:3 by , 17 years ago
Replying to unicoletti:
Dealing with multi-byte encoding is known to be tricky in mapserver, but Java mapscript has excellent support for it, see the example mapscript/java/examples/QueryByAttributeUnicode.java in the source distribution of mapserver. The trick is to set environment, database (both client and server!) and shape files all to the same encoding.
As you see on screenshots and from the content of the ticket report, every other place is right, except setName().
I would avoid this:
c.setName(new String(name.getBytes("UTF-8")));
That's why I used simply setName(name) in first place. The getBytes() hack is just to get some ideas what may be wrong with the implementation.
because it could confuse mapserver. Where did you get the the name String BTW?
The "name" is from database.
To offset possible problems with that, I just hardcoded a string with Russian and Latvian characters into the setName call
c.setName("qwerty - йцукенгш - ēūīāšģķļžčņ");
and compiled with javac -encoding utf-8. The resulted legend is attached (legend-constant-string.png).
comment:5 by , 17 years ago
Arkady, did you specify an ENCODING for the LABEL section of the LEGEND section of your mapfile? IMO doing that should fix it.
comment:6 by , 17 years ago
I didn't have any LEGEND sections in MAP file originally. Now I added
LEGEND LABEL TYPE TRUETYPE FONT arial ENCODING CP1257 SIZE 8 ANTIALIAS TRUE POSITION auto PARTIALS FALSE BUFFER 0 COLOR 0 0 200 FORCE FALSE END END
That make it work for Latvian characters. But Russian characters became "?". So one language at a time? Not that good. Setting UTF-8 produced garbage. Setting CP1251 still results ??? for Russian chars and Latvian chars replaced by some Russian chars (that probably have same code in 8-bit representation). See attached legend pictures. The server is on Windows platform and Latvian is set in regional settings.
This is kinda ugly, the data is there in Unicode, why we need to set encoding and such? I understand that there may be some internal Mapserver implementation details, but as far as Java developer is concerned, everything should just work automatically. For JNI there is GetStringUTFChars().
In case this issue is not resolvable, what the LEGEND label encoding should be set to? "Primary" 8-bit encoding for current Java locale (as set by -Duser.language/country, LANG/LC_CTYPE on Unix, or Regional settings on Windows)?
by , 17 years ago
Attachment: | legend-cp1257.png added |
---|
by , 17 years ago
Attachment: | legend-cp1251.png added |
---|
by , 17 years ago
Attachment: | legend-utf8.png added |
---|
comment:7 by , 17 years ago
Same problems applies to classObj.getName() function. In MAP file I have:
LAYER NAME dzelzcels_line_lv DATA dzelzcels_line_lv CLASS NAME "Dzelzceļš" STYLE MINSIZE 3 MAXSIZE 3 END END TYPE line STATUS ON END
File encoding is UTF-8.
See attached getname.png. Legend (on the right) displays perfectly, but, it looks like getName() is assuming the file is read in locale charset, and it tries to convert class name into UCS-2 from single-byte encoding, effectively doubling number of characters and producing garbage. The code is:
for (int i = 0; i < map.getNumlayers(); ++i) { layerObj l = map.getLayer(i); classObj c = l.getClass(0); if (c == null) continue; String name = c.getName(); if (name == null) continue; //name = new String(name.getBytes(), "UTF-8"); HashMap m = new HashMap(); m.put("id", i); m.put("name", name); m.put("visible", (l.getStatus() == mapscriptConstants.MS_ON)); a.add(m); }
Uncommenting new String(get.Bytes()) hack produces getname-hack.png, which, IMO, confirms the above theory and fixes almost every character.
by , 17 years ago
Attachment: | getname.png added |
---|
by , 17 years ago
Attachment: | getname-hack.png added |
---|
comment:8 by , 17 years ago
Converting MAP file to CP1257 encoding and removing getBytes() hack allows getName to function correctly, but then, the legend image is wrong. See getname-cp1257.png.
by , 17 years ago
Attachment: | getname-cp1257.png added |
---|
comment:9 by , 17 years ago
Resolution: | → invalid |
---|---|
Status: | assigned → closed |
Actually Mapserver behavior is correct. Setting everything to CP1257 - the native charset of Latvian Java locale, allows setName, getName, labels and legend to work correctly. Too bad it leaves us with only one language at time, until there is cross-platform UTF-8 enabled Java locale available.
Thank you for your time!
bad legend