Opened 17 years ago

Closed 17 years ago

#2104 closed defect (invalid)

Java Mapscript classObj.setName() wrong characters on legend image

Reported by: arkadi Owned by: unicoletti
Priority: normal Milestone:
Component: MapScript-Java Version:
Severity: normal Keywords:
Cc:

Description

When non-ASCII data is passed to setName() to set class name, the characters on legend are incorrect. Please see first image (legend-bad.png) where setName() is called directly on String variable:

classObj c; c.setName(name);

And second image (legend-almost-good.png) is a result of:

c.setName(new String(name.getBytes("UTF-8")));

Which is almost fine, except single character "ā" (amacron) between "ņ" (ncedilla) and "ž" (zcaron) is broken. Not sure why this tricks works at all.

I pretty positive my setup got every other place right:

  1. The map is based on SHP data, DBF-s in CP1257 (single-byte) encoding. ENCODING CP1257 in MAP file is set. The output is correct.
  2. The points are from Oracle Spatial. Oracle NLS_LANG set to UTF-8 and Mapserver labelObj.setEncoding(UTF-8) was called. Labels displays correctly. Setting class expression on field that contains non-ASCII data (to draw layer points with different colors) also works.

Mapserver 4.10.1 from MS4W 2.2.3 on Windows XP. Sun Java 1.5.0_09. Tomcat 5.5.20 if that matters.

Attachments (9)

legend-bad.png (20.3 KB ) - added by arkadi 17 years ago.
bad legend
legend-almost-good.png (7.1 KB ) - added by arkadi 17 years ago.
almost good legend
legend-constant-string.png (7.3 KB ) - added by arkadi 17 years ago.
legend from hardcoded string
legend-cp1257.png (8.2 KB ) - added by arkadi 17 years ago.
legend-cp1251.png (7.7 KB ) - added by arkadi 17 years ago.
legend-utf8.png (7.3 KB ) - added by arkadi 17 years ago.
getname.png (22.1 KB ) - added by arkadi 17 years ago.
getname-hack.png (17.4 KB ) - added by arkadi 17 years ago.
getname-cp1257.png (18.2 KB ) - added by arkadi 17 years ago.

Download all attachments as: .zip

Change History (18)

by arkadi, 17 years ago

Attachment: legend-bad.png added

bad legend

by arkadi, 17 years ago

Attachment: legend-almost-good.png added

almost good legend

in reply to:  description ; comment:1 by arkadi, 17 years ago

Replying to arkadi:

  1. The points are from Oracle Spatial. Oracle NLS_LANG set to UTF-8 and Mapserver labelObj.setEncoding(UTF-8) was called. Labels displays correctly. Setting class expression on field that contains non-ASCII data (to draw layer points with different colors) also works.

Sorry, the above statement is only partially true. Expressions works only if Oracle NLS_LANG is set to single-byte CP1257 encoding. Labels on map works with both UTF-8 and CP1257. So looks like this is a related problem.

in reply to:  1 ; comment:2 by unicoletti, 17 years ago

Status: newassigned

Replying to arkadi:

Replying to arkadi:

  1. The points are from Oracle Spatial. Oracle NLS_LANG set to UTF-8 and Mapserver labelObj.setEncoding(UTF-8) was called. Labels displays correctly. Setting class expression on field that contains non-ASCII data (to draw layer points with different colors) also works.

Sorry, the above statement is only partially true. Expressions works only if Oracle NLS_LANG is set to single-byte CP1257 encoding. Labels on map works with both UTF-8 and CP1257. So looks like this is a related problem.

Dealing with multi-byte encoding is known to be tricky in mapserver, but Java mapscript has excellent support for it, see the example mapscript/java/examples/QueryByAttributeUnicode.java in the source distribution of mapserver. The trick is to set environment, database (both client and server!) and shape files all to the same encoding.

I would avoid this:

c.setName(new String(name.getBytes("UTF-8")));

because it could confuse mapserver. Where did you get the the name String BTW?

in reply to:  2 ; comment:3 by arkadi, 17 years ago

Replying to unicoletti:

Dealing with multi-byte encoding is known to be tricky in mapserver, but Java mapscript has excellent support for it, see the example mapscript/java/examples/QueryByAttributeUnicode.java in the source distribution of mapserver. The trick is to set environment, database (both client and server!) and shape files all to the same encoding.

As you see on screenshots and from the content of the ticket report, every other place is right, except setName().

I would avoid this:

c.setName(new String(name.getBytes("UTF-8")));

That's why I used simply setName(name) in first place. The getBytes() hack is just to get some ideas what may be wrong with the implementation.

because it could confuse mapserver. Where did you get the the name String BTW?

The "name" is from database.

To offset possible problems with that, I just hardcoded a string with Russian and Latvian characters into the setName call

c.setName("qwerty - йцукенгш - ēūīāšģķļžčņ");

and compiled with javac -encoding utf-8. The resulted legend is attached (legend-constant-string.png).

by arkadi, 17 years ago

Attachment: legend-constant-string.png added

legend from hardcoded string

in reply to:  3 comment:4 by unicoletti, 17 years ago

Recap:

-labels work as expected -legend doesn't

I'll look into it asap.

comment:5 by unicoletti, 17 years ago

Arkady, did you specify an ENCODING for the LABEL section of the LEGEND section of your mapfile? IMO doing that should fix it.

comment:6 by arkadi, 17 years ago

I didn't have any LEGEND sections in MAP file originally. Now I added

LEGEND
      LABEL
	TYPE TRUETYPE
        FONT arial
        ENCODING CP1257
	SIZE 8
        ANTIALIAS TRUE
        POSITION auto
        PARTIALS FALSE
        BUFFER 0
        COLOR 0 0 200
	FORCE FALSE
      END
END

That make it work for Latvian characters. But Russian characters became "?". So one language at a time? Not that good. Setting UTF-8 produced garbage. Setting CP1251 still results ??? for Russian chars and Latvian chars replaced by some Russian chars (that probably have same code in 8-bit representation). See attached legend pictures. The server is on Windows platform and Latvian is set in regional settings.

This is kinda ugly, the data is there in Unicode, why we need to set encoding and such? I understand that there may be some internal Mapserver implementation details, but as far as Java developer is concerned, everything should just work automatically. For JNI there is GetStringUTFChars().

In case this issue is not resolvable, what the LEGEND label encoding should be set to? "Primary" 8-bit encoding for current Java locale (as set by -Duser.language/country, LANG/LC_CTYPE on Unix, or Regional settings on Windows)?

by arkadi, 17 years ago

Attachment: legend-cp1257.png added

by arkadi, 17 years ago

Attachment: legend-cp1251.png added

by arkadi, 17 years ago

Attachment: legend-utf8.png added

comment:7 by arkadi, 17 years ago

Same problems applies to classObj.getName() function. In MAP file I have:

  LAYER
    NAME dzelzcels_line_lv
    DATA dzelzcels_line_lv
    CLASS
      NAME "Dzelzceļš"
      STYLE
        MINSIZE 3
        MAXSIZE 3
      END 
    END 
  
    TYPE line
    STATUS ON
  END 

File encoding is UTF-8.

See attached getname.png. Legend (on the right) displays perfectly, but, it looks like getName() is assuming the file is read in locale charset, and it tries to convert class name into UCS-2 from single-byte encoding, effectively doubling number of characters and producing garbage. The code is:

       for (int i = 0; i < map.getNumlayers(); ++i) {
            layerObj l = map.getLayer(i);
            classObj c = l.getClass(0);
            if (c == null) continue;
            String name = c.getName();
            if (name == null) continue;
            //name = new String(name.getBytes(), "UTF-8");
            HashMap m = new HashMap();
            m.put("id", i);
            m.put("name", name);
            m.put("visible", (l.getStatus() == mapscriptConstants.MS_ON));
            a.add(m);
        }            

Uncommenting new String(get.Bytes()) hack produces getname-hack.png, which, IMO, confirms the above theory and fixes almost every character.

by arkadi, 17 years ago

Attachment: getname.png added

by arkadi, 17 years ago

Attachment: getname-hack.png added

comment:8 by arkadi, 17 years ago

Converting MAP file to CP1257 encoding and removing getBytes() hack allows getName to function correctly, but then, the legend image is wrong. See getname-cp1257.png.

by arkadi, 17 years ago

Attachment: getname-cp1257.png added

comment:9 by arkadi, 17 years ago

Resolution: invalid
Status: assignedclosed

Actually Mapserver behavior is correct. Setting everything to CP1257 - the native charset of Latvian Java locale, allows setName, getName, labels and legend to work correctly. Too bad it leaves us with only one language at time, until there is cross-platform UTF-8 enabled Java locale available.

Thank you for your time!

Note: See TracTickets for help on using tickets.