Opened 9 years ago

Last modified 4 years ago

#1193 new defect

Python Menu: Japanese (double byte character) in menu may cause parser error.

Reported by: naokiueda Owned by: grass-dev@…
Priority: major Milestone: 6.4.6
Component: Python Version: 6.4.0
Keywords: wingrass Cc:
CPU: Unspecified Platform: MSWindows 7

Description

In Japanese environment, r.reclass from menu GUI does not launched.

It is because Japanese character is double byte, and if first byte or second bite is same code as '<', '>', or maybe '\', it cause parser error.

One of same trouble has been solved by following, but it looks like other problem (caused by same mechanism) still remain.

GRASS in OSGEO4W, it is fix in File menuform.py rev.43275 Line 1280


# parse the interface decription self.grass_task = grassTask() handler = processTask(self.grass_task) enc = locale.getdefaultlocale()[1] if enc and enc.lower() not in ("utf8", "utf-8"):

xml.sax.parseString(getInterfaceDescription(cmd[0]).decode(enc).split('\n',1)[1].replace(, '<?xml version="1.0" encoding="utf-8"?>\n', 1).encode("utf-8"),

handler)

else:

xml.sax.parseString(getInterfaceDescription(cmd[0]),

handler)


Verion GRASS 6.4.0 (2010) Revision: 37101 Date: 2009-05-10 13:35:38 +0200 (So, 10 Mai 2009) (Fri Oct 08 17:30:18 2010) コマンド終了 (0 sec)


Error Log Traceback (most recent call last):

File "C:/GRASS__6401/etc/wxpython/wxgui.py", line 540, in

OnMenuCmd?

cmd = self.GetMenuCmd?(event)

File "C:/GRASS__6401/etc/wxpython/wxgui.py", line 527, in

GetMenuCmd?

input = menuform.GUI().GetCommandInputMapParamKey?(cmdlist[0])

File

"C:\GRASS6401\etc\wxpython\gui_modules\menuform.py", line 1944, in GetCommandInputMapParamKey?

xml.sax.parseString(getInterfaceDescription(cmd), handler)

File "C:\OSGeo4W\apps\Python25\lib\xml\sax\init.py",

line 49, in parseString

parser.parse(inpsrc)

File

"C:\OSGeo4W\apps\Python25\lib\xml\sax\expatreader.py", line 107, in parse

xmlreader.IncrementalParser?.parse(self, source)

File "C:\OSGeo4W\apps\Python25\lib\xml\sax\xmlreader.py",

line 123, in parse

self.feed(buffer)

File

"C:\OSGeo4W\apps\Python25\lib\xml\sax\expatreader.py", line 211, in feed

self._err_handler.fatalError(exc)

File "C:\OSGeo4W\apps\Python25\lib\xml\sax\handler.py",

line 38, in fatalError

raise exception xml.sax._exceptions . SAXParseException : <unknown>:1:30: unknown encoding


Also, when I download source file, it's encoding is ascii (Shift-JIS, for japanese) and not UTF-8. I think source file should be in UTF-8.

Change History (6)

comment:1 Changed 9 years ago by neteler

I have tried on Linux and I could launch r.reclass in Japanese without problems. Perhaps it is a Windows-only problem.

Wish: please submit your changes as patch as attachment to this ticket.

Also, when I download source file, it's encoding is ascii (Shift-JIS, for japanese) and not UTF-8. I think source file should be in UTF-8.

To which file do you refer?

comment:2 in reply to:  1 Changed 9 years ago by glynn

Replying to neteler:

I have tried on Linux and I could launch r.reclass in Japanese without problems. Perhaps it is a Windows-only problem.

AFAICT, it's a problem with Shift-JIS (cp932), which isn't compatible with ASCII. Unix systems use EUC-JP, which doesn't have this problem.

Shift-JIS is a multi-byte encoding. Non-ASCII characters have a first byte with the top bit set, but the second byte can be any value >= 64. While this excludes the digits and most of the punctuation characters, it includes [\]^_{|}~.

This makes it incompatible with any code which parses a stream of bytes without reference to the encoding, as e.g. '\' (0x5c) might be an ASCII '\' or it might be the second byte of a JISX0208 character; you can't tell without tracking the shift state.

Unfortunately, the only Japanese encoding which is supported by Windows' codepage-based API is Shift-JIS (actually, codepage 932, which is Shift-JIS plus the usual Microsoft-specific extensions). There is no UTF-8 codepage (cp 65001 is UTF-8, but it can't be used as a normal codepage).

I don't think that there's any solution to this, other than "don't use kanji (or hiragana or full-width katakana) in command lines". GRASS is stuck using the codepage-based API (unless someone wants to implement UTF-8 equivalents of all of the ANSI C and POSIX functions, and change all of GRASS to use them), and expecting every function which deals with char* to decode it according to the current locale isn't feasible.

comment:3 Changed 9 years ago by hellik

Keywords: wingrass added

comment:4 Changed 8 years ago by hamish

note here says it's fixed? Release/6.4.2RC2-News

  • "wxGUI: fixed for languages which use double-byte characters (Japanese etc)"

comment:5 in reply to:  4 Changed 7 years ago by neteler

Milestone: 6.4.16.4.3

Replying to hamish:

note here says it's fixed? Release/6.4.2RC2-News

  • "wxGUI: fixed for languages which use double-byte characters (Japanese etc)"

Is it fixed? This is not clear to me.

comment:6 Changed 4 years ago by neteler

Milestone: 6.4.36.4.6
Note: See TracTickets for help on using tickets.