Opened 10 years ago

Last modified 5 years ago

#2525 new defect

Unable to open sqlite database if path contains non-latin letters

Reported by: marisn Owned by: grass-dev@…
Priority: major Milestone: 7.6.2
Component: wxGUI Version: svn-releasebranch70
Keywords: attribute Cc:
CPU: Unspecified Platform: MSWindows Vista

Description

Seems that any operation touching attribute database fails if path contains non-latin letter. It is a single instance of a general problem of passing file names as arguments between GUI and modules. Output in CMD window:

GRASS_INFO_WARNING(5668,2): Unable open database <C:\Users\Māris\Documents\grass data\nc_basic_spm_grass7\PERMANENT\sqlite\sqlite.db> by driver <sqlite>
GRASS_INFO_END(5668,2)

One of outputs in wxGUI command console:

Exception in thread Thread-26:
Traceback (most recent call last):
  File "C:\Program Files\GRASS GIS
7.0.0svn\Python27\lib\threading.py", line 810, in
__bootstrap_inner
    self.run()
  File "C:\Program Files\GRASS GIS
7.0.0svn\gui\wxpython\gui_core\forms.py", line 374, in run
    self.resultQ.put((requestId, self.request.run()))
  File "C:\Program Files\GRASS GIS
7.0.0svn\gui\wxpython\gui_core\forms.py", line 289, in run
    cparams[map]['dbInfo'] = gselect.VectorDBInfo(map)
  File "C:\Program Files\GRASS GIS
7.0.0svn\gui\wxpython\gui_core\gselect.py", line 743, in
__init__
    self._DescribeTables() # -> self.tables
  File "C:\Program Files\GRASS GIS
7.0.0svn\gui\wxpython\gui_core\gselect.py", line 770, in
_DescribeTables
    database = self.layers[layer]["database"])['cols']:
  File "C:\Program Files\GRASS GIS
7.0.0svn\etc\python\grass\script\db.py", line 43, in
db_describe
    s = read_command('db.describe', flags='c', table=table,
**args)
  File "C:\Program Files\GRASS GIS
7.0.0svn\etc\python\grass\script\core.py", line 425, in
read_command
    return handle_errors(returncode, stdout, args, kwargs)
  File "C:\Program Files\GRASS GIS
7.0.0svn\etc\python\grass\script\core.py", line 308, in
handle_errors
    returncode=returncode)
CalledModuleError: Module run None ['db.describe', '-c',
'table=census', 'driver=sqlite', 'database=C:\\Users\\M\xe2r
is\\Documents\\grassdata\\nc_basic_spm_grass7\\PERMANENT\\sq
lite\\sqlite.db'] ended with error
Process ended with non-zero return code 1. See errors in the
(error) output.

GRASS version: 7.0.0svn GRASS SVN Revision: 63925 Build Date: 2015-01-02 Build Platform: i686-pc-mingw32 GDAL/OGR: 1.11.1 PROJ.4: 4.8.0 GEOS: 3.4.2 SQLite: 3.7.17 Python: 2.7.4 wxPython: 2.8.12.1 Platform: Windows-Vista-6.0.6002-SP2

Note: could CommandLineToArgvW be helpful?

Change History (14)

in reply to:  description comment:1 by glynn, 10 years ago

Replying to marisn:

Note: could CommandLineToArgvW be helpful?

What we need is the reverse: something which reliably converts argv[] to a command string.

We actually have one of those (make_command_line() in lib/gis/spawn.c), and Python also has one (list2cmdline() in the subprocess module). The problem is that both of these only reverse the parsing which is done by the executable itself, not that done by the shell. The shell's parsing rules are even less well documented than those of the executable, and even less sane.

The other issue is that the shell uses two different encodings (codepages): "ANSI" and "OEM". Most of the time this doesn't matter; you can just pass the byte strings straight through. But there are cases (such as using the FOR command with backticks to take process output and use it as an argument) where this doesn't work, and any character which doesn't have the same codepoint in both encodings will cause problems (problems which can't realistically be solved).

As for filenames, the main issues are

  1. If you use byte strings (i.e. char*) (e.g. fopen()), you can't access any file whose name isn't representable in the current codepage. Those files effectively don't exist in the char* world.
  1. The only supported encoding for Japanese is Shift-JIS (cp932), which has the unfortunate feature of not being entirely compatible with ASCII. Specifically, 0x5c is used both for the directory separator (normally backslash, but actually prints as a yen (¥) sign in Japanese locales) and as the second byte of some multi-byte sequences. Meaning that any code which tries to parse filenames as byte strings with 0x5c as a directory separator will often fail on Japanese filenames.

Neither of these have any simple solution (not even unreliable "hacks"). The only effective solution is to use the Unicode (i.e. wchar_t*) API.

In practical terms, that would mean writing a compatibility layer which re-implements all of the standard ANSI C and POSIX filesystem calls, taking UTF-8 char* arguments, converting to UTF-16 wchar_t*, then using the Windows-specific wchar_t* functions. Anything which uses third-party library functions which take filenames as char* won't work.

We'd also need custom startup code which used main16(int argc, wchar_t argv) as the entry point, converted all arguments to UTF-8, then called main(). We'd still have issues with reading filenames from files, stdin, or output from child processes, as these would either have to be in UTF-8 or would need to be converted to UTF-8 (which means that we'd need to know the encoding).

in reply to:  description ; comment:2 by glynn, 10 years ago

Replying to marisn:

Seems that any operation touching attribute database fails if path contains non-latin letter. It is a single instance of a general problem of passing file names as arguments between GUI and modules.

Do you have the same problem with filenames other than database files? E.g. do r.in.* or r.out.* work?

in reply to:  2 ; comment:3 by hellik, 10 years ago

Replying to glynn:

Replying to marisn:

Seems that any operation touching attribute database fails if path contains non-latin letter. It is a single instance of a general problem of passing file names as arguments between GUI and modules.

Do you have the same problem with filenames other than database files? E.g. do r.in.* or r.out.* work?

r.out.gdal --verbose input=MRVBF4@user1 output=C:\wd\eudata\Māris\test.tif format=GTiff
Exception in thread Thread-278:
Traceback (most recent call last):
  File "C:\OSGeo4W\apps\Python27\lib\threading.py", line
810, in __bootstrap_inner
    self.run()
  File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor
e\gconsole.py", line 155, in run
    self.resultQ.put((requestId, self.requestCmd.run()))
  File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor
e\gcmd.py", line 575, in run
    env = self.env)
  File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor
e\gcmd.py", line 161, in __init__
    args = map(EncodeString, args)
  File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor
e\gcmd.py", line 92, in EncodeString
    return string.encode(_enc)
  File "C:\OSGeo4W\apps\Python27\lib\encodings\cp1252.py",
line 12, in encode
    return
codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character
u'\u0101' in position 21: character maps to <undefined>

in reply to:  2 comment:4 by hellik, 10 years ago

Replying to glynn:

Replying to marisn:

Seems that any operation touching attribute database fails if path contains non-latin letter. It is a single instance of a general problem of passing file names as arguments between GUI and modules.

Do you have the same problem with filenames other than database files? E.g. do r.in.* or r.out.* work?

it doesn't work with

r.in.gdal --verbose input=C:\wd\eudata\Māris\test.tif output=testfile

in reply to:  3 comment:5 by glynn, 10 years ago

Replying to hellik:

  File "C:\OSGeo4W\apps\Python27\lib\encodings\cp1252.py",

Again, that character doesn't exist in cp1252. This isn't specific to GRASS; any portable C code will have exactly the same problems. The only way that you can open that file is to use Windows-specific functions (e.g. _wfopen() or CreateFileW()). Even passing it as an argument requires using wmain() instead of main().

I'm only interested in whether it works in a locale which uses cp1257. The fact that it doesn't work with cp1252 is a "wontfix".

comment:6 by neteler, 9 years ago

Keywords: attribute added
Milestone: 7.0.07.0.3

comment:7 by neteler, 9 years ago

Milestone: 7.0.3

Ticket retargeted after milestone closed

comment:8 by neteler, 9 years ago

Milestone: 7.0.4

Ticket retargeted after 7.0.3 milestone closed

comment:9 by martinl, 8 years ago

Milestone: 7.0.47.0.5

comment:10 by neteler, 8 years ago

Milestone: 7.0.57.0.6

comment:11 by neteler, 7 years ago

Milestone: 7.0.67.0.7

comment:12 by martinl, 5 years ago

What is status of this ticket?

comment:13 by sbl, 5 years ago

Milestone: 7.0.77.6.2

In 7.6, the state is actually worse, if the mapset contains non-ascii characters...

With mapset "tøst", GUI does not start up. Also maps with name tøst cannot be created:

v.edit map=test tool=create
WARNING: Illegal filename <u_tesø>. Character <?> not allowed.

in reply to:  13 comment:14 by marisn, 5 years ago

Replying to sbl:

In 7.6, the state is actually worse, if the mapset contains non-ascii characters...

Mapset names should not contain non-ascii chars. This issue was about cases when mapset name is OK, but path leading to it contains non-ascii chars.

Unfortunately I could not test the issue as GRASS failed to start on Windows at all due to #3837

Note: See TracTickets for help on using tickets.