Opened 10 years ago
Last modified 5 years ago
#2525 new defect
Unable to open sqlite database if path contains non-latin letters
Reported by: | marisn | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 7.6.2 |
Component: | wxGUI | Version: | svn-releasebranch70 |
Keywords: | attribute | Cc: | |
CPU: | Unspecified | Platform: | MSWindows Vista |
Description
Seems that any operation touching attribute database fails if path contains non-latin letter. It is a single instance of a general problem of passing file names as arguments between GUI and modules. Output in CMD window:
GRASS_INFO_WARNING(5668,2): Unable open database <C:\Users\Māris\Documents\grass data\nc_basic_spm_grass7\PERMANENT\sqlite\sqlite.db> by driver <sqlite> GRASS_INFO_END(5668,2)
One of outputs in wxGUI command console:
Exception in thread Thread-26: Traceback (most recent call last): File "C:\Program Files\GRASS GIS 7.0.0svn\Python27\lib\threading.py", line 810, in __bootstrap_inner self.run() File "C:\Program Files\GRASS GIS 7.0.0svn\gui\wxpython\gui_core\forms.py", line 374, in run self.resultQ.put((requestId, self.request.run())) File "C:\Program Files\GRASS GIS 7.0.0svn\gui\wxpython\gui_core\forms.py", line 289, in run cparams[map]['dbInfo'] = gselect.VectorDBInfo(map) File "C:\Program Files\GRASS GIS 7.0.0svn\gui\wxpython\gui_core\gselect.py", line 743, in __init__ self._DescribeTables() # -> self.tables File "C:\Program Files\GRASS GIS 7.0.0svn\gui\wxpython\gui_core\gselect.py", line 770, in _DescribeTables database = self.layers[layer]["database"])['cols']: File "C:\Program Files\GRASS GIS 7.0.0svn\etc\python\grass\script\db.py", line 43, in db_describe s = read_command('db.describe', flags='c', table=table, **args) File "C:\Program Files\GRASS GIS 7.0.0svn\etc\python\grass\script\core.py", line 425, in read_command return handle_errors(returncode, stdout, args, kwargs) File "C:\Program Files\GRASS GIS 7.0.0svn\etc\python\grass\script\core.py", line 308, in handle_errors returncode=returncode) CalledModuleError: Module run None ['db.describe', '-c', 'table=census', 'driver=sqlite', 'database=C:\\Users\\M\xe2r is\\Documents\\grassdata\\nc_basic_spm_grass7\\PERMANENT\\sq lite\\sqlite.db'] ended with error Process ended with non-zero return code 1. See errors in the (error) output.
GRASS version: 7.0.0svn GRASS SVN Revision: 63925 Build Date: 2015-01-02 Build Platform: i686-pc-mingw32 GDAL/OGR: 1.11.1 PROJ.4: 4.8.0 GEOS: 3.4.2 SQLite: 3.7.17 Python: 2.7.4 wxPython: 2.8.12.1 Platform: Windows-Vista-6.0.6002-SP2
Note: could CommandLineToArgvW be helpful?
Change History (14)
comment:1 by , 10 years ago
follow-ups: 3 4 comment:2 by , 10 years ago
Replying to marisn:
Seems that any operation touching attribute database fails if path contains non-latin letter. It is a single instance of a general problem of passing file names as arguments between GUI and modules.
Do you have the same problem with filenames other than database files? E.g. do r.in.* or r.out.* work?
follow-up: 5 comment:3 by , 10 years ago
Replying to glynn:
Replying to marisn:
Seems that any operation touching attribute database fails if path contains non-latin letter. It is a single instance of a general problem of passing file names as arguments between GUI and modules.
Do you have the same problem with filenames other than database files? E.g. do r.in.* or r.out.* work?
r.out.gdal --verbose input=MRVBF4@user1 output=C:\wd\eudata\Māris\test.tif format=GTiff
Exception in thread Thread-278: Traceback (most recent call last): File "C:\OSGeo4W\apps\Python27\lib\threading.py", line 810, in __bootstrap_inner self.run() File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor e\gconsole.py", line 155, in run self.resultQ.put((requestId, self.requestCmd.run())) File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor e\gcmd.py", line 575, in run env = self.env) File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor e\gcmd.py", line 161, in __init__ args = map(EncodeString, args) File "C:\OSGeo4W\apps\grass\grass-7.1.svn\gui\wxpython\cor e\gcmd.py", line 92, in EncodeString return string.encode(_enc) File "C:\OSGeo4W\apps\Python27\lib\encodings\cp1252.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeEncodeError: 'charmap' codec can't encode character u'\u0101' in position 21: character maps to <undefined>
comment:4 by , 10 years ago
Replying to glynn:
Replying to marisn:
Seems that any operation touching attribute database fails if path contains non-latin letter. It is a single instance of a general problem of passing file names as arguments between GUI and modules.
Do you have the same problem with filenames other than database files? E.g. do r.in.* or r.out.* work?
it doesn't work with
r.in.gdal --verbose input=C:\wd\eudata\Māris\test.tif output=testfile
comment:5 by , 10 years ago
Replying to hellik:
File "C:\OSGeo4W\apps\Python27\lib\encodings\cp1252.py",
Again, that character doesn't exist in cp1252. This isn't specific to GRASS; any portable C code will have exactly the same problems. The only way that you can open that file is to use Windows-specific functions (e.g. _wfopen() or CreateFileW()). Even passing it as an argument requires using wmain() instead of main().
I'm only interested in whether it works in a locale which uses cp1257. The fact that it doesn't work with cp1252 is a "wontfix".
comment:6 by , 9 years ago
Keywords: | attribute added |
---|---|
Milestone: | 7.0.0 → 7.0.3 |
comment:9 by , 8 years ago
Milestone: | 7.0.4 → 7.0.5 |
---|
comment:10 by , 8 years ago
Milestone: | 7.0.5 → 7.0.6 |
---|
comment:11 by , 7 years ago
Milestone: | 7.0.6 → 7.0.7 |
---|
follow-up: 14 comment:13 by , 5 years ago
Milestone: | 7.0.7 → 7.6.2 |
---|
In 7.6, the state is actually worse, if the mapset contains non-ascii characters...
With mapset "tøst", GUI does not start up. Also maps with name tøst cannot be created:
v.edit map=test tool=create WARNING: Illegal filename <u_tesø>. Character <?> not allowed.
comment:14 by , 5 years ago
Replying to sbl:
In 7.6, the state is actually worse, if the mapset contains non-ascii characters...
Mapset names should not contain non-ascii chars. This issue was about cases when mapset name is OK, but path leading to it contains non-ascii chars.
Unfortunately I could not test the issue as GRASS failed to start on Windows at all due to #3837
Replying to marisn:
What we need is the reverse: something which reliably converts argv[] to a command string.
We actually have one of those (make_command_line() in lib/gis/spawn.c), and Python also has one (list2cmdline() in the subprocess module). The problem is that both of these only reverse the parsing which is done by the executable itself, not that done by the shell. The shell's parsing rules are even less well documented than those of the executable, and even less sane.
The other issue is that the shell uses two different encodings (codepages): "ANSI" and "OEM". Most of the time this doesn't matter; you can just pass the byte strings straight through. But there are cases (such as using the FOR command with backticks to take process output and use it as an argument) where this doesn't work, and any character which doesn't have the same codepoint in both encodings will cause problems (problems which can't realistically be solved).
As for filenames, the main issues are
Neither of these have any simple solution (not even unreliable "hacks"). The only effective solution is to use the Unicode (i.e. wchar_t*) API.
In practical terms, that would mean writing a compatibility layer which re-implements all of the standard ANSI C and POSIX filesystem calls, taking UTF-8 char* arguments, converting to UTF-16 wchar_t*, then using the Windows-specific wchar_t* functions. Anything which uses third-party library functions which take filenames as char* won't work.
We'd also need custom startup code which used main16(int argc, wchar_t argv) as the entry point, converted all arguments to UTF-8, then called main(). We'd still have issues with reading filenames from files, stdin, or output from child processes, as these would either have to be in UTF-8 or would need to be converted to UTF-8 (which means that we'd need to know the encoding).