Opened 15 years ago
Last modified 9 years ago
#745 new defect
r.report formatting doesn't seem to be multi-byte aware
Reported by: | peifer | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | 6.4.6 |
Component: | Raster | Version: | 6.4.0 RCs |
Keywords: | Cc: | ||
CPU: | Unspecified | Platform: | Linux |
Description
| 126|DEG02 - Gera, Kreisfreie Stadt | | 127|ES617 - Málaga | | 128|DE915 - Göttingen | | 129|DE413 - Märkisch-Oderland | | 130|CH023 - Solothurn | | 130|CH023 - Solothurn | | 131|UKI22 - Outer London - South | | 132|UKD42 - Blackpool | | 133|GR124 - Pella | | 134|ITC31 - Imperia | | 135|DK022 - Vest- og Sydsjælland | | 136|NL422 - Midden-Limburg | | 137|RO423 - Hunedoara | | 138|DEA1D - Rhein-Kreis Neuss | | 139|CH013 - Genève | | 140|BE256 - Arr. Roeselare | | 141|FR625 - Lot |
Change History (5)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Milestone: | 6.4.0 → 6.4.1 |
---|
follow-up: 4 comment:3 by , 14 years ago
Replying to peifer:
The same issue applies to anything which attempts to format output containing user-supplied text (hard-coded text should all be ASCII).
The main problem with fixing this is that the necessary functions aren't available on all systems, so we will need either wrapper functions or a lot of #ifdef's.
An outline approach is to convert the string from multi-byte to wide with mbstowcs() or mbsrtowcs(), then either use wcslen() to find the number of characters in the wide string, or use wcswidth() to find the width (in columns) of the wide string.
wcswidth() correctly handles the "full-width" characters found in CJK locales, which occupy two columns. However, wcswidth() is POSIX while wcslen() is C99. None of the necessary functions are in C89.
Also, this rules out relying upon printf() etc for formatting, as printf's width specifiers are in "char"s (i.e. bytes), not characters or columns.
BTW, regarding gawk/mawk: as the ticket notes, POSIX explicitly states that awk matches the behaviour of printf, i.e. field widths are in bytes, not characters/columns.
comment:4 by , 14 years ago
Replying to glynn:
Thanks for the explanations, the situation seems to be complex. It might not be worth investing to much time in this in order to get a nicer "pritty print" functionality.
BTW, regarding gawk/mawk: as the ticket notes, POSIX explicitly states that awk matches the behaviour of printf, i.e. field widths are in bytes, not characters/columns.
Gawk maintainer Arnold Robbins changed gawk's printf behaviour after I pointed him to the issue, some 2 years ago. (Needless to say that he knows that POSIX specifies it differently).
# bash built-in printf [peifer:~]> printf "%-12s|\n" "dôležité" dôležité | # The other printf [peifer:~]> /usr/bin/printf "%-12s|\n" "dôležité" dôležité | # Gawk 3.1.7 and higher [peifer:~]> gawk 'BEGIN{ printf "%-12s|\n", "dôležité" }' dôležité |
comment:5 by , 9 years ago
Milestone: | 6.4.1 → 6.4.6 |
---|
A similar issue has been reported for gawk and mawk. The ticket includes suggestions
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=404980