Opened 11 years ago

Last modified 4 years ago

#745 new defect

r.report formatting doesn't seem to be multi-byte aware

Reported by: peifer Owned by: grass-dev@…
Priority: minor Milestone: 6.4.6
Component: Raster Version: 6.4.0 RCs
Keywords: Cc:
CPU: Unspecified Platform: Linux

Description

| 126|DEG02 - Gera, Kreisfreie Stadt             |
| 127|ES617 - Málaga                            |
| 128|DE915 - Göttingen                         |
| 129|DE413 - Märkisch-Oderland                 |
| 130|CH023 - Solothurn                          |
| 130|CH023 - Solothurn                          |
| 131|UKI22 - Outer London - South               |
| 132|UKD42 - Blackpool                          |
| 133|GR124 - Pella                              |
| 134|ITC31 - Imperia                            |
| 135|DK022 - Vest- og Sydsjælland              |
| 136|NL422 - Midden-Limburg                     |
| 137|RO423 - Hunedoara                          |
| 138|DEA1D - Rhein-Kreis Neuss                  |
| 139|CH013 - Genève                            |
| 140|BE256 - Arr. Roeselare                     |
| 141|FR625 - Lot                                |

Change History (5)

comment:1 Changed 10 years ago by neteler

A similar issue has been reported for gawk and mawk. The ticket includes suggestions

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=404980

comment:2 Changed 10 years ago by neteler

Milestone: 6.4.06.4.1

comment:3 in reply to:  description ; Changed 10 years ago by glynn

Replying to peifer:

The same issue applies to anything which attempts to format output containing user-supplied text (hard-coded text should all be ASCII).

The main problem with fixing this is that the necessary functions aren't available on all systems, so we will need either wrapper functions or a lot of #ifdef's.

An outline approach is to convert the string from multi-byte to wide with mbstowcs() or mbsrtowcs(), then either use wcslen() to find the number of characters in the wide string, or use wcswidth() to find the width (in columns) of the wide string.

wcswidth() correctly handles the "full-width" characters found in CJK locales, which occupy two columns. However, wcswidth() is POSIX while wcslen() is C99. None of the necessary functions are in C89.

Also, this rules out relying upon printf() etc for formatting, as printf's width specifiers are in "char"s (i.e. bytes), not characters or columns.

BTW, regarding gawk/mawk: as the ticket notes, POSIX explicitly states that awk matches the behaviour of printf, i.e. field widths are in bytes, not characters/columns.

comment:4 in reply to:  3 Changed 10 years ago by peifer

Replying to glynn:

Thanks for the explanations, the situation seems to be complex. It might not be worth investing to much time in this in order to get a nicer "pritty print" functionality.

BTW, regarding gawk/mawk: as the ticket notes, POSIX explicitly states that awk matches the behaviour of printf, i.e. field widths are in bytes, not characters/columns.

Gawk maintainer Arnold Robbins changed gawk's printf behaviour after I pointed him to the issue, some 2 years ago. (Needless to say that he knows that POSIX specifies it differently).

# bash built-in printf
[peifer:~]> printf "%-12s|\n" "dôležité"
dôležité |

# The other printf
[peifer:~]> /usr/bin/printf "%-12s|\n" "dôležité"
dôležité |

# Gawk 3.1.7 and higher
[peifer:~]> gawk 'BEGIN{ printf "%-12s|\n", "dôležité" }'
dôležité    |

comment:5 Changed 4 years ago by neteler

Milestone: 6.4.16.4.6
Note: See TracTickets for help on using tickets.