Opened 6 years ago

Closed 5 years ago

#2087 closed defect (fixed)

grass64 man page: missing words

Reported by: hamish Owned by: grass-dev@…
Priority: critical Milestone: 6.4.4
Component: Docs Version: 6.4.3
Keywords: g.html2man Cc:
CPU: All Platform: Linux

Description

Hi,

the current build of GRASS 6.x is losing important information in the man page due to a g.html2man error.

the source file is lib/init/grass6.html

the html looks like:

<h2>SYNOPSIS</h2>

<b>grass64</b> [<b>-</b>] [<b>-v</b>] [<b>-h | -help | --help</b>]
    [<b>-text | -gui | -tcltk | -oldtcltk | -wxpython | -wx]</b>]
    [[[<b>&lt;GISDBASE&gt;/</b>]<b>&lt;LOCATION_NAME&gt;/</b>]
        <b>&lt;MAPSET&gt;</b>]

the resulting man page looks like:

.SH SYNOPSIS
\fBgrass64\fR [\fB-\fR] [\fB-v\fR] [\fB-h | -help | --help\fR]
[\fB-text | -gui | -tcltk | -oldtcltk | -wxpython | -wx]\fR]
[[[\fB/\fR]\fB/\fR]
\fB\fR]

i.e. html:

SYNOPSIS

grass64 [-] [-v] [-h | -help | --help] [-text | -gui | -tcltk | -oldtcltk | -wxpython | -wx]] [[[<GISDBASE>/]<LOCATION_NAME>/] <MAPSET>] 

and man:

SYNOPSIS
       grass65  [-]  [-v] [-h | -help | --help] [-text | -gui | -tcltk |
       -oldtcltk | -wxpython | -wx]] [[[/]/] ]

... the [[<GISDBASE>/]<LOCATION_NAME>/] <MAPSET>] part has lost its words even though &gt;, &lt; were used and not something which could be mistaken for a <html tag>. Is the DoEscape subroutine converting '&gt;' to '<' before any unknown html tags are thrown away? If so it should be moved to after that; see lines 136 and 141:

https://trac.osgeo.org/grass/browser/grass/branches/develbranch_6/tools/g.html2man/g.html2man#L110

?

thanks, Hamish

Attachments (1)

g.html2man.diff (350 bytes) - added by mlennert 6 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 6 years ago by neteler

Also in GRASS 7, the final part which is in HTML

... -wxpython | -wx]] [[[<GISDBASE>/]<LOCATION_NAME>/] 

becomes in MAN:

... -wxpython | -wx]] [[[/]/] ]

comment:2 in reply to:  1 Changed 6 years ago by mlennert

Replying to neteler:

Also in GRASS 7, the final part which is in HTML

... -wxpython | -wx]] [[[<GISDBASE>/]<LOCATION_NAME>/] 

becomes in MAN:

... -wxpython | -wx]] [[[/]/] ]

I cannot confirm this. With a freshly checked out and compiled grass_trunk, I get:

       grass71  [-h  |  -help | --help] [-v | --version] [-c | -c geofile | -c
       EPSG:code] [-text |  -gtext  |  -gui]  [[[<GISDBASE>/]<LOCATION_NAME>/]
       <MAPSET>]

I can confirm it for grass64_release, though:

       grass64  [-] [-v] [-h | -help | --help] [-text | -gui | -tcltk | -oldt‐
       cltk | -wxpython | -wx]] [[[/]/] ]

Moritz

comment:3 Changed 6 years ago by mlennert

The problem seems to be in the function DoLine?, lines 136ff:

  &DoEscape($_);
  &DoPara($_);
  if (! $preformat) {
    if (m/^$/) {return 0};
    s#^[ \t]*##;
    s#<[^>]*>##g;

DoEscape? is called first, which replaces the &lt; and &gt; by the respective symbols, and then, in the last line of DoLine?, these symbols and everything between them is replace by an empty string. Commenting out the last line, i.e. s#<[>]*>##g;, solves the problem for grass6.html, but I don't know what other effects this has.

Moritz

comment:4 in reply to:  3 ; Changed 6 years ago by mlennert

Replying to mlennert:

The problem seems to be in the function DoLine?, lines 136ff:

  &DoEscape($_);
  &DoPara($_);
  if (! $preformat) {
    if (m/^$/) {return 0};
    s#^[ \t]*##;
    s#<[^>]*>##g;

DoEscape? is called first, which replaces the &lt; and &gt; by the respective symbols, and then, in the last line of DoLine?, these symbols and everything between them is replace by an empty string. Commenting out the last line solves the problem for grass6.html, but I don't know what other effects this has.

It leaves in a series of HTML tags. So the art will be to erase all these tags, without erasing the <> around the variable names. This said, do we really need those ?

Moritz

Changed 6 years ago by mlennert

Attachment: g.html2man.diff added

comment:5 Changed 6 years ago by mlennert

I've attached a very quick and dirty hack that solves this specific issue for me. I don't find it particularly elegant, though. Maybe someone with more perl/regex foo can find a better solution.

Moritz

comment:6 in reply to:  4 Changed 6 years ago by wenzeslaus

Replying to mlennert:

without erasing the <> around the variable names. This said, do we really need those ?

I would say no. We are still following man pages formatting in HTML and I don't think that < and > are part of it. For example, this is my man grep:

grep [OPTIONS] PATTERN [FILE...]
grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]

By the way, I'm still not sure if parsing whole HTML pages is a good idea. If I would start from scratch I would probably use module's HTML stub and XML interface description because XML is easier to parse than HTML tag soup (but since we already have the parsing and Makefiles are also designed for parsing whole HTML it is probably not worth trying).

Even more by the way, Moritz, you can mark and escape inline code here using backticks `#forexample` or even {{{ and }}} should work inline. New Trac has automatic preview, so we have hope (http://trac.edgewall.org/ticket/8855 and http://trac.edgewall.org/ticket/8721).

comment:7 in reply to:  5 ; Changed 6 years ago by glynn

Replying to mlennert:

Maybe someone with more perl/regex foo can find a better solution.

Does using g.html2man.py from GRASS 7 qualify?

The main drawbacks are that it makes Python a build-time dependency (but eliminates the Perl dependency), and may require some clean-up of the HTML files (the Python version will fail hard on invalid HTML).

comment:8 in reply to:  7 ; Changed 6 years ago by wenzeslaus

Replying to glynn:

Replying to mlennert:

Maybe someone with more perl/regex foo can find a better solution.

Does using g.html2man.py from GRASS 7 qualify?

The main drawbacks are that it makes Python a build-time dependency (but eliminates the Perl dependency), and may require some clean-up of the HTML files (the Python version will fail hard on invalid HTML).

I would just remove the problematic < and > and leave GRASS 6 (core) without Python (build) dependency. (We have two versions of GRASS, let's keep them different from each other.)

comment:9 in reply to:  8 ; Changed 6 years ago by mlennert

Replying to wenzeslaus:

Replying to glynn:

Replying to mlennert:

Maybe someone with more perl/regex foo can find a better solution.

Does using g.html2man.py from GRASS 7 qualify?

The main drawbacks are that it makes Python a build-time dependency (but eliminates the Perl dependency), and may require some clean-up of the HTML files (the Python version will fail hard on invalid HTML).

I would just remove the problematic < and > and leave GRASS 6 (core) without Python (build) dependency.

As there were no objections to this, I took the liberty to just erase these symbols from the file. The resulting html page and man file appear easily readable to me and I don't think that this issue warrants changing g.html2man.

Leaving this ticket open for now in case anyone objects now or in case someone sees the same problem in another man page.

Moritz

comment:10 in reply to:  9 Changed 6 years ago by mlennert

Replying to mlennert:

Replying to wenzeslaus:

Replying to glynn:

Replying to mlennert:

Maybe someone with more perl/regex foo can find a better solution.

Does using g.html2man.py from GRASS 7 qualify?

The main drawbacks are that it makes Python a build-time dependency (but eliminates the Perl dependency), and may require some clean-up of the HTML files (the Python version will fail hard on invalid HTML).

I would just remove the problematic < and > and leave GRASS 6 (core) without Python (build) dependency.

As there were no objections to this, I took the liberty to just erase these symbols from the file.

Forgot to mention: r60237 for develbranch6 and r60238 for releasebranch_6_4.

comment:11 Changed 5 years ago by mlennert

Resolution: fixed
Status: newclosed

Closing as no one has objected to the solution.

Moritz

Note: See TracTickets for help on using tickets.