Opened 13 years ago

Closed 13 years ago

Last modified 12 years ago

#963 closed defect (fixed)

Stop downloading DTDs

Reported by: robe Owned by: chodgson
Priority: medium Milestone: PostGIS 2.0.0
Component: management Version: master
Keywords: Cc:

Description

I don't know what's going on but I can only guess that the DTDs are being downloaded each time the docs build and worst yet something is really screwed up in w3c mathml.

We are getting errors that don't make sense. I get the same errors on my local xsltproc calls too and I tried doing:

xsltproc --novalid --nonet

and it still seems to try to validate thought the —nonet causes it to throw a:

http://www.w3.org/Math/DTD/mathml2/mathml2.dtd:2096: warning: failed to load external entity "http://www.w3.org/Math/DTD/mathml2/iso8879/isobox.ent"
%ent-isobox;

Does anyone know a way to have it just ignore the DTDs so the build passes and is not subject to the whims of the net and w3c?

Change History (22)

comment:1 by mcayland, 13 years ago

Sure - install the schema files locally. I'm not sure where the correct location is on Windows, but installing the Maths DTD package on Debian solved that problem for me.

comment:2 by robe, 13 years ago

I'm more bothered about our build server. On windows it's not too big of a deal because the files I need build anyway (they just throw warnings) and I don't really need them for building comments files (which is all I have the patience for :) ).

comment:3 by colivier, 13 years ago

What about including DTD in SVN, with a related catalog file ?

comment:4 by robe, 13 years ago

I'd be okay with that as long as it doesn't take up a ton of space.

comment:5 by chodgson, 13 years ago

Alternatively I can look at installing the DTDs on the server, it will probably have to wait until next week though.

comment:6 by chodgson, 13 years ago

Hmm I guess this has the potential to introduce more problems if there are inconsistencies or different versions… I imagine those DTDs should be pretty stable and solid though. Otherwise I think installing on the server is the best option, rather than cluttering our SVN with support files for the documentation build. My recent code reviewing begs the question of what is the licensing on those files too… stuff we don't even have to worry about if we don't include them.

comment:7 by robe, 13 years ago

Good point. Our tar ball is big enough anyway. If it's not too much trouble it would be better to install it on the server. Like I said it's really Hudson giving false fails that is annoying because then you don't know if it's w3c or a real failure, not to mention when it happens you can't see your doc updates on the site right away.

comment:8 by mcayland, 13 years ago

We definitely do not want to be including the DTDs as part of the tarball.

The way to think about it is that xsltproc checks a local cache for the DTD first, and only then if it can't find it does it go and download from the origin.

I don't know what OS the build host is running, but on Debian it looks as if it's just a case of "aptitude install docbook-mathml". There will definitely be similar packages around for Fedora/CentOS and other OS too. Installing the relevant packages on the build host is definitely the right fix here.

comment:9 by chodgson, 13 years ago

ugh this is more painful than I thought, I can't seem to find mathml and docbook-mathml packages for centos 5.5 (too old …). There aren't that many actual files to install (a few .dtd and a few .ent), the complicated part looks to be the setup of the catalog so that the tools can find the files. Anyone know of a better way to do this, or familiar with the xml catalog stuff? I'm sure I can figure it out with a bit more time, just thought I'd ask.

comment:10 by colivier, 13 years ago

A working example of a catalog i used:

<?xml version="1.0"?>
<catalog  xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">  

<group prefer="system" />  

<public 
  publicId="-//OASIS//DTD DocBook XML V4.2//EN"  
  uri="docbook/4.2/docbookx.dtd"/>
<system
  systemId="http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"  
  uri="docbook/4.2/docbookx.dtd"/>


<public 
  publicId="-//OASIS//DTD DocBook XML V4.3//EN"  
  uri="docbook/4.3/docbookx.dtd"/>
<system
  systemId="http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"  
  uri="docbook/4.3/docbookx.dtd"/>


<public 
  publicId="-//OASIS//DTD DocBook XML V4.4//EN"  
  uri="docbook/4.4/docbookx.dtd"/>
<system
  systemId="http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"  
  uri="docbook/4.4/docbookx.dtd"/>


<public
  publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
  uri="xhtml/xhtml1-strict.dtd"/>
<system 
  systemId="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
  uri="xhtml/xhtml1-strict.dtd"/>

</catalog>

NOTA: uri here mean relative path on file system.

HTH,

comment:11 by chodgson, 13 years ago

Ok, I actually figured this out. The part that was worrying me was that the catalog seemed to be using a hierarchical structure, and every different distribution seemed to put their files in different places and use different ways to map from the main catalog (usually at /etc/xml/catalog) to the various files in places like /usr/share/sgml/ and /usr/share/xml/. The trick is that there is actually a little program, "xmlcatalog" which is used by the packages to "install themselves" into the catalog. So I seem to have been successful at installing more recent RPMS of the mathml and docbook-mathml packages onto the buildbox. At least, their sub-catalogs show up in the main catalog. If anyone can confirm that we're actually using them properly and not downloading stuff that would be great.

comment:12 by robe, 13 years ago

Chris,

I think we are still downloading stuff, though not mathml anymore so perhaps you fixed that. I just did a commit that had no documentation changes and it failed on documentation build. PostGIS-trunk-build #2340 (r7281)

Thru this complaint:

make[1]: Entering directory `<http://office.refractions.net:1500/job/PostGIS-trunk-build/ws/build/doc/html/image_src'>
make[1]: Nothing to be done for `images'.
make[1]: Leaving directory `<http://office.refractions.net:1500/job/PostGIS-trunk-build/ws/build/doc/html/image_src'>
Build the listings...
http://www.oasis-open.org/docbook/xml/4.5/dbpoolx.mod:1: parser error : Content error in the external subset
HTTP/1.1 200 OK
^
http://www.oasis-open.org/docbook/xml/4.5/dbpoolx.mod:1: parser error : Content error in the external subset
HTTP/1.1 200 OK
   ^
unable to parse <http://office.refractions.net:1500/job/PostGIS-trunk-build/ws/build/doc/postgis-out.xml>
Error: xsltproc failed
make: *** [postgis-2.0.0SVN.pdf] Error 1

comment:13 by robe, 13 years ago

correction my last commit that failed is r7282

comment:14 by chodgson, 13 years ago

Do we actually need DocBook 4.5? We have 4.4 installed locally… this means we're fetching all of the core docbook dtds everytime… I can see about installing the 4.5 dtds too but if we don't actually need anything from 4.5 we can fix it real easy in 2 places in postgis.xml…

comment:15 by robe, 13 years ago

You know I'm not sure. Don't know the difference between the two. I recall Olivier had made some changes a while back and one of those might have been upgrading the docbook, so he might have a better idea.

in reply to:  14 comment:16 by mcayland, 13 years ago

Replying to chodgson:

Do we actually need DocBook 4.5? We have 4.4 installed locally… this means we're fetching all of the core docbook dtds everytime… I can see about installing the 4.5 dtds too but if we don't actually need anything from 4.5 we can fix it real easy in 2 places in postgis.xml…

Ah that will be it then - since it can't find 4.5 locally, xsltproc will always have to go and download it. Good work Chris.

In fact SVN blame points at a change Kevin made where he actually mentions forcing 4.5 in the commit message: http://trac.osgeo.org/postgis/changeset/6165. Does that mean if we revert the DTD sections then it will just use the currently installed version?

comment:17 by robe, 13 years ago

Chris,

I guess that means we need 4.5 then because putting in the mml namespaces everywhere we use MathML is a real eyesore.

comment:18 by chodgson, 13 years ago

Ok, I tried manually installing the 4.5 dtds, I couldn't find an appropriate package. Added it to the catalog. Let me know if it worked.

comment:19 by mcayland, 13 years ago

Although I don't have the URL handy, the easiest way is to check the Hudson output for the build time. IIRC from my own laptop the build time for the documentation dropped from around 10 mins to 1 min when I installed everything locally.

comment:20 by chodgson, 13 years ago

Status: newassigned

I worked through some more xml catalog fun (it's actually not that bad once you dive in) and ran a few test builds of the docs. It looks like it's now doing the xsltproc build very quickly for trunk. I'll wait a bit and check that this is showing up in hudson too before I close this.

Note the because the xmlcatalog is based on the publicId and systemId URI strings, there is all kinds of room to break this (and just as much flexibility to fix it). While those string are supposedly "standard" there are all kinds of hacks included in the packaged xmlcatalog files to handle misuse and common synonyms. My fix just handles the specific URIs that are presently in use in the docs, and I have a feeling that someone will break this at some point in the future by copy/pasting some docbook xml from somewhere else. The good news is that I now might have a clue of how to fix it when it happens.

comment:21 by chodgson, 13 years ago

Resolution: fixed
Status: assignedclosed

The build times appear to be down by as much as 4 minutes, I think this is a fair representation of the time reduction from not doing the downloads. Re-open this or post a new ticket if we experience any further random failures that seem caused by xml processing and/or DTD downloads.

comment:22 by strk, 12 years ago

See #1624 for a general handling of this

Note: See TracTickets for help on using tickets.