Ticket #612 (closed defect: fixed)

Opened 20 months ago

Last modified 19 months ago

XML files loaded by GeoNetwork may not be in UTF-8 charset and may fail to load

Reported by: simonp Owned by: geonetwork-devel@…
Priority: major Milestone: v2.7.0
Component: General Version:
Keywords: Cc:

Description

Some XML files have not UTF-8 chars in them - usually WINDOWS-1252 because these are often pasted into metadata fields from Microsoft apps. The user doesn't realize that the XML file is then no longer UTF-8 and receives strange errors such as 'Error on line 118 of document file:/home/simon/bioreg-test/caab37020028.xml: Invalid byte 2 of 3-byte UTF-8 sequence' when they try to load the XML file into GeoNetwork.

GeoNetwork needs to detect the character set of file content and convert to UTF-8 before attempting to load. Can do this using charset detectors often used in browsers. There is a patch attached which adds the mozilla juniversalcharsetdetector to the loadFile method in the Jeeves utils/Xml.java class. A system property (jeeves.filecharsetdetectandconvert) must be set to enable charset detection and conversion (it is disabled by default or if missing).

Attachments

charsetdetector.patch Download (8.1 KB) - added by simonp 20 months ago.

Change History

Changed 20 months ago by simonp

Changed 19 months ago by simonp

  • status changed from new to closed
  • resolution set to fixed

Committed in svn rev 8287

Note: See TracTickets for help on using tickets.