Opened 13 years ago

Closed 13 years ago

#612 closed defect (fixed)

XML files loaded by GeoNetwork may not be in UTF-8 charset and may fail to load

Reported by: simonp Owned by: geonetwork-devel@…
Priority: major Milestone: v2.7.0
Component: General Version:
Keywords: Cc:

Description

Some XML files have not UTF-8 chars in them - usually WINDOWS-1252 because these are often pasted into metadata fields from Microsoft apps. The user doesn't realize that the XML file is then no longer UTF-8 and receives strange errors such as 'Error on line 118 of document file:/home/simon/bioreg-test/caab37020028.xml: Invalid byte 2 of 3-byte UTF-8 sequence' when they try to load the XML file into GeoNetwork.

GeoNetwork needs to detect the character set of file content and convert to UTF-8 before attempting to load. Can do this using charset detectors often used in browsers. There is a patch attached which adds the mozilla juniversalcharsetdetector to the loadFile method in the Jeeves utils/Xml.java class. A system property (jeeves.filecharsetdetectandconvert) must be set to enable charset detection and conversion (it is disabled by default or if missing).

Attachments (1)

charsetdetector.patch (8.1 KB ) - added by simonp 13 years ago.

Download all attachments as: .zip

Change History (2)

by simonp, 13 years ago

Attachment: charsetdetector.patch added

comment:1 by simonp, 13 years ago

Resolution: fixed
Status: newclosed

Committed in svn rev 8287

Note: See TracTickets for help on using tickets.