Ticket #612 (closed defect: fixed)
XML files loaded by GeoNetwork may not be in UTF-8 charset and may fail to load
| Reported by: | simonp | Owned by: | geonetwork-devel@… |
|---|---|---|---|
| Priority: | major | Milestone: | v2.7.0 |
| Component: | General | Version: | |
| Keywords: | Cc: |
Description
Some XML files have not UTF-8 chars in them - usually WINDOWS-1252 because these are often pasted into metadata fields from Microsoft apps. The user doesn't realize that the XML file is then no longer UTF-8 and receives strange errors such as 'Error on line 118 of document file:/home/simon/bioreg-test/caab37020028.xml: Invalid byte 2 of 3-byte UTF-8 sequence' when they try to load the XML file into GeoNetwork.
GeoNetwork needs to detect the character set of file content and convert to UTF-8 before attempting to load. Can do this using charset detectors often used in browsers. There is a patch attached which adds the mozilla juniversalcharsetdetector to the loadFile method in the Jeeves utils/Xml.java class. A system property (jeeves.filecharsetdetectandconvert) must be set to enable charset detection and conversion (it is disabled by default or if missing).

