Opened 13 years ago
Closed 13 years ago
#612 closed defect (fixed)
XML files loaded by GeoNetwork may not be in UTF-8 charset and may fail to load
Reported by: | simonp | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | v2.7.0 |
Component: | General | Version: | |
Keywords: | Cc: |
Description
Some XML files have not UTF-8 chars in them - usually WINDOWS-1252 because these are often pasted into metadata fields from Microsoft apps. The user doesn't realize that the XML file is then no longer UTF-8 and receives strange errors such as 'Error on line 118 of document file:/home/simon/bioreg-test/caab37020028.xml: Invalid byte 2 of 3-byte UTF-8 sequence' when they try to load the XML file into GeoNetwork.
GeoNetwork needs to detect the character set of file content and convert to UTF-8 before attempting to load. Can do this using charset detectors often used in browsers. There is a patch attached which adds the mozilla juniversalcharsetdetector to the loadFile method in the Jeeves utils/Xml.java class. A system property (jeeves.filecharsetdetectandconvert) must be set to enable charset detection and conversion (it is disabled by default or if missing).
Attachments (1)
Change History (2)
by , 13 years ago
Attachment: | charsetdetector.patch added |
---|
comment:1 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Committed in svn rev 8287