|Version 2 (modified by simonp, 21 months ago)|
Loading XML files that are not UTF-8
|Status||complete - ready to commit|
|Assigned to release||2.7.x|
XML files that contain characters from sets other than UTF-8 will not load. This happens often as users will paste content from MS documents into XML files containing WINDOWS-1252 characters making the content WINDOWS-1252 rather than UTF-8. This content should be converted to UTF-8 where possible and more importantly to make sure that it doesn't create issues in the rest of the processing stream which almost always assumes UTF-8.
- Type: Core Change
- App: GeoNetwork
- Module: Jeeves
- Vote proposed by Simon on 2011/10/07, result was +/-n (m non-voting members).
GeoNetwork should be able to load and convert XML files that contain characters from character sets other than UTF-8. For example, loading a file with characters from the WINDOWS-1252 charset causes batch import to fail with a message like:
jeeves.utils.Xml - loadFile method needs to be modified to read the file as a stream of bytes, detect the character set and convert to UTF-8 as required. By default this character set detection capability is enabled by setting the java system property jeeves.filecharsetdetectandconvert.
Backwards Compatibility Issues
None because character set detection and conversion can be disabled on startup by setting the java system property jeeves.filecharsetdetectandconvert to disabled. eg. export JAVA_OPTS="-Djeeves.filecharsetdetectandconvert=disabled" if using tomcat or by editing bin/start-geonetwork.sh for jetty.
New libraries added
juniversalchardet - character set detection jar
- Simon Pigot