Version 3 (modified by 13 years ago) ( diff ) | ,
---|
Loading XML files that are not UTF-8
Date | 2011/10/07 |
Contact(s) | Simon Pigot |
Last edited | 2011/10/07 |
Status | complete - ready to commit |
Assigned to release | 2.7.x |
Resources | Available |
Ticket # | #612 |
Overview
XML files that contain characters from sets other than UTF-8 will not load. This happens often as users will paste content from MS documents into XML files containing WINDOWS-1252 characters making the content WINDOWS-1252 rather than UTF-8. This content should be converted to UTF-8 where possible and more importantly to make sure that it doesn't create issues in the rest of the processing stream which almost always assumes UTF-8.
Proposal Type
- Type: Core Change
- App: GeoNetwork
- Module: Jeeves
Voting History
- Vote proposed by Simon on 2011/10/07, result was +/-n (m non-voting members).
Motivations
GeoNetwork should be able to load and convert XML files that contain characters from character sets other than UTF-8. For example, loading a file with characters from the WINDOWS-1252 charset causes batch import to fail with a message like:
Proposal
jeeves.utils.Xml - loadFile method needs to be modified to read the file as a stream of bytes, detect the character set and convert to UTF-8 as required. By default this character set detection capability is enabled by setting the java system property jeeves.filecharsetdetectandconvert.
Backwards Compatibility Issues
None because character set detection and conversion can be disabled on startup by setting the java system property jeeves.filecharsetdetectandconvert to disabled. eg. export JAVA_OPTS="-Djeeves.filecharsetdetectandconvert=disabled" if using tomcat or by editing bin/start-geonetwork.sh for jetty.
New libraries added
juniversalchardet - character set detection jar
Risks
None known.
Participants
- Simon Pigot
Attachments (2)
- failed-batch-import.png (19.0 KB ) - added by 13 years ago.
- metadata-with-converted-utf8.png (19.8 KB ) - added by 13 years ago.
Download all attachments as: .zip