Opened 13 years ago
Closed 12 years ago
#896 closed defect (fixed)
sphinx doc build is broken because of BOM
Reported by: | fgdrf | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | OSGeoLive | Keywords: | 6.0 |
Cc: |
Description
Error in daily build log:
Sphinx error: Unable to decode input data. Tried the following encodings: 'UTF-8'. (UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 1: invalid start byte) make: *** [sphinxbuild] Error 1
Error is because of Byte order Mark (BOM). I assume its coming from a windows based edit. More Details at http://en.wikipedia.org/wiki/UTF-8#Byte_order_mark
The issue came from sponsor.rst edits in Revison 7725
Change History (10)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
the Byte Order Mark has been added and removed from the .csv lists of contributers for a while now.
I haven't really been sure if they should be there or not so only did a quick edit just before the last release to stop the table creation from breaking.
It's easy enough to open with vi and delete the first two chars in the file if needed.. Converting UTF back to ISO-8859-1 isn't too bad either:
iconv -f UTF-8 -t ISO_8859-1 utf_file > iso_file
Qs:
- Should the BOM be there or not?
- What files (if any) should be saved in UTF-8, and why? (ISO will not handle non-Western multibytes, but that doesn't necessitate that the English/Western pages also be in UTF)
this is out of my area of expertise, but the constant "last committer wins" back and forth of text file variants is as we see here causing problems.
any tips from the multi-lingual trenches?
thanks, hHamish
comment:3 by , 13 years ago
First of all, the sphinx doc build is still broken. But nevertheless I agree whit you to harmonize all the file encodings.
Sphinx assumes, if not configured at conf.py that all files are UTF-8 sources (http://sphinx.pocoo.org/rest.html#source-encoding) , means both documentation (rst) and included files (e.g csv). The configuration value isn't set currently (source-encoding
). So we should follow the defaults and should not mix between languages, nevertheless is necessitate or not.
Got the build back working with the following steps (reverenced csv files were not correct, not the printed "sponsors.rst") :
#1 perl -CD -pe 'tr/\x{feff}//d' contributors.csv > xx;mv xx contributors.rst
#1 edit the csv file with vi and typed set fileencoding=utf-8
, saved and closed afterwards
We should let this ticket still open because of mixed encodings (UTF-8, UTF-8 CRLF, "shell archive or script for antique kernel text", "ASCII text", "PARIX object not stripped", etc):
To analyze the docs do execute in terminal:
for f in `find . -name "*.rst"` ; do file $f ; done | grep -v target
comment:4 by , 13 years ago
note all text documents are (ie should be) set with the svn "eol-style=native" svnprop, so CRLF newlines will be automatically dealt with for everyone in a seamless way.
can your perl line be adapted to work with sed?
Hamish
comment:5 by , 12 years ago
Keywords: | 6.0 added |
---|---|
Priority: | major → critical |
documents are still not building on the disk after 6.0beta1
comment:6 by , 12 years ago
Priority: | critical → blocker |
---|
Hi, Is there any progress on this issue? I am marking this as a show stopper, since we cannot release without docs :)
comment:7 by , 12 years ago
Priority: | blocker → minor |
---|
This error with docs seems to have been fixed (for the moment) by someone before me. I can't reproduce the problem.
However the problem might be reintroduced by a manual edit with the wrong format. Until we work out an automated script to fix the issue, I'll leave this issue open, priority=minor.
comment:8 by , 12 years ago
FWIW I'm preprocessing the files before committing, to avoid this kind of problems.
The script is here: https://gist.github.com/2864567
It's using this sed regexp to get rid of BOM:
# Remove BOM sed -i '1 s/\xef\xbb\xbf' $DOC
comment:10 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I think this is now fixed. Thanks.
Just a hint, use Notepad++ on Windows OS and set the encoding to UTF-8 (no BOM) and voila, everything should work.