Opened 13 years ago
Closed 13 years ago
#3950 closed enhancement (fixed)
[PATCH]: Add CPLRecode() implementation based on iconv() function
Reported by: | dron | Owned by: | dron |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | default | Version: | unspecified |
Severity: | normal | Keywords: | iconv |
Cc: | warmerdam, jdenisgiguere |
Description
RPC 23: Unicode support in OGR suggests to introduce the extended CPLRecode() implementation which can do various conversions using iconv() function. There is a patch implementing this. I have added a separate module cpl_recode_iconv.cpp doing the conversion and frontend module cpl_recode.cpp which falls back to cpl_recode_stub.cpp if iconv() is not available at compile time.
What is NOT done in this patch:
- Unix configure stuff (pretty trivial);
- Windows configure stuff (not so trivial);
- Workarounds for iconv() differences on various platforms (could be done per further reports).
I am ready to proceed with applying the patch and implementing proper configuration if there will be no critics/objections.
Best regards, Andrey
Attachments (2)
Change History (10)
by , 13 years ago
Attachment: | cpl_recode.diff added |
---|
comment:1 by , 13 years ago
Andrey,
do you know if there is any performance impact on always using iconv rather than the stub methods for the few cases they already handle ? I mean utf8 <--> iso-8859-1 conversion is already coded : is there any negative (or positive !) performance impact on delegating it to iconv ? We could for example use iconv in the cases where we don't have a sensible implementation like in CPLRecode()
#ifdef HAVE_ICONV return CPLRecodeIconv(pszSource, pszSrcEncoding, pszDstEncoding); #else /* -------------------------------------------------------------------- */ /* Everything else is treated as a no-op with a warning. */ /* -------------------------------------------------------------------- */ { static int bHaveWarned = FALSE; if( !bHaveWarned ) { bHaveWarned = 1; CPLError( CE_Warning, CPLE_AppDefined, "Recode from %s to %s not supported, no change applied.", pszSrcEncoding, pszDstEncoding ); } return CPLStrdup(pszSource); } #endif
I somehow remember I started something like that in a working copy but I can't find it anymore.
As far as iconv difference, I remember that on some platforms, you must use libiconv_foo rather than iconv_foo.
comment:2 by , 13 years ago
Rouault,
That's an interesting question. There is indeed a performance penalty introduced by iconv() function (I am using the glibc's internal implementation now). I have created a simple benchmark (the source code is attached for reference) which reads some text file in memory and does the conversions on that string. I am using GCC manual as a test file which is large enough. There are results of multiple conversions:
$ gcc -g3 -O2 gdal_recode_perf.c -o gdal_recode_perf -lgdal $ ls -l /tmp/gcc-4.4.info -rw-r--r-- 1 dron dron 1938398 May 3 2010 /tmp/gcc-4.4.info $ ./gdal_recode_perf /tmp/gcc-4.4.info 10000 10000 iterations of CPLRecodeIconv() from UTF-8 to ISO-8859-1 took 119 seconds. 10000 iterations of CPLRecodeStub() from UTF-8 to ISO-8859-1 took 106 seconds. 10000 iterations of CPLRecodeIconv() from ISO-8859-1 to UTF-8 took 103 seconds. 10000 iterations of CPLRecodeStub() from ISO-8859-1 to UTF-8 took 84 seconds.
Subsequent runs produce the same results. So there is a ~10% penalty for UTF-8 to ISO-8859-1 conversion and about 20% for reverse operation. I can go the way you suggested: use existing implementation for appropriate conversions and iconv() for all other ones.
Best regards, Andrey
comment:3 by , 13 years ago
Cc: | added |
---|
follow-ups: 5 6 comment:4 by , 13 years ago
Andrey,
A few points on r21730:
- You've removed port/cpl_config_extras.h from AC_CONFIG_HEADER(port/cpl_config.h:port/cpl_config.h.in:port/cpl_config_extras.h) in configure.in. Apparently cpl_config_extras.h was added in r18144 for MacOSX support, so I think there's no reason to remove it.
- IMHO The new methods xxxIconv or xxxStub have no reason to be CPL_DLL exported. They should remain internal GDAL API
- I think (untested, only quick code review) that the implementation of CPLRecodeFromWCharIconv() is incorrect. You cannot safely cast the const wchar_t *pwszSource to a const char*. In particularly CPLRecodeIconv() calls strlen() on that pointer, so any character 0 will make it stop. The Stub() implementation does a "while( pwszSource[nSrcLen] != 0 ) nSrcLen++;" to compute the length of a wide char string.
Best regards,
Even
comment:5 by , 13 years ago
Roualt, thanks for the code review.
Replying to rouault:
- You've removed port/cpl_config_extras.h from AC_CONFIG_HEADER(port/cpl_config.h:port/cpl_config.h.in:port/cpl_config_extras.h) in configure.in. Apparently cpl_config_extras.h was added in r18144 for MacOSX support, so I think there's no reason to remove it.
Ah, actually it doesn't work with my autoheader. r21735 should restore its inclusion and to be slightly better solution.
- IMHO The new methods xxxIconv or xxxStub have no reason to be CPL_DLL exported. They should remain internal GDAL API
Agreed. Should be fixed with r21736.
More to come.
Best regards, Andrey
comment:6 by , 13 years ago
Even,
Replying to rouault:
- I think (untested, only quick code review) that the implementation of CPLRecodeFromWCharIconv() is incorrect. You cannot safely cast the const wchar_t *pwszSource to a const char*. In particularly CPLRecodeIconv() calls strlen() on that pointer, so any character 0 will make it stop. The Stub() implementation does a "while( pwszSource[nSrcLen] != 0 ) nSrcLen++;" to compute the length of a wide char string.
I think it should be fixed with r21738. We still need a proper test suite though.
Best regards, Andrey
comment:7 by , 13 years ago
comment:8 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I have added a simple test case with r21786. Consider this ticked to be closed for now
Patch adding the iconv()-based CPLRecode() implementation