Changes between Version 4 and Version 5 of rfc23_ogr_unicode


Ignore:
Timestamp:
Apr 25, 2008, 12:39:17 PM (16 years ago)
Author:
warmerdam
Comment:

replace CPLString stuff with a C API.

Legend:

Unmodified
Added
Removed
Modified
  • rfc23_ogr_unicode

    v4 v5  
    1313GDAL should be modified in a way to support three following main ideas:
    1414
    15  1. The CPLString class will be upgraded to support a variety of encoding conversions, including conversion between representations (ie. UTF-8 to UCS-16/wchar_t).
     15 1. C Functions will be provided to support a variety of encoding conversions, including conversion between representations (ie. UTF-8 to UCS-16/wchar_t).
    1616 2. Character encodings will be identified by iconv() style strings.
    1717 3. OFTString/OFTStringList feature attributes in OGR will be treated as being in UTF-8.
     
    1919This RFC specifically does not attempt to address issues of using non-ascii filenames.  It also does not attempt to make definitions about the encoding of other strings used in GDAL/OGR (such as field names, metadata, etc).  These would presumably be addressed in a later RFC building on this one.
    2020
    21 == CPLString ==
     21== CPLRecode API ==
    2222
    23 The CPLString class will now be assumed to be a potentially multi-byte
    24 encoded string, but with no effect within the CPLString to keep track of
    25 the encoding it is in.  This is left to the higher level code for now.
    26 
    27 However, the CPLString is extended with some convenient mechanisms for
    28 recoding, and for conversion of UTF-8 to/from "wchar_t" (aka UCS-2).
    29 
    30 It is stressed that normal initialization of a CPLString from "const char *"
    31 does not attempt to do any conversions to/from UTF-8.  This rule is kept,
    32 in part to minimize string processing costs for the common case.  When encoding
    33 is believed to be an issue the calling code must keep track.
     23The following three C callable functions will be introduced for recoding strings, and for converting between wchar_t (wide character) and char (multi-byte) formats:
    3424
    3525{{{
     26char *CPLRecode( const char *pszSource,
     27                 const char *pszSrcEncoding, const char *pszDstEncoding );
    3628
    37 // Convert the internal string to a new encoding.
    38 char* CPLString::recode( const char *pszSrcEncoding, const char *pszDstEncoding );
    39 
    40 // Set value based on input encoded string with CPLString set to UTF-8.
    41 // This is equivelent to normal setting, and then a recode() with a destination
    42 // encoding of "UTF-8" and thus is just for convenience.
    43 
    44 CPLString &CPLString::SetAsUTF8( const char *pszInput, const char *pszEncoding = "" );
    45 
    46 // Set value based on input encoded string with CPLString set to UTF-8.
    47 // This is equivelent to normal setting, and then a recode() with a destination
    48 // encoding of "UTF-8" and thus is just for convenience.
    49 
    50 CPLString &CPLString::SetAsUTF8( const wchar_t *pszInput, const char *pszEncoding = "UCS-2" );
    51 
    52 // Construct UTF-8 string object from array of wchar_t elements.
    53 CPLString::CPLString( const wchar_t*pszInput, const char *pszEncoding = "UCS-2" );
    54 
    55 // Returns a wchar_t string which becomes the ownership responsibility of
    56 // the caller (free with CPLFree()).  It is assumed the CPLString is UTF-8.
    57 wchar_t *CPLString::GetAsWChar( const char *pszDstEncoding = "UCS-2" );
     29char *CPLRecodeFromWChar( const wchar_t *pwszSource,
     30                          const char *pszSrcEncoding,
     31                          const char *pszDstEncoding );
     32wchar_t *CPLRecodeToWChar( const char *pszSource,
     33                           const char *pszSrcEncoding,
     34                           const char *pszDstEncoding );
    5835}}}
    5936
    60 I have specifically avoided additional constructors or casting operators to
    61 to avoid any possible overloading ambiguities or complication in maintaining
    62 extra state in the CPLString.  Such services can be added in the future based
    63 on the above methods if desired.
     37In each case the returned string is zero terminated, as is the input string, and the returned string should be deallocated with CPLFree().
     38In case of error the returned string will be NULL, and the function will issue a CPLError().  The functions will be marked with CPL_DLL and considered part of the public GDAL/OGR API for use of applications as well as internal use.
    6439
    6540== Encoding Names ==