Changes between Version 2 and Version 3 of FdoTextReaderEnhancements


Ignore:
Timestamp:
Oct 18, 2007, 2:08:00 PM (17 years ago)
Author:
gregboone
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FdoTextReaderEnhancements

    v2 v3  
    150150}}}
    151151
    152 
    153 
     152==== !FdoIoCachedStream ====
     153
     154A new stream class will be added:
     155
     156
     157{{{
     158class FdoIoCachedStream : public FdoIoStream
     159{
     160public:
     161
     162        FDO_API_COMMON static FdoIoCachedStream* Create(
     163                FdoIoStream* baseStream,
     164                FdoInt32 bufferSize=4096
     165        );
     166
     167        FDO_API_COMMON FdoIoStream* GetStream();
     168
     169        FDO_API_COMMON void Flush();
     170};
     171
     172typedef FdoPtr<FdoIoCachedStream> FdoIoCachedStreamP;
     173}}}
     174
     175
     176''Parameters'':
     177
     178baseSteam: underlying stream where data will ultimately be read from or written to.
     179
     180bufferSize: size in bytes of data cache. The cache is used to provide better read/write performance
     181
     182''Description'':
     183
     184!GetStream() returns the underlying stream.
     185
     186Flush()  if the cache contains data to write, this function writes it to the underlying stream.
     187
     188This class provides buffered reads and writes from and to an FdoIStream by adding an in-memory cache.
     189
     190See '''Design Discussion''' below for the rationale behind this class.
     191
     192==== !FdoIoCachedFileStream  ====
     193
     194The following class will be added for convenience:
     195
     196
     197{{{
     198class FdoIoCachedFileStream : public FdoIoCachedStream
     199{
     200public:
     201
     202        FDO_API_COMMON static FdoIoCachedFileStream* Create(
     203                FdoString* fileName,
     204                FdoString* accessModes,
     205                FdoInt32 bufferSize=4096
     206        );
     207
     208        FDO_API_COMMON static FdoIoCachedFileStream* Create(
     209                FILE* fp,
     210                FdoInt32 bufferSize=4096
     211        );
     212};
     213
     214typedef FdoPtr<FdoIoCachedFileStream> FdoIoCachedFileStreamP;
     215}}}
     216
     217The parameters are identical to those of !FdoIoFileStream and !FdoIoCachedStream. The behaviour is also identical except that reads and writes are buffered.
     218
     219!FdoIoCachedFileStream serves as a convenience class for wrapping an !FdoIoCachedStream around an !FdoIoFileStream to provide buffered access to a file. The following:
     220
     221{{{
     222FdoIoCachedFileStreamP cfs = FdoIoCachedFileStream::Create(
     223        L"myfile.txt",
     224        L"r"
     225);
     226}}}
     227
     228is equivalent to:
     229
     230{{{
     231FdoIoFileStreamP fs = FdoIoFileStream::Create(
     232        L"myfile.txt",
     233        L"r"
     234);
     235
     236FdoIoCachedStreamP cfs = FdoIoCachedStream::Create( fs );
     237}}}
     238
     239It might seem odd that the Create( FILE*, !FdoInt32) function would be needed since a FILE already provides buffered access. However, !FdoIoFileStream actually performs reads and writes through the underlying device for the FILE so the buffered access provided by FILE gets bypassed. An alternative approach would have been to leverage off the buffering provided by FILE. However, once !FdoIoCachedStream is in place, it can be used to implement buffered I/O for other types of devices.
     240
     241==== !FdoIoTextReader and !FdoIoTextWriter ====
     242
     243The following functions will have a slight behavioural change:
     244
     245{{{
     246FdoIoTextReader::Create( FdoString* fileName )
     247FdoIoTextWriter::Create( FdoString* fileName )
     248}}}
     249
     250These functions used to automatically create an underlying stream of type !FdoIoFileStream. They will change to create the stream as an !FdoIoCachedFileStream. This will provide performance benefits.
     251
     252=== Design Discussion ===
     253
     254The Read functions, being added to !FdoIoTextReader, do not know in advance how much data they will read. Therefore, these functions must do one of two things:
     255
     256• read the data one byte at a time. [[br]]
     257• read fixed sized chunks of data until the end of the string to read is reached.[[br]]
     258
     259For some devices (e.g. files) the second option provides better performance. However, it is much more complicated to implement since the end of the string to read can be in the middle of the current chunk. Therefore, the current read position must be reset from the end of the chunk to the middle. This cannot be done by changing the position on the underlying stream since not all streams support rewinding (some only support forward-only reading). Therefore the text reader would have to cache the remainder of the current chunk, to be read when the next string to read is requested.
     260
     261The first option is much simpler but could be slow when reading from a file (the !FdoIoFileReader class does not provide any buffering on read). Therefore, the second option is preferred.
     262
     263Rather than complicate !FdoIoTextReader with the managing of a read buffer, this complication can be pushed down to a new class, called !FdoIoCachedStream. An added benefit is that doing the caching at the stream level allows buffered writing to file to be supported, providing performance benefits for streamed writing as well.
     264
     265When the first read request is made to !FdoIoCachedStream, it will read enough data from its base stream to fill its cache. For this read and subsequent reads, the data will be retrieved from the cache. When the end of the cache is reached, it will be flushed and filled again from the base stream.
     266
     267When the first write request is made, the data is added to the cache. Subsequent writes will append to the cache. When the cache is full, it will be flushed by writing its contents to the base stream.
     268
     269The actual algorithm for buffered reading and writing would be more complicated than described above, especially when mixed reads and writes are peformed. However, this document concentrates on the API, so the exact algorithm is beyond the scope of the document. To the caller, !FdoIoCacheStream will behave as if the reads and writes had been done directly against the base stream, except that they will be faster.
     270
     271Of the current streaming implementations, the file stream will benefit the most from caching. Caching would provide no benefit to the memory stream. For convenience, an !FdoIoCachedFileStream is provided. It allows a file stream, with cached stream wrapper, to be created in one step.
     272
     273=== More Examples ===
     274
     275The following reads each line of text from "myfile.txt"
     276
     277{{{
     278FdoIoTextReaderP rdr = FdoIoTextReader( L"myfile.txt );
     279FdoStringP line;
     280
     281while ( rdr->ReadLine(line) ) {
     282        printf( "%ls\n", (FdoString*) line );
     283}
     284}}}
     285
     286The following does the same thing, but with a 50,000 byte read buffer:
     287
     288{{{
     289FdoIoCachedFileStreamP stream = FdoIoCachedFileStream::Create(
     290        L"myfile.txt",
     291        L"rt",
     292        50000
     293);
     294
     295FdoIoTextReaderP rdr = FdoIoTextReader( stream );
     296FdoStringP line;
     297
     298while ( rdr->ReadLine(line) ) {
     299        printf( "%ls\n", (FdoString*) line );
     300}
     301}}}
     302
     303=== Performance Stats ===
     304
     305The addition of the !FdoIoCachedStream class will help performance when reading or writing files. This will also translate into performance improvements for writing XML files. The reading of XML files will not be affected, since this is delegated to Xerces, which already does buffered reads.
     306
     307The improvements on writing were verified by doing some tests against the !FdoIoTextWriter class. This class was used to write a 10Mb file on local disk, by multiple calls to FdoIoTextWriter::Write(). A number of tests were done, where each passed a different number of characters to each call to Write(). The following table shows the results:
     308
     309'''||# characters per call''''''||# calls to Write()'''||'''time (secs)'''||'''char/sec'''||
     310||1||10000000||45||22222||
     311||10||1000000||5.07||1972387||
     312||20||500000||2.64||3787878||
     313||100||100000||0.69||14492754||
     314||4000||2500||0.28||35714286||
     315||8000||1250||0.34||35714286||
     316||50000||200||0.34||29411765||
     317
     318