Opened 16 years ago

Closed 16 years ago

#2601 closed enhancement (fixed)

MapServer rendering speed improvement - SwapWord() function replacement.

Reported by: nnikolov Owned by: sdlime
Priority: normal Milestone: 5.2 release
Component: MapServer C Library Version: 5.0
Severity: normal Keywords: SwapWord
Cc: warmerdam, pramsey, dmorissette

Description (last modified by dmorissette)

Please could you consider making the following improvement to Mapserver. With the changes described, we get a reduction of about 19% when running a test that renders 700 maps of varying locations and zooms.

Please bear in mind that we have not tried this on platforms other than the one described, so it may need review to check for platform independence (see detail below).

After using a profiling tool to analyze the performance of different functions within MapServer code, SwapWord() function was identified as a potential improvement target.

After researching the Big/Little Endian issue, macros, recommended by Intel, were identified. Our Mapserver is deployed on Dell desktop PC with Intel Dual-core (64bit) processor, running Fedora Linux OS.

The proposal is to replace the SwapWord() function in mapshape.c and maptree.c files with one of the following macros:

#define SWAP_TWO_BYTES(data) \
( ((data >> 8) & 0x00FF) | ((data << 8) & 0xFF00) )

#define SWAP_FOUR_BYTES(data) \
( ((data >> 24) & 0x000000FF) | ((data >> 8) & 0x0000FF00) | \
((data << 8) & 0x00FF0000) | ((data << 24) & 0xFF000000) )

#define SWAP_EIGHT_BYTES(data) \
( ((data >> 56) & 0x00000000000000FF) | ((data >> 40) & 0x000000000000FF00) | \
((data >> 24) & 0x0000000000FF0000) | ((data >> 8) & 0x00000000FF000000) | \
((data << 8) & 0x000000FF00000000) | ((data << 24) & 0x0000FF0000000000) | \
((data << 40) & 0x00FF000000000000) | ((data << 56) & 0xFF00000000000000) )

The SWAP_TWO_BYTES is actually never used.

The usage of SWAP_EIGHT_BYTE macro however, poses a challenge, due to the fact that it requires the correct type for “data”, i.e. 64 bit data type. To overcome this issue, a temporary pointer of type “long” (long *temp;) was introduced to allow correct type casting. A potential problem is the size of “long” on different machines, which is not guaranteed to be 64 bits on the older machines. I have not found yet a resolution for this issue.

An additional improvement, independent of the use of SwapWord() function or the macros, is moving the “Swap needed” checking outside several “for” loops in the above mentioned files.

Attachments (3)

mapshape.c (92.8 KB ) - added by nnikolov 16 years ago.
maptree.c (26.6 KB ) - added by nnikolov 16 years ago.
swapword.h (979 bytes ) - added by nnikolov 16 years ago.

Download all attachments as: .zip

Change History (19)

by nnikolov, 16 years ago

Attachment: mapshape.c added

by nnikolov, 16 years ago

Attachment: maptree.c added

by nnikolov, 16 years ago

Attachment: swapword.h added

comment:1 by sdlime, 16 years ago

Cc: warmerdam pramsey dmorissette added
Milestone: 5.2 release

Cool, we love performance improvement patches! CC'ing a few folks more familiar with low-level issues like this than I for comment. Setting target to 5.2 release.

Steve

comment:2 by pramsey, 16 years ago

Cute. I like.

Note that with the performance patch already on trunk (#2282), the gains you're going to see from this are going to be smaller. SwapBytes was called a great deal because we were loading every entry in SHX (which is full of big-endian values needing swapping on Intel hardware) on every draw. With the performance patch we are now calling it a lot less. This will still help, but it won't be a 17% gain anymore.

comment:3 by warmerdam, 16 years ago

I would suggest preparing diffs since Paul has just done some major surgery in the same area.

We cannot easily depend on having a 64bit integer type on all platforms, and if we do what it is will vary (long, long long, int64, etc). I would suggest first addressing 32bit byte swapping.

I can test on a bigendian (PPC MacOS X) system once the patches have been applied.

comment:4 by pramsey, 16 years ago

I'm with Frank, most of the swapping required in Intel is 32-bit integer swapping, so just leave the 64 bit double swapping alone for now and you'll still see a gain on your platform. Re-name the macros to indicate that they only work on integer types, unlike the SwapBytes function. (eg SWAP_INT16, SWAP_INT32).

comment:5 by dmorissette, 16 years ago

Description: modified (diff)

comment:6 by sdlime, 16 years ago

Status: newassigned

Ok, so if we can get diff against the development tree then this would be a good addition to the 5.2 release. nnikolov, can do?

Steve

comment:7 by sdlime, 16 years ago

Just checking to see if a diff against the dev tree is a possiblity in the next couple of weeks. Will gladly apply it.

Steve

comment:8 by pramsey, 16 years ago

I've applied SWAP_FOUR_BYTES to r7599. I don't think swap two ever gets used, and as discussed, SWAP_EIGHT is still problematic. I've only applied it in mapshape.c, since the performance of shptree isn't really at issue.

comment:9 by sdlime, 16 years ago

Seems like we can close then? Or leave the rest for a future date?

Steve

comment:10 by pramsey, 16 years ago

Resolution: fixed
Status: assignedclosed

Patch for four-bytes + existing speed improvements in 5.2 should render the remaining gains very small.

comment:11 by nnikolov, 16 years ago

Resolution: fixed
Status: closedreopened

I did some additional work and testing on the issue implementing it on CentOS 32 bit machine (The old implementation was on Fedora 64 bit). The problem was the definition of 64 bit pointer and its use together with SWAP_EIGHT_BYTE macro.

The solution consists of: First, the corresponding temporary pointer needs to be defined as “long long” so to become pointer to 64-bit data. Secondly, the SWAP_EIGHT_BYTE macro needs to be slightly changed to make the compiler to recognise the 64-bit constants in the macro:

#define SWAP_EIGHT_BYTES(data) \ ( ((data >> 56) & 0x00000000000000FFULL) | ((data >> 40) & 0x000000000000FF00ULL) | \ ((data >> 24) & 0x0000000000FF0000ULL) | ((data >> 8) & 0x00000000FF000000ULL) | \ ((data << 8) & 0x000000FF00000000ULL) | ((data << 24) & 0x0000FF0000000000ULL) | \ ((data << 40) & 0x00FF000000000000ULL) | ((data << 56) & 0xFF00000000000000ULL) )

It works fine and it should be good for both 64 and 32-bit machines and allow the use of the SWAP_EIGHT_BYTE macro as well.

comment:12 by sdlime, 16 years ago

Any comments? Seems like something necessary to apply before Wed.

Steve

comment:13 by warmerdam, 16 years ago

As I understand it, this is an additional optimization beyond what we did incorporate, not a fix for what is already in there.

I absolutely would not support making this change at this date if it is just an additional optimization. I think there is a high risk that this sort of "long long" handling will not work on all platforms. I'm essentially sure of it. So if we want to experiment with it, it should be in 5.4, not 5.2.

comment:14 by pramsey, 16 years ago

Agreed, 5.4 for testing. I'm not convinced this will be platform independent until we see for sure. Additionally, it'll only make a different for big-endian platforms, which aren't exactly a huge proportion of the server population these days.

comment:15 by dmorissette, 16 years ago

Since we agree on pushing it to 5.4, I'd suggest we move that enhancement to a new bug since the current bug was implemented and closed in 5.2

comment:16 by sdlime, 16 years ago

Resolution: fixed
Status: reopenedclosed

Thanks for the comments. I've created a new ticket (#2688) and am closing this one...

Steve

Note: See TracTickets for help on using tickets.