Opened 16 years ago
Closed 16 years ago
#2601 closed enhancement (fixed)
MapServer rendering speed improvement - SwapWord() function replacement.
Reported by: | nnikolov | Owned by: | sdlime |
---|---|---|---|
Priority: | normal | Milestone: | 5.2 release |
Component: | MapServer C Library | Version: | 5.0 |
Severity: | normal | Keywords: | SwapWord |
Cc: | warmerdam, pramsey, dmorissette |
Description (last modified by )
Please could you consider making the following improvement to Mapserver. With the changes described, we get a reduction of about 19% when running a test that renders 700 maps of varying locations and zooms.
Please bear in mind that we have not tried this on platforms other than the one described, so it may need review to check for platform independence (see detail below).
After using a profiling tool to analyze the performance of different functions within MapServer code, SwapWord() function was identified as a potential improvement target.
After researching the Big/Little Endian issue, macros, recommended by Intel, were identified. Our Mapserver is deployed on Dell desktop PC with Intel Dual-core (64bit) processor, running Fedora Linux OS.
The proposal is to replace the SwapWord() function in mapshape.c and maptree.c files with one of the following macros:
#define SWAP_TWO_BYTES(data) \ ( ((data >> 8) & 0x00FF) | ((data << 8) & 0xFF00) ) #define SWAP_FOUR_BYTES(data) \ ( ((data >> 24) & 0x000000FF) | ((data >> 8) & 0x0000FF00) | \ ((data << 8) & 0x00FF0000) | ((data << 24) & 0xFF000000) ) #define SWAP_EIGHT_BYTES(data) \ ( ((data >> 56) & 0x00000000000000FF) | ((data >> 40) & 0x000000000000FF00) | \ ((data >> 24) & 0x0000000000FF0000) | ((data >> 8) & 0x00000000FF000000) | \ ((data << 8) & 0x000000FF00000000) | ((data << 24) & 0x0000FF0000000000) | \ ((data << 40) & 0x00FF000000000000) | ((data << 56) & 0xFF00000000000000) )
The SWAP_TWO_BYTES is actually never used.
The usage of SWAP_EIGHT_BYTE macro however, poses a challenge, due to the fact that it requires the correct type for “data”, i.e. 64 bit data type. To overcome this issue, a temporary pointer of type “long” (long *temp;) was introduced to allow correct type casting. A potential problem is the size of “long” on different machines, which is not guaranteed to be 64 bits on the older machines. I have not found yet a resolution for this issue.
An additional improvement, independent of the use of SwapWord() function or the macros, is moving the “Swap needed” checking outside several “for” loops in the above mentioned files.
Attachments (3)
Change History (19)
by , 16 years ago
Attachment: | mapshape.c added |
---|
by , 16 years ago
by , 16 years ago
Attachment: | swapword.h added |
---|
comment:1 by , 16 years ago
Cc: | added |
---|---|
Milestone: | → 5.2 release |
comment:2 by , 16 years ago
Cute. I like.
Note that with the performance patch already on trunk (#2282), the gains you're going to see from this are going to be smaller. SwapBytes was called a great deal because we were loading every entry in SHX (which is full of big-endian values needing swapping on Intel hardware) on every draw. With the performance patch we are now calling it a lot less. This will still help, but it won't be a 17% gain anymore.
comment:3 by , 16 years ago
I would suggest preparing diffs since Paul has just done some major surgery in the same area.
We cannot easily depend on having a 64bit integer type on all platforms, and if we do what it is will vary (long, long long, int64, etc). I would suggest first addressing 32bit byte swapping.
I can test on a bigendian (PPC MacOS X) system once the patches have been applied.
comment:4 by , 16 years ago
I'm with Frank, most of the swapping required in Intel is 32-bit integer swapping, so just leave the 64 bit double swapping alone for now and you'll still see a gain on your platform. Re-name the macros to indicate that they only work on integer types, unlike the SwapBytes function. (eg SWAP_INT16, SWAP_INT32).
comment:5 by , 16 years ago
Description: | modified (diff) |
---|
comment:6 by , 16 years ago
Status: | new → assigned |
---|
Ok, so if we can get diff against the development tree then this would be a good addition to the 5.2 release. nnikolov, can do?
Steve
comment:7 by , 16 years ago
Just checking to see if a diff against the dev tree is a possiblity in the next couple of weeks. Will gladly apply it.
Steve
comment:8 by , 16 years ago
I've applied SWAP_FOUR_BYTES to r7599. I don't think swap two ever gets used, and as discussed, SWAP_EIGHT is still problematic. I've only applied it in mapshape.c, since the performance of shptree isn't really at issue.
comment:9 by , 16 years ago
Seems like we can close then? Or leave the rest for a future date?
Steve
comment:10 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Patch for four-bytes + existing speed improvements in 5.2 should render the remaining gains very small.
comment:11 by , 16 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I did some additional work and testing on the issue implementing it on CentOS 32 bit machine (The old implementation was on Fedora 64 bit). The problem was the definition of 64 bit pointer and its use together with SWAP_EIGHT_BYTE macro.
The solution consists of: First, the corresponding temporary pointer needs to be defined as “long long” so to become pointer to 64-bit data. Secondly, the SWAP_EIGHT_BYTE macro needs to be slightly changed to make the compiler to recognise the 64-bit constants in the macro:
#define SWAP_EIGHT_BYTES(data) \ ( ((data >> 56) & 0x00000000000000FFULL) | ((data >> 40) & 0x000000000000FF00ULL) | \ ((data >> 24) & 0x0000000000FF0000ULL) | ((data >> 8) & 0x00000000FF000000ULL) | \ ((data << 8) & 0x000000FF00000000ULL) | ((data << 24) & 0x0000FF0000000000ULL) | \ ((data << 40) & 0x00FF000000000000ULL) | ((data << 56) & 0xFF00000000000000ULL) )
It works fine and it should be good for both 64 and 32-bit machines and allow the use of the SWAP_EIGHT_BYTE macro as well.
comment:12 by , 16 years ago
Any comments? Seems like something necessary to apply before Wed.
Steve
comment:13 by , 16 years ago
As I understand it, this is an additional optimization beyond what we did incorporate, not a fix for what is already in there.
I absolutely would not support making this change at this date if it is just an additional optimization. I think there is a high risk that this sort of "long long" handling will not work on all platforms. I'm essentially sure of it. So if we want to experiment with it, it should be in 5.4, not 5.2.
comment:14 by , 16 years ago
Agreed, 5.4 for testing. I'm not convinced this will be platform independent until we see for sure. Additionally, it'll only make a different for big-endian platforms, which aren't exactly a huge proportion of the server population these days.
comment:15 by , 16 years ago
Since we agree on pushing it to 5.4, I'd suggest we move that enhancement to a new bug since the current bug was implemented and closed in 5.2
comment:16 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Thanks for the comments. I've created a new ticket (#2688) and am closing this one...
Steve
Cool, we love performance improvement patches! CC'ing a few folks more familiar with low-level issues like this than I for comment. Setting target to 5.2 release.
Steve