Opened 7 years ago
Closed 5 years ago
#6790 closed defect (wontfix)
CPL_IS_DOUBLE_A_INT cleanup needed
Reported by: | Kurt Schwehr | Owned by: | Even Rouault |
---|---|---|---|
Priority: | normal | Milestone: | closed_because_of_github_migration |
Component: | default | Version: | unspecified |
Severity: | normal | Keywords: | asan UndefinedBehavior |
Cc: |
Description
CPL_IS_DOUBLE_A_INT is a bit wonky as added in r29955 with undefined behavior waiting to cause issues. Why not just do the defined thing all the time? GDAL is getting built on non-x86 platforms (arm, power, sparc). A couple extra cycles to be defined behavior seems to be worth it.
e.g.
#if defined(MAKE_SANITIZE_HAPPY) || !(defined(__i386__) || defined(__x86_64__) || defined(_M_IX86) || defined(_M_X64)) /*! @cond Doxygen_Suppress */ #define CPL_CPU_REQUIRES_ALIGNED_ACCESS /*! @endcond */ /** Returns whether a double fits on a int */ #define CPL_IS_DOUBLE_A_INT(d) ( (d) >= INT_MIN && (d) <= INT_MAX && (double)(int)(d) == (d) ) #else /* This is technically unspecified behaviour if the double is out of range, but works OK on x86 */ /** Returns whether a double fits on a int */ #define CPL_IS_DOUBLE_A_INT(d) ( (double)(int)(d) == (d) ) #endif
Could we not just have this (with A -> AN grammar fix)? I tried to be as obvious as possible, typed and let the optimizer make it go fast rather than return d >= INT_MIN && d <= INT_MAX && (double)(int)d == d;
#ifdef __cplusplus #include <limits> // Move to the top of the file. inline bool CPLIsDoubleAnInt(double d) { if (d > std::numeric_limits<int>::max()) return false; if (d < std::numeric_limits<int>::min()) return false; return d == static_cast<double>(static_cast<int>(d)); } #endif // __cplusplus
Or better yet... use modf as used here:
#include <cmath> // Move to the top of the file. #include <limits> // Move to the top of the file. inline bool CPL_IS_DOUBLE_A_INT(double d) { if (d > std::numeric_limits<int>::max()) return false; if (d < std::numeric_limits<int>::min()) return false; double dfTmp = 0.0; return modf(d, &dfTmp) == 0.0; }
Change History (7)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
I did a quick look with micorbenchmark.
TL/DR: the unsafe macro and the check and cast in an inline are about the same speed when optimized. So I think we should go with this and drop the use of undefined behavior:
inline bool CPLIsDoubleAnInt2(double d) { if (d > std::numeric_limits<int>::max()) return false; if (d < std::numeric_limits<int>::min()) return false; return d == static_cast<double>(static_cast<int>(d)); }
This was on my desktop with Xeon(R) CPU E5-1650 0 @ 3.20GHz
that's about 2.5 years old with a fairly newish clang bulid env for opt and gcc for debug.
Using an inline with limit check and cast is pretty fast. Both about 2.3 ns. In most of my tests, the cast inline function was actually faster than the unsafe macro. With gcc debug mode, it's all pretty slow. Adding const and ref (&) to the cast case didn't help on performance in the optimized case and hurt in the debug case.
It looks like a difference between 2 ns and 25 ns per check for modf. So, for 1G points it'll add about ~20 seconds. So modf seems like a bad idea.
Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------------------- BM_MacroUnsafe 2.26 2.26 310165815 BM_Macro 14.6 14.7 48352200 BM_MacroUnsafeInline 2.30 2.30 305550706 BM_MacroInline 14.7 14.7 44510538 BM_Cast 2.29 2.29 310908966 BM_Modf 40.2 40.2 17409677 BM_ModfConstRef 25.9 25.9 26201663 DEBUG: Benchmark Time(ns) CPU(ns) Iterations -------------------------------------------------------------------- DEBUG: BM_MacroUnsafe 96.3 96.3 7073333 DEBUG: BM_Macro 115 115 6342469 DEBUG: BM_MacroUnsafeInline 118 118 6082977 DEBUG: BM_MacroInline 138 138 4955087 DEBUG: BM_Cast 177 177 3886193 DEBUG: BM_Modf 204 204 3461143 DEBUG: BM_ModfConstRef 158 158 4573156 Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------------- BM_MacroUnsafe 2.37 2.37 297256205 BM_Cast 2.28 2.28 308528102
The test code is unexciting...
#define CPL_IS_DOUBLE_AN_INT_UNSAFE(d) ((double)(int)(d) == (d)) #define CPL_IS_DOUBLE_AN_INT(d) \ ((d) >= INT_MIN && (d) <= INT_MAX && (double)(int)(d) == (d)) inline bool CPLIsDoubleAnInt0(double d) { return CPL_IS_DOUBLE_AN_INT_UNSAFE(d); } inline bool CPLIsDoubleAnInt1(double d) { return CPL_IS_DOUBLE_AN_INT(d); } inline bool CPLIsDoubleAnInt2(double d) { if (d > std::numeric_limits<int>::max()) return false; if (d < std::numeric_limits<int>::min()) return false; return d == static_cast<double>(static_cast<int>(d)); } inline bool CPLIsDoubleAnInt3(double d) { if (d > std::numeric_limits<int>::max()) return false; if (d < std::numeric_limits<int>::min()) return false; double dfTmp = 0.0; return modf(d, &dfTmp) == 0.0; } inline bool CPLIsDoubleAnInt4(const double& d) { if (d > std::numeric_limits<int>::max()) return false; if (d < std::numeric_limits<int>::min()) return false; double dfTmp = 0.0; return modf(d, &dfTmp) == 0.0; } const auto kValuesYes = {-1.0, 0.0, 1.0, -1000001.0, 1000001.0}; const auto kValuesNo = {-1.1, 1e-19, 1.00001, -1000001.999, 1000001.999, 1e100, -1e100, 1.234e300, -1.234e300}; static void BM_MacroUnsafe(benchmark::State& state) { while (state.KeepRunning()) { for (const auto& v : kValuesYes) { CHECK(CPL_IS_DOUBLE_AN_INT_UNSAFE(v)) << v; } for (const auto& v : kValuesNo) { CHECK(!CPL_IS_DOUBLE_AN_INT_UNSAFE(v)) << v; } } } BENCHMARK(BM_MacroUnsafe); static void BM_Macro(benchmark::State& state) { while (state.KeepRunning()) { for (const auto& v : kValuesYes) { CHECK(CPL_IS_DOUBLE_AN_INT(v)) << v; } for (const auto& v : kValuesNo) { CHECK(!CPL_IS_DOUBLE_AN_INT(v)) << v; } } } BENCHMARK(BM_Macro); // ...
comment:3 by , 7 years ago
ok, so the unsafe way is ~2.3 ns and the safe way ~14.6 ns ? That doesn't really tell the effect in a real world case like OGRMakeWktCoordinate(), but their relative cost compared to all the rest of the work done in the function might be neglectable.
OK just go ahead.
comment:4 by , 7 years ago
Sorry, I should have lead with the boiled down summary. The fastest safe method is about the same speed as the unsafe method in this test. But yes, this is an artificial test.
Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------------- BM_MacroUnsafe 2.37 2.37 297256205 BM_Cast 2.28 2.28 308528102
comment:5 by , 7 years ago
Ah the difference is really neglectable. Let's adopt the safe way unconditionaly
comment:6 by , 7 years ago
r37201 I wasn't able to keep the inline in the cpl_port.h header. grib has some weirdness that I didn't figure out.
comment:7 by , 5 years ago
Milestone: | → closed_because_of_github_migration |
---|---|
Resolution: | → wontfix |
Status: | new → closed |
This ticket has been automatically closed because Trac is no longer used for GDAL bug tracking, since the project has migrated to GitHub. If you believe this ticket is still valid, you may file it to https://github.com/OSGeo/gdal/issues if it is not already reported there.
Could you try to benchmark the performance effect of your proposed change on exporting to WKT a huge geometry (I think this must be the most demanding user of this macro) ?