Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#1617 closed defect (fixed)

[raster] several regress failures on raster (old mingw)

Reported by: robe Owned by: Bborie Park
Priority: blocker Milestone: PostGIS 2.0.0
Component: raster Version: master
Keywords: mingw Cc:

Description

IT's been a while since I've tested raster since before the core postgis tests were failing. Now that all those past after Paul's fixes — thanks Paul :)

I'm seeing failures in raster. Also for some reason my temp folder is not being numbered anymore. It's putting the results in Appdata\Local\Temp\pgis_reg instead of Appdata\Local\Temp\pgis_reg_somerandomnumber

 PostgreSQL 9.1.2, compiled by Visual C++ build 1500, 32-bit
 Postgis 2.0.0alpha7SVN - r - 2012-02-26 02:29:02
   GEOS: 3.3.3dev-CAPI-1.7.3
   PROJ: Rel. 4.6.1, 21 August 2008
   GDAL: GDAL 1.9.0, released 2011/12/29

Running tests

 check_raster_columns .. ok 
 check_raster_overviews .. ok 
 rt_io .. ok 
 rt_bytea .. ok 
 box3d .. ok 
 rt_addband .. ok 
 rt_band .. ok 
 rt_asgdalraster .. failed (diff expected obtained: /tmp/pgis_reg/test_8_diff)
 rt_astiff .. failed (diff expected obtained: /tmp/pgis_reg/test_9_diff)
 rt_asjpeg .. failed (diff expected obtained: /tmp/pgis_reg/test_10_diff)
 rt_aspng .. ok 
 rt_union .. ok 
 create_rt_properties_test .. ok 
 rt_dimensions .. ok 
 rt_scale .. ok 
 rt_pixelsize .. ok 
 rt_upperleft .. ok 
 rt_rotation .. ok 
 rt_georeference .. ok 
 rt_set_properties .. ok 
 drop_rt_properties_test .. ok 
 create_rt_empty_raster_test .. ok 
 rt_isempty .. ok 
 rt_hasnoband .. ok 
 drop_rt_empty_raster_test .. ok 
 rt_metadata .. ok 
 create_rt_band_properties_test .. ok 
 rt_band_properties .. ok 
 rt_set_band_properties .. ok 
 rt_summarystats .. ok 
 rt_count .. ok 
 rt_histogram .. ok 
 rt_quantile .. ok 
 rt_valuecount .. ok 
 rt_valuepercent .. ok 
 rt_bandmetadata .. ok 
 rt_pixelvalue .. ok 
 drop_rt_band_properties_test .. ok 
 rt_utility .. ok 
 create_rt_mapalgebra_test .. ok 
 rt_mapalgebraexpr .. ok 
 rt_mapalgebrafct .. ok 
 rt_mapalgebraexpr_2raster .. ok 
 rt_mapalgebrafct_2raster .. ok 
 drop_rt_mapalgebra_test .. ok 
 create_rt_mapalgebrafctngb_test .. ok 
 rt_mapalgebrafctngb .. ok 
 rt_mapalgebrafctngb_userfunc .. ok 
 drop_rt_mapalgebrafctngb_test .. ok 
 rt_reclass .. ok 
 rt_resample .. ok 
 rt_asraster .. ok 
 rt_intersection .. ok 
 rt_clip .. ok 
 create_rt_gist_test .. ok 
 rt_above .. ok 
 rt_below .. ok 
 rt_contained .. ok 
 rt_contain .. ok 
 rt_left .. ok 
 rt_overabove .. ok 
 rt_overbelow .. ok 
 rt_overlap .. ok 
 rt_overleft .. ok 
 rt_overright .. ok 
 rt_right .. ok 
 rt_same .. ok 
 drop_rt_gist_test .. ok 
 rt_spatial_relationship .. ok 
 rt_intersects .. ok 
 rt_samealignment .. ok 
 bug_test_car5 .. ok 
 tickets .. ok 
 loader/Basic .... failed ( test: actual SQL does not match expected.,: /tmp/pgis_reg/loader.out)
.. ok 
 loader/BasicCopy .... failed ( test: actual SQL does not match expected.,: /tmp/pgis_reg/loader.out)
.. ok 
 loader/Tiled10x10 ..... ok 
 loader/Tiled10x10Copy ..... ok 
 uninstall ... ok (3855)

Run tests: 78
Failed: 5

The rt_asgdalraster is crashing my service. I suspect astiff and asjpeg are failing becasue they are being running while the pg service is starting up.

I've attached the regress folder

Attachments (1)

pgis_reg.zip (5.3 KB ) - added by robe 12 years ago.

Download all attachments as: .zip

Change History (24)

by robe, 12 years ago

Attachment: pgis_reg.zip added

comment:1 by robe, 12 years ago

Keywords: mingw added

comment:2 by robe, 12 years ago

Bborie, As I noted in http://www.postgis.org/pipermail/postgis-devel/2012-February/018849.html (I think Paul's recent failure and this might be related).

I tried Paul's example

SELECT CASE WHEN
 length(ST_AsGDALRaster(ST_AddBand(ST_MakeEmptyRaster(200, 
 200, 10, 10, 2, 2, 0, 0), 1, '8BSI',123, NULL),'PNG')) > 0 
 THEN 1 ELSE 0 END;
 

Works fine with my alpha6 build, but crashes with trunk. I looked at my postgresql logs and this is the last message it gives before it crashes the backend.

Warning 6: PNG driver doesn't support data type Int16. Only eight bit (Byte) and sixteen bit (UInt16) bands supported. Defaulting to Byte

So maybe related to your fix for #1616 ?

comment:3 by Bborie Park, 12 years ago

I'm guessing it has something to do with the change I had to make for the 8BSI pixel type. 8BSI used to map to GDAL pixel type GDT_Byte but now maps to GDT_Int16 to preserve the sign.

What version of GDAL are you using?

comment:4 by robe, 12 years ago

released 1.9.0 compiled under mingw, but Paul might be using trunk and I think his is compiled under VC 2008. Paul which GDAL are you using?

comment:5 by pramsey, 12 years ago

GDAL 1.9.0, compiled with MSVC 2008.

comment:6 by Bborie Park, 12 years ago

Summary: several regress failures on raster (old mingw)[raster] several regress failures on raster (old mingw)

Can either of you test out raster/test/core/testapi.c from 9303? I've added another test of the GDAL PNG output. I'm expecting that to cause a crash in Windows. I hope that will make the crash easier to debug.

I'm not seeing this issue in Linux 32 and 64-bit (GDAL trunk r24025) and OSX (GDAL release 1.9.0), which makes things even stranger.

comment:7 by pramsey, 12 years ago

It does crash, but unfortunately the stack trace is just as useless (perhaps because I'm built with minimal debugging on my dependent libraries, or perhaps just because I'm not good looking enough)

Program received signal SIGSEGV, Segmentation fault.
0x7855ae7a in memcpy ()
   from C:\WINDOWS\WinSxS\x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.30729.6161_x-ww_31a54e43\msvcr90.dll
(gdb) bt
#0  0x7855ae7a in memcpy ()
   from C:\WINDOWS\WinSxS\x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.30729.6161_x-ww_31a54e43\msvcr90.dll
#1  0x006eb768 in gdal!?IReadBlock@MEMRasterBand@@UAE?AW4CPLErr@@HHPAX@Z ()
   from c:\pgsql\bin\gdal.dll
Backtrace stopped: Not enough registers or memory available to unwind further
(gdb)

comment:8 by pramsey, 12 years ago

The key to most windows-specific problems seems to be dirty memory, in my experience thus far. For whatever reason, the odds that a an address will be zero'ed when you first read from it seem much higher in Linux/OSX. So check that all your variables have initializers, and that you don't assume malloc'ed space will be zero'ed out.

comment:9 by robe, 12 years ago

On mine it says its successful

make -C test check
make[1]: Entering directory `/c/projects/PostGIS/trunk/raster/test'
make -C core check
make[2]: Entering directory `/c/projects/PostGIS/trunk/raster/test/core'
./testapi
Warning 6: PNG driver doesn't support data type Int16. Only eight bit (Byte) and sixteen bit (UInt16) bands supported. Defaulting to Byte
Checking empty and hasnoband functions...
Checking raster properties...
Raster starts with 0 bands
First point on convexhull ring is 0.5,0.5
Second point on convexhull ring is 256.5,1280.5
Third point on convexhull ring is 1280.5,1536.5
Fourth point on convexhull ring is 1024.5,256.5
Fifth point on convexhull ring is 0.5,0.5
Testing rt_raster_gdal_polygonize
Successfully tested rt_raster_gdal_polygonize
Testing 1BB band
Testing 2BB band
Testing 4BUI band
Testing 8BUI band
Testing 8BSI band
Testing 16BSI band
ERROR: rt_band_set_pixel: Coordinates out of range
Testing 16BUI band
ERROR: rt_band_set_pixel: Coordinates out of range
Testing 32BUI band
ERROR: rt_band_set_pixel: Coordinates out of range
Testing 32BSI band
ERROR: rt_band_set_pixel: Coordinates out of range
Testing 32BF band
Testing 64BF band
Testing band hasnodata flag
Testing rt_raster_from_band
Successfully tested rt_raster_from_band
Testing band stats
Successfully tested band stats
Testing rt_raster_replace_band
Successfully tested rt_raster_replace_band
Testing rt_band_reclass
Successfully tested rt_band_reclass
Testing rt_raster_to_gdal
Successfully tested rt_raster_to_gdal
Testing rt_raster_gdal_drivers
Successfully tested rt_raster_gdal_drivers
Testing rt_band_get_value_count
Successfully tested rt_band_get_value_count
Testing rt_raster_from_gdal_dataset
Successfully tested rt_raster_from_gdal_dataset
Testing rt_util_compute_skewed_extent
Successfully tested rt_util_compute_skewed_extent
Testing rt_raster_gdal_warp
Successfully tested rt_raster_gdal_warp
Testing rt_raster_gdal_rasterize
Successfully tested rt_raster_gdal_rasterize
Testing rt_raster_intersects
Successfully tested rt_raster_intersects
Testing rt_raster_same_alignment
Successfully tested rt_raster_same_alignment
Testing rt_raster_from_two_rasters
ERROR: rt_raster_from_two_rasters: The two rasters provided do not have the same alignment
ERROR: rt_raster_from_two_rasters: The two rasters provided do not have the same SRID
ERROR: rt_raster_from_two_rasters: The two rasters provided do not have the same alignment
Successfully tested rt_raster_from_two_rasters
Testing rt_raster_load_offline_band
Successfully tested rt_raster_load_offline_band
./testwkb
 in hexwkb len: 122
out hexwkb len: 122
 in hexwkb: 00000000003FF0000000000000400000000000000040080000000000004010000000000000401400000000000040180000000000000000000A00070008
out hexwkb: 0100000000000000000000F03F000000000000004000000000000008400000000000001040000000000000144000000000000018400A00000007000800
 in hexwkb len: 128
out hexwkb len: 128
 in hexwkb len: 138
out hexwkb len: 138
 in hexwkb len: 152
out hexwkb len: 152
 in hexwkb len: 152
out hexwkb len: 152
ext band path: /tmp/t.tif
ext band  num: 3
 in hexwkb len: 152
out hexwkb len: 152
SRID value -1 converted to the officially unknown SRID value 0
SRID value -1 converted to the officially unknown SRID value 0
 in hexwkb len: 284
out hexwkb len: 284
SRID value -1 converted to the officially unknown SRID value 0
 in hexwkb len: 284
out hexwkb len: 284
SRID value -1 converted to the officially unknown SRID value 0
 in hexwkb len: 284
out hexwkb len: 284
SRID value -1 converted to the officially unknown SRID value 0
 in hexwkb len: 284
out hexwkb len: 284
SRID value -1 converted to the officially unknown SRID value 0
 in hexwkb len: 284
out hexwkb len: 284
All tests successful !

}}}

comment:10 by robe, 12 years ago

I agree with Paul though. One reason I'm such a good tester is because windows crashes whenever the memory is dirty in any way and the combination of windows 32 app running on windows 7 64-bit seems to be even better at doing that. So I could consistently cause a crash on those dirty memory array bugs that would take others 10 cycles or more to produce.

comment:11 by robe, 12 years ago

Paul,

So are you saying the testapi test is crashing for you? If that is the case might be the interaction between vcc and native ming. In my case the testapi would be a pure mingw gdal test since my gdal is compiled under mingw and that test doesn't touch postgresql. So I wouldn't see the interaction until my PostgreSQL test which is a test against a PostgreSQL VC++ build.

So could be at the point where it is trying to output the error something is not being closed right.

comment:12 by pramsey, 12 years ago

Yes, testapi crashes, and from the stack trace it looks to crash in the same place as the online test. I find it hard to believe your VC postgresql could have anything to do with it, so perhaps you could try your online test with a mingw postgresql and see if it works there to prove me wrong.

comment:13 by pramsey, 12 years ago

Bborie, you could pass the testapi.c through valgrind and look for uninitialized variables and other memory nastinesses.

comment:14 by robe, 12 years ago

Well I get those 3 regress failures still with the test_8_out (rt_asgdalraster) still giving a server crash notice.

However when I run this

SELECT CASE WHEN
 length(ST_AsGDALRaster(ST_AddBand(ST_MakeEmptyRaster(200, 
 200, 10, 10, 2, 2, 0, 0), 1, '8BSI',123, NULL),'PNG')) > 0 
 THEN 1 ELSE 0 END
 
SELECT version() , postgis_full_version();

PostgreSQL 9.1.0 on i686-pc-mingw32, compiled by gcc.exe (GCC) 3.4.5 (mingw-vista special r3), 32-bit POSTGIS="2.0.0alpha7SVN" GEOS="3.3.3dev-CAPI-1.7.3" PROJ="Rel. 4.6.1, 21 August 2008" GDAL="GDAL 1.9.0, released 2011/12/29" LIBXML="2.7.8" USE_STATS

Under my mingw compiled postgresql it doesn't crash and gives 1. I'll double check on my VC++ build to make sure it still crashes and it consistently crashes. Then I'll recompile my mingw postgres with more debug things enabled if needed so I can troubleshoot the other crash.

comment:15 by robe, 12 years ago

Okay confirmed. mingw on mingw doesn't crash on Paul's sample query

mingw with vcc+ compiled PostgreSQL consistently crashes on Paul's sample query.

I still haven't had a chance to troubleshoot the rgdal crash that seems to happen on both, so I guess that one might be a separate issue.

comment:16 by Bborie Park, 12 years ago

Owner: changed from pracine to Bborie Park
Status: newassigned

Gah! No need to do additional testing. I know what's causing it. PostGIS Raster has the pixel type 8BSI. GDAL does not support 8BSI so we use the GDAL pixel type GDT_Int16. When converting a raster to a GDAL MEM dataset, we use a pointer to the location of the pixel data. But, there is a mismatch as GDAL is expecting a block of data in 16-bit signed integer when the data is in 8-bit signed integer. And thus the memcpy messages pramsey was getting in gdb. GDAL was expecting the data block to be twice the size of what is in the raster.

So, that's the problem. It only affects the 8BSI pixel type as all other pixel-types have a clean one-to-one match. I hope to have a fix committed sometime today or in the worst case, tomorrow.

in reply to:  7 comment:17 by mcayland, 12 years ago

Replying to pramsey:

It does crash, but unfortunately the stack trace is just as useless (perhaps because I'm built with minimal debugging on my dependent libraries, or perhaps just because I'm not good looking enough)

Program received signal SIGSEGV, Segmentation fault.
0x7855ae7a in memcpy ()
   from C:\WINDOWS\WinSxS\x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.30729.6161_x-ww_31a54e43\msvcr90.dll
(gdb) bt
#0  0x7855ae7a in memcpy ()
   from C:\WINDOWS\WinSxS\x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.30729.6161_x-ww_31a54e43\msvcr90.dll
#1  0x006eb768 in gdal!?IReadBlock@MEMRasterBand@@UAE?AW4CPLErr@@HHPAX@Z ()
   from c:\pgsql\bin\gdal.dll
Backtrace stopped: Not enough registers or memory available to unwind further
(gdb)

Ah yes, you're probably getting caught by the DWARF2 vs. SJLJ exception handling fun. GCC uses DWARF2 while MSVC uses SJLJ - hence if you're mixing across the two, your stack traces will stop at the point where you switch.

I think personally a better idea based upon your email would be for OSGEO to host a set of pre-built Windows DLLs and library headers so that people can quickly grab a tarball/zip file to get involved with development. It should be fairly easy to build everything consistently on mingw, and then everything would "just work".

comment:18 by Bborie Park, 12 years ago

Can someone test r9313? I've fixed the issue regarding PT_8BUI → GDT_Int16 and am expecting that these regressions should no longer exist.

comment:19 by pramsey, 12 years ago

The core API test now passes. Running the online tests now.

comment:20 by pramsey, 12 years ago

Online tests now get past the GDAL failures, loader/Basic and loader/BasicCopy still remain in the raster tests… great work, bborrie!

comment:21 by pramsey, 12 years ago

Committed a fix to run_test to get around the Basic/BasicCopy failures, now running online tests again… can we get past raster?…

comment:22 by pramsey, 12 years ago

Resolution: fixed
Status: assignedclosed

Windows regresses all the way to completion. We have reached the promised land.

comment:23 by Bborie Park, 12 years ago

Thanks Paul for all your work. Time for 64-bit Windows?

Note: See TracTickets for help on using tickets.