Opened 9 years ago

Closed 8 years ago

#518 closed defect (worksforme)

Postgres backend terminated by exception 0xC0000005

Reported by: loldrup Owned by: pramsey
Priority: critical Milestone: PostGIS 1.4.3
Component: postgis Version: 1.4.X
Keywords: backend terminated exception Cc: lollaj


We are running a postgis production installation ("PostgreSQL 8.4.2, compiled by Visual C++ build 1400, 32-bit" / "POSTGIS="1.4.1" GEOS="3.2.0-CAPI-1.6.0" PROJ="Rel. 4.6.1, 21 August 2008" USE_STATS") on a window box, where the database dies a couple of times per week with the following exception: 2010-05-03 14:43:03 CEST 10032 LOG: server process (PID 2492) was terminated by exception 0xC0000005

I have finally been able reproduce the error using JMeter configured with 5 connections. The error is triggered by two selects using insersect with the same WKT polygon:

The 1. select:

SELECT AsText?(force_2d(Envelope(Extent("geom")))) FROM "schema"."mytable" WHERE "geom" && GeometryFromText?('POLYGON ((725024.0573763638 6188750.973059048, 725024.0573763638 6188753.700331775, 725026.7846490908 6188753.700331775, 725026.7846490908 6188750.973059048, 725024.0573763638 6188750.973059048))', 25832) AND intersects("geom", GeometryFromText?('POLYGON ((725024.0573763638 6188750.973059048, 725024.0573763638 6188753.700331775, 725026.7846490908 6188753.700331775, 725026.7846490908 6188750.973059048, 725024.0573763638 6188750.973059048))', 25832));

,and the second select:

SELECT "oid", "id", "geom" FROM "schema"."mytable" WHERE "geom" && GeometryFromText?('POLYGON ((725024.0573763638 6188750.973059048, 725024.0573763638 6188753.700331775, 725026.7846490908 6188753.700331775, 725026.7846490908 6188750.973059048, 725024.0573763638 6188750.973059048))', 25832) AND intersects("geometri", GeometryFromText?('POLYGON ((725024.0573763638 6188750.973059048, 725024.0573763638 6188753.700331775, 725026.7846490908 6188753.700331775, 725026.7846490908 6188750.973059048, 725024.0573763638 6188750.973059048))', 25832));

I have created a stack-trace, but my installation does not contain any symbols for postgis, but here it is:

Microsoft (R) Windows Debugger Version 6.12.0002.633 X86 Copyright (c) Microsoft Corporation. All rights reserved.

* wait with pending attach Symbol search path is: c:\postgres8.4\symbols;SRV*c:\localsymbols* Executable search path is: ModLoad?: 00400000 008b0000 C:\postgres8.4\bin\postgres.exe ModLoad?: 7c900000 7c9b2000 C:\WINDOWS\system32\ntdll.dll ModLoad?: 7c800000 7c8f6000 C:\WINDOWS\system32\kernel32.dll ModLoad?: 10000000 10034000 C:\postgres8.4\bin\SSLEAY32.dll ModLoad?: 002a0000 0039f000 C:\postgres8.4\bin\LIBEAY32.dll ModLoad?: 71ad0000 71ad9000 C:\WINDOWS\system32\WSOCK32.dll ModLoad?: 71ab0000 71ac7000 C:\WINDOWS\system32\WS2_32.dll ModLoad?: 77dd0000 77e6b000 C:\WINDOWS\system32\ADVAPI32.dll ModLoad?: 77e70000 77f02000 C:\WINDOWS\system32\RPCRT4.dll ModLoad?: 77fe0000 77ff1000 C:\WINDOWS\system32\Secur32.dll ModLoad?: 77c10000 77c68000 C:\WINDOWS\system32\msvcrt.dll ModLoad?: 71aa0000 71aa8000 C:\WINDOWS\system32\WS2HELP.dll ModLoad?: 77f10000 77f59000 C:\WINDOWS\system32\GDI32.dll ModLoad?: 7e410000 7e4a1000 C:\WINDOWS\system32\USER32.dll ModLoad?: 78130000 781cb000 C:\WINDOWS\WinSxS\x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.4053_x-ww_e6967989\MSVCR80.dll ModLoad?: 61cc0000 61cd3000 C:\postgres8.4\bin\libintl-8.dll ModLoad?: 66000000 660e8000 C:\postgres8.4\bin\libiconv-2.dll ModLoad?: 1c000000 1c09b000 C:\postgres8.4\bin\krb5_32.dll ModLoad?: 003b0000 003b7000 C:\postgres8.4\bin\comerr32.dll ModLoad?: 003c0000 003c8000 C:\postgres8.4\bin\k5sprt32.dll ModLoad?: 7c340000 7c396000 C:\postgres8.4\bin\MSVCR71.dll ModLoad?: 003d0000 003f1000 C:\postgres8.4\bin\gssapi32.dll ModLoad?: 00cb0000 00da1000 C:\postgres8.4\bin\libxml2.dll ModLoad?: 00db0000 00e89000 C:\postgres8.4\bin\iconv.dll ModLoad?: 00e90000 00ea3000 C:\postgres8.4\bin\zlib1.dll ModLoad?: 76f60000 76f8c000 C:\WINDOWS\system32\WLDAP32.dll ModLoad?: 76390000 763ad000 C:\WINDOWS\system32\IMM32.DLL ModLoad?: 71a50000 71a8f000 C:\WINDOWS\system32\mswsock.dll ModLoad?: 662b0000 66308000 C:\WINDOWS\system32\hnetcfg.dll ModLoad?: 71a90000 71a98000 C:\WINDOWS\System32\wshtcpip.dll ModLoad?: 6adc0000 6ae4f000 C:\postgres8.4\lib\postgis-1.4.dll ModLoad?: 70040000 700ad000 C:\postgres8.4\bin\libgeos_c-1.dll ModLoad?: 704c0000 70662000 C:\postgres8.4\bin\libgeos-3-2-0.dll ModLoad?: 040d0000 04109000 C:\postgres8.4\bin\libproj.dll (efc.e7c): Break instruction exception - code 80000003 (first chance) eax=7ffd5000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005 eip=7c90120e esp=0413ffcc ebp=0413fff4 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246 ntdllDbgBreakPoint: 7c90120e cc int 3 0:003> g (efc.d64): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=0088ee74 ebx=03f7c238 ecx=00000001 edx=0000234c esi=03f8a000 edi=03f533c0 eip=6addae49 esp=00caf3e0 ebp=00caf488 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 * ERROR: Symbol file could not be found. Defaulted to export symbols for C:\postgres8.4\lib\postgis-1.4.dll - postgis_1_4GetPrepGeomCache+0x149: 6addae49 f3a5 rep movs dword ptr es:[edi],dword ptr [esi] 0:000> ~*k

. 0 Id: efc.d64 Suspend: 1 Teb: 7ffdf000 Unfrozen ChildEBP RetAddr? WARNING: Stack unwind information not available. Following frames may be wrong. 00caf488 6add7722 postgis_1_4GetPrepGeomCache+0x149 00caf4f8 004dce39 postgis_1_4!intersects+0x102 00caf774 004dcea3 postgresExecMakeFunctionResult+0x459 [c:\pginstaller-repo\\src\backend\executor\execqual.c @ 1686] 00caf798 004dc108 postgresExecEvalFunc+0x33 [c:\pginstaller-repo\\src\backend\executor\execqual.c @ 2116] 00caf7bc 004dff9c postgresExecQual+0x38 [c:\pginstaller-repo\\src\backend\executor\execqual.c @ 4909] 00caf7e0 004e97f7 postgresExecScan+0xac [c:\pginstaller-repo\\src\backend\executor\execscan.c @ 131] 00caf7f0 004d84a9 postgresExecIndexScan+0x27 [c:\pginstaller-repo\\src\backend\executor\nodeindexscan.c @ 159] 00caf800 004ec2d7 postgresExecProcNode+0xa9 [c:\pginstaller-repo\\src\backend\executor\execprocnode.c @ 367] 00caf828 004d8529 postgresExecNestLoop+0x67 [c:\pginstaller-repo\\src\backend\executor\nodenestloop.c @ 120] 00caf838 004ec2d7 postgresExecProcNode+0x129 [c:\pginstaller-repo\\src\backend\executor\execprocnode.c @ 404] 00caf860 004d8529 postgresExecNestLoop+0x67 [c:\pginstaller-repo\\src\backend\executor\nodenestloop.c @ 120] 00caf870 004ec2d7 postgresExecProcNode+0x129 [c:\pginstaller-repo\\src\backend\executor\execprocnode.c @ 404] 00caf898 004d8529 postgresExecNestLoop+0x67 [c:\pginstaller-repo\\src\backend\executor\nodenestloop.c @ 120] 00caf8a8 004ec2d7 postgresExecProcNode+0x129 [c:\pginstaller-repo\\src\backend\executor\execprocnode.c @ 404] 00caf8d0 004d8529 postgresExecNestLoop+0x67 [c:\pginstaller-repo\\src\backend\executor\nodenestloop.c @ 120] 00caf8e0 004ec2d7 postgresExecProcNode+0x129 [c:\pginstaller-repo\\src\backend\executor\execprocnode.c @ 404] 00caf908 004d8529 postgresExecNestLoop+0x67 [c:\pginstaller-repo\\src\backend\executor\nodenestloop.c @ 120] 00caf918 004ec2d7 postgresExecProcNode+0x129 [c:\pginstaller-repo\\src\backend\executor\execprocnode.c @ 404] 00caf940 004d8529 postgresExecNestLoop+0x67 [c:\pginstaller-repo\\src\backend\executor\nodenestloop.c @ 120] 00caf950 004ec2d7 postgresExecProcNode+0x129 [c:\pginstaller-repo\\src\backend\executor\execprocnode.c @ 404] 00caf978 004d8529 postgresExecNestLoop+0x67 [c:\pginstaller-repo\\src\backend\executor\nodenestloop.c @ 120] 00caf988 004ec2d7 postgresExecProcNode+0x129 [c:\pginstaller-repo\\src\backend\executor\execprocnode.c @ 404] 00caf9b0 004d8529 postgresExecNestLoop+0x67 [c:\pginstaller-repo\\src\backend\executor\nodenestloop.c @ 120] 00caf9c0 004e4142 postgresExecProcNode+0x129 [c:\pginstaller-repo\\src\backend\executor\execprocnode.c @ 404] 00cafa04 004d8586 postgres!agg_retrieve_direct+0x62 [c:\pginstaller-repo\\src\backend\executor\nodeagg.c @ 884] 00cafa14 004d71e3 postgresExecProcNode+0x186 [c:\pginstaller-repo\\src\backend\executor\execprocnode.c @ 431] 00cafa70 004d763a postgresExecutePlan+0x93 [c:\pginstaller-repo\\src\backend\executor\execmain.c @ 1504] 00cafa9c 005c996a postgres!standard_ExecutorRun+0x7a [c:\pginstaller-repo\\src\backend\executor\execmain.c @ 314] 00cafac0 005c9f2f postgresPortalRunSelect+0x6a [c:\pginstaller-repo\\src\backend\tcop\pquery.c @ 954] 00cafb3c 005c5f92 postgresPortalRun+0x14f [c:\pginstaller-repo\\src\backend\tcop\pquery.c @ 782] 00cafbec 005c7a68 postgres!exec_execute_message+0x312 [c:\pginstaller-repo\\src\backend\tcop\postgres.c @ 1935] 00cafcd0 00595544 postgresPostgresMain+0xd38 [c:\pginstaller-repo\\src\backend\tcop\postgres.c @ 3671] 00cafd20 00597fa4 postgresBackendRun+0x204 [c:\pginstaller-repo\\src\backend\postmaster\postmaster.c @ 3450] 00caff60 00504167 postgresSubPostmasterMain+0x224 [c:\pginstaller-repo\\src\backend\postmaster\postmaster.c @ 3941] 00caff7c 006ba55d postgres!main+0x177 [c:\pginstaller-repo\\src\backend\main\main.c @ 165] 00caffc0 7c817077 postgres__tmainCRTStartup+0x10f [f:\sp\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 597] 00cafff0 00000000 kernel32BaseProcessStart+0x23

1 Id: efc.dac Suspend: 1 Teb: 7ffde000 Unfrozen

ChildEBP RetAddr? 0164fecc 7c90d3aa ntdllKiFastSystemCallRet 0164fed0 7c8314ae ntdllZwFsControlFile+0xc 0164ff14 0058955f kernel32ConnectNamedPipe+0x52 0164ffb4 7c80b729 postgres!pg_signal_thread+0x9f [c:\pginstaller-repo\\src\backend\port\win32\signal.c @ 275] 0164ffec 00000000 kernel32BaseThreadStart+0x37

2 Id: efc.ed8 Suspend: 1 Teb: 7ffdd000 Unfrozen

ChildEBP RetAddr? 03ecff2c 7c90df5a ntdllKiFastSystemCallRet 03ecff30 7c8025db ntdllZwWaitForSingleObject+0xc 03ecff94 0058ab6b kernel32WaitForSingleObjectEx+0xa8 03ecffb4 7c80b729 postgres!pg_timer_thread+0x2b [c:\pginstaller-repo\\src\backend\port\win32\timer.c @ 51] 03ecffec 00000000 kernel32BaseThreadStart+0x37

Change History (25)

comment:1 Changed 9 years ago by loldrup

If needed, I can create a complete setup (database dump & JMeter .jmx file) so that the error can be reproduced in another installation.

comment:2 Changed 9 years ago by robe

There have been a couple of leak fixes in GEOS since 3.2.0 was released. Before you send a dump, can you try the GEOS 3.2.2 libraries.

They are packaged in with the PostGIS 1.5.2SVN experimental builds. You just need to copy the geos* from the bin folder of that to your PostgreSQL bin folder. Backup your geos* .dlls first before overwriting.

Then to verify just do a

SELECT postgis_full_version();

and it should register the new GEOS.

If you still have problems with the new GEOS then a dump would be helpful to troubleshoot.

Thanks, Regina

comment:3 Changed 9 years ago by loldrup

Ok robe.

I have upgraded GEOS only as described above, SELECT postgis_full_version() returns: "POSTGIS="1.4.1" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.6.1, 21 August 2008" USE_STATS"

But the problem still exists:

2010-05-04 23:19:12 CEST 3296 LOG: server process (PID 3480) was terminated by exception 0xC0000005 2010-05-04 23:19:12 CEST 3296 HINT: See C include file "ntstatus.h" for a description of the hexadecimal value. 2010-05-04 23:19:12 CEST 3296 LOG: terminating any other active server processes

If U supply me with symbol files, I can produce a more complete stack trace if needed.

comment:4 Changed 9 years ago by robe

On second glance looks like its breaking in the prepared geometry logic. Paul would probably have a better idea on whats wrong here. It might not be GEOS.

Paul any thoughts?

comment:5 Changed 9 years ago by pramsey

None. Having a full stack trace would help, but since this is well debugged and heavily used code (ie, this is not an obvious bug, but a subtle one involving very particular inputs) having an actual data table and the ability to reproduce the crash is going to be required to track down the issue.

comment:6 Changed 9 years ago by loldrup

I am trying to create a test setup simplifying the selects, at the data, but then the error cannot be reproduced - Ill be back.

comment:7 Changed 9 years ago by loldrup

I finally managed to produce a dumpfile that can be used to reproduce the error - I have given Paul references to it including a describtion how to reproduce the error.

The odd thing is that if I update or delete rows in the table that is part of the "trigger query" the error can not be reproduced !

I have upgraded to Postgres 8.4.2/PostGis 1.4.2, and the error is still there.

Next week I could try to reproduce the error on a Linux box.

comment:8 Changed 9 years ago by pramsey

Please do try on a linux box. I'm getting no replication here on my OSX box. I'm going to just manually comb that particular code area for the kinds of thing that generally screw up Windows, but I don't hold out a lot of hope. Also see if the problem can be reproduced turning your jmeter down to just one concurrent thread (it should, because those processes should be isolated).

comment:9 Changed 9 years ago by loldrup

Paul, I allready tried to test with only 1 thread in JMeter - the problem first starts at 5 threads or above !

Another shot was to perform all kind's of vacuum/reindex - again without any luck.

comment:10 Changed 9 years ago by pramsey

I wish someone smarter than I could look at this one... in this day and age separate processes shouldn't be able to stomp on one another. I guess there's the pgsql shared memory segment... anyone testing this should ensure their pgsql is compiled with --enable-cassert to make sure that memory operations in pgsql are being double-checked.

comment:11 Changed 9 years ago by loldrup

Did a "CLUSTER myschema.mytable USING idx_mytable;", and the error cannot be reproduced - just like updating or deleting rows....

comment:12 Changed 9 years ago by pramsey

Right, there seems to be a dependency not only on particular data, but having them passed into the function in a particular order, which then (maybe?) creates a condition where the data is written past the bounds of the memory area and, if there are enough other processes running can hit something else and cause it to become invalid, which then causes a failure. At least, that's the guess. If we could get you to reproduce the error on a pgsql that is compiled with --enable-cassert, we *might* see where the initial memory corruption occurs, instead of seeing the effect of it later on when the corrupt segment is then picked up by another process.

comment:13 Changed 9 years ago by loldrup

Paul, what do U exactly means with "on a pgsql"?

Anyways, I just restored the dbase, changed shared_buffers from 32MB to 64MB, ran the Jmeter test (that probably would die within 1000-2000 samples), and it ran for 50.000 samples untill I stopped it (my workstation becomes totaly unusable (m$)).

Are we seeing a "out of memory situation"?

comment:14 Changed 9 years ago by pramsey

Probably you can make it fail faster by lowering shared_buffers then, right?

Yes, it might be a leak.

I mean, building PostGIS against, and running it inside a PostgreSQL that has been build with --enable-cassert.

comment:15 Changed 9 years ago by loldrup

Agreed about the lowering the shared_buffers - statistical testing is so funny! - Could it be a mininum required size, or a leak?

Just restarted the JMeter, stoppet i again at 94715 samples without any problems still using the 64MB shared_buffes. Memory consumption of the 5 backend process's seems to stabilize at around 16MB (I newer learned the "win task manager" - anyone).

Maybe I could keep it running over night (maybe I need to bye some CO2 something..)...

Sorry, I do not have a build environment on the win platform - only on linux.

comment:16 Changed 9 years ago by pramsey

Should take my own advice, will re-run my OSX test with a much smaller shared_buffer and see if it forces a failure.

comment:17 Changed 9 years ago by loldrup

Running on the clean restored database, JMeter test has been running 2 hours - now at 374.000 samples, and still running - still using the 64MB shared_buffes setting. Memory consumption of the 5 backend's now at 61MB each - looks like a leak.

If it's a leak, why does changing table data sometimes remove the leak? - anyways in this ballgame anything is possible!

Ill just leave it running over night... it's getting late in Denmark...

comment:18 Changed 9 years ago by loldrup

The "64MB shared_buffes setting" test died at 07:00:53 local time(GMT+2), and the last I saw of the backend's was at 00:45:00 where memory usage was 148MB pr. backend after 902.000 JMeter samples.

Changed to 16MB shared_buffes setting, and the exact same test stops after only 500 samples.

comment:19 Changed 9 years ago by loldrup

Did a linux build of Postgres 8.4.3 including POSTGIS="1.4.2" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September 2009" USE_STATS, and compiled with --enable-cassert

Changed postgresql.conf with the following options: max_connections = 10 shared_buffers = 128kB client_min_messages = debug5 log_min_messages = debug5

Ran the test again, starting at 09:34 - runnung stable without problems (as expected on a linux build). At 13:34, and 1.889.000 samples the 5 backends was using between 109 and 123MB virtual memory each. At 16:20 JMeter came to a halt, probably another leak there.

I got a 3.9G log-file from postgres, but nothing special there - anyone with a neural network that can grep for "non repeated things" in a GB file?

I am not realy sure what can be concluded at this point apart, that there is some leak in both platforms, and a to low shared_buffers triggers a problem on win platform.

At 17:05 JMeter out controll, so I stopped it.

comment:20 Changed 9 years ago by mcayland

I seem to recall that it was possible to send a signal to a running PostgreSQL backend which would cause it dump out it's memory context information to the server log file (possibly SIGABORT?). If we could do this on your Linux test server, we can at least get a rough idea of *where* the memory leak is occurring.



comment:21 Changed 9 years ago by loldrup

I made a new test on the Linux platform: Disabled the prepared geometry cache by commenting out the "#define PREPARED_GEOM" in lwgeom_geos_prepared.h, but the JMeter test did not show anything new - still leaking memory.

My gues is that we are dealing with 2 problems, that could be related: a) A memory leak, and b) "Postgres backend terminated by exception 0xC0000005" on the windows platform.

I suggest that someone with a windows development/debug environment looks at the bug(s) - Paul has got the data, and JMeter setup.

Another try could be to disable the geometrycache on a windows build, maybe the exception would go away - is there anyone that could compile the relevant DLL's for me?

BTW: I will be on hollyday for the next 10 days.

comment:22 Changed 9 years ago by lollaj

Cc: lollaj added

I can reproduce this problem on the latest stable PostgreSQL 8.4.4 and PostGIS 1.5.0. I've tried with the GEOS 3.2.2 libraries as well without luck. I am running on Windows 7.

comment:23 Changed 9 years ago by lollaj

I'm also able to reproduce on Ubuntu 10.04 LTS:

PostgreSQL 8.4.4 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 32-bit

POSTGIS="1.4.0" GEOS="3.1.0-CAPI-1.5.0" PROJ="Rel. 4.7.1, 23 September 2009" USE_STATS

comment:24 Changed 8 years ago by strk

Milestone: PostGIS 1.4.3

comment:25 Changed 8 years ago by pramsey

Resolution: worksforme
Status: newclosed

We've seen no other reports and nobody else can recreate this.

Note: See TracTickets for help on using tickets.