Opened 2 days ago

Closed 27 hours ago

#5798 closed defect (invalid)

Out of memory: Killed process

Reported by: Lars Aksel Opsahl Owned by: strk
Priority: medium Milestone: PostGIS 3.5.1
Component: topology Version: 3.4.x
Keywords: Cc:

Description

I got a new memory error when using Postgis Toplogy now. Have never seen this error before. I am running 100 threads in parallel and some hours I get this error.

Oct 16 00:20:05 cpu11 kernel: [1644367.115588] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/system.slice/system-postgresql.slice/postgresql@16-main.service,task=postgres,pid=662537,uid=133
Oct 16 00:20:05 cpu11 kernel: [1644367.115610] Out of memory: Killed process 662537 (postgres) total-vm:8780856kB, anon-rss:7860kB, file-rss:444kB, shmem-rss:8041096kB, UID:133 pgtables:16928kB oom_score_adj:0
Oct 16 00:20:08 cpu11 systemd[1]: postgresql@16-main.service: Failed with result 'oom-kill'.
Oct 16 00:20:08 cpu11 systemd[1]: postgresql@16-main.service: Consumed 1w 14h 21min 45.086s CPU time.


This problem is present at both this two configs

POSTGIS="3.6.0dev 3.5.0-16-g4fcfd0fd6" [EXTENSION] PGSQL="160" GEOS="3.13.0beta2-CAPI-1.19.0" PROJ="9.3.0 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj DATABASE_PATH=/usr/local/share/proj/proj.db" (compiled against PROJ 9.13.0) LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" TOPOLOGY

POSTGIS="3.6.0dev 3.5.0-22-g224839693" [EXTENSION] PGSQL="160" GEOS="3.13.0beta2-CAPI-1.19.0" PROJ="9.3.0 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj DATABASE_PATH=/usr/local/share/proj/proj.db" (compiled against PROJ 9.13.0) LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" TOPOLOGY

I am now running a new test and I have also

sysctl -w vm.overcommit_memory=2

And I do see that memory is gradually eaten up

Wed Oct 16 05:44:46 CEST 2024
               total        used        free      shared  buff/cache   available
Mem:           125Gi        20Gi        93Gi       3.3Gi        12Gi       100Gi
Swap:          7.8Gi       263Mi       7.6Gi

Wed Oct 16 05:49:54 CEST 2024
               total        used        free      shared  buff/cache   available
Mem:           125Gi        22Gi        88Gi       3.7Gi        14Gi        98Gi
Swap:          7.8Gi       263Mi       7.6Gi
ubuntu, root@cpu11:~# 

Wed Oct 16 06:09:01 CEST 2024
               total        used        free      shared  buff/cache   available
Mem:           125Gi        22Gi        79Gi       5.7Gi        23Gi        96Gi
Swap:          7.8Gi       262Mi       7.6Gi

Wed Oct 16 06:25:38 CEST 2024
               total        used        free      shared  buff/cache   available
Mem:           125Gi        22Gi        77Gi       7.0Gi        25Gi        94Gi
Swap:          7.8Gi       262Mi       7.6Gi

Here is the complete log from syslog from the last log.

Oct 16 00:20:05 cpu11 kernel: [1644367.114868] [ 662535]   133 662535  2192613    56684   700416      458          -900 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114871] [ 662536]   133 662536    18641      435   126976      472             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114874] [ 662537]   133 662537  2195214  2012350 17334272      670             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114878] [ 662540]   133 662540  2192613    17212   294912      496             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114881] [ 662541]   133 662541  2194212     1338   221184      517             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114884] [ 662542]   133 662542  2193167      780   188416      508             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114887] [ 662556]   133 662556  2197872     9552   643072      764             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114890] [ 662559]   133 662559  2200440    15654   729088     3418             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114901] [ 662560]   133 662560  2239702   299801  3440640    11516             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114909] [ 665806]   133 665806  2464749   921966 19582976    88281             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114913] [ 665862]   133 665862  2419285   976456 19218432    79868             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114917] [ 665968]   133 665968  2393075   842596 19001344    87827             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114920] [ 666038]   133 666038  2387447   698377 18935808    89011             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114922] [ 666041]   133 666041  2417106  1150517 19197952    94609             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114926] [ 666043]   133 666043  2368198   516669 18759680    71657             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114929] [ 666058]   133 666058  2367152   752421 18800640    69757             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114932] [ 666131]   133 666131  2373326   865112 18849792    68493             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114935] [ 666137]   133 666137  2431131  1206774 19312640    98984             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114940] [ 666188]   133 666188  2363180   806837 18759680    56174             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114943] [ 666266]   133 666266  2369595   715551 18784256    64895             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114947] [ 666268]   133 666268  2398557   802444 19050496    83483             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114950] [ 666313]   133 666313  2387945   859805 18964480    77019             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114954] [ 666315]   133 666315  2411280   987194 19148800    66393             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114957] [ 666327]   133 666327  2389958   779322 18980864    73531             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114962] [ 666403]   133 666403  2380761   678603 18886656    61683             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114965] [ 666432]   133 666432  2366968   798506 18796544    57632             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114968] [ 666441]   133 666441  2515257  1314710 19980288    66781             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114972] [ 666445]   133 666445  2375433   903633 18866176    59106             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114975] [ 666500]   133 666500  2405779   997691 19107840    61937             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114979] [ 666508]   133 666508  2406856   925944 19111936    48066             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114982] [ 666639]   133 666639  2369685   725350 18817024    51126             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114986] [ 666668]   133 666668  2398864   830083 19046400    31860             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114989] [ 666670]   133 666670  2349296   755876 18620416    30572             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114992] [ 666691]   133 666691  2393287   892246 19005440    45076             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114995] [ 666735]   133 666735  2401319   935289 19070976    26484             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.114999] [ 666752]   133 666752  2380063   544926 18857984    24782             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115002] [ 666789]   133 666789  2418430  1093497 19210240    32148             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115005] [ 666793]   133 666793  2339296   418434 18513920    28633             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115009] [ 666816]   133 666816  2376001   639589 18812928    14548             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115011] [ 666823]   133 666823  2404492  1045778 19099648    10892             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115015] [ 666868]   133 666868  2367088   875554 18796544     1776             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115018] [ 666871]   133 666871  2379566   825016 18894848     2265             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115021] [ 666883]   133 666883  2369865   798628 18821120     1866             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115025] [ 666898]   133 666898  2422905   943067 19243008     2096             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115028] [ 666900]   133 666900  2369322   725249 18812928     3776             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115031] [ 666911]   133 666911  2359212   637610 18722816     2827             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115035] [ 666967]   133 666967  2397861   896821 19046400     2772             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115038] [ 666968]   133 666968  2475851  1257159 19673088     3334             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115041] [ 666972]   133 666972  2477059   968797 19660800     2909             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115044] [ 666973]   133 666973  2442818  1115803 19402752     3362             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115047] [ 667038]   133 667038  2365807   658911 18747392     2799             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115051] [ 667046]   133 667046  2399926  1002120 19062784     2295             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115054] [ 667055]   133 667055  2371241   821674 18829312     2804             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115058] [ 667059]   133 667059  2379741   905288 18898944     2016             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115061] [ 667063]   133 667063  2464180  1204996 19578880     1966             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115065] [ 667078]   133 667078  2372813   796936 18837504     2490             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115068] [ 667080]   133 667080  2388087  1048580 18964480     1975             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115071] [ 667095]   133 667095  2400704   752384 19054592     1737             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115074] [ 667149]     0 667149   110381    13833   286720      254             0 fwupd
Oct 16 00:20:05 cpu11 kernel: [1644367.115077] [ 667170]   133 667170  2368144   501302 18743296     2846             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115079] [ 667280]   133 667280  2551068  1348511 20279296     2051             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115083] [ 667286]   133 667286  2413127  1130892 19169280     1977             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115086] [ 667316]   133 667316  2453458  1057767 19492864     2179             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115090] [ 667334]   133 667334  2360377   420627 18489344     1739             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115095] [ 667347]   133 667347  2467566  1213654 19607552     1917             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115099] [ 667360]   133 667360  2445012   891742 19415040     2480             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115102] [ 667367]   133 667367  2446838  1065619 19439616     3358             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115105] [ 667384]   133 667384  2477083  1239232 19681280     1370             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115108] [ 667397]   133 667397  2327730   489946 18427904     1855             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115111] [ 667537]   133 667537  2370330   792902 18825216     1271             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115114] [ 667550]   133 667550  2368018   569864 18792448     1862             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115117] [ 667569]   133 667569  2456089  1028092 19513344     1488             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115120] [ 667583]   133 667583  2376472   689012 18874368     2114             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115124] [ 667584]   133 667584  2403579   894646 19091456     2440             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115126] [ 667620]   133 667620  2361015   782119 18747392     1675             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115129] [ 667672]   133 667672  2326497   451655 18362368     1691             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115133] [ 667697]   133 667697  2459488   893628 19542016     1725             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115137] [ 667719]   133 667719  2393571   489942 18739200     1631             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115140] [ 667739]   133 667739  2370564   723532 18821120      719             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115143] [ 667740]   133 667740  2344873   385926 18362368      925             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115146] [ 667743]   133 667743  2395908   945641 19030016     1567             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115149] [ 667744]   133 667744  2382390   765106 18915328      975             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115152] [ 667747]   133 667747  2421561  1040185 19234816     1296             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115155] [ 667748]   133 667748  2338907   629757 18534400      680             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115158] [ 667750]   133 667750  2355418   658372 18698240     1056             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115161] [ 667783]   133 667783  2411056  1030101 19148800     1267             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115164] [ 667797]   133 667797  2394930   989270 19017728      973             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115168] [ 667864]   133 667864  2420866   881757 19226624     1514             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115171] [ 667868]   133 667868  2416079  1076182 19189760     1134             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115175] [ 667874]   133 667874  2449475  1084779 19460096     1303             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115177] [ 667875]   133 667875  2386415   707196 18952192      832             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115181] [ 667884]   133 667884  2333831   592638 18505728      650             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115184] [ 667897]   133 667897  2403892   918430 19087360     1032             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115187] [ 667898]   133 667898  2410483   868234 19148800     1168             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115190] [ 667940]   133 667940  2377870   963160 18882560     1010             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115194] [ 667945]   133 667945  2361566   475192 18591744     1015             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115198] [ 667959]   133 667959  2367100   598089 18784256      672             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115201] [ 667993]   133 667993  2338406   536640 18513920     1046             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115204] [ 668000]   133 668000  2356366   955157 18710528      825             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115207] [ 668005]   133 668005  2478921   958788 19685376     1665             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115209] [ 668060]   133 668060  2463214  1083155 19566592      717             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115213] [ 668087]   133 668087  2415386  1080251 19181568      708             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115216] [ 668093]   133 668093  2404299   890056 19091456      645             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115219] [ 668105]   133 668105  2463629  1127000 19574784      541             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115221] [ 668108]   133 668108  2477148  1145401 19681280      875             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115224] [ 668112]   133 668112  2349226   593779 18653184     1652             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115226] [ 668128]   133 668128  2430080   977192 19296256      708             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115229] [ 668130]   133 668130  2383561   895649 18931712      912             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115233] [ 668131]   133 668131  2331780   454451 18325504      454             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115236] [ 668132]   133 668132  2364671   817158 18780160      630             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115238] [ 668135]   133 668135  2328665   726274 18481152      613             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115241] [ 668170]   133 668170  2387278   945936 18960384      557             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115244] [ 668171]   133 668171  2359917   792388 18739200      782             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115246] [ 668173]   133 668173  2352364   555048 18677760      979             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115249] [ 668174]   133 668174  2408385   957073 19128320      587             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115252] [ 668176]   133 668176  2313152   407656 18108416      687             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115255] [ 668220]   133 668220  2359000   748494 18735104      657             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115258] [ 668226]   133 668226  2311832   434443 18194432      788             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115260] [ 668227]   133 668227  2351460   588988 18665472      839             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115263] [ 668231]   133 668231  2355236   624002 18694144     1232             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115267] [ 668236]   133 668236  2353199   544063 18640896      580             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115271] [ 668248]   133 668248  2335155   704638 18542592      522             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115274] [ 668283]   133 668283  2395867   968346 19021824      526             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115277] [ 668316]   133 668316  2329575   439859 18276352      523             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115279] [ 668373]   133 668373  2451333   848397 19468288     1951             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115282] [ 668392]   133 668392  2307054   431530 18128896      696             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115285] [ 668408]   133 668408  2348848   512934 18612224      442             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115288] [ 668441]   133 668441  2402077   852670 19079168     1039             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115298] [ 668442]   133 668442  2355303   602347 18690048      723             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115301] [ 668448]   133 668448  2286300   389037 18014208      568             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115304] [ 668451]   133 668451  2391450   804620 18989056      688             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115306] [ 668456]   133 668456  2311822   558123 18354176      845             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115309] [ 668457]   133 668457  2375696   734785 18857984      505             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115312] [ 668458]   133 668458  2421432   858280 19234816      713             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115315] [ 668496]   133 668496  2391657   774375 18993152      582             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115319] [ 668499]   133 668499  2362119   761190 18759680      804             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115322] [ 668502]   133 668502  2355130   627657 18694144      561             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115324] [ 668505]   133 668505  2343225   521105 18579456      571             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115327] [ 668506]   133 668506  2363628   557165 18743296      564             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115330] [ 668507]   133 668507  2382254   717356 18915328      441             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115333] [ 668508]   133 668508  2321866   398990 18108416      667             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115339] [ 668526]   133 668526  2403978   981960 19083264      796             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115341] [ 668528]   133 668528  2379369   855111 18894848      641             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115344] [ 668531]   133 668531  2418254   829681 19206144      595             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115347] [ 668532]   133 668532  2330037   424382 18362368      643             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115349] [ 668535]   133 668535  2320234   389854 18034688      497             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115352] [ 668599]   133 668599  2457871   901370 19513344      763             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115355] [ 668611]   133 668611  2398837   935621 19050496      562             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115358] [ 668633]   133 668633  2294999   475484 18210816      471             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115361] [ 668634]   133 668634  2394145   885622 19013632      671             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115364] [ 668640]   133 668640  2342641   479414 18522112      658             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115367] [ 668641]   133 668641  2351273   528891 18649088      805             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115369] [ 668774]   133 668774  2494484   954076 19816448     2656             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115372] [ 668814]   133 668814  2443278   803108 19402752      598             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115374] [ 668815]   133 668815  2796063  1140647 22216704      658             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115378] [ 668823]   133 668823  2336605   325163 17317888      526             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115380] [ 668911]   133 668911  2425876   945413 19255296      479             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115383] [ 668933]   133 668933  2335924   396079 18313216     1122             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115385] [ 668935]   133 668935  2362945   357814 17186816      571             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115388] [ 669018]   133 669018  2360750   413156 18292736      797             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115392] [ 669043]   133 669043  2350409   381002 18055168      524             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115395] [ 669044]   133 669044  2296811   236265 16510976      960             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115398] [ 669048]   133 669048  2300770   206764 14393344      534             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115401] [ 669077]   133 669077  2327264   406585 18100224      557             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115403] [ 669094]   133 669094  2506606   762210 19910656      825             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115406] [ 669138]   133 669138  2302055   187977 12500992      576             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115409] [ 669210]   133 669210  2331634   629975 18518016      517             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115411] [ 669220]   133 669220  2375192   846694 18862080      625             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115414] [ 669252]   133 669252  2304722   404707 18145280      770             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115417] [ 669306]   133 669306  2483176   992873 19709952      482             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115420] [ 669331]   133 669331  2482967   940090 19726336      619             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115424] [ 669418]   133 669418  2381810   796735 18915328      717             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115428] [ 669459]   133 669459  2354454   497049 18616320     3921             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115432] [ 669462]   133 669462  2326153   385541 18182144      607             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115435] [ 669470]   133 669470  2304311   324584 17547264      715             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115438] [ 669476]   133 669476  2394955   789008 19021824      608             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115441] [ 669478]   133 669478  2380706   946924 18907136      711             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115444] [ 669483]   133 669483  2347195   534151 18571264      519             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115446] [ 669484]   133 669484  2348300   346606 17346560     1961             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115449] [ 669485]   133 669485  2334832   326072 17059840     1094             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115452] [ 669498]   133 669498  2400951   728343 19070976     4549             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115456] [ 669499]   133 669499  2350499   618190 18657280     1868             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115459] [ 669511]   133 669511  2346917   395089 18251776     3906             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115461] [ 669514]   133 669514  2303504   377795 18022400      920             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115464] [ 669515]   133 669515  2287327   290918 17166336     1788             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115467] [ 669573]   133 669573  2366551   376293 17920000     2567             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115470] [ 669574]   133 669574  2293563   162219 10108928      762             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115473] [ 669575]   133 669575  2356168   396320 18259968     6533             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115476] [ 669581]   133 669581  2313884   259552 15249408     4212             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115480] [ 669603]   133 669603  2400208   618369 19062784     2343             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115483] [ 669604]   133 669604  2471911   928412 19615744     3904             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115486] [ 669611]   133 669611  2321938   313233 16855040      814             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115489] [ 669613]   133 669613  2300645   282589 16547840     1006             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115492] [ 669615]   133 669615  2268683   128282  8761344     2803             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115495] [ 669627]   133 669627  2485833   784031 19746816     1003             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115498] [ 669628]   133 669628  2335783   579066 18530304     1101             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115501] [ 669629]   133 669629  2314182   417982 18366464     3351             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115504] [ 669636]   133 669636  2385193   527259 18849792     1592             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115508] [ 669637]   133 669637  2261309   223998 15880192      885             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115511] [ 669694]   133 669694  2436912   869136 19353600     4554             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115515] [ 669718]   133 669718  2298663   376421 18034688      827             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115518] [ 669722]   133 669722  2431256   805070 19304448     1672             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115522] [ 669728]   133 669728  2486767   906554 19759104     1606             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115525] [ 669729]   133 669729  2460698   826427 19550208     1001             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115528] [ 669739]   133 669739  2335787   402344 18288640     2233             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115531] [ 669748]   133 669748  2244227    78375  3833856     1462             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115534] [ 669750]   133 669750  2320027   264183 14766080      853             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115537] [ 669752]   133 669752  2241738    73108  3022848      422             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115540] [ 669759]   133 669759  2451182  1015872 19468288      581             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115543] [ 669767]   133 669767  2274792   224547 15638528      488             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115554] [ 672091]     0 672091    19064     1376   126976        0             0 cupsd
Oct 16 00:20:05 cpu11 kernel: [1644367.115557] [ 672096]     0 672096    43015     1073   106496        0             0 cups-browsed
Oct 16 00:20:05 cpu11 kernel: [1644367.115568] [ 673101]   112 673101    10402      641    69632        0             0 pickup
Oct 16 00:20:05 cpu11 kernel: [1644367.115576] [ 673160]   133 673160  2458726   741431 16121856      426             0 postgres
Oct 16 00:20:05 cpu11 kernel: [1644367.115586] [ 673209]     0 673209    42146     1254    86016      252             0 (lcluster)
Oct 16 00:20:05 cpu11 kernel: [1644367.115588] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/system.slice/system-postgresql.slice/postgresql@16-main.service,task=postgres,pid=662537,uid=133
Oct 16 00:20:05 cpu11 kernel: [1644367.115610] Out of memory: Killed process 662537 (postgres) total-vm:8780856kB, anon-rss:7860kB, file-rss:444kB, shmem-rss:8041096kB, UID:133 pgtables:16928kB oom_score_adj:0
Oct 16 00:20:08 cpu11 systemd[1]: postgresql@16-main.service: Failed with result 'oom-kill'.
Oct 16 00:20:08 cpu11 systemd[1]: postgresql@16-main.service: Consumed 1w 14h 21min 45.086s CPU time.

Change History (11)

comment:1 by Lars Aksel Opsahl, 2 days ago

I have now set vm.overcommit_memory=2 and run test with only 100 workes and it still seems like free memory is steadly falling.

In this run we create around 1300 topologies with more 65. million edges and more than 3.5 million faces generted from 7 different simple feature input tables. Each of this 1300 topologies are cleaned up by removing faces below 10 m2 . We then reduce the number edges to around 5 millon edges and faces down to below 1 million faces, before it’s moved into single a topology.

When started free memory was around 100 GB. We are now done with 1250 of the temp toplogies and free memory is now

Wed Oct 16 09:35:28 CEST 2024
               total        used        free      shared  buff/cache   available
Mem:           125Gi        57Gi        30Gi       8.3Gi        37Gi        58Gi
Swap:          7.8Gi       259Mi       7.6Gi

comment:2 by Lars Aksel Opsahl, 2 days ago

We started at

2024-10-16 06:48:55.150 CEST >LOG:  starting PostgreSQL 16.4 (Ubuntu 16.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit

It failed around 4 hours later, setting overcommit to 2 and reduce threads did not help.

NOTICE:  Start create with transfer_method_from_temp_2_final_topology 3 for has_edges_temp_table_name tmp_dyrkbarjord_10_c3_037.edge_data_tmp_1224 at 2024-10-16 09:40:19.671996+02 used_time: 0.448377 
NOTICE:  Done create with transfer_method_from_temp_2_final_topology 3 for has_edges_temp_table_name tmp_dyrkbarjord_10_c3_037.edge_data_tmp_1224 at 2024-10-16 09:40:20.179359+02 used_time: 0.506995 
NOTICE:  Done adding lines 1213 (edges 18327)  of 1319, and done ANALYZE for tmp_dyrkbarjord_10_c3_037_1174 .
NOTICE:  start to map face 3 for has_edges_temp_table_name tmp_dyrkbarjord_10_c3_037.edge_data_tmp_1224 at 2024-10-16 09:40:20.807004+02 used_time: 0.506995 
TopMemoryContext: 282504 total in 8 blocks; 10184 free (21 chunks); 272320 used
  TopTransactionContext: 8192 total in 1 blocks; 6696 free (1 chunks); 1496 used
  Attopt cache: 16384 total in 2 blocks; 6672 free (3 chunks); 9712 used
  TableSpace cache: 8192 total in 1 blocks; 2128 free (0 chunks); 6064 used
  Type information cache: 24368 total in 2 blocks; 2648 free (0 chunks); 21720 used
  TOAST to main relid map: 262144 total in 6 blocks; 38864 free (23 chunks); 223280 used
  Autovacuum worker: 131072 total in 5 blocks; 56328 free (17 chunks); 74744 used
    Autovacuum Portal: 8192 total in 1 blocks; 7744 free (0 chunks); 448 used
      Vacuum: 8192 total in 1 blocks; 7784 free (0 chunks); 408 used
  Operator class cache: 8192 total in 1 blocks; 592 free (0 chunks); 7600 used
  smgr relation table: 32768 total in 3 blocks; 16880 free (8 chunks); 15888 used
  PgStat Pending: 24576 total in 6 blocks; 4048 free (13 chunks); 20528 used
  TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
  Portal hash: 8192 total in 1 blocks; 592 free (0 chunks); 7600 used
  TopPortalContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
  Relcache by OID: 16384 total in 2 blocks; 3584 free (2 chunks); 12800 used
  CacheMemoryContext: 1048576 total in 8 blocks; 585048 free (94 chunks); 463528 used
    index info: 2048 total in 2 blocks; 616 free (1 chunks); 1432 used: pg_attrdef_adrelid_adnum_index
    index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_namespace_oid_index
    index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_index_indrelid_index
    index info: 2048 total in 2 blocks; 544 free (1 chunks); 1504 used: pg_toast_2619_index
    index info: 3072 total in 2 blocks; 912 free (1 chunks); 2160 used: pg_statistic_relid_att_inh_index
    index info: 3072 total in 2 blocks; 1088 free (1 chunks); 1984 used: pg_amop_opr_fam_index
    index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_tablespace_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_statistic_ext_relid_index
    index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_operator_oid_index
    index info: 3072 total in 2 blocks; 808 free (1 chunks); 2264 used: pg_amop_fam_strat_index
    index info: 2048 total in 2 blocks; 544 free (1 chunks); 1504 used: pg_cast_source_target_index
    index info: 3072 total in 2 blocks; 1088 free (1 chunks); 1984 used: pg_opclass_am_name_nsp_index
    index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_type_oid_index
    index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_index_indexrelid_index
    index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_opclass_oid_index
    index info: 2048 total in 2 blocks; 544 free (2 chunks); 1504 used: pg_trigger_tgrelid_tgname_index
    index info: 2048 total in 2 blocks; 616 free (3 chunks); 1432 used: pg_rewrite_rel_rulename_index
    index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_class_oid_index
    index info: 2048 total in 2 blocks; 544 free (2 chunks); 1504 used: pg_attribute_relid_attnum_index
    index info: 3072 total in 2 blocks; 808 free (3 chunks); 2264 used: pg_amproc_fam_proc_index
    index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_authid_oid_index
    index info: 3072 total in 2 blocks; 1200 free (3 chunks); 1872 used: pg_auth_members_member_role_index
    index info: 3072 total in 2 blocks; 1232 free (3 chunks); 1840 used: pg_shseclabel_object_index
    index info: 1024 total in 1 blocks; 72 free (0 chunks); 952 used: pg_database_datname_index
    index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_database_oid_index
    index info: 1024 total in 1 blocks; 72 free (0 chunks); 952 used: pg_authid_rolname_index
  PgStat Shared Ref Hash: 7216 total in 2 blocks; 688 free (0 chunks); 6528 used
  PgStat Shared Ref: 16384 total in 5 blocks; 4296 free (5 chunks); 12088 used
  WAL record construction: 50200 total in 2 blocks; 6376 free (0 chunks); 43824 used
  PrivateRefCount: 8192 total in 1 blocks; 2648 free (0 chunks); 5544 used
  MdSmgr: 8192 total in 1 blocks; 7200 free (17 chunks); 992 used
  LOCALLOCK hash: 8192 total in 1 blocks; 592 free (0 chunks); 7600 used
  GUCMemoryContext: 24576 total in 2 blocks; 7952 free (9 chunks); 16624 used
    GUC hash table: 32768 total in 3 blocks; 12704 free (5 chunks); 20064 used
  Timezones: 104112 total in 2 blocks; 2648 free (0 chunks); 101464 used
  Postmaster: 16384 total in 2 blocks; 13712 free (15 chunks); 2672 used
    ident parser context: 1024 total in 1 blocks; 760 free (0 chunks); 264 used
    hba parser context: 50176 total in 9 blocks; 7824 free (12 chunks); 42352 used
  ErrorContext: 8192 total in 1 blocks; 7928 free (4 chunks); 264 used
Grand total: 2321840 bytes in 131 blocks; 885360 free (272 chunks); 1436480 used
< , , , 2024-10-16 09:40:22.693 CEST >ERROR:  out of memory
< , , , 2024-10-16 09:40:22.693 CEST >DETAIL:  Failed on request of size 1073741822 in memory context "TopTransactionContext".
< , , , 2024-10-16 09:40:22.693 CEST >CONTEXT:  automatic vacuum of table "rog_02.pg_toast.pg_toast_2619"

comment:3 by strk, 2 days ago

As a temporary workaround reconnecting should alleviate the problem

in reply to:  3 comment:4 by Lars Aksel Opsahl, 2 days ago

Replying to strk:

As a temporary workaround reconnecting should alleviate the problem

This 1300 jobs are run by 100 workers and every worker connect to the database does it's job and disconnects and then connects to the database again and starts the works with a new topology.

Seems like I have to stop and start postgis to free the memory.

comment:5 by strk, 2 days ago

Are you sure disconnecting all backends isn't enough ?

comment:6 by Lars Aksel Opsahl, 2 days ago

Yes I use https://gitlab.com/nibioopensource/postgres_execute_parallel and this kind of code I have using for many years without any memory problems. This memory problems seems to quite new now.

comment:7 by strk, 2 days ago

If I understand the message correctly PostgreSQL autovacuum process is unable to allocate new memory for the TopTransactionContext memory pool. This isn't necessarely due to a leak, it could also "just" be higher memory consumption. To find a leak you should run a single process and see if the memory of that process keeps raising.

in reply to:  7 comment:9 by Lars Aksel Opsahl, 2 days ago

Replying to strk:

If I understand the message correctly PostgreSQL autovacuum process is unable to allocate new memory for the TopTransactionContext memory pool. This isn't necessarely due to a leak, it could also "just" be higher memory consumption. To find a leak you should run a single process and see if the memory of that process keeps raising.

This has not been an problem before.

sysctl -w vm.overcommit_memory=0 vm.overcommit_memory = 0

I now run a test with 10 threads in parallel .

And started out with

Wed Oct 16 12:08:54 CEST 2024
               total        used        free      shared  buff/cache   available
Mem:           125Gi       4.1Gi       100Gi       3.3Gi        20Gi       117Gi
Swap:          7.8Gi       258Mi       7.6Gi

comment:10 by Lars Aksel Opsahl, 27 hours ago

I tested with ST_RemEdgeNewFace instead off ST_RemEdgeModFace and it was the same behavior. One thing notiched was that did see a lot off LWLock for pg_stat_statements, so I removed pg_stat_statements, but the memory problem continued.

I went back and checked some old scripts and am I not sure that I haves tested wth 200 paralell threads on this server before.

With 200 threds in parallel I run the 1300 jobs with memory usage around 100 GB and I do finish most the jobs yes, but not all before running out off memory.

When running with 60 threads in parallel the memory usage is around 60 GB for the same amount off jobs, but all jobs are done. When we check the logs we find this :

Some jobs takes a long time (almost 3 hours) and needs a lot off memory to remove edges. Done clean small polygons for face_edges_to_remove_list size 7196 for face_table_name tmp_dyrkbarjord_10_c3_037_689.face at 2024-10-17 00:00:39.899873+02 used_time: 8681.734

While the job below remove twice as many faces in about 100 seconds. Done clean small polygons for face_edges_to_remove_list size 16502 for face_table_name tmp_dyrkbarjord_10_c3_037_725.face at 2024-10-16 21:38:11.523127+02 used_time: 105.92985

But we are able move this 1309 topologies which involves around 5 million faces and more than 100 millions edges into a single topology with 900.000 thousand faces (There are still some errors in the result that I have to check out).

So I think you are right there are no clear memory leaks but some times topology needs a lot memory and time.

So I close this one.

comment:11 by Lars Aksel Opsahl, 27 hours ago

Resolution: invalid
Status: newclosed
Note: See TracTickets for help on using tickets.