Opened 5 weeks ago
Closed 3 days ago
#5798 closed defect (fixed)
Out of memory: Killed process
Reported by: | Lars Aksel Opsahl | Owned by: | strk |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 3.5.1 |
Component: | topology | Version: | 3.4.x |
Keywords: | Cc: |
Description
I got a new memory error when using Postgis Toplogy now. Have never seen this error before. I am running 100 threads in parallel and some hours I get this error.
Oct 16 00:20:05 cpu11 kernel: [1644367.115588] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/system.slice/system-postgresql.slice/postgresql@16-main.service,task=postgres,pid=662537,uid=133 Oct 16 00:20:05 cpu11 kernel: [1644367.115610] Out of memory: Killed process 662537 (postgres) total-vm:8780856kB, anon-rss:7860kB, file-rss:444kB, shmem-rss:8041096kB, UID:133 pgtables:16928kB oom_score_adj:0 Oct 16 00:20:08 cpu11 systemd[1]: postgresql@16-main.service: Failed with result 'oom-kill'. Oct 16 00:20:08 cpu11 systemd[1]: postgresql@16-main.service: Consumed 1w 14h 21min 45.086s CPU time.
This problem is present at both this two configs
POSTGIS="3.6.0dev 3.5.0-16-g4fcfd0fd6" [EXTENSION] PGSQL="160" GEOS="3.13.0beta2-CAPI-1.19.0" PROJ="9.3.0 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj DATABASE_PATH=/usr/local/share/proj/proj.db" (compiled against PROJ 9.13.0) LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" TOPOLOGY POSTGIS="3.6.0dev 3.5.0-22-g224839693" [EXTENSION] PGSQL="160" GEOS="3.13.0beta2-CAPI-1.19.0" PROJ="9.3.0 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj DATABASE_PATH=/usr/local/share/proj/proj.db" (compiled against PROJ 9.13.0) LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" TOPOLOGY
I am now running a new test and I have also
sysctl -w vm.overcommit_memory=2
And I do see that memory is gradually eaten up
Wed Oct 16 05:44:46 CEST 2024 total used free shared buff/cache available Mem: 125Gi 20Gi 93Gi 3.3Gi 12Gi 100Gi Swap: 7.8Gi 263Mi 7.6Gi Wed Oct 16 05:49:54 CEST 2024 total used free shared buff/cache available Mem: 125Gi 22Gi 88Gi 3.7Gi 14Gi 98Gi Swap: 7.8Gi 263Mi 7.6Gi ubuntu, root@cpu11:~# Wed Oct 16 06:09:01 CEST 2024 total used free shared buff/cache available Mem: 125Gi 22Gi 79Gi 5.7Gi 23Gi 96Gi Swap: 7.8Gi 262Mi 7.6Gi Wed Oct 16 06:25:38 CEST 2024 total used free shared buff/cache available Mem: 125Gi 22Gi 77Gi 7.0Gi 25Gi 94Gi Swap: 7.8Gi 262Mi 7.6Gi
Here is the complete log from syslog from the last log.
Oct 16 00:20:05 cpu11 kernel: [1644367.114868] [ 662535] 133 662535 2192613 56684 700416 458 -900 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114871] [ 662536] 133 662536 18641 435 126976 472 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114874] [ 662537] 133 662537 2195214 2012350 17334272 670 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114878] [ 662540] 133 662540 2192613 17212 294912 496 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114881] [ 662541] 133 662541 2194212 1338 221184 517 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114884] [ 662542] 133 662542 2193167 780 188416 508 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114887] [ 662556] 133 662556 2197872 9552 643072 764 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114890] [ 662559] 133 662559 2200440 15654 729088 3418 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114901] [ 662560] 133 662560 2239702 299801 3440640 11516 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114909] [ 665806] 133 665806 2464749 921966 19582976 88281 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114913] [ 665862] 133 665862 2419285 976456 19218432 79868 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114917] [ 665968] 133 665968 2393075 842596 19001344 87827 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114920] [ 666038] 133 666038 2387447 698377 18935808 89011 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114922] [ 666041] 133 666041 2417106 1150517 19197952 94609 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114926] [ 666043] 133 666043 2368198 516669 18759680 71657 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114929] [ 666058] 133 666058 2367152 752421 18800640 69757 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114932] [ 666131] 133 666131 2373326 865112 18849792 68493 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114935] [ 666137] 133 666137 2431131 1206774 19312640 98984 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114940] [ 666188] 133 666188 2363180 806837 18759680 56174 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114943] [ 666266] 133 666266 2369595 715551 18784256 64895 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114947] [ 666268] 133 666268 2398557 802444 19050496 83483 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114950] [ 666313] 133 666313 2387945 859805 18964480 77019 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114954] [ 666315] 133 666315 2411280 987194 19148800 66393 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114957] [ 666327] 133 666327 2389958 779322 18980864 73531 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114962] [ 666403] 133 666403 2380761 678603 18886656 61683 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114965] [ 666432] 133 666432 2366968 798506 18796544 57632 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114968] [ 666441] 133 666441 2515257 1314710 19980288 66781 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114972] [ 666445] 133 666445 2375433 903633 18866176 59106 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114975] [ 666500] 133 666500 2405779 997691 19107840 61937 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114979] [ 666508] 133 666508 2406856 925944 19111936 48066 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114982] [ 666639] 133 666639 2369685 725350 18817024 51126 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114986] [ 666668] 133 666668 2398864 830083 19046400 31860 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114989] [ 666670] 133 666670 2349296 755876 18620416 30572 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114992] [ 666691] 133 666691 2393287 892246 19005440 45076 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114995] [ 666735] 133 666735 2401319 935289 19070976 26484 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.114999] [ 666752] 133 666752 2380063 544926 18857984 24782 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115002] [ 666789] 133 666789 2418430 1093497 19210240 32148 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115005] [ 666793] 133 666793 2339296 418434 18513920 28633 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115009] [ 666816] 133 666816 2376001 639589 18812928 14548 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115011] [ 666823] 133 666823 2404492 1045778 19099648 10892 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115015] [ 666868] 133 666868 2367088 875554 18796544 1776 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115018] [ 666871] 133 666871 2379566 825016 18894848 2265 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115021] [ 666883] 133 666883 2369865 798628 18821120 1866 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115025] [ 666898] 133 666898 2422905 943067 19243008 2096 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115028] [ 666900] 133 666900 2369322 725249 18812928 3776 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115031] [ 666911] 133 666911 2359212 637610 18722816 2827 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115035] [ 666967] 133 666967 2397861 896821 19046400 2772 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115038] [ 666968] 133 666968 2475851 1257159 19673088 3334 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115041] [ 666972] 133 666972 2477059 968797 19660800 2909 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115044] [ 666973] 133 666973 2442818 1115803 19402752 3362 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115047] [ 667038] 133 667038 2365807 658911 18747392 2799 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115051] [ 667046] 133 667046 2399926 1002120 19062784 2295 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115054] [ 667055] 133 667055 2371241 821674 18829312 2804 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115058] [ 667059] 133 667059 2379741 905288 18898944 2016 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115061] [ 667063] 133 667063 2464180 1204996 19578880 1966 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115065] [ 667078] 133 667078 2372813 796936 18837504 2490 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115068] [ 667080] 133 667080 2388087 1048580 18964480 1975 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115071] [ 667095] 133 667095 2400704 752384 19054592 1737 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115074] [ 667149] 0 667149 110381 13833 286720 254 0 fwupd Oct 16 00:20:05 cpu11 kernel: [1644367.115077] [ 667170] 133 667170 2368144 501302 18743296 2846 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115079] [ 667280] 133 667280 2551068 1348511 20279296 2051 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115083] [ 667286] 133 667286 2413127 1130892 19169280 1977 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115086] [ 667316] 133 667316 2453458 1057767 19492864 2179 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115090] [ 667334] 133 667334 2360377 420627 18489344 1739 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115095] [ 667347] 133 667347 2467566 1213654 19607552 1917 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115099] [ 667360] 133 667360 2445012 891742 19415040 2480 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115102] [ 667367] 133 667367 2446838 1065619 19439616 3358 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115105] [ 667384] 133 667384 2477083 1239232 19681280 1370 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115108] [ 667397] 133 667397 2327730 489946 18427904 1855 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115111] [ 667537] 133 667537 2370330 792902 18825216 1271 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115114] [ 667550] 133 667550 2368018 569864 18792448 1862 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115117] [ 667569] 133 667569 2456089 1028092 19513344 1488 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115120] [ 667583] 133 667583 2376472 689012 18874368 2114 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115124] [ 667584] 133 667584 2403579 894646 19091456 2440 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115126] [ 667620] 133 667620 2361015 782119 18747392 1675 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115129] [ 667672] 133 667672 2326497 451655 18362368 1691 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115133] [ 667697] 133 667697 2459488 893628 19542016 1725 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115137] [ 667719] 133 667719 2393571 489942 18739200 1631 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115140] [ 667739] 133 667739 2370564 723532 18821120 719 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115143] [ 667740] 133 667740 2344873 385926 18362368 925 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115146] [ 667743] 133 667743 2395908 945641 19030016 1567 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115149] [ 667744] 133 667744 2382390 765106 18915328 975 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115152] [ 667747] 133 667747 2421561 1040185 19234816 1296 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115155] [ 667748] 133 667748 2338907 629757 18534400 680 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115158] [ 667750] 133 667750 2355418 658372 18698240 1056 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115161] [ 667783] 133 667783 2411056 1030101 19148800 1267 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115164] [ 667797] 133 667797 2394930 989270 19017728 973 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115168] [ 667864] 133 667864 2420866 881757 19226624 1514 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115171] [ 667868] 133 667868 2416079 1076182 19189760 1134 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115175] [ 667874] 133 667874 2449475 1084779 19460096 1303 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115177] [ 667875] 133 667875 2386415 707196 18952192 832 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115181] [ 667884] 133 667884 2333831 592638 18505728 650 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115184] [ 667897] 133 667897 2403892 918430 19087360 1032 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115187] [ 667898] 133 667898 2410483 868234 19148800 1168 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115190] [ 667940] 133 667940 2377870 963160 18882560 1010 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115194] [ 667945] 133 667945 2361566 475192 18591744 1015 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115198] [ 667959] 133 667959 2367100 598089 18784256 672 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115201] [ 667993] 133 667993 2338406 536640 18513920 1046 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115204] [ 668000] 133 668000 2356366 955157 18710528 825 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115207] [ 668005] 133 668005 2478921 958788 19685376 1665 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115209] [ 668060] 133 668060 2463214 1083155 19566592 717 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115213] [ 668087] 133 668087 2415386 1080251 19181568 708 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115216] [ 668093] 133 668093 2404299 890056 19091456 645 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115219] [ 668105] 133 668105 2463629 1127000 19574784 541 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115221] [ 668108] 133 668108 2477148 1145401 19681280 875 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115224] [ 668112] 133 668112 2349226 593779 18653184 1652 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115226] [ 668128] 133 668128 2430080 977192 19296256 708 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115229] [ 668130] 133 668130 2383561 895649 18931712 912 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115233] [ 668131] 133 668131 2331780 454451 18325504 454 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115236] [ 668132] 133 668132 2364671 817158 18780160 630 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115238] [ 668135] 133 668135 2328665 726274 18481152 613 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115241] [ 668170] 133 668170 2387278 945936 18960384 557 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115244] [ 668171] 133 668171 2359917 792388 18739200 782 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115246] [ 668173] 133 668173 2352364 555048 18677760 979 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115249] [ 668174] 133 668174 2408385 957073 19128320 587 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115252] [ 668176] 133 668176 2313152 407656 18108416 687 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115255] [ 668220] 133 668220 2359000 748494 18735104 657 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115258] [ 668226] 133 668226 2311832 434443 18194432 788 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115260] [ 668227] 133 668227 2351460 588988 18665472 839 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115263] [ 668231] 133 668231 2355236 624002 18694144 1232 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115267] [ 668236] 133 668236 2353199 544063 18640896 580 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115271] [ 668248] 133 668248 2335155 704638 18542592 522 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115274] [ 668283] 133 668283 2395867 968346 19021824 526 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115277] [ 668316] 133 668316 2329575 439859 18276352 523 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115279] [ 668373] 133 668373 2451333 848397 19468288 1951 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115282] [ 668392] 133 668392 2307054 431530 18128896 696 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115285] [ 668408] 133 668408 2348848 512934 18612224 442 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115288] [ 668441] 133 668441 2402077 852670 19079168 1039 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115298] [ 668442] 133 668442 2355303 602347 18690048 723 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115301] [ 668448] 133 668448 2286300 389037 18014208 568 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115304] [ 668451] 133 668451 2391450 804620 18989056 688 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115306] [ 668456] 133 668456 2311822 558123 18354176 845 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115309] [ 668457] 133 668457 2375696 734785 18857984 505 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115312] [ 668458] 133 668458 2421432 858280 19234816 713 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115315] [ 668496] 133 668496 2391657 774375 18993152 582 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115319] [ 668499] 133 668499 2362119 761190 18759680 804 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115322] [ 668502] 133 668502 2355130 627657 18694144 561 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115324] [ 668505] 133 668505 2343225 521105 18579456 571 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115327] [ 668506] 133 668506 2363628 557165 18743296 564 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115330] [ 668507] 133 668507 2382254 717356 18915328 441 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115333] [ 668508] 133 668508 2321866 398990 18108416 667 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115339] [ 668526] 133 668526 2403978 981960 19083264 796 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115341] [ 668528] 133 668528 2379369 855111 18894848 641 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115344] [ 668531] 133 668531 2418254 829681 19206144 595 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115347] [ 668532] 133 668532 2330037 424382 18362368 643 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115349] [ 668535] 133 668535 2320234 389854 18034688 497 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115352] [ 668599] 133 668599 2457871 901370 19513344 763 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115355] [ 668611] 133 668611 2398837 935621 19050496 562 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115358] [ 668633] 133 668633 2294999 475484 18210816 471 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115361] [ 668634] 133 668634 2394145 885622 19013632 671 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115364] [ 668640] 133 668640 2342641 479414 18522112 658 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115367] [ 668641] 133 668641 2351273 528891 18649088 805 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115369] [ 668774] 133 668774 2494484 954076 19816448 2656 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115372] [ 668814] 133 668814 2443278 803108 19402752 598 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115374] [ 668815] 133 668815 2796063 1140647 22216704 658 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115378] [ 668823] 133 668823 2336605 325163 17317888 526 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115380] [ 668911] 133 668911 2425876 945413 19255296 479 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115383] [ 668933] 133 668933 2335924 396079 18313216 1122 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115385] [ 668935] 133 668935 2362945 357814 17186816 571 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115388] [ 669018] 133 669018 2360750 413156 18292736 797 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115392] [ 669043] 133 669043 2350409 381002 18055168 524 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115395] [ 669044] 133 669044 2296811 236265 16510976 960 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115398] [ 669048] 133 669048 2300770 206764 14393344 534 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115401] [ 669077] 133 669077 2327264 406585 18100224 557 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115403] [ 669094] 133 669094 2506606 762210 19910656 825 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115406] [ 669138] 133 669138 2302055 187977 12500992 576 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115409] [ 669210] 133 669210 2331634 629975 18518016 517 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115411] [ 669220] 133 669220 2375192 846694 18862080 625 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115414] [ 669252] 133 669252 2304722 404707 18145280 770 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115417] [ 669306] 133 669306 2483176 992873 19709952 482 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115420] [ 669331] 133 669331 2482967 940090 19726336 619 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115424] [ 669418] 133 669418 2381810 796735 18915328 717 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115428] [ 669459] 133 669459 2354454 497049 18616320 3921 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115432] [ 669462] 133 669462 2326153 385541 18182144 607 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115435] [ 669470] 133 669470 2304311 324584 17547264 715 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115438] [ 669476] 133 669476 2394955 789008 19021824 608 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115441] [ 669478] 133 669478 2380706 946924 18907136 711 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115444] [ 669483] 133 669483 2347195 534151 18571264 519 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115446] [ 669484] 133 669484 2348300 346606 17346560 1961 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115449] [ 669485] 133 669485 2334832 326072 17059840 1094 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115452] [ 669498] 133 669498 2400951 728343 19070976 4549 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115456] [ 669499] 133 669499 2350499 618190 18657280 1868 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115459] [ 669511] 133 669511 2346917 395089 18251776 3906 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115461] [ 669514] 133 669514 2303504 377795 18022400 920 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115464] [ 669515] 133 669515 2287327 290918 17166336 1788 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115467] [ 669573] 133 669573 2366551 376293 17920000 2567 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115470] [ 669574] 133 669574 2293563 162219 10108928 762 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115473] [ 669575] 133 669575 2356168 396320 18259968 6533 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115476] [ 669581] 133 669581 2313884 259552 15249408 4212 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115480] [ 669603] 133 669603 2400208 618369 19062784 2343 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115483] [ 669604] 133 669604 2471911 928412 19615744 3904 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115486] [ 669611] 133 669611 2321938 313233 16855040 814 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115489] [ 669613] 133 669613 2300645 282589 16547840 1006 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115492] [ 669615] 133 669615 2268683 128282 8761344 2803 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115495] [ 669627] 133 669627 2485833 784031 19746816 1003 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115498] [ 669628] 133 669628 2335783 579066 18530304 1101 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115501] [ 669629] 133 669629 2314182 417982 18366464 3351 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115504] [ 669636] 133 669636 2385193 527259 18849792 1592 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115508] [ 669637] 133 669637 2261309 223998 15880192 885 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115511] [ 669694] 133 669694 2436912 869136 19353600 4554 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115515] [ 669718] 133 669718 2298663 376421 18034688 827 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115518] [ 669722] 133 669722 2431256 805070 19304448 1672 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115522] [ 669728] 133 669728 2486767 906554 19759104 1606 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115525] [ 669729] 133 669729 2460698 826427 19550208 1001 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115528] [ 669739] 133 669739 2335787 402344 18288640 2233 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115531] [ 669748] 133 669748 2244227 78375 3833856 1462 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115534] [ 669750] 133 669750 2320027 264183 14766080 853 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115537] [ 669752] 133 669752 2241738 73108 3022848 422 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115540] [ 669759] 133 669759 2451182 1015872 19468288 581 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115543] [ 669767] 133 669767 2274792 224547 15638528 488 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115554] [ 672091] 0 672091 19064 1376 126976 0 0 cupsd Oct 16 00:20:05 cpu11 kernel: [1644367.115557] [ 672096] 0 672096 43015 1073 106496 0 0 cups-browsed Oct 16 00:20:05 cpu11 kernel: [1644367.115568] [ 673101] 112 673101 10402 641 69632 0 0 pickup Oct 16 00:20:05 cpu11 kernel: [1644367.115576] [ 673160] 133 673160 2458726 741431 16121856 426 0 postgres Oct 16 00:20:05 cpu11 kernel: [1644367.115586] [ 673209] 0 673209 42146 1254 86016 252 0 (lcluster) Oct 16 00:20:05 cpu11 kernel: [1644367.115588] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/system.slice/system-postgresql.slice/postgresql@16-main.service,task=postgres,pid=662537,uid=133 Oct 16 00:20:05 cpu11 kernel: [1644367.115610] Out of memory: Killed process 662537 (postgres) total-vm:8780856kB, anon-rss:7860kB, file-rss:444kB, shmem-rss:8041096kB, UID:133 pgtables:16928kB oom_score_adj:0 Oct 16 00:20:08 cpu11 systemd[1]: postgresql@16-main.service: Failed with result 'oom-kill'. Oct 16 00:20:08 cpu11 systemd[1]: postgresql@16-main.service: Consumed 1w 14h 21min 45.086s CPU time.
Change History (14)
comment:1 by , 5 weeks ago
comment:2 by , 5 weeks ago
We started at
2024-10-16 06:48:55.150 CEST >LOG: starting PostgreSQL 16.4 (Ubuntu 16.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
It failed around 4 hours later, setting overcommit to 2 and reduce threads did not help.
NOTICE: Start create with transfer_method_from_temp_2_final_topology 3 for has_edges_temp_table_name tmp_dyrkbarjord_10_c3_037.edge_data_tmp_1224 at 2024-10-16 09:40:19.671996+02 used_time: 0.448377 NOTICE: Done create with transfer_method_from_temp_2_final_topology 3 for has_edges_temp_table_name tmp_dyrkbarjord_10_c3_037.edge_data_tmp_1224 at 2024-10-16 09:40:20.179359+02 used_time: 0.506995 NOTICE: Done adding lines 1213 (edges 18327) of 1319, and done ANALYZE for tmp_dyrkbarjord_10_c3_037_1174 . NOTICE: start to map face 3 for has_edges_temp_table_name tmp_dyrkbarjord_10_c3_037.edge_data_tmp_1224 at 2024-10-16 09:40:20.807004+02 used_time: 0.506995 TopMemoryContext: 282504 total in 8 blocks; 10184 free (21 chunks); 272320 used TopTransactionContext: 8192 total in 1 blocks; 6696 free (1 chunks); 1496 used Attopt cache: 16384 total in 2 blocks; 6672 free (3 chunks); 9712 used TableSpace cache: 8192 total in 1 blocks; 2128 free (0 chunks); 6064 used Type information cache: 24368 total in 2 blocks; 2648 free (0 chunks); 21720 used TOAST to main relid map: 262144 total in 6 blocks; 38864 free (23 chunks); 223280 used Autovacuum worker: 131072 total in 5 blocks; 56328 free (17 chunks); 74744 used Autovacuum Portal: 8192 total in 1 blocks; 7744 free (0 chunks); 448 used Vacuum: 8192 total in 1 blocks; 7784 free (0 chunks); 408 used Operator class cache: 8192 total in 1 blocks; 592 free (0 chunks); 7600 used smgr relation table: 32768 total in 3 blocks; 16880 free (8 chunks); 15888 used PgStat Pending: 24576 total in 6 blocks; 4048 free (13 chunks); 20528 used TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used Portal hash: 8192 total in 1 blocks; 592 free (0 chunks); 7600 used TopPortalContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used Relcache by OID: 16384 total in 2 blocks; 3584 free (2 chunks); 12800 used CacheMemoryContext: 1048576 total in 8 blocks; 585048 free (94 chunks); 463528 used index info: 2048 total in 2 blocks; 616 free (1 chunks); 1432 used: pg_attrdef_adrelid_adnum_index index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_namespace_oid_index index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_index_indrelid_index index info: 2048 total in 2 blocks; 544 free (1 chunks); 1504 used: pg_toast_2619_index index info: 3072 total in 2 blocks; 912 free (1 chunks); 2160 used: pg_statistic_relid_att_inh_index index info: 3072 total in 2 blocks; 1088 free (1 chunks); 1984 used: pg_amop_opr_fam_index index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_tablespace_oid_index index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_statistic_ext_relid_index index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_operator_oid_index index info: 3072 total in 2 blocks; 808 free (1 chunks); 2264 used: pg_amop_fam_strat_index index info: 2048 total in 2 blocks; 544 free (1 chunks); 1504 used: pg_cast_source_target_index index info: 3072 total in 2 blocks; 1088 free (1 chunks); 1984 used: pg_opclass_am_name_nsp_index index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_type_oid_index index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_index_indexrelid_index index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_opclass_oid_index index info: 2048 total in 2 blocks; 544 free (2 chunks); 1504 used: pg_trigger_tgrelid_tgname_index index info: 2048 total in 2 blocks; 616 free (3 chunks); 1432 used: pg_rewrite_rel_rulename_index index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_class_oid_index index info: 2048 total in 2 blocks; 544 free (2 chunks); 1504 used: pg_attribute_relid_attnum_index index info: 3072 total in 2 blocks; 808 free (3 chunks); 2264 used: pg_amproc_fam_proc_index index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_authid_oid_index index info: 3072 total in 2 blocks; 1200 free (3 chunks); 1872 used: pg_auth_members_member_role_index index info: 3072 total in 2 blocks; 1232 free (3 chunks); 1840 used: pg_shseclabel_object_index index info: 1024 total in 1 blocks; 72 free (0 chunks); 952 used: pg_database_datname_index index info: 2048 total in 2 blocks; 912 free (0 chunks); 1136 used: pg_database_oid_index index info: 1024 total in 1 blocks; 72 free (0 chunks); 952 used: pg_authid_rolname_index PgStat Shared Ref Hash: 7216 total in 2 blocks; 688 free (0 chunks); 6528 used PgStat Shared Ref: 16384 total in 5 blocks; 4296 free (5 chunks); 12088 used WAL record construction: 50200 total in 2 blocks; 6376 free (0 chunks); 43824 used PrivateRefCount: 8192 total in 1 blocks; 2648 free (0 chunks); 5544 used MdSmgr: 8192 total in 1 blocks; 7200 free (17 chunks); 992 used LOCALLOCK hash: 8192 total in 1 blocks; 592 free (0 chunks); 7600 used GUCMemoryContext: 24576 total in 2 blocks; 7952 free (9 chunks); 16624 used GUC hash table: 32768 total in 3 blocks; 12704 free (5 chunks); 20064 used Timezones: 104112 total in 2 blocks; 2648 free (0 chunks); 101464 used Postmaster: 16384 total in 2 blocks; 13712 free (15 chunks); 2672 used ident parser context: 1024 total in 1 blocks; 760 free (0 chunks); 264 used hba parser context: 50176 total in 9 blocks; 7824 free (12 chunks); 42352 used ErrorContext: 8192 total in 1 blocks; 7928 free (4 chunks); 264 used Grand total: 2321840 bytes in 131 blocks; 885360 free (272 chunks); 1436480 used < , , , 2024-10-16 09:40:22.693 CEST >ERROR: out of memory < , , , 2024-10-16 09:40:22.693 CEST >DETAIL: Failed on request of size 1073741822 in memory context "TopTransactionContext". < , , , 2024-10-16 09:40:22.693 CEST >CONTEXT: automatic vacuum of table "rog_02.pg_toast.pg_toast_2619"
follow-up: 4 comment:3 by , 5 weeks ago
As a temporary workaround reconnecting should alleviate the problem
comment:4 by , 5 weeks ago
Replying to strk:
As a temporary workaround reconnecting should alleviate the problem
This 1300 jobs are run by 100 workers and every worker connect to the database does it's job and disconnects and then connects to the database again and starts the works with a new topology.
Seems like I have to stop and start postgis to free the memory.
comment:6 by , 5 weeks ago
Yes I use https://gitlab.com/nibioopensource/postgres_execute_parallel and this kind of code I have using for many years without any memory problems. This memory problems seems to quite new now.
follow-up: 9 comment:7 by , 5 weeks ago
If I understand the message correctly PostgreSQL autovacuum process is unable to allocate new memory for the TopTransactionContext memory pool. This isn't necessarely due to a leak, it could also "just" be higher memory consumption. To find a leak you should run a single process and see if the memory of that process keeps raising.
comment:8 by , 5 weeks ago
PostGIS might be using memory pools of a too wide scope. See https://git.osgeo.org/gitea/postgis/postgis/src/branch/master/topology/postgis_topology.c#L4109 for example, and https://github.com/postgres/postgres/blob/master/src/backend/utils/mmgr/README
comment:9 by , 5 weeks ago
Replying to strk:
If I understand the message correctly PostgreSQL autovacuum process is unable to allocate new memory for the TopTransactionContext memory pool. This isn't necessarely due to a leak, it could also "just" be higher memory consumption. To find a leak you should run a single process and see if the memory of that process keeps raising.
This has not been an problem before.
sysctl -w vm.overcommit_memory=0 vm.overcommit_memory = 0
I now run a test with 10 threads in parallel .
And started out with
Wed Oct 16 12:08:54 CEST 2024 total used free shared buff/cache available Mem: 125Gi 4.1Gi 100Gi 3.3Gi 20Gi 117Gi Swap: 7.8Gi 258Mi 7.6Gi
comment:10 by , 4 weeks ago
I tested with ST_RemEdgeNewFace instead off ST_RemEdgeModFace and it was the same behavior. One thing notiched was that did see a lot off LWLock for pg_stat_statements, so I removed pg_stat_statements, but the memory problem continued.
I went back and checked some old scripts and am I not sure that I haves tested wth 200 paralell threads on this server before.
With 200 threds in parallel I run the 1300 jobs with memory usage around 100 GB and I do finish most the jobs yes, but not all before running out off memory.
When running with 60 threads in parallel the memory usage is around 60 GB for the same amount off jobs, but all jobs are done. When we check the logs we find this :
Some jobs takes a long time (almost 3 hours) and needs a lot off memory to remove edges. Done clean small polygons for face_edges_to_remove_list size 7196 for face_table_name tmp_dyrkbarjord_10_c3_037_689.face at 2024-10-17 00:00:39.899873+02 used_time: 8681.734
While the job below remove twice as many faces in about 100 seconds. Done clean small polygons for face_edges_to_remove_list size 16502 for face_table_name tmp_dyrkbarjord_10_c3_037_725.face at 2024-10-16 21:38:11.523127+02 used_time: 105.92985
But we are able move this 1309 topologies which involves around 5 million faces and more than 100 millions edges into a single topology with 900.000 thousand faces (There are still some errors in the result that I have to check out).
So I think you are right there are no clear memory leaks but some times topology needs a lot memory and time.
So I close this one.
comment:11 by , 4 weeks ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:12 by , 3 days ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
Testing more on this after commit https://git.osgeo.org/gitea/postgis/postgis/commit/75d9b03d6c4ec5d47637f9912081f4eaf332810e
comment:13 by , 3 days ago
After the memory fixes up to https://git.osgeo.org/gitea/postgis/postgis/commit/75d9b03d6c4ec5d47637f9912081f4eaf332810e the test that failed here are now running through with out any memory errors.
Yes I run very low on memory but the code runs through with out any memory errors.
I was now testing with 100 threads in parallel and 1300 max 1300 rows pr cell.
CALL topo_rog_static.rog_overlay( ARRAY[ 'sk_grl.n5_forvaltning_flate n5', 'prod_dyrkbarjord_05.snapshot_ar5_flate ar5', 'prod_dyrkbarjord_05.snapshot_ssb_flate ssb', 'prod_dyrkbarjord_05.snapshot_mdir_flate mdir', 'prod_dyrkbarjord_05.snapshot_dmk_flate dmk', 'prod_dyrkbarjord_05.snapshot_ravine_dtm10_dmk_flate dtm10', 'prod_dyrkbarjord_05.snapshot_ravine_statsforvalter_flate ravine' ], 'tmp_dyrkbarjord_12.dyrkbarjord_05', 4258, --srid 0, -- no snapto 10.0, -- min m2 area to keep 0, --max vertx length false, -- _break_up_big_polygons 1.0, -- min negative buffer value null, remove sliver polygons ARRAY[ ('0103000020A21000000100000005000000B3C573C549A82240C2497FD1316F4F40B3C573C549A8224059E5729B33DC4F40A3DA057EEABE264059E5729B33DC4F40A3DA057EEABE2640C2497FD1316F4F40B3C573C549A82240C2497FD1316F4F40','037')::rog_input_boundary::rog_input_boundary ] , -- run all the area blocks true, -- do_qualitycheck_on_final_reseult -- Method 1 topology.TopoGeo_addLinestring -- Method 2 topology.TopoGeo_LoadGeometry -- Method 3 select from tmp toplogy into master topology 3, 100, -- _max_parallel_jobs int, -- this is the max number of paralell jobs to run. There must be at least the same number of free connections 1300, -- _max_rows_in_each_cell int, -- this is the max number rows that intersects with box before it's split into 4 new boxes null );
comment:14 by , 3 days ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
I have now set vm.overcommit_memory=2 and run test with only 100 workes and it still seems like free memory is steadly falling.
In this run we create around 1300 topologies with more 65. million edges and more than 3.5 million faces generted from 7 different simple feature input tables. Each of this 1300 topologies are cleaned up by removing faces below 10 m2 . We then reduce the number edges to around 5 millon edges and faces down to below 1 million faces, before it’s moved into single a topology.
When started free memory was around 100 GB. We are now done with 1250 of the temp toplogies and free memory is now