Opened 5 years ago

Closed 4 years ago

#3716 closed defect (wontfix)

Gist index irrational growth

Reported by: Marcin Piotrowski Owned by: pramsey
Priority: critical Milestone: PostGIS 2.2.7
Component: postgis Version: 2.1.x
Keywords: Cc:

Description (last modified by pramsey)

If you create a GiST index on a geometry(Polygon,4267) column and do updates on the table, the index size grows enormously even if the indexed column contains only nulls. This is a big problem that caused index to grow to tens of gigabytes for a table with just about 10000 rows, where indexed column contained only nulls, but was updated on a minutely basis (but not on the geometry column). The size was not shrinked during autovacuum, only full vacuum was able to free the disk space.

This is reproducable using following example that inserts 10000 rows and performs 1000 updates on the status column. After that index size is 321 MB (321 MB of indexed nulls ! )

-- psql script to create the database

CREATE DATABASE gist_weirdness WITH TEMPLATE = template0 ENCODING = 'UTF8'
LC_COLLATE = 'en_US.utf8' LC_CTYPE = 'en_US.utf8';
\connect gist_weirdness
CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;
CREATE EXTENSION IF NOT EXISTS postgis WITH SCHEMA public;
CREATE TABLE table_with_gist (
    id integer NOT NULL,
    status character varying(10),
    geompolygon geometry(Polygon,4326)
);
CREATE SEQUENCE table_with_gist_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;
ALTER TABLE ONLY table_with_gist ALTER COLUMN id SET DEFAULT
nextval('table_with_gist_id_seq'::regclass);

CREATE INDEX incidents_geompolygon_idx ON table_with_gist USING gist
(geompolygon);


--  pgscript to insert data
DECLARE @I;
set @I = 0;
while @I < 10000
begin
INSERT INTO table_with_gist(status)  VALUES (cast (@I as CHARACTER
VARYING));
set @I = @I + 1;
end


--pgscript to update data 

DECLARE @I;
set @I = 0;
while @I < 1000
begin
update table_with_gist set status=status::integer+7;
set @I = @I + 1;
end


Change History (4)

comment:2 Changed 4 years ago by pramsey

Description: modified (diff)

comment:3 Changed 4 years ago by pramsey

Milestone: PostGIS 2.2.6
Priority: mediumcritical

comment:4 Changed 4 years ago by pramsey

Milestone: PostGIS 2.2.6PostGIS 2.2.7

comment:5 Changed 4 years ago by pramsey

Resolution: wontfix
Status: newclosed

While this is no doubt an operationally awful problem to deal with, it's a PostgreSQL issue, not a PostGIS one. See the same behavior with a GIST index on the native 'box' type:

create table table_with_gist_box (id integer, status text, g box);
create index box_gx on table_with_gist_box using gist (g);
select pg_size_pretty(pg_relation_size('box_gx'::regclass));
update table_with_gist set status = 'bar';^C
insert into table_with_gist_box (status) select 'foo' as status from generate_series(1,100000);
select pg_size_pretty(pg_relation_size('box_gx'::regclass));
update table_with_gist_box set status = 'bar';
select pg_size_pretty(pg_relation_size('box_gx'::regclass));
update table_with_gist_box set status = 'bar';
select pg_size_pretty(pg_relation_size('box_gx'::regclass));
vacuum table_with_gist_box;
select pg_size_pretty(pg_relation_size('box_gx'::regclass));
update table_with_gist_box set status = 'bar';
vacuum table_with_gist_box;
select pg_size_pretty(pg_relation_size('box_gx'::regclass));
update table_with_gist_box set status = 'bar';
vacuum table_with_gist_box;
select pg_size_pretty(pg_relation_size('box_gx'::regclass));
update table_with_gist_box set status = 'bar';
vacuum full table_with_gist_box;
select pg_size_pretty(pg_relation_size('box_gx'::regclass));

I'm sure the folks at https://postgrespro.ru/ would be happy to work on improving the situation for future PgSQL releases.

Note: See TracTickets for help on using tickets.