Opened 11 years ago

Closed 6 years ago

Last modified 2 years ago

#2131 closed enhancement (fixed)

Replace regress_lots_of_points.sql data with generated query

Reported by: strk Owned by: pramsey
Priority: low Milestone: PostGIS Fund Me
Component: postgis Version: master
Keywords: Cc:

Description

The regress/regress_lots_of_points.sql file is 1.7MB. It defines a table with 50000 records (int, point geometry). Could very well be replaced by a one-line query with a similar effect. Would cut at least 700KB from the tarball…

Change History (11)

comment:1 by strk, 11 years ago

Milestone: PostGIS 2.0.2PostGIS 2.1.0
Priority: mediumlow
Type: defectenhancement
Version: 2.0.xtrunk

comment:2 by robe, 10 years ago

Milestone: PostGIS 2.1.0PostGIS Future

punt

comment:3 by robe, 6 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

comment:4 by pramsey, 6 years ago

Resolution: wontfix
Status: newclosed

OK, having looked this over, I don't see any compelling reason to re-write the index regression checks. They are pretty good, and to save a few K we have to muck with them a fair amount.

comment:5 by Mike Taves, 6 years ago

How about we make a deterministic pseudo-random point sequence, based on a linear-feedback shift register. This one should generate a uniform distribution of points between [0, 1000] in both x and y directions. And because it's based on a bit(16) postgres type, I'd expect it to be portable.

CREATE OR REPLACE FUNCTION
  lfsr_point(len integer, start_state integer DEFAULT 6342)
    RETURNS SETOF geometry AS
$$
DECLARE
  lfsr bit(16);
  bt bit(16);
  x float8;
  y float8;
BEGIN
  lfsr := start_state::bit(16);
  FOR i IN 1..$1 LOOP
    bt := ((lfsr >> 0) # (lfsr >> 2) # (lfsr >> 3) # (lfsr >> 5)) & (1::bit(16));
    lfsr := (lfsr >> 1) | (bt << 15);
    x := round(lfsr::integer::numeric / 65536.0 * 1000, 4);
    bt := ((lfsr >> 0) # (lfsr >> 2) # (lfsr >> 3) # (lfsr >> 5)) & (1::bit(16));
    lfsr := (lfsr >> 1) | (bt << 15);
    y := round(lfsr::integer::numeric / 65536.0 * 1000, 4);
    RETURN NEXT ST_MakePoint(x, y);
  END LOOP;
  RETURN;
END;$$ LANGUAGE plpgsql;

And then generate the "random" points:

DROP TABLE IF EXISTS test;
CREATE TABLE test (
    "num" serial PRIMARY KEY,
    "the_geom" geometry
);
INSERT INTO test(the_geom)
SELECT lfsr_point(50000);

This is all fine an dandy, until I looked at the points: https://imgur.com/FgMjyc4 so it just needs a bit of fine tuning…

comment:6 by Mike Taves, 6 years ago

New idea: improve an existing function by adding a new (3rd) parameter:

ST_GeneratePoints( g geometry , npoints numeric , seed integer DEFAULT NULL)

where seed (if provided) could be used to generate a predictable pseudo-random number sequence. I've investigated that the Mersenne Twister (e.g. MT19937) provides the same pseudo-random sequence on different platforms. There are a few small C examples floating around that can be tailored into liblwgeom/lwgeom_geos.c.

comment:7 by strk, 6 years ago

I like the idea of a seed-based random function

comment:8 by pramsey, 6 years ago

I like it in principle… in practice it sure better be 100% stable on different platforms, otherwise it could be quite hard to use effectively. And that's a hard guarantee to provide. I was fairly surprised how much variability PgSQL's on random() had when run with the same seed on different OS's, but all on Intel chips.

comment:9 by Mike Taves, 5 years ago

The seed parameter enhancement is at #4304

comment:11 by Mike Taves <mwtoews@…>, 2 years ago

Resolution: wontfixfixed

In 54eb53a/git:

Replace regress_lots_of_{points,nulls}.sql with generated queries

Closes #2131

Note: See TracTickets for help on using tickets.