Opened 9 years ago

Last modified 5 years ago

#2587 new enhancement

copying attribute tables very inefficient if tables are in same database

Reported by: mlennert Owned by: grass-dev@…
Priority: normal Milestone: 7.6.2
Component: Database Version: svn-trunk
Keywords: libdb copy table Cc:
CPU: Unspecified Platform: Unspecified

Description

The function copy_table in lib/db/dbmi_client/copy_tab.c seems very inefficient to me when both the old and the new table are in the same database and it is a database with real SQL functionalities. IIUC, the function actually selects all rows from the input table into a cursor and the inserts them, row by row, into the output table. With a large table this takes a long time.

Using a simple "CREATE TABLE new_table AS SELECT [*, ListOfColumns] FROM old_table WHERE [Conditions]" is much faster !

So, it would be a great addition if for the relevant drivers, such as SQLite and PostgreSQL, could use this approach.

P.S. There is no LibDB component in the tracker. Don't know if there should be...

Change History (10)

comment:1 by mlennert, 9 years ago

To reinforce this:

I have a vector map with 881,120 points. I have to loop over this map, extracting points according to one attribute. When I use v.extract to extract the points and to create the table it takes quite a long time, most of it in "writing attributes":

time v.extract produnits_light where="cd_nacebel_2010='27120' and etab_emploi_onss_2010>0" out=prod_temp --o

real	0m20.183s
user	0m12.872s
sys	0m17.516s

i.e. 20s, although this only extracts 93 points !

Using a script that extracts the points without attribute table and then creates the table using different tools, it is much quicker. Here's the script (t.sh):

v.extract -t produnits_light where="cd_nacebel_2010='27120' and etab_emploi_onss_2010>0" out=prod_temp --o
v.db.select produnits_light where="cd_nacebel_2010='27120' and etab_emploi_onss_2010>0" sep=, file=test.csv --o
v.info -c produnits_light --quiet | cut -d'|' -f1 |  sed 's/TEXT/STRING/' | sed 's/DOUBLE PRECISION/REAL/' | sed 's/\(.\)\(.*\)/"\1\L\2"/g' | paste -s -d ',' > test.csvt
db.in.ogr input=test.csv out=prod_temp_table --o
v.db.connect -o prod_temp table=prod_temp_table key=cat_
time sh t.sh

real	0m1.241s
user	0m1.000s
sys	0m0.168s

The result is identical except that cat is renamed cat_ by db.in.ogr...

comment:2 by neteler, 8 years ago

Milestone: 7.1.07.2.0

Milestone renamed

comment:3 by neteler, 7 years ago

Milestone: 7.2.07.2.1

Ticket retargeted after milestone closed

comment:4 by martinl, 7 years ago

Milestone: 7.2.17.2.2

comment:5 by martinl, 7 years ago

Milestone: 7.2.27.4.0

All enhancement tickets should be assigned to 7.4 milestone.

comment:6 by neteler, 6 years ago

Milestone: 7.4.07.4.1

Ticket retargeted after milestone closed

comment:7 by neteler, 6 years ago

Milestone: 7.4.17.4.2

comment:8 by martinl, 6 years ago

Milestone: 7.4.27.6.0

All enhancement tickets should be assigned to 7.6 milestone.

comment:9 by martinl, 5 years ago

Milestone: 7.6.07.6.1

Ticket retargeted after milestone closed

comment:10 by martinl, 5 years ago

Milestone: 7.6.17.6.2

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.