Opened 6 years ago

Closed 5 years ago

#2258 closed defect (fixed)

t.create creates DB always in the PERMANENT

Reported by: martinl Owned by: grass-dev@…
Priority: blocker Milestone: 7.0.0
Component: Temporal Version: svn-releasebranch70
Keywords: t.register Cc:
CPU: Unspecified Platform: Unspecified

Description (last modified by martinl)

t.create creates DB not in the current mapset, but always in PERMANENT.

g.mapset -p
modis2002lst_1
t.create output=modis title="MODIS 2002" desc="Ukazkovy casoprostorovy dataset MODIS"
Create temporal database: /opt/grassdata/nc_spm_08_grass7/PERMANENT/tgis/sqlite.db

Attachments (2)

tgis.diff (1.2 KB) - added by martinl 6 years ago.
distr_temporal_database.diff (33.0 KB) - added by huhabla 5 years ago.

Download all attachments as: .zip

Change History (23)

comment:1 Changed 6 years ago by martinl

Description: modified (diff)
Summary: t.register creates DB always in the PERMANENTt.create creates DB always in the PERMANENT

comment:2 Changed 6 years ago by martinl

See the attachment, anyway it didn't help, what helped was t.connect -d, so probably some letf-over in my mapset(?)

Changed 6 years ago by martinl

Attachment: tgis.diff added

comment:3 Changed 6 years ago by huhabla

There is a reason why the temporal database is generated and used by default in the PERMANENT directory. This will assure that by default all maps from different mapsets are registered in the same temporal database. A single temporal database can handle maps from different mapsets, it is designed to handle maps from different mapsets. Consider the temporal database as a location specific, mapset independent storage of temporal information. The temporal framework makes sure that you can not modify space time datasets or time stamps of maps "located" in other mapsets.

You can use t.connect to use a different temporal database (a path for sqlite, connection string for postgresql). Hence you can force that the current mapset will be used to store time stamped maps and space time datasets by running t.connect in the new mapset. Be aware that the connection information is stored mapset specific.

However, using mapset specific temporal databases has the drawback that temporal information from other mapsets (STDS, time stamped maps) will not be available. The reason is that sqlite does not support the merging of different database files at runtime.

About temporal database creation: Each temporal command checks if a temporal database exists, if there is no temporal database then a new one will be created using the default settings (PERMANENT mapset) or the mapset specific settings set by t.connect. Hence if you modify the database connection at runtime with t.connect, the new temporal database will be used, or created if not present, when a temporal command gets invoked (except t.connect).

comment:4 in reply to:  3 ; Changed 6 years ago by glynn

Replying to huhabla:

There is a reason why the temporal database is generated and used by default in the PERMANENT directory.

Does it handle the case where the user lacks the filesystem permissions to write to PERMANENT?

Does it handle race conditions, i.e. where two users both try to create the database at the same time?

Ordinarily, we assume that the user lacks write permission outside their mapset.

comment:5 in reply to:  4 ; Changed 6 years ago by huhabla

Replying to glynn:

Replying to huhabla:

There is a reason why the temporal database is generated and used by default in the PERMANENT directory.

Does it handle the case where the user lacks the filesystem permissions to write to PERMANENT?

There is no explicit error handling implemented. The creation process will terminate with a Python exception.

Does it handle race conditions, i.e. where two users both try to create the database at the same time?

No. How does GRASS handle the race condition in case two user try to create a mapset with the same name at the same time?

Ordinarily, we assume that the user lacks write permission outside their mapset.

The temporal framework supports two SQL database backends: sqlite and postgresql. Sqlite is the default backend that can be used in case of single user GRASS locations. In case of a multi-user GRASS environment the postgresql database backend should be used instead, since it is able to handle concurrent read/write and multi-user access better than sqlite. The postgresql connection must be set explicitly with t.connect for each mapset, since temporal database connections are mapset specific.

I absolutely prefer to use mapset specific sqlite database by default that will merge at runtime with temporal databases from other mapsets to avoid race conditions and filesystem permission problems. But unfortunately the Python sqlite implementation does not support the merge of different sqlite databases at runtime. Other file based databases were no option, because of the lack of time support, or Python support and the introduction of additional dependencies. Therefore the postgresql support to avoid some problems of the sqlite backend.

comment:6 in reply to:  5 Changed 6 years ago by glynn

Replying to huhabla:

Does it handle race conditions, i.e. where two users both try to create the database at the same time?

No. How does GRASS handle the race condition in case two user try to create a mapset with the same name at the same time?

One of the mkdir() calls will fail, and G_make_mapset() will return -1 to its caller.

This is correct behaviour for creating a new mapset, which is supposed to "belong" to the user creating it. Creating a mapset should be a distinct operation from using an existing mapset. If a mapset with that name already exists, creation should fail; a "create" operation shouldn't just use an existing mapset (typically, incorrect ownership will prevent this).

This wouldn't be correct behaviour for creating a shared database, where the desired semantics are "use it if it exists, otherwise create it". In the event of a race condition, the process should either create the database (if it won the race) or use the existing database (if it lost the race). In neither case should it fail.

comment:7 in reply to:  5 ; Changed 6 years ago by neteler

Version: unspecifiedsvn-releasebranch70

Replying to huhabla:

Replying to glynn:

Replying to huhabla:

There is a reason why the temporal database is generated and used by default in the PERMANENT directory.

Does it handle the case where the user lacks the filesystem permissions to write to PERMANENT?

There is no explicit error handling implemented. The creation process will terminate with a Python exception.

Here in our shared /network and multi-user grassdata/ we have exactly this problem: while I am the PERMANENT owner, my colleague cannot write into the DB.

As a compromise, t.create should advertise t.connnect as a solution to the problem, otherwise users get immediately stuck.

comment:8 in reply to:  7 ; Changed 6 years ago by huhabla

Replying to neteler:

Replying to huhabla:

Replying to glynn:

Replying to huhabla:

There is a reason why the temporal database is generated and used by default in the PERMANENT directory.

Does it handle the case where the user lacks the filesystem permissions to write to PERMANENT?

There is no explicit error handling implemented. The creation process will terminate with a Python exception.

Here in our shared /network and multi-user grassdata/ we have exactly this problem: while I am the PERMANENT owner, my colleague cannot write into the DB.

As a compromise, t.create should advertise t.connnect as a solution to the problem, otherwise users get immediately stuck.

Please try r60732, i have improved the error messages in case the database creation or access fails. Database connection failures will now result in a fatal error rather then Python traceback.

comment:9 in reply to:  8 Changed 6 years ago by neteler

Replying to huhabla: ...

Please try r60732

Backported to relbr7 in r60828

comment:10 Changed 5 years ago by hamish

Priority: normalblocker

This is a serious design flaw.

GRASS modules should NEVER expect to be able to write outside their own mapset directory tree.

Mentioning the t.connect work around and internal checks are certainly helpful, but does not solve the underlying problem. Mapsets must be self-contained.

One way is to have it fail-over into a sqlite db in the current mapset dir tree. But really that needs to be the default, and so we need a way for the various mapsets made accessible by g.access to connect, share, and mix their sqlite dbs in memory, without blocking of two parallel grass sessions running in the same common location at the same time.

The reason is that sqlite does not support the merging of different database files at runtime.

Then we'll have to figure out some other way.. there must be a solution, even if it is working with the python sqlite people to make their code work the way we need it to, which would have the secondary effect of helping many other projects too.

Or maybe we can figure out some clever work around on our own- perhaps a module to (re)scan other available mapsets' temporal sqlite dbs and write out a local summary into $MAPSET/.tmp/, even as a flat text file if need be?

I've just come across this starting g.gui.animation from the main menu in OSGeoLive 8.0 after installing GRASS 7 with the "install_grass7" script from a terminal (* run that as a regular user, no sudo).

sorry, but we've got to fix this. :-/

thanks, Hamish

comment:11 in reply to:  10 ; Changed 5 years ago by huhabla

Replying to hamish:

This is a serious design flaw.

GRASS modules should NEVER expect to be able to write outside their own mapset directory tree.

Yes, you are right, it is a serious design flaw that breaks GRASS completely. And there are absolutely no modules in GRASS that write outside their own mapset directory tree. Modules able to produce file output anywhere in the file system and modules that use SQL database backends other than dbf and sqlite are not existent.

Mentioning the t.connect work around and internal checks are certainly helpful, but does not solve the underlying problem. Mapsets must be self-contained.

Having the opportunity to set the temporal database mapset specific using t.connect is not a work around, its a design decision. Well, we can remove sqlite support to force the use of PostgreSQL as temporal database backend to avoid the sqlite dilemma. Using PostgreSQL will avoid writing to any mapset directory.

One way is to have it fail-over into a sqlite db in the current mapset dir tree. But really that needs to be the default, and so we need a way for the various mapsets made accessible by g.access to connect, share, and mix their sqlite dbs in memory, without blocking of two parallel grass sessions running in the same common location at the same time.

Please can you point me to documentation howto share and mix sqlite dbs in memory ... and also for other SQL DBMS?

The reason is that sqlite does not support the merging of different database files at runtime.

Then we'll have to figure out some other way.. there must be a solution, even if it is working with the python sqlite people to make their code work the way we need it to, which would have the secondary effect of helping many other projects too.

Or maybe we can figure out some clever work around on our own- perhaps a module to (re)scan other available mapsets' temporal sqlite dbs and write out a local summary into $MAPSET/.tmp/, even as a flat text file if need be?

The design decision to use a single database is indeed questionable. But it is not limited to sqlite. The temporal framework was designed to support different database backends. Hence the solution to support mapset specific and central temporal databases must work with sqlite, PostgreSQL and possible other database backends. I am thinking since the beginning of the temporal framework design how to to support mapset specific databases and central databases all together, but i don't come to a good solution.

Using several independent temporal databases (mapset specific or ...) will reduce the SQL capabilities that can be used in the temporal framework (IMHO the most important feature of SQL databases). SQL queries that make use of where, group, having and order clauses will most of the time not work across mapset specific independent temporal databases. Such functionality must then be implemented in the temporal framework itself, which requires the parsing and analysis of SQL strings ... which are unfortunately SQL backend specific. This is a lot of work. Hence, most SQL queries using where, group, having and order keywords can not be used to select time stamped map layer or space time datasets from different mapsets (Affected module is t.list and maybe other).

However, all the fancy SQL features can be used to select map layers from space time datasets, since space time datasets are mapset specific. Hence, the temporal framework can be modified to use mapset specific database connections to process space time datasets. This will require the modification of all database queries in the framework to specify the mapset that should be used for the database connection. A central database can be supported by using the same connection for different mapsets. This is a huge effort. Probably all temporal modules must be modified as well.

So, what would be the best solution?

  • Using a central database management system with server based DBMS and abandon sqlite?
  • Removing the SQL capabilities for cross mapset map layer and STDS selection to support mapset specific temporal databases?

  • Living with this design flaw, using t.connect to specify a central temporal database that can be accessed by other mapsets if PERMANENT is not the best choice?

If you think that the current temporal framework design is blocking the release of GRASS7, then we should remove the temporal framework and all temporal modules from the release version, developing it only in trunk, marking it as highly experimental. I am fine with that. Actually, i would to suggest this, since the temporal framework is still in development. Several module manual pages are still missing and the design is obviously not finished yet.

comment:12 in reply to:  11 ; Changed 5 years ago by glynn

Replying to huhabla:

So, what would be the best solution?

Place the SQLite database file in the mapset directory by default. Any other location should only be used if specifically requested by the user.

Assuming that the current user can write to locations other than the current mapset directory, their home directory, or the temporary directory (/tmp or $TMPDIR), is a bug.

comment:13 in reply to:  12 ; Changed 5 years ago by martinl

Replying to glynn:

Replying to huhabla:

So, what would be the best solution?

Place the SQLite database file in the mapset directory by default. Any other location should only be used if specifically requested by the user.

Assuming that the current user can write to locations other than the current mapset directory, their home directory, or the temporary directory (/tmp or $TMPDIR), is a bug.

Is it planned any progress in this issue? Would be nice to reduce number of blockers slowly to zero...

comment:14 in reply to:  13 ; Changed 5 years ago by huhabla

Replying to martinl:

Replying to glynn:

Replying to huhabla:

So, what would be the best solution?

Place the SQLite database file in the mapset directory by default. Any other location should only be used if specifically requested by the user.

Assuming that the current user can write to locations other than the current mapset directory, their home directory, or the temporary directory (/tmp or $TMPDIR), is a bug.

Is it planned any progress in this issue? Would be nice to reduce number of blockers slowly to zero...

Work in progress. I have applied the patches from #2408 in r61956 and r61957. This is the basis for a larger patch, that enables distributed temporal datasets (patch is attached, but very experimental). I will switch to mapset specific temporal databases in case the new approach works stable.

Changed 5 years ago by huhabla

comment:15 in reply to:  14 ; Changed 5 years ago by neteler

Replying to huhabla:

Work in progress. I have applied the patches from #2408 in r61956 and r61957. This is the basis for a larger patch, that enables distributed temporal datasets (patch is attached, but very experimental). I will switch to mapset specific temporal databases in case the new approach works stable.

Is the attached patch the mentioned "larger patch"? What is the state of this ticket?

comment:16 in reply to:  15 Changed 5 years ago by huhabla

Replying to neteler:

Replying to huhabla:

Work in progress. I have applied the patches from #2408 in r61956 and r61957. This is the basis for a larger patch, that enables distributed temporal datasets (patch is attached, but very experimental). I will switch to mapset specific temporal databases in case the new approach works stable.

Is the attached patch the mentioned "larger patch"? What is the state of this ticket?

I have improved the temporal framework in the last months in grass trunk to use as default the distributed temporal database approach. It should be fully functional right now. There are several gunittests in temporal/t.connect/testsuite that tests the new temporal database approach.

Please test and report any issues with the new approach.

I would like to backport all the modifications of the temporal framework and temporal modules to grass 7.0 if everything works fine.

I hope that the gunittest framework will be backported as well.

comment:17 Changed 5 years ago by neteler

It would be great to ave a (partial?) backport soon. Beta4 is overdue...

comment:18 Changed 5 years ago by martinl

what's the status of this ticket? anything missing?

comment:19 in reply to:  18 Changed 5 years ago by martinl

Replying to martinl:

what's the status of this ticket? anything missing?

can we close this ticket?

comment:20 Changed 5 years ago by neteler

The doxygen documentation in lib/temporal/lib/default_name.c needs yet to be updated (I think).

comment:21 Changed 5 years ago by neteler

Resolution: fixed
Status: newclosed

Done in r63769 and r63770. Closing.

Note: See TracTickets for help on using tickets.