Opened 15 years ago
Closed 15 years ago
#163 closed defect (fixed)
Simultaneous reharvest fail
Reported by: | ddnebert | Owned by: | simonp |
---|---|---|---|
Priority: | major | Milestone: | v2.4.3 |
Component: | General | Version: | v2.4.2 |
Keywords: | harvest, concurrency, run | Cc: |
Description
I have over 50 WMS GetCapabilities to check on a daily basis. If I set the revisit frequency to be the same, a significant number of the harvest processes return as errors, purging the old entries and not fetching new ones. It would appear that there is not multi-threaded support for harvest and proper queuing of the results to be indexed by Lucene.
This defect can be demonstrated by selecting more than 3-4 of listed harvest services and clicking Run. Invariably one or more will fail. However, if one runs the checks individually, they all succeed.
Change History (10)
comment:1 by , 15 years ago
comment:2 by , 15 years ago
Class: MSQL Exception, Error: Concurrent Serializable Transaction Conflict
comment:3 by , 15 years ago
Milestone: | → v2.5.0 |
---|---|
Version: | → v2.4.2 |
comment:4 by , 15 years ago
Hi Doug,
Which database are you using when you get this bug?
I have also come across the concurrent serializable transaction conflict in the recent bug on Dbms retries and before that I used to get it in lots of situations with ORACLE (eg. doing a big batch import etc).
I think the problem is in part due to the rather restrictive (but very safe) way in which access to data in the database is synchronized between different threads. At the moment (except for ORACLE) we have exclusive locks ie. one writer or one reader can have a lock at any time (isolation level SERIALIZABLE). This appears to be in part due to McKoi which doesn't support any other isolation level and because this is the safest way (in terms of consistency) to handle this issue.
We could change the default isolation level for all databases except McKoi to allow data to be read at any time as long as there is no write in progress (isolation level READ_COMMITTED - for all except McKoi which shouldn't be used in production anyway) which would help with this problem and also improve performance too. However we should check our database interaction to see whether there are any issues.
The other issue with the harvest is that by my reckoning, each harvest process uses one connection to the database - if you attempt to start 35 at the same time by selecting them all and choosing run then I think your harvest will fail anyway - you need to schedule them in groups according to the database pool size for your database (see <resources><resource enabled="true">..... in web/geonetwork/WEB-INF/config.xml) or more simply, just increase the poolsize - default poolsize is 10.
Cheers, Simon
follow-up: 6 comment:5 by , 15 years ago
Milestone: | v2.5.0 → v2.4.2 |
---|
follow-up: 7 comment:6 by , 15 years ago
Replying to ddnebert:
I am only using the default (embedded) database solution for experimentation. If using or deploying MySQL or Postgresql would make this concurrency problem go away, then I'll try that. I get the same problem if I hit "Run" manually on two or more targets, or if a scheduled harvest on multiple targets takes place. Not talking 35 concurrent requests here.
If this is a known behavior it should be flagged or prevented in the interface "warning: more than one concurrent harvest not permitted with embedded McKoi database"
comment:7 by , 15 years ago
Replying to ddnebert:
Replying to ddnebert:
I am only using the default (embedded) database solution for experimentation. If using or deploying MySQL or Postgresql would make this concurrency problem go away, then I'll try that. I get the same problem if I hit "Run" manually on two or more targets, or if a scheduled harvest on multiple targets takes place. Not talking 35 concurrent requests here.
If this is a known behavior it should be flagged or prevented in the interface "warning: more than one concurrent harvest not permitted with embedded McKoi database"
We reinstalled the software, this time using MySQL and get the same result. If I try to harvest only two remote collections at the same time, the indexing chokes and both need to be redone separately. This is a defect.
comment:8 by , 15 years ago
We reinstalled the software, this time using MySQL and get the same result. If I try to harvest only two remote collections at the same time, the indexing chokes and both need to be redone separately. This is a defect.
comment:9 by , 15 years ago
Owner: | changed from | to
---|
I can reproduce the Concurrent Serializable Transaction Conflict on McKoi but not on MySQL
Can you send a log file and the sites you are attempting to harvest please?
version 2.4.2 on Linux.