id,summary,reporter,owner,description,type,status,priority,milestone,component,version,severity,resolution,keywords,cc
1822,Instability and dropped ingest on installed system,mlucas17,gpotts,"from Nicki:

We have had a ton of problems with the ingest and with the server crashing.  I have been trying to collect meaningful data and come up with scripts to keep it running and to re-try failed ingests, so I apologize for not having sent you anything yet.

Overall, it seems like 50% or less of the post updates are actually making it into the database.  Only by running them back in manually do I see any error codes, as nothing is written to any standard log (nohup, etc).  I will see error 500, 502 and 503.  Sometimes if I restart the omar server, I can sneak a few more back in but it is HIGHLY unreliable.

I have seen this when nothing else was running other than the imagery ingest, which gets maybe 3 images an hour.  I ended up writing a script that will dump a list of the images in the database versus what is on the filesystem and then trying to run them back in using the java OmarDataManager <url> addRaster $i (were $i is a for loop that reads the list of deltas produced from the on-filesystem and in-db lists).  That script is lucky if it gets 30% of the images back in.  

The failure rate is even higher for video. 

We found out yesterday that the reason that most of the tags don't show up in the database is b/c the ntm plugin wasn't compiled properly, so the ingest didn't register most of the tags we are actually interested in.  We are hoping that the ongoing issues with ""the image not showing up in the right place on google earth"" is somewhat remedied by the plugin being employed.  

Unfortunately, where I am starting today is trying to delete all records from the database and run them back in.  This is very troublesome since we have exposed the service to users and because the deletion routine (removeRaster) is as, if not more, problematic than the addRaster.  

The server crashes alot, sometimes once every 10 minutes.  I have a cron to just restart it, but in some cases (such as when it gets a java out of memory error), the java process doesn't actually die but is totally useless.  I don't have a way to find that condition to auto-restart it.  I have bumped up the memory for the jvm to 4G (from 1G).  We do see tons of hs_pid_error logs and I have been trying to find some rhyme or reason as to why they are occurring.  They always reference either the addRaster or addVideo in some part of the stack trace in the hs_pid_error file.

We set the ulimit on our machines to unlimited so we are now allowing the system to generate core dumps, so there are a bunch of those as well.  That is how Jeremy was able to tell me that the ntm plugin wasn't engaged.  
",defect,closed,highest,OMAR Mar 2010,Performance and Stability,OMAR 1.8.4,blocker,fixed,,
