#3626 closed defect (fixed)
JPIP driver in endless loop
Reported by: | gaopeng | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | 2.0.0 |
Component: | GDAL_Raster | Version: | 1.6.2 |
Severity: | normal | Keywords: | JPIPKAK |
Cc: |
Description (last modified by )
On one of my test services, the driver worked for awhile, and then gave the following error message each time GetNextUpdatedRegion is called, and returned PENDING, instead of ERROR.
ERROR 1: EOF reached before completing VBAS ERROR 1: EOF reached before completing VBAS ERROR 1: EOF reached before completing VBAS ERROR 1: EOF reached before completing VBAS ERROR 1: EOF reached before completing VBAS ERROR 1: EOF reached before completing VBAS ERROR 1: EOF reached before completing VBAS
Attachments (1)
Change History (27)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Another case of endless loop:
// if there is no pending data and we don't want to wait. if( nSize == 0 ) <<< nSize always 0 { *pnxbufoff = 0; *pnybufoff = 0; *pnxbufsize = 0; *pnybufsize = 0; return GARIO_PENDING; }
comment:3 by , 14 years ago
Description: | modified (diff) |
---|---|
Status: | new → assigned |
comment:4 by , 14 years ago
Keywords: | JPIPKAK added; JPIP removed |
---|
comment:5 by , 14 years ago
I have attempted to reproduce this problem a number of ways with no success. In the 1.6-esri branch I tried:
gdal_translate jpip://216.150.195.220/JP2Server/Nashville_QB -srcwin 10 10 1 1 -outsize 1000 1000 out.tif
as their was a report that zooming in caused problems.
In trunk I also tested with an alternate versions of gdalasyncread which give up on the reading the async image after the first pass and then initiated a new async reader. But this also worked fine. All testing was on linux.
Can you provide a more specific case to reproduce the error? Perhaps with gdal_translate?
comment:6 by , 14 years ago
It is in fact service specific. The following is a service and the command line to reproduce. If you can't, change the offset and size and try again.
C:\GDAL\gdal>gdal_translate jpip://imagery10.arcgisonline.com:8080/JP2Server/red prvimgwebw02/Adelaide_FebMar2008_15cm_GDA94_MGA54 -srcwin 100 100 1000 1000 -out size 1000 1000 out2.tif Input file size is 253334, 593334 0Warning 1: Kakadu Core Warning: Profile violation detected (code-stream is technically illegal). Profile-0 and Profile-1 code-streams must have sufficient DWT levels to permit extraction of a low resolution image which is no larger than 128x128. Try setting a larger va lue for "Clevels" or else set "Sprofile" to 2.
comment:7 by , 14 years ago
Gao,
I am just getting "couldn't connect to host". Are there restrictions on where imagery10.arcgisonline.com is accessable from?
comment:8 by , 14 years ago
Frank,
We are having a problem with accessing port imagery10.arcgisonline.com:8080 from web(imagery10.arcgisonline.com is accessible). We have been working on opening up the port. It took longer than I thought. But the test jpip service should be ready by Wednesday (6/23).
comment:9 by , 14 years ago
Frank,
The service should be ready now.
jpip://imagery10.arcgisonline.com:8080/JP2Server/redprvimgwebw02/Adelaide_FebMar2008_15cm_GDA94_MGA54
comment:10 by , 14 years ago
When I issue the command:
gdal_translate \ jpip://imagery10.arcgisonline.com:8080/JP2Server/redprvimgwebw02/Adelaide_FebMar2008_15cm_GDA94_MGA54 \ -srcwin 100 100 1000 1000 -outsize 1000 1000 out2.tif
I get an http response document that includes:
Exception report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The server encountered an internal error () that prevented it from fulfilling this request.</u></p><p><b>exception</b> <pre >javax.servlet.ServletException: javax.naming.NamingException: Could not derefer ence object [Root exception is javax.naming.NamingException: Could not dereferen ce object [Root exception is javax.ejb.EJBException: java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/jboss/ejb3/interceptors/aop/LifecycleCallba ckBeanMethodInterceptor]] com.ittvis.jp2.web.JP2Server.service(Unknown Source) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) com.ittvis.jp2.web.RequestControlFilter.doFilter(Unknown Source) org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilte r.java:96) </pre></p><p><b>root cause</b> <pre>javax.naming.NamingException: Could not dere ference object [Root exception is javax.naming.NamingException: Could not derefe rence object [Root exception is javax.ejb.EJBException: java.lang.RuntimeExcepti on: java.lang.NoClassDefFoundError: org/jboss/ejb3/interceptors/aop/LifecycleCal lbackBeanMethodInterceptor]] org.jnp.interfaces.NamingContext.resolveLink(NamingContext.java:1352) org.jnp.interfaces.NamingContext.lookup(NamingContext.java:817) org.jnp.interfaces.NamingContext.lookup(NamingContext.java:833) org.jnp.interfaces.NamingContext.lookup(NamingContext.java:686) javax.naming.InitialContext.lookup(Unknown Source) com.ittvis.jp2.web.JP2Server.service(Unknown Source) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) com.ittvis.jp2.web.RequestControlFilter.doFilter(Unknown Source) org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilte r.java:96)
I shall investigate some other configurations.
comment:11 by , 14 years ago
I have been unable to find any request configuration that gets any further on this service. I will note that the request url was:
I'm at a standstill again.
comment:12 by , 14 years ago
Frank, Could you please try again using the following service? Note that it's JPIP:// instead of HTTP://
jpip://imagery10.arcgisonline.com:8080/JP2Server/Adelaide
comment:13 by , 14 years ago
I have occasionally succeeded in reproducing the hang with the request:
gdalasyncread jpip://imagery10.arcgisonline.com:8080/JP2Server/Adelaide outgdb.tif -srcwin 50000 100000 1000 1000 -outsize 1100 1100
I see the debug message:
JPIPWorkerFunc: zero data returned from server: (null)
Breaking in the code, "nSize" is always zero in GetNextUpdatedRegion() the while loop to actually wait for data is never entered since both bhighThreadRunning and bLowThreadRunning are false. This code results in a PENDING exit:
// if there is no pending data and we don't want to wait. if( nSize == 0 ) { *pnxbufoff = 0; *pnybufoff = 0; *pnxbufsize = 0; *pnybufsize = 0; return GARIO_PENDING; }
One obvious approach is to not return PENDING if neither a high nor low priority thread is left running. I shall experiment with fixes.
comment:14 by , 14 years ago
Milestone: | → 1.8.0 |
---|
I have applied a fix to properly error out when the worker thread terminates without all the data - likely due to a timeout. This is in trunk (r19912) and 1.6-esri (r19913).
This fixes the only hang I've been able to reproduce, but I'm also going to dig into the VBAS issue mentioned above, which might be a distinct issue.
I am not sure why no data is coming back from the server. The results are not consistent, but it seems like it might be a server loading issue of some sort.
comment:15 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I implemented some machinery to make it easy to test short return results from the JPIP server. Using that I was able to stimulate the VBAS reading error. I have corrected that and also a lack of kakadu exception handling in some cases.
The changes are in trunk (r19916) and 1.6-esri (r19917).
I *think* this pretty much addresses this bug report. Let me know if there are outstanding issues.
comment:16 by , 14 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Frank, We found another case where the endless look occurs.
- Display the jpip service, Adelaide, zoom and pan - Let it idle for about 4-5 minutes - Pan again and endless loop.
I looked it. The problem is that the http request returns an error, as below, instead of valid pixel data, and the return always has the same size. This is a problem even if it stops (fails) instead of the endless loop.
Could you look at it again?
CPLHTTPResult *psResult = CPLHTTPFetch(osCurrentRequest, apszOptions); << This returns an error if (psResult->nDataLen == 0) {
<html><head><title>JBoss Web/2.1.2.GA - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 500 - </h1><HR size="1" noshade="noshade"><p><b>type</b> Exception report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The server encountered an internal error () that prevented it from fulfilling this request.</u></p><p><b>exception</b> <pre>javax.ejb.NoSuchEJBException: Could not find stateful bean: ad2ue-63u2pp-gbuwxvxi-1-gbwm4cb7-b4 org.jboss.ejb3.cache.simple.SimpleStatefulCache.get(SimpleStatefulCache.java:479) org.jboss.ejb3.cache.simple.SimpleStatefulCache.get(SimpleStatefulCache.java:430) org.jboss.ejb3.stateful.StatefulInstanceInterceptor.invoke(StatefulInstanceInterceptor.java:59) org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102) org.jboss.aspects.tx.TxPolicy.invokeInOurTx(TxPolicy.java:79) org.jboss.aspects.tx.TxInterceptor$Required.invoke(TxInterceptor.java:190) org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102) org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:76) org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102) org.jboss.ejb3.tx.NullInterceptor.invoke(NullInterceptor.java:42) org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102) org.jboss.ejb3.security.RoleBasedAuthorizationInterceptorv2.invoke(RoleBasedAuthorizationInterceptorv2.java:201) org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102) org.jboss.ejb3.security.Ejb3AuthenticationInterceptorv2.invoke(Ejb3AuthenticationInterceptorv2.java:186) org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102) org.jboss.ejb3.ENCPropagationInterceptor.invoke(ENCPropagationInterceptor.java:41) org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102) org.jboss.ejb3.asynchronous.AsynchronousInterceptor.invoke(AsynchronousInterceptor.java:106) org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102) org.jboss.ejb3.BlockContainerShutdownInterceptor.invoke(BlockContainerShutdownInterceptor.java:67) org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102) org.jboss.aspects.currentinvocation.CurrentInvocationInterceptor.invoke(CurrentInvocationInterceptor.java:67) org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102) org.jboss.ejb3.session.SessionSpecContainer.invoke(SessionSpecContainer.java:219) org.jboss.ejb3.proxy.handler.ProxyInvocationHandlerBase.invoke(ProxyInvocationHandlerBase.java:261) org.jboss.ejb3.proxy.handler.session.SessionSpecProxyInvocationHandlerBase.invoke(SessionSpecProxyInvocationHandlerBase.java:101) $Proxy198.request(Unknown Source) com.ittvis.jp2.web.JP2Server.service(Unknown Source) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) com.ittvis.jp2.web.RequestControlFilter.doFilter(Unknown Source) org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) </pre></p><p><b>note</b> <u>The full stack trace of the root cause is available in the JBoss Web/2.1.2.GA logs.</u></p><HR size="1" noshade="noshade"><h3>JBoss Web/2.1.2.GA</h3></body></html>
comment:17 by , 14 years ago
Gao,
I have tried:
gdalinfo jpip://imagery10.arcgisonline.com:8080/JP2Server/Adelaide
but I get "couldn't connect to host". Is this service still publically available?
comment:18 by , 14 years ago
I have reproduced the problem, and incorporated a number of changes to propogate and error out appropriately on failures in trunk (r20136). However, it seems these changes may be over aggressive, and I am getting normal access failing now. Continuing to investigate...
comment:19 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
I tracked out another problem to bHighThreadFinished sometimes being 1 when the worker thread is launched leading to a race condition. Fixed in trunk (r20137). Both r20137 and r20136 backported to the 1.6-esri branch (r20138).
I *think* things are more stable now. I've done quite a few runs against the Adelaide server including many under valgrind.
comment:20 by , 14 years ago
The endless loop problem is fixed. However, I still see the same error message coming back from the server. Once this happens, the current jpip dataset becomes invalid, and it's no longer usable (the display no longer sharpens).
Looking at the error message, and given that it happens only after the connection is idle for awhile, I am wondering if the connection is timed out, and the client needs to reconnect in order to continue.
Otherwise, this must be data specific to this test jp2 image, or there is something wrong with ITT's JPIP server implmentation. This can't be an as designed behavor.
comment:21 by , 14 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Via Garrett, Norman has suggested using the JPIP "cclose" request in the ::Stop() method. I have review the situation with the error after a lengthy pause in access, and there is no sign that Stop() is called. I have also been unable to find detailed information on how to issue this request.
I have implement a "reinitialization" mechanism in the driver such that a new channel is initialized when an error occurs in JPIPWorkerFunc() fetching data. With this I have been successful in completing a new raster request after one has failed. Note that the current mechanism uses the existing HTTP connection, but does recreate all the JPIP level information. The changes are in trunk (r20318). I'm testing in the gdal-esri branch now.
The following test script was used. About one time in two the request after the pause fails with the aformentioned html error. But with the reinit support the following (Again) request succeeds.
import gdal import time import sys ds = gdal.Open( 'jpip://imagery10.arcgisonline.com:8080/JP2Server/Adelaide' ) ds.GetRasterBand(1).ReadRaster( 0, 0, 500, 500 ) total = 8 for x in range(total): sys.stderr.write('Wait %d more minutes...\n' % (total-x)) time.sleep( 60 ) sys.stderr.write( 'Continue\n' ) ds.GetRasterBand(1).ReadRaster( 0, 500, 500, 500 ) sys.stderr.write( 'Again\n' ) ds.GetRasterBand(1).ReadRaster( 0, 500, 500, 500 ) sys.stderr.write( 'Done\n' )
comment:23 by , 14 years ago
I noticed once timed out, it reinitializes on every read request. It seems that the "Need Reinitialize" flag is not turned off, see attached.
by , 14 years ago
Attachment: | jpipkakdataset.cpp added |
---|
Clear bNeedReinitialize flag after initialized
comment:24 by , 9 years ago
If I understand right, all stuff belonging to this ticket is implemented except perhaps what gaopeng is talking about turning off the "Need Reinitialize" flag.
comment:25 by , 9 years ago
Milestone: | 1.8.1 → 2.0 |
---|---|
Resolution: | → fixed |
Status: | reopened → closed |
There is one mroe case of endless loop: