Opened 10 years ago

Closed 6 years ago

Last modified 5 years ago

#3626 closed defect (fixed)

JPIP driver in endless loop

Reported by: gaopeng Owned by: warmerdam
Priority: normal Milestone: 2.0.0
Component: GDAL_Raster Version: 1.6.2
Severity: normal Keywords: JPIPKAK
Cc:

Description (last modified by warmerdam)

On one of my test services, the driver worked for awhile, and then gave the following error message each time GetNextUpdatedRegion? is called, and returned PENDING, instead of ERROR.

ERROR 1: EOF reached before completing VBAS
ERROR 1: EOF reached before completing VBAS
ERROR 1: EOF reached before completing VBAS
ERROR 1: EOF reached before completing VBAS
ERROR 1: EOF reached before completing VBAS
ERROR 1: EOF reached before completing VBAS
ERROR 1: EOF reached before completing VBAS

Attachments (1)

jpipkakdataset.cpp (70.8 KB) - added by gaopeng 10 years ago.
Clear bNeedReinitialize flag after initialized

Download all attachments as: .zip

Change History (27)

comment:1 Changed 10 years ago by gaopeng

There is one mroe case of endless loop:

void JPIPKAKAsyncReader::Stop()
{
    JPIPKAKDataset *poJDS = (JPIPKAKDataset*)poDS;

    bComplete = 1;
    if (poJDS->pGlobalMutex)
    {
        if (((bHighPriority) && (!poJDS->bHighThreadFinished)) || 
            ((!bHighPriority) && (!poJDS->bLowThreadFinished)))
        {
            // stop the thread
            if (bHighPriority)
            {
                CPLAcquireMutex(poJDS->pGlobalMutex, 100.0);
                poJDS->bHighThreadRunning = 0;
                CPLReleaseMutex(poJDS->pGlobalMutex);
		
                while (!poJDS->bHighThreadFinished)
                    CPLSleep(0.1);  // <<< poJDS->bHighThreadFinished is never true
           ...

comment:2 Changed 10 years ago by gaopeng

Another case of endless loop:

    // if there is no pending data and we don't want to wait.
    if( nSize == 0 )   <<< nSize always 0
    {
        *pnxbufoff = 0;
        *pnybufoff = 0;
        *pnxbufsize = 0;
        *pnybufsize = 0;		
        return GARIO_PENDING;
    }

comment:3 Changed 10 years ago by warmerdam

Description: modified (diff)
Status: newassigned

comment:4 Changed 10 years ago by warmerdam

Keywords: JPIPKAK added; JPIP removed

comment:5 Changed 10 years ago by warmerdam

I have attempted to reproduce this problem a number of ways with no success. In the 1.6-esri branch I tried:

gdal_translate jpip://216.150.195.220/JP2Server/Nashville_QB -srcwin 10 10 1 1 -outsize 1000 1000 out.tif

as their was a report that zooming in caused problems.

In trunk I also tested with an alternate versions of gdalasyncread which give up on the reading the async image after the first pass and then initiated a new async reader. But this also worked fine. All testing was on linux.

Can you provide a more specific case to reproduce the error? Perhaps with gdal_translate?

comment:6 Changed 10 years ago by gaopeng

It is in fact service specific. The following is a service and the command line to reproduce. If you can't, change the offset and size and try again.

C:\GDAL\gdal>gdal_translate jpip://imagery10.arcgisonline.com:8080/JP2Server/red
prvimgwebw02/Adelaide_FebMar2008_15cm_GDA94_MGA54 -srcwin 100 100 1000 1000 -out
size 1000 1000 out2.tif
Input file size is 253334, 593334
0Warning 1: Kakadu Core Warning:
Profile violation detected (code-stream is technically illegal).  Profile-0 and
Profile-1 code-streams must have sufficient DWT levels to permit extraction of a
 low resolution image which is no larger than 128x128.   Try setting a larger va
lue for "Clevels" or else set "Sprofile" to 2.

comment:7 Changed 10 years ago by warmerdam

Gao,

I am just getting "couldn't connect to host". Are there restrictions on where imagery10.arcgisonline.com is accessable from?

comment:8 Changed 10 years ago by gaopeng

Frank,

We are having a problem with accessing port imagery10.arcgisonline.com:8080 from web(imagery10.arcgisonline.com is accessible). We have been working on opening up the port. It took longer than I thought. But the test jpip service should be ready by Wednesday (6/23).

comment:9 Changed 10 years ago by gaopeng

Frank,

The service should be ready now.

jpip://imagery10.arcgisonline.com:8080/JP2Server/redprvimgwebw02/Adelaide_FebMar2008_15cm_GDA94_MGA54

comment:10 Changed 10 years ago by warmerdam

When I issue the command:

gdal_translate \
  jpip://imagery10.arcgisonline.com:8080/JP2Server/redprvimgwebw02/Adelaide_FebMar2008_15cm_GDA94_MGA54 \
  -srcwin 100 100 1000 1000 -outsize 1000 1000 out2.tif

I get an http response document that includes:

 Exception report</p><p><b>message</b>
 <u></u></p><p><b>description</b> <u>The server encountered an internal error ()
 that prevented it from fulfilling this request.</u></p><p><b>exception</b> <pre
>javax.servlet.ServletException: javax.naming.NamingException: Could not derefer
ence object [Root exception is javax.naming.NamingException: Could not dereferen
ce object [Root exception is javax.ejb.EJBException: java.lang.RuntimeException:
 java.lang.NoClassDefFoundError: org/jboss/ejb3/interceptors/aop/LifecycleCallba
ckBeanMethodInterceptor]]
        com.ittvis.jp2.web.JP2Server.service(Unknown Source)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
        com.ittvis.jp2.web.RequestControlFilter.doFilter(Unknown Source)
        org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilte
r.java:96)
</pre></p><p><b>root cause</b> <pre>javax.naming.NamingException: Could not dere
ference object [Root exception is javax.naming.NamingException: Could not derefe
rence object [Root exception is javax.ejb.EJBException: java.lang.RuntimeExcepti
on: java.lang.NoClassDefFoundError: org/jboss/ejb3/interceptors/aop/LifecycleCal
lbackBeanMethodInterceptor]]
        org.jnp.interfaces.NamingContext.resolveLink(NamingContext.java:1352)
        org.jnp.interfaces.NamingContext.lookup(NamingContext.java:817)
        org.jnp.interfaces.NamingContext.lookup(NamingContext.java:833)
        org.jnp.interfaces.NamingContext.lookup(NamingContext.java:686)
        javax.naming.InitialContext.lookup(Unknown Source)
        com.ittvis.jp2.web.JP2Server.service(Unknown Source)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
        com.ittvis.jp2.web.RequestControlFilter.doFilter(Unknown Source)
        org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilte
r.java:96)

I shall investigate some other configurations.

comment:11 Changed 10 years ago by warmerdam

I have been unable to find any request configuration that gets any further on this service. I will note that the request url was:

http://imagery10.arcgisonline.com:8080/JP2Server/redprvimgwebw02/Adelaide_FebMar2008_15cm_GDA94_MGA54?cnew=http&type=jpp-stream&stream=0&tid=0&len=2000

I'm at a standstill again.

comment:12 Changed 10 years ago by gaopeng

Frank, Could you please try again using the following service? Note that it's JPIP:// instead of HTTP://

jpip://imagery10.arcgisonline.com:8080/JP2Server/Adelaide

comment:13 Changed 10 years ago by warmerdam

I have occasionally succeeded in reproducing the hang with the request:

gdalasyncread jpip://imagery10.arcgisonline.com:8080/JP2Server/Adelaide outgdb.tif -srcwin 50000 100000 1000 1000 -outsize 1100 1100

I see the debug message:

JPIPWorkerFunc: zero data returned from server: (null)

Breaking in the code, "nSize" is always zero in GetNextUpdatedRegion?() the while loop to actually wait for data is never entered since both bhighThreadRunning and bLowThreadRunning are false. This code results in a PENDING exit:

    // if there is no pending data and we don't want to wait.
    if( nSize == 0 )
    {
        *pnxbufoff = 0;
        *pnybufoff = 0;
        *pnxbufsize = 0;
        *pnybufsize = 0;		
        return GARIO_PENDING;
    }

One obvious approach is to not return PENDING if neither a high nor low priority thread is left running. I shall experiment with fixes.

comment:14 Changed 10 years ago by warmerdam

Milestone: 1.8.0

I have applied a fix to properly error out when the worker thread terminates without all the data - likely due to a timeout. This is in trunk (r19912) and 1.6-esri (r19913).

This fixes the only hang I've been able to reproduce, but I'm also going to dig into the VBAS issue mentioned above, which might be a distinct issue.

I am not sure why no data is coming back from the server. The results are not consistent, but it seems like it might be a server loading issue of some sort.

comment:15 Changed 10 years ago by warmerdam

Resolution: fixed
Status: assignedclosed

I implemented some machinery to make it easy to test short return results from the JPIP server. Using that I was able to stimulate the VBAS reading error. I have corrected that and also a lack of kakadu exception handling in some cases.

The changes are in trunk (r19916) and 1.6-esri (r19917).

I *think* this pretty much addresses this bug report. Let me know if there are outstanding issues.

comment:16 Changed 10 years ago by gaopeng

Resolution: fixed
Status: closedreopened

Frank, We found another case where the endless look occurs.

- Display the jpip service, Adelaide, zoom and pan
- Let it idle for about 4-5 minutes
- Pan again and endless loop.

I looked it. The problem is that the http request returns an error, as below, instead of valid pixel data, and the return always has the same size. This is a problem even if it stops (fails) instead of the endless loop.

Could you look at it again?

        CPLHTTPResult *psResult = CPLHTTPFetch(osCurrentRequest, apszOptions); << This returns an error
        if (psResult->nDataLen == 0)
        {
<html><head><title>JBoss Web/2.1.2.GA - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 500 - </h1><HR size="1" noshade="noshade"><p><b>type</b> Exception report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The server encountered an internal error () that prevented it from fulfilling this request.</u></p><p><b>exception</b> <pre>javax.ejb.NoSuchEJBException: Could not find stateful bean: ad2ue-63u2pp-gbuwxvxi-1-gbwm4cb7-b4
	org.jboss.ejb3.cache.simple.SimpleStatefulCache.get(SimpleStatefulCache.java:479)
	org.jboss.ejb3.cache.simple.SimpleStatefulCache.get(SimpleStatefulCache.java:430)
	org.jboss.ejb3.stateful.StatefulInstanceInterceptor.invoke(StatefulInstanceInterceptor.java:59)
	org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
	org.jboss.aspects.tx.TxPolicy.invokeInOurTx(TxPolicy.java:79)
	org.jboss.aspects.tx.TxInterceptor$Required.invoke(TxInterceptor.java:190)
	org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
	org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:76)
	org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
	org.jboss.ejb3.tx.NullInterceptor.invoke(NullInterceptor.java:42)
	org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
	org.jboss.ejb3.security.RoleBasedAuthorizationInterceptorv2.invoke(RoleBasedAuthorizationInterceptorv2.java:201)
	org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
	org.jboss.ejb3.security.Ejb3AuthenticationInterceptorv2.invoke(Ejb3AuthenticationInterceptorv2.java:186)
	org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
	org.jboss.ejb3.ENCPropagationInterceptor.invoke(ENCPropagationInterceptor.java:41)
	org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
	org.jboss.ejb3.asynchronous.AsynchronousInterceptor.invoke(AsynchronousInterceptor.java:106)
	org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
	org.jboss.ejb3.BlockContainerShutdownInterceptor.invoke(BlockContainerShutdownInterceptor.java:67)
	org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
	org.jboss.aspects.currentinvocation.CurrentInvocationInterceptor.invoke(CurrentInvocationInterceptor.java:67)
	org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
	org.jboss.ejb3.session.SessionSpecContainer.invoke(SessionSpecContainer.java:219)
	org.jboss.ejb3.proxy.handler.ProxyInvocationHandlerBase.invoke(ProxyInvocationHandlerBase.java:261)
	org.jboss.ejb3.proxy.handler.session.SessionSpecProxyInvocationHandlerBase.invoke(SessionSpecProxyInvocationHandlerBase.java:101)
	$Proxy198.request(Unknown Source)
	com.ittvis.jp2.web.JP2Server.service(Unknown Source)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	com.ittvis.jp2.web.RequestControlFilter.doFilter(Unknown Source)
	org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
</pre></p><p><b>note</b> <u>The full stack trace of the root cause is available in the JBoss Web/2.1.2.GA logs.</u></p><HR size="1" noshade="noshade"><h3>JBoss Web/2.1.2.GA</h3></body></html>

comment:17 Changed 10 years ago by warmerdam

Gao,

I have tried:

 gdalinfo jpip://imagery10.arcgisonline.com:8080/JP2Server/Adelaide

but I get "couldn't connect to host". Is this service still publically available?

comment:18 Changed 10 years ago by warmerdam

I have reproduced the problem, and incorporated a number of changes to propogate and error out appropriately on failures in trunk (r20136). However, it seems these changes may be over aggressive, and I am getting normal access failing now. Continuing to investigate...

comment:19 Changed 10 years ago by warmerdam

Resolution: fixed
Status: reopenedclosed

I tracked out another problem to bHighThreadFinished sometimes being 1 when the worker thread is launched leading to a race condition. Fixed in trunk (r20137). Both r20137 and r20136 backported to the 1.6-esri branch (r20138).

I *think* things are more stable now. I've done quite a few runs against the Adelaide server including many under valgrind.

comment:20 Changed 10 years ago by gaopeng

The endless loop problem is fixed. However, I still see the same error message coming back from the server. Once this happens, the current jpip dataset becomes invalid, and it's no longer usable (the display no longer sharpens).

Looking at the error message, and given that it happens only after the connection is idle for awhile, I am wondering if the connection is timed out, and the client needs to reconnect in order to continue.

Otherwise, this must be data specific to this test jp2 image, or there is something wrong with ITT's JPIP server implmentation. This can't be an as designed behavor.

comment:21 Changed 10 years ago by warmerdam

Resolution: fixed
Status: closedreopened

Via Garrett, Norman has suggested using the JPIP "cclose" request in the ::Stop() method. I have review the situation with the error after a lengthy pause in access, and there is no sign that Stop() is called. I have also been unable to find detailed information on how to issue this request.

I have implement a "reinitialization" mechanism in the driver such that a new channel is initialized when an error occurs in JPIPWorkerFunc() fetching data. With this I have been successful in completing a new raster request after one has failed. Note that the current mechanism uses the existing HTTP connection, but does recreate all the JPIP level information. The changes are in trunk (r20318). I'm testing in the gdal-esri branch now.

The following test script was used. About one time in two the request after the pause fails with the aformentioned html error. But with the reinit support the following (Again) request succeeds.

import gdal
import time
import sys

ds = gdal.Open( 'jpip://imagery10.arcgisonline.com:8080/JP2Server/Adelaide' )

ds.GetRasterBand(1).ReadRaster( 0, 0, 500, 500 )

total = 8
for x in range(total):
    sys.stderr.write('Wait %d more minutes...\n' % (total-x))
    time.sleep( 60 )

sys.stderr.write( 'Continue\n' )

ds.GetRasterBand(1).ReadRaster( 0, 500, 500, 500 )

sys.stderr.write( 'Again\n' )

ds.GetRasterBand(1).ReadRaster( 0, 500, 500, 500 )

sys.stderr.write( 'Done\n' )

comment:22 Changed 10 years ago by warmerdam

Tested and migrated to 1.6-esri branch (r20319).

comment:23 Changed 10 years ago by gaopeng

I noticed once timed out, it reinitializes on every read request. It seems that the "Need Reinitialize" flag is not turned off, see attached.

Changed 10 years ago by gaopeng

Attachment: jpipkakdataset.cpp added

Clear bNeedReinitialize flag after initialized

comment:24 Changed 6 years ago by Jukka Rahkonen

If I understand right, all stuff belonging to this ticket is implemented except perhaps what gaopeng is talking about turning off the "Need Reinitialize" flag.

comment:25 Changed 6 years ago by Even Rouault

Milestone: 1.8.12.0
Resolution: fixed
Status: reopenedclosed

Only compiled tested (couldn't find any public JPIP server) : trunk r28235 "JPIPKAK: reset the bNeedReinitialize flag after a timeout (based on modified file by gaopeng, #3626)"

comment:26 Changed 5 years ago by Even Rouault

Milestone: 2.02.0.0

Milestone renamed

Note: See TracTickets for help on using tickets.