Opened 15 years ago

Closed 14 years ago

Last modified 14 years ago

#1131 closed defect (fixed)

Load balancing doesn't support failover

Reported by: brucedechant Owned by: brucedechant
Priority: low Milestone: 2.2
Component: General Version: 2.1.0
Severity: minor Keywords:
Cc: Christine, Bao External ID:

Description

The load balancing algorithm used in the web tier doesn't remove server's that are not responding.

Example: In a 2 server site this shows up as a sequence of success/fail/success/fail/... operations because the load balancing algorithm is round robining between all of the servers in the list even "bad" ones.

If a "bad" server is encountered it needs to be removed from the load balancing list of servers and periodically checked and added back in if it becomes "good" again.

Attachments (2)

GetSiteInfo.patch (347 bytes ) - added by christinebao 14 years ago.
CommentKmlServiceTest.patch (660 bytes ) - added by christinebao 14 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 by brucedechant, 15 years ago

Status: newassigned

comment:2 by brucedechant, 15 years ago

Fixed in sandbox/adsk/2.1 r4323

comment:3 by brucedechant, 14 years ago

Fixed in sandbox/adsk/2.1 r4329

comment:4 by brucedechant, 14 years ago

Resolution: fixed
Status: assignedclosed

Fixed in trunk r4403

comment:5 by brucedechant, 14 years ago

Added NULL pointer check.Trunk r4440

comment:6 by brucedechant, 14 years ago

Added NULL pointer check. sandbox/adsk/2.1 r4441

comment:7 by christinebao, 14 years ago

Resolution: fixed
Status: closedreopened

After adding NULL pointer check, server unit test won't crash because of calling NULL pointer, but it still has 4 test cases fails.

This is because of code:

MgConnectionProperties* MgSiteManager::GetConnectionProperties(
    MgUserInformation* userInfo, MgSiteInfo::MgPortType portType, bool useSessionIp)
{
    …

        if (length > MgSiteInfo::HexStringLength)
        {
            STRING siteHexString = sessionId.substr(
                length - MgSiteInfo::HexStringLength, MgSiteInfo::HexStringLength);
            Ptr<MgSiteInfo> siteInfo = GetSiteInfo(siteHexString);

            if ((NULL != siteInfo.p) && (MgSiteInfo::Ok == siteInfo->GetStatus()))
            {
                connProps = GetConnectionProperties(userInfo, siteInfo, portType);
            }
            else
            {
                // This site is not currently working

                // We have a session, but it will not exist on any other machine so we force the session exception
                throw new MgSessionExpiredException(L"MgSiteManager.GetConnectionProperties",__LINE__,__WFILE__, NULL, L"", NULL);
            }
        }



siteInfo from GetSiteInfo(siteHexString) is NULL, and an exception throw out saying session expired.

Compared with the code before:

            Ptr<MgSiteInfo> siteInfo = new MgSiteInfo(siteHexString);

            if (MgSiteInfo::Ok == siteInfo->GetStatus())
            {
                connProps = GetConnectionProperties(userInfo, siteInfo, portType);
            }


Personally I think in the GetSiteInfo(…) function, if matchingSiteInfo is not found, we can new an instance of it. The code could be:

MgSiteInfo* MgSiteManager::GetSiteInfo(CREFSTRING hexString)
{
    MgSiteInfo* matchingSiteInfo = NULL;

    …

    if (matchingSiteInfo == NULL)
        matchingSiteInfo = new MgSiteInfo(hexString);

    return SAFE_ADDREF(matchingSiteInfo);
}


I tested it in unit test, and all test cases pass.

by christinebao, 14 years ago

Attachment: GetSiteInfo.patch added

comment:8 by christinebao, 14 years ago

Attach patch http://trac.osgeo.org/mapguide/attachment/ticket/1131/GetSiteInfo.patch for fixing server unit test.

Hi Bruce, would you please check this patch when you come back:

  1. Is it a right fixing which will not break your submission?
  2. There is another override MgSiteInfo* MgSiteManager::GetSiteInfo(CREFSTRING target, INT32 port). Shall we handle it in the same way?

I submit the patch to avoid server unit test fail in the following build. It's appreciated if you can review it and correct me if anything wrong.

Thanks & regards,
Christine

comment:9 by christinebao, 14 years ago

Cc: Christine Bao added

comment:10 by waltweltonlair, 14 years ago

Fixed a ref-count bug in Christine's submission.

by christinebao, 14 years ago

Attachment: CommentKmlServiceTest.patch added

comment:11 by christinebao, 14 years ago

These three KmlService test cases will throw out MgConnectionFailedException and make the site (typically 127.0.0.1) UnableToConnect. Then the GetNextSite() will fail.
As these three test cases actually do nothing, comment them temporarily to make server unit test pass. They will be considered later when Bruce reviews this defect.

comment:12 by brucedechant, 14 years ago

Fixed failed KML unit tests. See submission r4488.

comment:13 by brucedechant, 14 years ago

Resolution: fixed
Status: reopenedclosed

comment:14 by brucedechant, 14 years ago

Apply updates to sandbox/adsk/2.1 r4489.

comment:15 by brucedechant, 14 years ago

  • Updated background check server(s) thread to shutdown faster
  • Disabled starting the background check server(s) thread if we are running unit tests

sandbox/adsk/2.1 r4491

sandbox/adsk/2.2gp r4492

trunk r4493

comment:16 by brucedechant, 14 years ago

Fixed failed KML unit tests.

See submission trunk r4495.

See submission sandbox/adsk/2.1 r4497.

comment:17 by brucedechant, 14 years ago

  • Turn on fail over retry background thread by default and disable it when running unit tests
  • Adjust fail over retry time for web tier to correspond to a single MapGuide server instead of a multiple MapGuide server configuration

See submission sandbox/adsk/2.2gp r4574

See submission trunk r4575

Note: See TracTickets for help on using tickets.