[wiki:ProjectSteeringCommittee Project Steering Committee - Home] == Meeting Info == This meeting of the !MapGuide PSC will take place Thursday, April 2, 2009 at 18:00 UTC (2:00 PM ET / noon MT / 11:00 AM PT). Meeting Chair: Bob Bray Universal Time: http://www.timeanddate.com/worldclock/fixedtime.html?month=4&day=2&year=2009&hour=12&min=0&sec=0&p1=55 Location: The meeting will be held on IRC at [irc://irc.freenode.net/mapguide #mapguide] == Agenda == * Appoint a Meeting Secretary * Updates on Actions from the [wiki:PscMeeting03-05-2009 Last Meeting] * Status of the !PayPal initiative for builds (added by Tom) * 2.1 branch (including setting externals to CS-Map 12.01 branch) (first part of this added by Tom) * If Paul is here, timeline for next Fusion 2.0 beta? * Way forward on Raster for 2.1 * Others? == Minutes == PSC Members present: Tom, Bob, Bruce, Jason, Haris, Kenneth. === Review last meeting's action items === PSC Members present: Tom, Bob, Bruce, Jason, Haris, Kenneth === Review last meeting's action items === * All items are on todays meeting as well === Status for the !PayPal initiative === * There are legal issues involved, since Trevor is an Autodesk employee. * An outcome is expected before the next meeting. === 2.1 and CS-MAP === * Trunk should point to CS-MAP 12.01 * Branches are not created until a specific major code change is expected * ... And a special case is the MG Connection Manager patches that Haris proposes. * Tom will assist Haris in setting up the branch, and Bruce agreed to work on the issue with changes. === Fusio 2.0 beta timeline === * Likely to be ready Monday === Raster issues for 2.1 === * Debate on what locks are required * Suggestion: * Add the two fixes produced by Haris (already commited) * Add the stylization lock produced by Haris * Votes: Tom +1, Bob +1, Bruce +1, Paul +1, Kenneth +1 * Postpone the refcount fix produced by Traian, and initially suggested by Haris due to possible issues with the lock === End of meeting === == Full transcript == {{{ [INFO] Channel view for “#mapguide” opened. Hey everyone, are we ready to start? I'm ready Same ready. Who want's to take minutes today - any volunteers? I would like to give it a try today I will volunteer I'll step down then :) you sure? :) Ok, Kenneth it is I checked last meetings minutes and there really are no actions with updates that are not in this weeks agenda already - so let's start there hi So first item: Status of the PayPal initiative for builds Tom this is yours We are currently trying to figure out whether Trevor is allowed to do this as an employee of Autodesk I was hoping to have a better update today, but the meeting with the lawyers did not go through yesterday as planned. So we will have to wait some more That's it. Any questions? Apparently the meeting with legal is monday - I asked Trevor just before the PSC meeting So we should know more next week I appologize for the delay on this, everything legal here takes a while Anyway, next item: 2.1 branch (including setting externals to CS-Map 12.01 branch) oops; sorry I'm wondering whether anyone is holding onto a big submission for branch, but they can't because we haven't created the branch yet I mean for trunk ...but maybe we are at the stage where it would be good to create the branch anyway My main thing about getting a branch Is that then we could hard-code specific tags for externals Like for CS-Map Instead of following their trunk We've always been holding off on creating branches because it saves developers some time if they don't have to merge. I'm good with pointing to CSMap 12.01 even from Trunk OK, I'd be OK with that too. We don't really have to branch until we start getting people wanting to make massive chances Or release, whatever comes first :) chances -> changes OK, just say when you want the branch and I'll make it. I'll also make that change to point to CSMap 12.01 today. thanks Tom ok next item If I would sumbit changes for MG connection manager, that would require branch to submit too ? Oops - hold on sorry, Bob slow here Thats ok If you were going to change connection manager to not use refcounting I think that someone wanted that do be done in a post-2.1 Bruce maybe? that was me Jason :) The connection manager code is a vital part of MapGuide and changing it this close to release could be dangerous agrre, only I would like to understand if I would like to change it How big is the change? not so big in coding but may effect all parts of MG as Bruce said I think we should assess the risk - maybe via code review and then decide agree I guess we're still counting on the ADSK beta to give us some stability, and not relying on our own Beta process very much. That will have to change eventually. We're not tehnically all that close to release is all I mean :) agree, to do review I would ask fro branch and do changes there ? That is why I think we should look at it HarisK - can you make a patch? HarisK: Someone could create a sandbox for you to work in. There is a sandbox area, HarisK your change seems more appropriate for that ok It seems that you are doing a proof of concept type exercise; please correct me if I'm wrong I hope not :) connection manager is not working properly in MG If more than a couple files have to change, then a sandbox is better than patch anyway (I think) And a patch can easily be generated via SVN once the changes are made and tested in sandbox. and not a proof concept but try to make it work If the changes aren't too risky, it would be nice. Couldn't you do that all locally? Why do you need a branch for this? He could, but then would have to upload patch for review. Also, local changes don't get backed up :) He needs a branch for review purposes Reviews can be done via patches So if he's doing a lot of work, would be better to do it in a branch. The sandbox can easily be deleted post-work yes, if it's a lot of work a branch would be good That's a good reason I think sandbox is right place to do this BTW, HarisK, thanks for doing this. Do you need any help creating the branch? yes, i would need help with svn in any case, thanks OK, I'll create you a sandbox/haris/2.1 branch (feel free to give me a different name) ok, I was also waiting when 2.1 will roll out Anything else on this? and chekc with others when would be appropriate time to start this bigger change I am done :) Is there any reason not to start the change now? I was kind of hopping that sombody who was working on current connection manager will come to email list and we discuss it I think that person would be Bruce that is me :) I've started to look at the code How about you commit what you have to sandbox so people can see, And then discuss? :) it is not only code which i wrote but will need also little bit of changes I'm well aware of the issue and would love to get rid of the reference count tracking which would influence perhaps fewer places of MG code ok, great I would like that we exchange ideas how to solve it, before writting full imp[lementtation I would happy to work with you on this in the sandbox :) sounds good sounds perfect to me we can finish this issue chat offline so we can contiune IMO - it would be great to get that in for 2.1 So let's see how it shapes up we can try :) Next item was: timeline for next Fusion 2.0 beta? But no Paul today next beta? Anyone else know the status? -->| pagameba has joined #mapguide :) lol pagameba: ? not bad timing your ears ringing? hola! sorry, was in a meeting and lost track of time :D burning more like agile development So, Fusion 2.0 beta next? when? zjames was eavesdropping for me um hang on ... lemme ask madair what his plans are pagameba: he said on list he wanted to fix http://trac.osgeo.org/fusion/ticket/238 first looking oh right - we talked about it yes, we want to fix that first but we could do another beta right after I won't be able to work on it until tomorrow so earliest would be tomorrow or perhaps monday ok - thanks for joining Paul and for the update pagameba: I don't want to push. I could just point at trunk when building test installers Modify my working copy externals, update, etc... I guess although theoretically trunk could get changes that aren't going into 2.0 I'd rather we be more thorough in keeping 2.0 current That would be cool :) IS OL 2.8 going into 2.0? :) yes, I think so Mike has OL trunk in 2.0 That's cool. Guess I'm going too OT :) which is basically 2.8 Next topic? Sure - everyones favorite. Raster There was some more mail on that today I think Personally I'd like to see Traian's approach tested, as that change is more scalable than one big lock I think we need to decide for 2.1 Hi, for 2.1, just ship the lock people seem to like the lock Well maybe everyone but me :) rbray, I worry about that. Primarily because the only reason we started with locks in FDOGDAL, and single pool in MapGuide was multithreading instability in GDAL I have a few machines that I can load test on, if I get some pathed dll's You guys are trying to fix a refcounting bug with locks -- it's not right. I really think we need to understand problems and divide them no we are not I don't have time to build a 2.0.2 from scratch, then patch, but I can test with a prebuilt dll set If we can get the non-reference-counted connection manager into 2.1, then we can do away with the lock and allow users to choose between threaded and non-threaded providers. It really requires two things I think connection manager is something wrong One is the connection manager changes another story is if gdal can support multithreading we cant mix that in true - that is the other if/when it will support we will remove locks from fdo gdal Even if you ignore the multithreading part of GDAL, there is still a refcounting problem, that is causing behavior that is different from that of other providers. Traian_: is refcounting part of the FDO contract though? it seems not all That should be fixed I would assume ADSK raster too and I think i saw one more provider with it no ADSK raster doesn't have the same issue but regardless it chrashes same way with rasters but it is not point anyhow The feature reader in the GDAL provider needs a strong reference to the connection, since it needs the connection. yes, I worte that in my first email but It's the same for the feature reader in SDF or SHP or whatever also fdo client as MG can't get object from connection,close connection and use object is there anything wrong with putting both in? If that bug is fixed, then you don't need the lock in the stylization code. Um...not sure I agree Harris There is nothing wrong with putting both in. with both it will lock That's why I said, just ship with the lock, if users want it. If an object needs another for it to be valid, then it should hold a strong reference to it there is no library of db hich will allow you to close connection and read from connection but we are talking abut something which is not the reall issue/ not reall cause of problem I agree with that too, Ideally the reader would hold a strong reference to the connection If there is time pressure, I'm pro shipping with a lock, but if there is a possibility to test a lock free solution that would be nicer if the connection is closed, the readeer should return an error There is not a big performance difference between lock and no lock in practice -- I tried it, at least on a dual core machine. btw, it is not even reader But that is not the way any of the providers are currently implemented but a raster object returned I played with gdal a year or so ago there were problems with it Makes no difference - same logic should apply regardless of the object Frank added lock on every gdal call in provider i would not now unlock that until we have solve MG issues and then really look at gdal multithreading capabillities Yes, even with the lock you get the crash, as I said, you can't fix a refcounting problem with locks. You can repro the problem from a single thread too. and also reall gains in prfoamnce from that part Gee - thanks Walt! yes, closing connection and resuing object from it will crash Seems the order of things should be: there were 3 probems with MG 2.1 rasters 1. memory leak 2. unalloacted pointer use and 3. this connection manager problem Evaluate teh connection manager issues, if we fix, then decide what to do here rbray, agreed we cant fix connection manager for 2.1 ( two weks) If we don't fix the connection manager, then let's just ship with the big freaking lock 2.1 doesn't have to be in two weeks. not, fix change and the difference between fix and change is? Can we also fix the refcounting in the provider? It can't hurt. yes, ofcourse I would like to do both - the strong reference in the reader of GDAL and the FDO connection manager :) I really wrote in email that I want to change that and in my code I did just it is not the main issue to solve Traian: your fix is only for the provider, not anything in the server code? Yes, provider only, but with the fix, you can potentially use the provider without pooling connections to it, since connecting becomes pretty fast. Thus bypassing the whole connection pool thing in theory... I believe it will come to unable to create new connections So your fix will only introduce a new gdalprovider.dll ? Yes, you can get to that, in my tests it does, and then it waits a bit and then it can create new ones. In my code, I also removed the locks from the MG stylization code, and messed with serverconfig.ini. The real change is in the provider though... And what results did you get from your tests Traian? So, if only the provider is updated, it gets stable, but has issues in long runs? I am in favor with shipping with the lock, since I get a suspcious thing with FDO XML parsing, which happens when you open connections concurrently. I have not had time to debug this, but I think it happens any time FDO parses XML. It affects all providers which parse XML on open though... The problem with the crash/deadlock disappears in my tests. I'd prefer to either fix the connection manager to work properly with single-threaded providers, or ship with the lock, before presenting multithreaded GDAL provider as a testing option for users. It had problems in 1.2 when it wasn't constrained And I value stability over performance. jasonbirch: may be the problems had to do with refcounting, and not threading :) Well, maybe, but we need a stable base to test from. If it ships with singlethread setup as default in serverconfig.ini (and it works) I see no issues in allowing users to test multithreading, unless it is known to be unstable Yes, let's just add the global raster lock, but also fix the refcoutning in the provider. We can remove the global lock from the provider itself since it is superceded by the lock in MG. GDAL is allready single threaded Imean fdo gdal perfoamnce is gained on enabling pool for fdo gdal and resuing connections GDAL Provider was made single threaded in a failed attempt to fix the crashing problem wasn't it? So there is no reason for it to be single threaded HarisK: if Traian's refcount fix is added to the provider, what is the problems that you see? :) I mean, we eventually want to remove the locks from the GDAL Raster provider, so we shouldn't be doing thigns thatwill assume that it will always be single-threaded It is my suggestion from first email taht provider needs to add ref count as well behaved I thought that was what Traians fix does? tomf2: what are we doing that assumes single-threaded, other than temporary lock? but only that will lock if we have lock on rasters too I just did what Haris suggested for the provider. It's not something I invented I also removed the locks and sped up connection opening The locks inside the provider will be unnecessary due to the BFL in MapGuide itself schema cache ? BFL = big f-ing lock but we have also other fdo clients jasonbirch: I was concerned about Haris' statement that "GDAL is allready single threaded", so if we do things that assume the slow nature of a single threaded provider we could end up with something that is not as fast as it could be in the future and because we dont know (and i think is not stable) how gdal is safe in mt, i would leave those gdal locks We certainly don't want BFL's in all clients From here, it looks as if Traian and Haris disagree on the solution, but if Traians patch is what Haris suggested, I'm not following the dicussion... no, not in all clienst other clienst who open connection, use connection, close connection will work tomf2: I think maybe terminology difference between single-threaded and per-connection-threaded. My patch is only part of what Haris wants... I claim it's sufficient... This is going nowhere. Here is what I would do this release, you guys take it for what it's worth: 1. Fix refcount in provider, remove lock from provider. 2. Add BFL to MG. I'd only want to do that if the changes proposed to the connection manager are not implemented. The #2 The #1 should be done regardless I think Yep - I agree isn't #2 what "CustomConnectionPoolSize=1" means? That #1 should be done Should be - yes #1 should be done regardless ksgeograf, that's correct, that's what it's supposed to mean, but it didn't work... I'd like #2 for now due to the FDO XML parsing being single-threaded, which gives me a heap corruption every few thousand concurrent requests. I don't think it will happen in real use cases, but until we fix it, I'd keep the lock. remove gdal lock from provider is not right way, I strgly beleiev I dont understand why adding uanother lkevel of uncertainty ksgeograf: perhaps it didn't work because of the refcount problem? we have problem with rasters for very long time why experiment with multithreading gdal tomf2: yes, that was what I was thinking Perhaps we need to think about this differently You guys are guessing; haris tested this fairly thoroughly now, and a year ago. Stability is key for 2.1 - yes So lets do what we need to do to achieve that, even if there are extra locks we may not need For 2.2, we find some time to remove all the freaking locks and look for the real source of hte problem You'll never find it with a bunch of irrelevent locks in the way, which mask the problem jasonbirch: Does that mean that Haris has found the "CustomConnectionPoolSize" to be faulty, even in the case of correct refcounts? ksgeograf: yes I think so. Haris? jasonbirch: I thought that Haris didn't touch the GDAL Raster provider yet I looked at provider year ago and in last 2 weeks a lot tomf2: his proposed fix didn't involve the provlder, but his testing (including this time around) included changes to refcounting. it wasn't easy to find what is exactly going on Alone, they didn't fix the problem. What does the patch that people are happy with in the last week contain? HarisK: kenneth and I were wondering if you did the refcounting fix in the GDAL Raster PRovider and then make sure CustomConnectioPoolSize was set to 1 at the same time 3 fixes PoolSize doesn pla with GDAL there is fix peace of code in MG, if single threaded then pool size == 1 which is also something to change, I believe Traian_: that patch contains a fix for the memory leak and invalid allocation, and the big lock. That's all. there were 2 bugs causing unhandled exception when working with rasters so, the only thing that is not checked in yet, is the big lock. Let's add that, and also fix the refcounting in the provider, and call it a day? +1 +1 +1 - for 2.1 I am concerned that you're looking at 2.2 to fix the connection manager. HarisK: does that seem reasonable for you? jason: correct. I'm not convinced that the GDAL provider is the only place this problem is manifesting, just the most obvious place. that would need to be tested If the work that HarisK is doing is deemed safe enough, I want to see it in the 2.1 branch, if not for the initial 2.1 release. if you add ref count and leave lock it will not work if i remember correctly also, there is allready one lock in stylization code on excecute query So fixing the refcount breaks your fix? +1 for 2.1 not that I really understand :) could be, cant remeber this second jasonbirch: so if I have this right, you want to put in some changes that might destabilize the code to fix some problems that we don't know exist (but I agree, probably do somewhere- but may not be caused by the connection manager) tomf2: is the code stable? Not my experience... +1 for 2.1 it doesn't sound right to me, we could put it in a 2.1.1. sorry guys, I am lost in debate now I now what works that is what we sumbited as patches So then let's go with that for 2.1 all other stuff is something we need to further wok and test ok, but it looks to me as if the patches Traian submitted are indeed fixing the refcount problem you describe? there is a huge potential in improving rasters in MG For 2.1, I can live with stable and slow... Right now anything is better than a constantly crashing service OK, so as long as you're open to working on the connection manager for 2.1.1, then I'd be happy to see 2.1 ship with just the BFL :) yes, I don't think that refcounting is the best possible way to determine usage of a connection Kenneth, yes only adding ref to provider will halt MG ( or in combo with lock) can't remember now I'm happy to work on the FDO connection manager post 2.1 I'd be worried about fixing the refcounting, if HarisK saw a negative interaction between that and the lock. jasonbirch: BTW I got some news today that there may be a problem with the way the web tier closes connections and this may cause some instability. Perhaps this is one of the stability issues you are seeing? tomf2: yes, quite possibly. tomf2: is fix for that likely to make it into 2.1? :) jasonbirch: ask Trevor :) So are we done? I think that supporting many concurrent users hasn't been thoroughly done yet. And it seems that many developers are working on this at the same time Would it be worthwhile to set up a wiki or something for people to put their findings? We should just make an area on the trac wiki for that and Yes - that would be helpful as long as people update it I plan to blog about the use of 'The Grinder" to replicate many users too. All of my testing so far has been with real users, but artificial load can be useful too. Haris used it to good effect for the raster testing. We use the GRinder here for our load tests Chris has made some nice scripts Oh. Would be cool to share those in the SVN :) great, share it please :) Is this the final suggestion for 2.1 then? * Suggestion: * Add the two fixes produced by Haris * Add the refcount fix produced by Traian, and initially suggested by Haris * Add a lock produced by Haris, but not yet submitted as a patch we obviusly need better unittest ksgeograf: I think all we're going to add is the big lock in the stylization code. ksgeograf: I believe so Chris is on holidays until the 20th Remind me at that time tomf2: will do. the one called "mapguide_raster_stability.patch" ? I'm done, thanks Bob I think so. The other one was already submitted (in a modified form) OK, so that's what we'll do for 2.1 I have to run, so thanks everyone And adding the refcounting fixes into fdo 3.4 may introduce more instability. Nice short meeting... jasonbirch: the refcounting fix does not introduce instability, but is useless if there is a giant lock. Of course, I can't convince you of that by just saying it. no :) Show me the money :) it's impossible to prove that a problem doesn't exist it's a general scientific concept exactly... the burden is on you to show a problem... :D I would personally go to B.C. and buy you a a beer if the lock + refcount fix breaks rasters... and then go to Slovenia to buy a beer for Haris OK, so if first thread has strong connection to fdo provider, and does execute, then second thread comes in and there's only a single connection what happens? Does it throw exception because second connection can't be opened? Or does it wait for the first one to close then proceed? it waits there is retry logic to wait and it gies out after a short period right, the retry logic kicks in and with many concurent users no images for them so it waits, then retries? it doesnt work I tested it with many concurrent and it worked fine many = 16 on very short raster access yes ut it doesnt really work I tested with short and long (small and big rasters). Of course, I can't be 100% sure it works. You guys have more real tests we will talk about beer :) either in Canada or Slovenia Haris: it will have to be in June/July, that's when I go visit in the area Haris will be in BC for GeoWeb :) I wonder who else here is going to be there... So, If I were to produce a raster provider with the fixes submitted by Traian, on a MapGuide running with the patched dll's Jason made, it would lock up? great I can send you guys a patched provider to play with I compile against trunk, so I need to rebuild on 3.4... Traian_: can you rebuild against 3.4 but using GDAL 1.6? :) I can, but I thought it was using GDAL 1.6 already... Oh... You're right. I was thinking 3.3 Kenneth's still running MG 2.0.2 I think. urgh yes I am :D I am too, in production. But am testing against 2.1 in Open Source time frames, that's like running a Ford Model T :) I have MGE 2010 running in test, but I can't get the OGR provider working I forgot a bit, but trying to remember only adding ref count to provider will again throw exception what part of OGR doesn't work? HarisK: it would help if you tell me what exception -- I have found another problem which has to do with threading, when parsing XML using FDO's XML stuff. not sure yet, it just throws Unclassified Exception and occasionally "Resource busy" what driver? PostGIS? MapInfo hmm... MapInfo is file-based, should be rock solid... Strange. Not really asking for help, just an excuse for driving a Ford T :D Is OGR provider advertised as single-threaded? :) not sure... No, MapGuide does dangerous things when the provider says "single-threaded" :) Yeah, it's hard-coded to set the pool size to 1, regardless of the settings in the config file. That was a bit annoying :) was? For Haris when trying to figure out why his config file settings weren't changing the behaviour :) Ah, I see Yeah, I had the same problem with the settings when I was testing... Took a while to figure out. Traian_, what are those dangerous thrings? It was a joke -- it's the override setting that sets the Gdal provider to use one connection it's not necessary anyway, since the provider already reports it's single-threaded when I removed the setting from the config, it still remained single connection limit and I couldn't figure out why... tomf2> Blame bdecahnt :) so, was there a decision what to do for 2.1? I think it was commit the lock and be done with it. ok do you mean: not fix the refcounting? The lock works OK without it, and make it pointless. The refcounting is not a problem if the global lock is there... Unless you are doing something other than stylization, in which case you are screwed. Since other APIs don't have the global lock... ok, so basically what is out now in the patched dll's ? ksgeograf: yes. The memory leak was already fixed by ADSK for 2.1 And the second issue has been committed by Walt, So the lock is all that's remaining. ok }}}