Windmill Never Hangs When You Want It To
I've been making slow progress on getting our Windmill tests re-enabled on Launchpad. We've got a lot of people digging into JavaScript on Launchpad now, which is nice, but we have relatively few JavaScript experts on Launchpad, so I've been fielding a fair amount of questions lately on good JavaScript patterns, YUI 3, testing, and especially on how to divide testing between YUI tests and Windmill. I'm thrilled about these interruptions, don't get me wrong. It means more of us are becoming proficient with JavaScript hacking in Launchpad. We're also getting close to finishing our first feature on my squad. So yesterday was largely voice calls and IRC chats. The queue to speak to me was longer than than the PvP arena queue in DC Universe Online.
But I did make some progress with Windmill! The test suite is passing in the Windmill run in Steve Kowalik's Jenkins instance.
Ok, so I can't take any credit for this. It was a fix from Ian Booth, which landed in devel and got the broken test going again. In the end, the test was correct and required code changes to fix the overlay.
The couple hours I spent on Windmill yesterday, I spent running the dev server for Launchpad and poking at recipe pages. I had sprinkled some Y.log statements around the disable_existing_builds function in lib/lp/code/javascript/requestbuild_overlay.js and was trying to work out what the various variables were about. I was toggling back and forth between the dev server and the Jenkins page for Windmill, trying to work out where this was failing, when I realized the build was passing again. So I ran the test in my copy of devel, realized it was passing now, and looked through the logs for the revision to be sure of what had happened.
After that, I was near my end of day so I fired off the branch to ec2 test, to see if I could get a hang. My first test run passed, so I fired off another before bed, which also passed. This is part of the frustration of Windmill. It's passing in ec2 for me. It's passing in Jenkins now. But I know that as soon as I turn it back on for our runs in buildbot, it will start failing for someone. If I didn't have team lead responsibilities and knew I could babysit this properly, I would turn it on and chase failures directly as they appear.
My plan today is to do something like the following, in the hopes that I get a hang:
for i in `seq 10`;do ./bin/test -cvv --layer=WindmillLayer;done
If I do, then I'll attach with gdb and get a backtrace to see if I can work out what is causing the hang.
Posted by deryck on March 17, 2011

