Just when you think that you have the worst job in IT, a story like this comes along! Last Monday the London Stock Exchange (LSE) experienced a full day outage. Traders who were ready to trade were unable to connect to one of the LSE’s main trading applications. No connect, no trading.
If you’ll think back about a week or so, you’ll realize that Monday was a very important day in stock trading land. The U.S. Government had just stepped in to shore up Fannie Mae and Freddie Mac. What this meant is that over in London, there were lots of traders who wanted to buy/sell British bank stocks because of what they thought the impact of this move would have on British stocks. However, for a full day nobody could trade anything!
The LSE uses a trading program called TradElect which is a 15 month old proprietary application that they’ve build using Microsoft technology. It appears that the traders were unable to connect to this application and that is why everyone experienced the outage.
The big question is why? Their trading volume grew too quickly and caused their software/hardware capabilities to be exceeded. Although the LSE is not talking, we can probably take some educated guesses at to what went wrong here. Since TradElect has been in service for 15 months, it’s probably not the fault of the functionality of the application. Additionally, since the problem lasted the entire day, clearly the IT team was unable to revert to a previous version of the application in order to fix the problem – so no “upgrade gone wrong” problem here. My guess is that this is an old fashion “too much volume” problem.
I almost hate to use the term, but could “cloud computing” be the solution for the LSE? Specifically, should they design their apps to run on their servers in their data center but build in an option to expand to additional servers located in some secure cloud in the event that there is a surge in trading like (tried to) happened on Monday? You can never guess at exactly how much computing capacity that you’ll need and perhaps this is where the brave new world of cloud computing can shine. Maybe this is a question that the next LSE CIO will have an opportunity to answer…
Have you ever had a problem where one of your applications get overwhelmed with too much user volume? Did the app go down or just stumble? What did you do? Probably even more importantly, what changes did you make later on to prevent the situation from happening again? Leave a comment and let me know what you think.