PayPal Outage Points To CIO Failure – The Accidental Successful CIO

The basic job of a CIO is to ensure that a company’s IT infrastructure operates smoothly and allows the company to conduct business. On Monday, August 3, 2009, PayPal’s CIO failed at this most basic of jobs.

A quick check of PayPal’s senior management structure reveals that they don’t have a CIO position (which in of itself is rather amazing), but Ryan D. Downs is their Senior Vice President, Worldwide Operations and so he’s their de facto CIO. What went wrong Ryan?

The Facts Behind The Failure

On Monday, August 3rd, Paypal experienced a world-wide outage that affected all of their customer facing systems. The effect of this outage is that millions of Paypal’s customers who rely on them to approve and complete financial transactions were unable to do so. This was a long outage – it started at 1:30 pm EST and lasted to until at least 6:30 pm EST.

Paypal is attributing this outage to “internal” issues.

Paypal is a huge business. In the most recent quarter, Paypal handled $16.7B in customer online commerce transactions. In the past the company has stated that they normally handle $2,000 in online transactions every second. Just in case you are doing the math, this means that this outage prevented at least $36M worth of business from happening.

What The CIO Did Wrong

I have no magic insights into what went wrong at Paypal, but it’s pretty easy to make a guess. Back in 2005, customers got shut out of Paypal for about 5 days when a software update went very, very wrong. I’m willing to bet that some sort of update process got away from them once again. This is just sloppy IT work.

This is exactly the type of basic “blocking & tackling” that CIOs have to get taken care of as part of building a solid IT foundation. Clearly this has not been done at Paypal.

The reason that this is such a scandal is that its happened at Paypal before. Once a problem is known, the CIO needs to step in and make sure that it will never happen again. We’re not just talking about establishing a fail-safe update process, but also making whatever changes are needed to the Paypal infrastructure in order to make sure that problems like this can’t ripple throughout the system.

Additionally, creating a process for rolling back changes is critical. If a bad change slips though the system and starts to go into production, you need to have the ability to get the system back to the way that it used to be.

Final Thoughts

Major outages like this reflect badly on all CIOs. There should be no reason that a outage like this should be allowed to happen especially since Paypal has had problems like this in the past. Paypal can’t claim that they didn’t have enough funding to prevent this problem – they are the fastest growing part of the eBay corporation.

In the end it all comes down to planning. Finding the time to gather the right people to run through “what if” scenarios and then following through with the recommendations that come out of these meetings is what every CIO needs to do. If Ryan takes the time to do this, then he will have found a way to apply IT to enable the rest of the company to grow quicker, move faster, and do more.

Click here to get automatic updates when The Accidental Successful CIO Blog is updated.

What We’ll Be Talking About Next Time

Hewlett-Packard is a huge IT products and services company that lives and dies by the actions of its sales teams. Making sure that the sales teams get paid should be a simple task right? Think again…