I’m hoping that you are not familiar with the U.S. food stamp program. This is a government funded program that provides people who are living below the poverty line with money that can only be spent on food. Clearly it’s a critical program that demonstrates the importance of information technology and the people who are enrolled in it desperately need it. That’s why it’s unacceptable when the IT systems that support the program stop working. Clearly the person who has the CIO job is the one to blame…
What Went Wrong With The Food Stamp System
The core of the problem is that the company Xerox is responsible for providing the back office IT systems that run the U.S. government’s food stamp program. The Electronic Benefits Transfer (EBT) system allows recipients of government food stamps to purchase goods using a digital card with a set spending limit. The other day, a power outage during a routine maintenance test caused the temporary glitch in the food stamp program.
One of the results of this glitch was that shoppers were able to sweep through the aisles at stores and buy as much as they could carry because their preset spending limit had been removed. This caused a great deal of concern at Walmart stores when shoppers started to show up at the checkout with fully loaded carts.
However, another side effect of the glitch was that other food stamp shoppers were unable to purchase any food. The glitch caused food stamp recipients in 17 states to lose access for much of a Saturday to the electronic system used by stores to verify their benefits. This left many unable to buy any groceries.
What Should Have Been Done
Clearly this situation should never have been allowed to happen. The Xerox team that designed the food stamp system has not done the required amount of testing. It appears as though they got themselves caught in the IT equivalent of a perfect storm: during a routine test of a backup system, a power glitch hit and that placed the system into a previously unknown state.
Editor’s Note: Many thanks to Carol Zierhoffer who wrote to me after this article was published in order to correct some misunderstandings on my part. I had stated that I felt that it was the Xerox CIO’s fault that this outage had been allowed to happen. Carol pointed out that she was CIO at Xerox for just 18 months and had already left when the outage occurred (ouch — but that’s for another article some day). Next she revealed that the CIO at Xerox does does not have responsibility for the underlying applications that support the external offerings of the heritage ACS company. Yes, yes there are all sorts of issues with this, but as Carol says, it’s they way that things currently are. In Carols own words “By way of background, ACS was acquired by Xerox in Feb, 2010 and is now called Xerox Services (XS). XS has 100+ Strategic Business Units, each with their own IT organizaitons that are run independently by the Strategic Business Unit. That is how management chooses to run the company. The heritage Xerox side of the business is different, where the CIO does have full responsibility for all systems both customer facing and internal.”
The reason that I’m holding Xerox and their corporate CIO structure responsible for this is that we all know that events like this can happen. No, we can’t predict exactly what they’ll look like, but we can almost certainly predict that they’ll happen. That’s why it’s the CIO’s responsibility to make sure that the IT systems that they are responsible for have the ability to deal with unplanned circumstances.
There were two problems associated with this outage: the granting of unlimited spending to food stamp program participants and the inability of people to access the system. The removal of spending limits is a simple programming bug and effective code reviews would have detected this long ago. Much more unacceptable is the extended outage that a brief power outage caused. This is a fundamental system design problem that should never have occurred. Xerox needs to go back and fix things. Improving their code review procedures would be a good start, but redesigning the food stamp system to improve its reliability is a must.
What All Of This Means For You
The U.S. food stamp program is a critical system that allows people to buy food who could not otherwise afford to do so. This means that it is a mission critical system and always has to be there to support these people who really can’t speak for themselves. However, the system recently experienced an outage that prevented people from purchasing food for a period of time.
The outage is reported to have been caused by a routine test of the system’s back up capabilities. As IT professionals, we can all understand how this type of testing can cause a ripple effect that could cause a system to shut down. However, when a system is a mission critical system, the design of the system has to take events like this into account and needs to have ways to prevent it from impacting the vulnerable end users. Clearly this was not the case.
The person in the CIO position at Xerox, has some answering to do. It’s understood that the system may have been installed before she become CIO. However, as CIO it is her responsibility to evaluate the level of risk associated with all of their systems and clearly this has not been done for the food stamp application. Let us hope that she now realizes the importance of this system and that design changes will be made that will prevent an outage like this from ever happening again.
Question For You: What do you think Xerox’s first step should be to prevent this from happening again?
P.S.: Free subscriptions to The Accidental Successful CIO Newsletter are now available. Learn what you need to know to do the job. Subscribe now: Click Here!
What We’ll Be Talking About Next Time
Sometimes you just have to change everything. When you get the CIO job great things are going to be expected of you. Everyone understands the importance of information technology and so they are going to be looking at you with the assumption that you have all of the answers. Of course you don’t, but you can’t tell them that. Instead, you are going to have to show them. This means that you’re going to have to shake things up a bit.