By now you have heard of the amazon issues that plagued many websites a few days ago. I want to talk about one key part of the issue that often gets overlooked. If you read through their message describing their service disruption (https://aws.amazon.com/message/41926/) you will notice a section where they discuss some changes to the tools they use to manage their systems.
So let’s take a step back for a moment. Amazon attributed the service disruption to basically a simple mistake when executing a command. Although following an established playbook, something was entered incorrectly, leading to the disruption. I don’t want to get into all the details about the disruption, but rather, lets fast forward to considering the tools that are used.
Whether we are developing applications for external use or tools used to manage our systems, we don’t always think about all the possible threat scenarios. Sure, we think about some of the common threats and put in some simple safeguards, but what about some of those more detailed issues.
In this case, the tools allowed manipulating sub-systems that shouldn’t have been possible. It allowed shutting systems down too quickly, which caused some issues. This is no different than many of the tools that we write. We understand the task and create something to make it happen. But then our systems get more complicated. They pull in other systems which have access to yet other systems. A change to something directly accessible has an effect on a system 3 hops away.
Now, the chances of this edge case from happening are probably pretty slim. The chances that this case was missed during design is pretty good. Especially if you are not regularly reassessing your systems. What may have not been possible 2 years ago when the system was created is now possible today.
Even with Threat Modeling and secure design, things change. If the tool isn’t being updated, then the systems it controls are. We must have the ability to learn from these situations and identify how we can apply them to our own situations. How many tools have you created that can control your systems where a simple wrong click or incorrect parameter can create a huge headache. Do you even understand your systems well enough to know if doing something too fast or too slow can cause a problem. What about the effect of shutting one machine down has on any other machines. Are there warning messages for things that may have adverse effects? Are there safeguards in place of someone doing something out of the norm?
We are all busy and going back through working systems isn’t a high priority. These tools/systems may require an addition of new features, which makes for a perfect time to reassess it. Like everything in security, peripheral vision is very important. When working on one piece, it is always good to peek at other code around it. Take the time to verify that everything is up to date and as expected. Not any changes and determine if any enhancements or redesign is in order. Most of the time, these are edge case scenarios, but they can have a big impact if they occur.
Jardine Software helps companies get more value from their application security programs. Let’s talk about how we can help you.
James Jardine is the CEO and Principal Consultant at Jardine Software Inc. He has over 15 years of combined development and security experience. If you are interested in learning more about Jardine Software, you can reach him at firstname.lastname@example.org or @jardinesoftware on twitter.