I came across an interesting post on twitter the other day (https://twitter.com/suffert/status/567486188383379456) that depicts a sidewalk with a sign indicating what wasn’t allowed on the sidewalk. You have seen these before: NO bicycles, skateboards, rollerblades, roller skates, scooters. In the information technology sector, this is known as a black list; a list that defines what is NOT allowed or permitted. You can see black lists all over the place, input validation, output encoding, etc.
The other type of list that we are more commonly seeing is a white list; a list that defines what IS allowed indicating that everything else is NOT allowed. While writing this post I was drawing a blank on where I have seen thin in the physical world and it wasn’t until I was talking to a colleague about this that I realized I had the perfect example: Handicap parking. Handicap parking signs are meant to say that only people with that designation can park there and everyone else is prohibited. In technology, we are seeing it a lot more for input validation and output encoding because it is usually a smaller list compared to a black list. Lets compare the two and see what pros and cons exist.
Honestly, they both provide good protection when properly defined. Depending on the data, a black list can actually be a strong control. For example, if we have a system that has special escape sequences to identify its control characters. While simplified down (and I know there are more characters than this) SQL uses the (‘) apostrophe as a control character. It is that delimiter to determine what is data and what is command. If SQL only had one control character (the apostrophe) then a black list would be sufficient. Put the apostrophe into the black list and any time that character appeared you could reject it, or escape it. Unfortunately, it is rare that the list will be that small. Using the example of SQL, what happens if in the future the update is released and now the (-) dash is a special character, or the (#) hashtag? Now the list has to be updated and re-deployed and during that time before deployment the application could be vulnerable.
A white list defines exactly what is good and puts everything else up for question. For this example lets take a first name field and look at input validation. If the field is defined as only (a-z) characters then it is easy to set up a white list using a regular expression to say only the letters (a-z) will be accepted. Every other character will be rejected. A regular expression for (a-z) is much simpler than trying to record every other character out there into the black list. What if you forget one? In this case you really don’t forget any because it is such a limited set. In the example I gave earlier with the handicap parking, the sign is simple: One designation that is allowed. What if the sign used a black list? Can you imagine the number of prohibited items there would be?
Another example is in output encoding to protect against HTML context cross-site scripting. I created a document a few years ago showing the different encoding methods in .NET (http://www.jardinesoftware.com/Documents/ASPNET_HTML_Encoding.pdf). Looking at this, there are five characters that are encoded using a black list build into .NET (<,>,”,&,’). This list defines what will get output encoded when using the HTMLEncode method. These are some of the most common characters used to perform cross-site scripting. What if a new character is found to be a problem? This method won’t cover it. With a white list we could say encode everything except for (a-z). Now if a new special character is determined to be a problem it is already encoded for us.
EFFECT ON USER
You wouldn’t expect much effect on the users if all you are doing is saying what is and isn’t allowed based on the use of the data. However, lets go back to the initial example that started all of this, the twitter post. Setting up the black list was most likely fairly simple. Here are some common problem items we see, lets just prohibit them. Of course then someone comes along on a unicycle and while probably shouldn’t be there, are not in violation of the sign. So it appears as a “Good Enough” solution that shouldn’t inhibit any valid users.
I posed the question on what the white list would look like. The first response I got back was “unassisted movement only” from a friend of mine, Tim Tomes.
Seems like a pretty good idea, I am not sure I would have thought of unassisted movement, but lets dig a little deeper. What about a wheelchair or crutches?
The point here is that with a white list, if it is too narrow, it could effect the ability for valid users to use the system. In this case, just using “unassisted movement only”, while a great first draft, would have prohibited anyone in a wheel chair from using the sidewalk. The point is that because a white list will prohibit anything in the list, it must be scrutinized and tested much more to ensure that it is exactly what is needed. Unlike a black list where there can be a control after the black list to continue limiting down items, if it is blocked by the white list there is no way to still have it later on.
I like both black lists and white lists and I believe they both have their place. It is important for you to analyze what your situation is to determine what the best course of action will be. In some cases a black list will be exactly what you are looking for, in others the white list will be the right fit. WE often get this feeling that we have to make blanket statements like “White lists are better so only use those.” Situations are different, the lists are different and you want to use the one that best fits your needs. Take a moment to determine what the pros and cons are to each and select the best fit.