Monthly Archives: February 2015

Black Lists and White Lists: Overview

I came across an interesting post on twitter the other day (https://twitter.com/suffert/status/567486188383379456) that depicts a sidewalk with a sign indicating what wasn’t allowed on the sidewalk. You have seen these before: NO bicycles, skateboards, rollerblades, roller skates, scooters. In the information technology sector, this is known as a black list; a list that defines what is NOT allowed or permitted. You can see black lists all over the place, input validation, output encoding, etc.

BLWL1

The other type of list that we are more commonly seeing is a white list; a list that defines what IS allowed indicating that everything else is NOT allowed. While writing this post I was drawing a blank on where I have seen thin in the physical world and it wasn’t until I was talking to a colleague about this that I realized I had the perfect example: Handicap parking. Handicap parking signs are meant to say that only people with that designation can park there and everyone else is prohibited. In technology, we are seeing it a lot more for input validation and output encoding because it is usually a smaller list compared to a black list. Lets compare the two and see what pros and cons exist.

PROTECTION
Honestly, they both provide good protection when properly defined. Depending on the data, a black list can actually be a strong control. For example, if we have a system that has special escape sequences to identify its control characters. While simplified down (and I know there are more characters than this) SQL uses the (‘) apostrophe as a control character. It is that delimiter to determine what is data and what is command. If SQL only had one control character (the apostrophe) then a black list would be sufficient. Put the apostrophe into the black list and any time that character appeared you could reject it, or escape it. Unfortunately, it is rare that the list will be that small. Using the example of SQL, what happens if in the future the update is released and now the (-) dash is a special character, or the (#) hashtag? Now the list has to be updated and re-deployed and during that time before deployment the application could be vulnerable.

A white list defines exactly what is good and puts everything else up for question. For this example lets take a first name field and look at input validation. If the field is defined as only (a-z) characters then it is easy to set up a white list using a regular expression to say only the letters (a-z) will be accepted. Every other character will be rejected. A regular expression for (a-z) is much simpler than trying to record every other character out there into the black list. What if you forget one? In this case you really don’t forget any because it is such a limited set. In the example I gave earlier with the handicap parking, the sign is simple: One designation that is allowed. What if the sign used a black list? Can you imagine the number of prohibited items there would be?

Another example is in output encoding to protect against HTML context cross-site scripting. I created a document a few years ago showing the different encoding methods in .NET (http://www.jardinesoftware.com/Documents/ASPNET_HTML_Encoding.pdf). Looking at this, there are five characters that are encoded using a black list build into .NET (<,>,”,&,’). This list defines what will get output encoded when using the HTMLEncode method. These are some of the most common characters used to perform cross-site scripting. What if a new character is found to be a problem? This method won’t cover it. With a white list we could say encode everything except for (a-z). Now if a new special character is determined to be a problem it is already encoded for us.

EFFECT ON USER
You wouldn’t expect much effect on the users if all you are doing is saying what is and isn’t allowed based on the use of the data. However, lets go back to the initial example that started all of this, the twitter post. Setting up the black list was most likely fairly simple. Here are some common problem items we see, lets just prohibit them. Of course then someone comes along on a unicycle and while probably shouldn’t be there, are not in violation of the sign. So it appears as a “Good Enough” solution that shouldn’t inhibit any valid users.

I posed the question on what the white list would look like. The first response I got back was “unassisted movement only” from a friend of mine, Tim Tomes.

BLWL2

Seems like a pretty good idea, I am not sure I would have thought of unassisted movement, but lets dig a little deeper. What about a wheelchair or crutches?

The point here is that with a white list, if it is too narrow, it could effect the ability for valid users to use the system. In this case, just using “unassisted movement only”, while a great first draft, would have prohibited anyone in a wheel chair from using the sidewalk. The point is that because a white list will prohibit anything in the list, it must be scrutinized and tested much more to ensure that it is exactly what is needed. Unlike a black list where there can be a control after the black list to continue limiting down items, if it is blocked by the white list there is no way to still have it later on.

CONCLUSION
I like both black lists and white lists and I believe they both have their place. It is important for you to analyze what your situation is to determine what the best course of action will be. In some cases a black list will be exactly what you are looking for, in others the white list will be the right fit. WE often get this feeling that we have to make blanket statements like “White lists are better so only use those.” Situations are different, the lists are different and you want to use the one that best fits your needs. Take a moment to determine what the pros and cons are to each and select the best fit.

Sensitive Data and Storage Issues

Do you know what constitutes sensitive data in your organization? How about in your state or industry? As developers or business analysts we often do not follow the nitty gritty details of sensitive information regulations or laws. Not that we don’t want to enforce them, but often times I think we often just don’t know about them. It is often thought that the CIO, CISO or a privacy officer is responsible for understanding our data and to what level it needs to be protected. I completely believe that these positions should understand the rules and regulations around privacy and what is sensitive data. Although this can be difficult because there are multiple definitions depending on what state and what industry you are in.

When working on developing an application, do you give much thought about data storage and sensitive information much past the user’s password? What is it that defines sensitive information for you and your organization? While it may not be a developers or business analysts main focus, it is important that everyone in the development lifecycle understand that data processed and stored and any rules around it.

The first place to look in most organizations is probably your policies and procedures. Most likely there are data classification documents that describe what is sensitive data and how that data must be handled. If your organization doesn’t have this type of documentation, this is a good time to start thinking about it. Often, this documentation is created by a privacy team, the security team, or some other office outside of the development teams. While it is probably not your job to create the documentation, it is important that you know and understand it.

The second thing to do is to look at your local state regulations for your industry. Regulations or definitions may be different depending on if you are healthcare, PCI, or some other industry. Unfortunately, many states have laws in place (usually around data breach notifications), but they are not standardized across the states. This may be changed soon as there is a movement in the government to create a nationwide breach notification law which may make things a little easier and more consistent. Until then, we are stuck scouring the internet looking for these different laws.

Some examples of these laws are in New Jersey, with this bill that recently went into effect, and Florida with the Florida Information Protection Act of 2014. Both of these are similar, yet have their differences. For example, while NJ calls out Driver’s license and State ID card, Florida also adds Passport, Military ID and other government documents used to verify identity. The Florida law also discusses Username, password and secret questions and answers. The following shows a quick summary of the data that can be considered sensitive:

New Jersey

  • First Name (or initial) and last name linked with one or more of the following:
    • Social Security Number
    • Driver’s License or State ID card Number
    • Address
    • Identifiable health information

Florida

  • First Name (or initial) and last name with one or more of the following:
    • Social Security Number
    • Driver’s License, Passport, Military ID, or other similar number on government document used to verify identity
    • Financial Account Number, Credit or Debit Card Number in combination with
      • Security Code
      • Access Code
      • Password
    • Medical history, mental or physical condition, medical treatment or diagnosis by a health care professional
    • Health insurance policy number or Subscriber ID and any unique identifier used by health insurer to identify individual
  • User name or email address in combination with a password or security question and answer that would permit access to online account

It is important to protect this sensitive information because many times it is what the attackers are after. Both of the laws above require that the information be rendered unusable (encrypted) to be protected. All too often we think only about the user’s password or possibly their social security number, but rarely are we thinking about some of this other information. When we know during design and development what data we use and how it needs to be protected then it is that much easier to do it right the first time.

Take the time to catalog all of the data elements your application uses and how you are protecting them (if needed). You can’t protect what you don’t know you have, so it is important to first inventory and then determine where the holes may be.

Episode 21 of the DevelopSec Podcast discusses this more if you want to take a listen.