Tag Archives: data

New Year’s Resolutions

Here we are, the start of another year. As we reflect on 2017, this is where we really start to focus on what lies ahead in 2018. The new year is always interesting because it usually doesn’t affect our build cycles or releases. With the exception of accounting for vacations. Yet, this is the time of year where many people get re-focused and motivated to change old habits or try something new.

Listen to the Podcast:

As I look back on 2017, there were a lot of news headlines that focused around security. So many of them highlighting breaches, many termed “mega” breaches. The trend of hyped up headlines glorifying monster breaches will likely continue through 2018 and beyond. We know that breaches can, or will, happen. We have seen examples of different techniques used to gain unauthorized access to data. This won’t change, and will most likely become more prevalent going forward. The amount of information available to potential attackers is enormous, making our job of application security that much more important.

One of the biggest lessons to take away from 2017 is that privacy is important. In addition, private data is not limited to PCI or HIPAA. All sorts of data can be considered private and require the custodian to take proper steps to protect it. It doesn’t matter if the data is held by a Fortune 500 company or a one-person shop. To someone, that data is worth something. As we look into 2018, this reminds us that we must understand what data we have. We must know what type of regulations it may fall under, what applications contain it, and how we are protecting it. Just because data may not fall under a regulation doesn’t mean it should be overlooked. In the end, it is the expectation of our customers and clients that we will handle their data responsibly.

Protecting this data is not about how much money you spend or what tools you buy. Every organization is different. Every application development team is different. I encourage everyone to take the time to research and understand what your team needs to be successful. As in the past, throughout the year I will be posting thoughts on different application security topics. If you have any questions or topics, feel free to share them with me. Looking for someone to talk to about application security? Reach out. I have services available to help organizations and individuals reach new heights and solve problems.

What are your New Year’s Resolutions when it comes to application security?

Tips for Securing Test Data (Scrubbing?)

An application typically has multiple environments from development through to full production. It is rare to find an application that doesn’t use some form of data. Some applications may use just a little data with a very simple database, while others may have very complex database schemas with a lot of data. Developers usually load just enough data to test the features/functions being implemented in the current iteration. Production systems contain actual customer information which may be very sensitive in nature. Finally, we have the test environments. These environments need to be fully functional, requiring lots of data, but where should the data come from?

In many cases it is common to see data from production copied into the test environments. Due to many test systems having less security controls in place, this may inadvertently expose sensitive data. In addition to securing the environment, here are a few tips to help protect the sensitive data when trying to populate lower level environments.

  • Don’t Use Production Data
  • Disassociate Sensitive Information
  • Remove Sensitive Information

Don’t Use Production Data
The safest solution is to not use actual production data in any other environments. Like any other security control, if you don’t have the information you have less risk. While this data may be most realistic to indicate how the system is used, it often comes with a high risk exposure. There are benefits to using scripts to generate test data because it is less likely to contain sensitive information and it can be easier to make test automation more successful. It is also possible to script in values that may be edge cases or less common in real data that can help enable better test cases.

Disassociate Sensitive Information
If you have (or decide) to use data from production one option is to make sure sensitive data is disassociated. There are many ways to do this, depending on how your system works. Some places will just scramble the fields so the data is real, but the different columns are re-arranged so that data for any given row is actually not related. The following table shows the initial data (Note: This data is made up):

First Name Last Name Tax ID Phone
John Smith 333-33-3333 904-555-6588
Debra Jones 111-11-1111 301-555-2395
Jason Walker 999-99-9999 011-138-9443

The following table shows the same data from above, but it has been disassociated. Notice how the data is no longer related to any specific person. Keep in mind that the data here is a very small sample so the combinations to get the real data would not be that difficult. However with a large dataset, it could be enough to help slow an attacker.

First Name Last Name Tax ID Phone
John Walker 111-11-1111 904-555-6588
Debra Smith 999-99-9999 011-138-9443
Jason Jones 333-33-3333 301-555-2395

Depending on features of the system, this may not be ideal. Imagine that the system actually sends emails or ships items. Of course you have disabled these features so they don’t actually function in test, right? Either way, data like phone numbers, addresses, email addresses, etc could lead to an incident if executed against this random data. Customers would be confused if they received a notification that had some of their information but also someone else’s information. A headache you don’t want to deal with. On another note, things like email addresses can be self identifying all by themselves. This might be information that should be removed or further mangled as to protect your user identities.

Remove the Sensitive Data
One option is to remove any data considered to be sensitive. It is important to check with the corporate guidelines or data classifications for the specific requirements for sensitive information. The sensitive data could be replace by just place holder generic data. For example, replace all phone numbers with (999)999-9999 or emails with test@sometestexample.com.

It can be more difficult when a sensitive field is used as a search field or a unique identifier. If that phone number is used as a search field, setting all phone numbers to the same value won’t work well in the test environment because you can’t really test the search feature. It would return all or nothing, which would not be a desired test case.

Check with your internal security office to understand the policies and procedures that are in place regarding production data. If no policies exist, work with the security team to help define them. By working together it is possible to understand the risks and hopefully reduce them. Determine a procedure that will work in your situation.

Sensitive Data and Storage Issues

Do you know what constitutes sensitive data in your organization? How about in your state or industry? As developers or business analysts we often do not follow the nitty gritty details of sensitive information regulations or laws. Not that we don’t want to enforce them, but often times I think we often just don’t know about them. It is often thought that the CIO, CISO or a privacy officer is responsible for understanding our data and to what level it needs to be protected. I completely believe that these positions should understand the rules and regulations around privacy and what is sensitive data. Although this can be difficult because there are multiple definitions depending on what state and what industry you are in.

When working on developing an application, do you give much thought about data storage and sensitive information much past the user’s password? What is it that defines sensitive information for you and your organization? While it may not be a developers or business analysts main focus, it is important that everyone in the development lifecycle understand that data processed and stored and any rules around it.

The first place to look in most organizations is probably your policies and procedures. Most likely there are data classification documents that describe what is sensitive data and how that data must be handled. If your organization doesn’t have this type of documentation, this is a good time to start thinking about it. Often, this documentation is created by a privacy team, the security team, or some other office outside of the development teams. While it is probably not your job to create the documentation, it is important that you know and understand it.

The second thing to do is to look at your local state regulations for your industry. Regulations or definitions may be different depending on if you are healthcare, PCI, or some other industry. Unfortunately, many states have laws in place (usually around data breach notifications), but they are not standardized across the states. This may be changed soon as there is a movement in the government to create a nationwide breach notification law which may make things a little easier and more consistent. Until then, we are stuck scouring the internet looking for these different laws.

Some examples of these laws are in New Jersey, with this bill that recently went into effect, and Florida with the Florida Information Protection Act of 2014. Both of these are similar, yet have their differences. For example, while NJ calls out Driver’s license and State ID card, Florida also adds Passport, Military ID and other government documents used to verify identity. The Florida law also discusses Username, password and secret questions and answers. The following shows a quick summary of the data that can be considered sensitive:

New Jersey

  • First Name (or initial) and last name linked with one or more of the following:
    • Social Security Number
    • Driver’s License or State ID card Number
    • Address
    • Identifiable health information


  • First Name (or initial) and last name with one or more of the following:
    • Social Security Number
    • Driver’s License, Passport, Military ID, or other similar number on government document used to verify identity
    • Financial Account Number, Credit or Debit Card Number in combination with
      • Security Code
      • Access Code
      • Password
    • Medical history, mental or physical condition, medical treatment or diagnosis by a health care professional
    • Health insurance policy number or Subscriber ID and any unique identifier used by health insurer to identify individual
  • User name or email address in combination with a password or security question and answer that would permit access to online account

It is important to protect this sensitive information because many times it is what the attackers are after. Both of the laws above require that the information be rendered unusable (encrypted) to be protected. All too often we think only about the user’s password or possibly their social security number, but rarely are we thinking about some of this other information. When we know during design and development what data we use and how it needs to be protected then it is that much easier to do it right the first time.

Take the time to catalog all of the data elements your application uses and how you are protecting them (if needed). You can’t protect what you don’t know you have, so it is important to first inventory and then determine where the holes may be.

Episode 21 of the DevelopSec Podcast discusses this more if you want to take a listen.