An application typically has multiple environments from development through to full production. It is rare to find an application that doesn’t use some form of data. Some applications may use just a little data with a very simple database, while others may have very complex database schemas with a lot of data. Developers usually load just enough data to test the features/functions being implemented in the current iteration. Production systems contain actual customer information which may be very sensitive in nature. Finally, we have the test environments. These environments need to be fully functional, requiring lots of data, but where should the data come from?
In many cases it is common to see data from production copied into the test environments. Due to many test systems having less security controls in place, this may inadvertently expose sensitive data. In addition to securing the environment, here are a few tips to help protect the sensitive data when trying to populate lower level environments.
- Don’t Use Production Data
- Disassociate Sensitive Information
- Remove Sensitive Information
Don’t Use Production Data
The safest solution is to not use actual production data in any other environments. Like any other security control, if you don’t have the information you have less risk. While this data may be most realistic to indicate how the system is used, it often comes with a high risk exposure. There are benefits to using scripts to generate test data because it is less likely to contain sensitive information and it can be easier to make test automation more successful. It is also possible to script in values that may be edge cases or less common in real data that can help enable better test cases.
Disassociate Sensitive Information
If you have (or decide) to use data from production one option is to make sure sensitive data is disassociated. There are many ways to do this, depending on how your system works. Some places will just scramble the fields so the data is real, but the different columns are re-arranged so that data for any given row is actually not related. The following table shows the initial data (Note: This data is made up):
First Name | Last Name | Tax ID | Phone |
---|---|---|---|
John | Smith | 333-33-3333 | 904-555-6588 |
Debra | Jones | 111-11-1111 | 301-555-2395 |
Jason | Walker | 999-99-9999 | 011-138-9443 |
The following table shows the same data from above, but it has been disassociated. Notice how the data is no longer related to any specific person. Keep in mind that the data here is a very small sample so the combinations to get the real data would not be that difficult. However with a large dataset, it could be enough to help slow an attacker.
First Name | Last Name | Tax ID | Phone |
---|---|---|---|
John | Walker | 111-11-1111 | 904-555-6588 |
Debra | Smith | 999-99-9999 | 011-138-9443 |
Jason | Jones | 333-33-3333 | 301-555-2395 |
Depending on features of the system, this may not be ideal. Imagine that the system actually sends emails or ships items. Of course you have disabled these features so they don’t actually function in test, right? Either way, data like phone numbers, addresses, email addresses, etc could lead to an incident if executed against this random data. Customers would be confused if they received a notification that had some of their information but also someone else’s information. A headache you don’t want to deal with. On another note, things like email addresses can be self identifying all by themselves. This might be information that should be removed or further mangled as to protect your user identities.
Remove the Sensitive Data
One option is to remove any data considered to be sensitive. It is important to check with the corporate guidelines or data classifications for the specific requirements for sensitive information. The sensitive data could be replace by just place holder generic data. For example, replace all phone numbers with (999)999-9999 or emails with test@sometestexample.com.
It can be more difficult when a sensitive field is used as a search field or a unique identifier. If that phone number is used as a search field, setting all phone numbers to the same value won’t work well in the test environment because you can’t really test the search feature. It would return all or nothing, which would not be a desired test case.
Check with your internal security office to understand the policies and procedures that are in place regarding production data. If no policies exist, work with the security team to help define them. By working together it is possible to understand the risks and hopefully reduce them. Determine a procedure that will work in your situation.
Leave a Reply
You must be logged in to post a comment.