In the application security community it is common to talk about untrusted data. Talk about any type of injection attack (SQLi, XSS, XXE, etc) and one of the first terms mentions is untrusted data. In some cases it is also known as user data. While we hear the phrase all the time, are we sure everyone understands what it means? What is untrusted data? It is important that anyone associated with creating and testing applications understand the concept of untrusted data.
Unfortunately, it can get a little fuzzy as you start getting into different situations. Lets start with the most simplistic definition:
Untrusted data is data that is controlled by the user.
While this is very simple, it is often confusing to many. When we say controlled by the user, what does that really mean? For some, they stop at data in text boxes or drop down lists, overlooking hidden fields, request headers (cookies, referer, user agent, etc), etc. And that is just on the client side.
From the client perspective, we need to include any data that could be manipulated before it gets to the server. That includes cookie values, hidden form fields, user agent, referer field, and any other item that is available. A common mistake is to assume if there is no input box on the page for the data, it cannot be manipulated. Thanks to browser plugins and web proxies, that is far from the truth. All of that data is vulnerable to manipulation.
What about on the server side? Can we assume everything on the server is trusted? First, lets think about the resources we have server-side. There are configuration files, file system objects, databases, web services, etc. What would you trust out of these systems?
It is typical to trust configuration files stored on the file system of the web server. Think your web.xml or web.config files. These are typically deployed with the application files and are not easy to update. Access to those files in production should be very limited and it would not be easy to open them up to others for manipulation. What about data from the database? Often I hear people trusting the database. This is a dangerous option. Lets take an example.
You have a web application that uses a database to store its information. The web application does a good job of input validation (type, length, etc) for any data that can be stored in the database. Since the web application does good input validation, it lacks on output encoding because it assumes the data in the database is good. Today, maybe no other apps write to that database. Maybe the only way to get data into that database is either via SQL Script run by a DBA or through the app with good input validation. Even here, there are weaknesses. What if the input validation misses an attack payload. Sure the validation is good, but does it catch everything? What if a rogue DBA manipulates a script to put malicious data into the database? Could happen.
Now, think about the future, when the application owner requests a mobile application to use the same database, or they decide to create a user interface for data that previously was not available for update in the application previously. Now, that data that was thought to be safe (even though it probably wasn’t) is now even less trusted. The mobile application or other interfaces may not be as stringent as thought.
The above example has been seen in real applications numerous times. It is a good example of how what we might think at first is trusted data really isn’t.
Web services are similar to the database. Even internal web services should be considered untrusted when it comes to the validity of the data being returned. We don’t know how the data on the other side of that service is being handled, so how can we trust it?
When working with data, take the time to think about where that data is coming from and whether or not you are willing to take the risk of trusting it. Think beyond what you can see. It is more than just fields on the form that users can manipulate. It is more than just the client-side data, even though we use terms like user controlled data. There is a lot of talk about threat modeling in application security, and it is a great way to help identify these trust boundaries. Using a data flow model and showing the separation of trust can make it much easier to understand what data we trust and what we don’t. At the bottom of this article is a very basic, high level, rough draft data flow diagram that shows basic trust boundaries for some of the elements I mentioned above. This is just an example and is not indicative of an actual system.
When it comes to output encoding to protect against cross site scripting or proper encoding for SQL, LDAP, or OS calls, the safest approach is to just perform the encoding. While you may trust a value from the web.config it doesn’t mean you can’t properly do output encoding to protect from XSS. From a pure code perspective, that is the most secure code. Assuming the data is safe, while it may be, does increase risk to some level.
If you are new to application security or training others, make sure they really understand what is meant by untrusted data. Go deeper than what is on the surface. Ask questions. Thank about different scenarios. Use stories from past experience. Once the concept is better understood it makes triaging flaws from the different automated tools much easier and should reduce number of bugs going forward.
Example Data Flow Diagram Showing Trust Boundaries
(This is only an example for demonstration purposes.)