Attackers take advantage of an application by manipulating the inputs to the system. For example, a first name field or even a request header like the user-agent. Applications wouldn’t be very useful if they didn’t accept any input from the end user. Unfortunately, this is the key attack vector. One of the basic techniques used to help protect a system is to us input validation, which assesses the input to determine if it is should be accepted.
Many development groups have fought with the concept of input validation in regards to how much should be done. Get to extreme with it and you may never implement it at all. It is important to understand what your goals are for the input validation, goals that should align with what the application does. Make the validation to complex and it is too easy to introduce errors. Start out with a simple plan, don’t go for the gusto out of the start gate. Here are some tips for starting out with input validation.
One of the easiest checks to implement is checking the data type of the input field. If you are expecting a date data type and the input received is not a valid date then it is invalid and should be rejected. Numbers are also fairly simple to validate to ensure they match the type expected. Many of the most popular frameworks provide methods to determine if a string matches a specific data type. Use these features to validate this type of data. Unfortunately, when the data type is a free form string then the check isn’t as useful. This is why we have other checks that we will perform.
You have determined that the input is of the correct type, say a number or a date. Now it is time to validate that it falls within the correct range of acceptable values. Imagine that you have an age field on a form. Depending on your business cases, lets say that a valid age is 0-150. It is a simple step to ensure that the value entered falls into this valid range. Another example is a date field. What if the date represents a signed date and can’t be in the future. In this case, it is important to ensure that the value entered is not later than the current day.
This is really more for the free form text fields to validate that the input doesn’t exceed the expected length for the field. In many cases, this lines up with the database or back end storage. For example, in SQL Server you may specify a first name field that is 100 characters or a state field that only allows two characters for the abbreviation. Limiting down the length of the input can significantly inhibit some of the attacks that are out there. Two characters can be tough to exploit, while possible depending on the application design.
White lists can be a very effective control during input validation. Depending on the type of data that is being accepted it may be possible to indicate that only alphabetical characters are acceptable. In other cases you can open the white list up to other special characters but then deny everything else. It is important to remember that even white lists can allow malicious input depending on the values required to be accepted.
Black lists are another control that can be used when there are specifically known bad characters that shouldn’t be allowed. Like white lists, this doesn’t mean that attack payloads can’t get through, but it should make them more difficult.
Regular expressions help with white and black lists as well as pattern matching. With the former it is just a matter of defining what the acceptable characters are. Pattern matching is a little different and really aligns more with the range item discussed earlier. Pattern matching is great for free form fields that have a specific pattern. A social security number or an email address. These fields have an exact format that can be matched, allowing you to determine if it is right or not.
Test Your Validation
Make sure that you test the validation routines to ensure they are working as expected. If the field shouldn’t allow negative numbers, make sure that it rejects negative numbers. If the field should only allow email addresses then ensure that it rejects any pattern that doesn’t match a valid email address.
Validate Server Side
Maybe this should have been first? Client side validation is a great feature for the end user. They get immediate feedback to what is wrong with their data without having that round trip to the server and back. THAT IS EASILY BYPASSED!! It is imperative that the validation is also checked on the server. It is too easy to bypass client side validation so this must be done at the server where the user cannot intercept it and bypass it.
Don’t Try To Solve All Problems
Don’t try to solve all output issues with input validation. It is common for someone to try and block cross site scripting with input validation, it is what WAFs do right? But is that the goal? What if that input isn’t even sent to a browser? Are you also going to try and block SQL Injection, LDAP Injection, Command Injection, XML Injection, Parameter Manipulation, etc. with input validation? Now we are getting back to an overly complex solution when there are other solutions for these issues. These types of items shouldn’t be ignored and that is why we have the regular expressions and white lists to help decrease the chance that these payloads make it into the system. We also have output encoding and parameterized queries that help with these additional issues. Don’t spend all of your time focusing on input validation and forget about where this data is going and protecting it there.
Input validation is only half of the solution, the other half is implemented when the data is transmitted from the application to another system (ie. database, web browser, command shell). Just like we don’t want to rely solely on output encoding, we don’t want to do that with input validation. Assess the application and its needs to determine what solution is going to provide the best results. If you are not doing any input validation, start as soon as possible. It is your first line of defense. Start small with the Type/Range/Length checks. When those are working as expected then you can start working into the more advanced input validation techniques. Make sure the output encoding routines are also inplace.
Don’t overwhelm yourself with validating every possible attack. This is one cog in the system. There should also be other controls like monitoring and auditing that catch things as well. Implementing something for input validation is better than nothing, but don’t let it be a false sense of security. Just because you check the length of a string and limit it to 25 characters, don’t think a payload can’t be sent and exploited. It is, however, raising the bar (albeit just a little).
Take the time to assess what you are doing for input validation and work with the business to determine what valid inputs are for the different fields. You can’t do fine validation if you don’t understand the data being received. Don’t forget that data comes in many different forms or encodings. Convert all data to a specific encoding before you perform any validation on it.