Input Validation –”All input is evil” – CS2 – In progress


Summary:

Any input that comes into a program from an external source – such as a user typing at a keyboard or a network connection – can potentially be the source of security concerns and potentially disastrous bugs. All input should be treated as potentially dangerous.

Description:

All interesting software packages rely upon external input. Although information typed at a computer might be the most familiar, networks and external devices can also send data to a program. Generally, this data will be of a specific type: for example, a user interface that requests a person’s name might be written to expect a series of alphabetic characters. If the correct type and form of data is provided, the program might work fine. However, if programs are not carefully written, attackers can construct inputs that can cause malicious code to be executed.

Risk – How can it happen?

Any data that can enter your program from an external source can be a potential source of problems. If external data is not checked to verify that it has the right type of information, the right amount of information, and the right structure of information, it can cause problems.

Input validation errors can lead to buffer overflows if the data being provided is used as an index into an array. Input that is used as the basis for a database search can be used as the basis for SQL injections, which use carefully constructed inputs to make relational databases reveal data inappropriately or even destroy data.

Example of occurrence:

A Norwegian woman mistyped her account number on an internet banking system. Instead of typing her 11-digit account number, she accidentally typed an extra digit, for a total of 12 numbers. The system discarded the extra digit, and transferred $100,000 to the (incorrect) account given by the 11 remaining numbers. A simple dialog box informing her that she had typed two many digits would have gone a long way towards avoiding this expensive error.

Olsen, Kai. “The $100,000 Keying error” IEEE Computer, August 2008

Example in Code:

This program stores the squares of the number from one to ten in array, and then asks the user to type a number. The square of that number will then be returned:

#include 
using namespace std;
int main()
{
  unsigned int vals[10];
  size_t which;
  unsigned int square;

  for (size_t i = 0; i < 10; i++ )
    vals[i] = (i+1)*(i+1);

  cout << "Please type a number: ";
  cin >> which;

  square = vals[which-1];
  cout << "The square of " << which << " is " << square << endl;
  return 0;
}

This program has two input validation problems. The first comes with the use of cin to read an integer from the console. If the user types a number, this will work just fine. However, if the user types something that is not a number, the program may not work correctly. Floating point values might be truncated to integers (3.2 becoming 3), and non-numeric input might lead to unpredictable behavior. A robust program would catch this error, provide a clear and appropriate error message, and ask the person to re-type their input. The second problem occurs when the array is accessed. Even if the user provides an appropriate integer, the value may be out of the range of the array. An array containing 10 elements can only be accessed by indices 0,1,...,9. Thus, the only values of which that will work correctly are 1,2,...,10. Any values outside of this range will lead to an attempt to access a value outside the range of the array. In C++ programs, this can lead to unpredictable results, ranging from incorrect answers to crashed programs. In other languages, this may lead to a buffer overflow that might be exploited by malicious software.

How can I properly validate input?

Check your input: The basic rule is for input validation is to check that input data matches all of the constraints that it must meet to be used correctly in the given circumstance. In many cases, this can be very difficult: confirming that a set of digits is, in fact, a telephone number may require consideration of the many differing phone number formats used by countries around the world. Some of the checks that you might want to use include:

  • Type: Input data should be of the right type. Names should generally be alphabetic, numbers numeric. Punctuation and other uncommon characters are particularly troubling, as they can often be used to form the basis of code-injection attacks. Many programs will handle input data by assuming that all input is of string form, verifying that the string contains appropriate characters, and then converting the string into the desired data type.
  • Range: Verify that numbers are within a range of possible values: For example, the month of a person's date of birth should lie between 1 and 12. Another common range check involves values that may lead to division by zero errors.
  • Plausibility: Check that values make sense: a person's age shouldn't be less than 0 or more than 150.
  • Presence check: Guarantee presence of important data – the omission of important data can be seen as an input validation error.
  • Length: Input that is either too long or too short will not be legitimate. Phone numbers generally don't have 39 digits; Social Security Numbers have exactly 9
  • Format: Dates, credit card numbers, and other data types have limitations on the number of digits and any other characters used for separation. For example, dates are usually specified by 2 digits for the month, one or two for the day, and either two or four for the year.
  • Checksums: Identification numbers such as bank accounts, often have check digits: additional digits included at the end of a number to provide a verifiability check. The check digit is determined by a calculation based on the remaining digits – if the check digit does not match the results of the calculation,either the ID is bad or the check digit is bad. In either case, the number should be rejected as invalid.

Use appropriate language tools: The safety of tools that read user input varies across programming languages and systems. Some languages, such as C and C++ have library calls that read user input into a character buffer without checking the bounds of that buffer, causing a both a buffer overflow and an input validation problem. Alternative libraries specifically designed with security in mind are often more robust.

The choice of programming languages can play a role in the potential severity of input validation vulnerabilities. As strongly-typed languages, Java and C++ require that the type of data stored in a variable is known ahead of time. This requirement leads to the type mismatch problem when – for example- a string such as “abcd” is typed in response to a request for an integer. Untyped languages such as Perl and Ruby do not have any such requirements – any variable can store any type of value. Of course, these languages do not eliminate validation problems – you may still run into trouble if you use a string to retrieve an item from an integer- indexed array. Some languages provide additional help in the form of built-in procedures that can be used to remove potentially damaging characters from input strings.

Recover Appropriately: A robust program will respond to invalid input in a manner that is appropriate, correct, and secure. For user input, this will often mean providing an informative error message and requesting re-entry of the data. Invalid input from other sources – such as a network connection – may require alternate measures. Arbitrary decisions such as truncating or otherwise reformatting data to “make it fit” should be avoided.

Program 1

#include 
using namespace std;
void getNames(string[], int);
int getWhich();
string getName(int, string[],int);
const int SIZE=10;
int main()
{
  int which;
  string names[SIZE];
  getNames(names,SIZE);

  which =getWhich();
  string aName = getName(which,names,SIZE);
  cout << "You choose name: "  << aName;

  return 0;
}
void getNames(string names[],size_t sz)
{
  for (size_t i = 0; i < sz; i++ )
  {
    cout << "type name # " << i+1 <<": ";
    cin >> names[i];
  }
}
size_t getWhich()
{
  size_t x;
  cout << "Which name: ";
  cin >> x;
  return x;
}
string getName(size_t n,string vals,size_t sz)
{
    if (n >=1 && n <= sz)
      return vals[n-1];
    else
      return "";
}

Lab Questions:

  1. Complete the security checklist for this program. Submit marked program and completed checklist.
  2. List the potential input validation errors.
  3. Provide example inputs that might cause validation problems, and describe the problems that they might cause.
  4. What happens if you type non-numeric characters for either the number of names or which name you wanted to retrieve?
  5. Revise the program* to properly validate input and gracefully recover from errors.

*Copying and pasting programs may result in syntax errors and other inconsistencies. It is recommended you type each program.

Program 2 (optional - check with your instructor if you need to complete this program)

You’re writing a program that will be used to submit bid on items from an online auction site. Each item is available in multiple lots – for example, there might be 100 boxes of crayons available. Your program must ask users for two important pieces of information:

  1. The price that they are willing to pay for the item, given in dollars and cents.
  2. The quantity of that item that they want to bid on. This quantity must be at least one, and it must be a whole number – it’s not possible to buy fractional parts of an item. Your program should validate the user input for both quantities. To do so, take the following steps:
    • Price of items
      1. Ask the user to type the number of items
      2. Read this value into a string
      3. Write a routine that will examine the string to verify that it contains a number that can be a legal amount of money. This string must contain:
        • Some number of integers (possibly zero) – the number of dollars • An optional decimal point • Some number of integers (possibly zero) - the number of cents.
        • There must be at least one digit in the number Thus, “12.34”, “12”, and “.34”, are all valid prices, but “12,34” and ‘.” are not. To examine all of the characters in the string, you can the s.At(i) method from the string class. You can check each character in the string to see that it is either a decimal or an integer. You should also check to make sure that there are not multiple decimal points. This routine should return a bool value that is true if the string is a legitimate price and false otherwise.
      4. If the value provided is not legitimate, print an error message.
      5. If the value is legitimate, convert it into a float (to store the value). You will need to write a routine to do this. Hint: convert the digits to an integer by looping through the string. Examine each character, converting each digit to an integer as follows (once you’ve verified that it’s an integer): int c = s.at(i) – ‘0’ Keep track of the number of digits past the decimal, and use that number to divide the integer into dollars and cents. Thus “123.45” becomes “12345” divided by 100.
    • Number of items: This will be similar to the price, but you must simply check for whole numbers – no decimals allowed. The quantity must be greater than or equal to one. You will convert the result to an int – not a float – once again, using a routine that you will write.
    • An interactive loop: put these two checks in a loop that will ask for both values, check both to see if they are valid, and then repeat requests for any values that are not valid. The easiest way to do this is probably to have two boolean values , one for each input quantity. These values will represent the validity of the input quantities. You will stay in an input loop as long as at least one of them is false. When this is the case, you will then check the values to see which is false (i.e., invalid) . If a value is not valid, you will repeat the prompt and validate the response. This will continue until both are valid.

Security Checklist

Vulnerability:Input Validation Course: CS2
Task - Check each line of code Completed
1. Mark each variable that receives external input with a V
For each statement that is marked with a V, verify that the variable is checked for each of these criteria. Note any that is not checked for:
1. Length
2 Range (reasonableness?)
3. Format
4. Type
Shaded areas indicate vulnerabilities!
  1. You're writing a program that asks the user to type in a telephone number. How might you validate that the characters that they've typed represent a legal telephone number? You should assume that you're only concerned about phone numbers from the US, but you want to give users as much flexibility as possible, in terms of spaces and punctuation characters. List some rules that you might use. Make sure that you complete this question before moving on to question #2.
  2. Find an example of a phone number that doesn't fit your rules.
  3. Describe either an example of an input validation problem that you may have encountered. If you can't remember having any sort of problem, try some web pages or other software tools – try to find a system that fails to validate input data correctly.
Further Work (optional - check with your instructor if you need to answer the following questions)
  1. If input is sufficiently cryptic, it might be hard to provide useful error messages in responses to invalid input. Describe some strategies that might be used to help users recover from invalid input.
  2. Revisit Program 2. Are there any inputs that the above description accepts as valid that perhaps should be considered invalid? If so, what are they and how might you handle them?





Olsen, Kai. “The $100,000 Keying error” IEEE Computer, August 2008

 
Copyright © Towson University