Input Validation –”All Input is Evil” – CS2


Background

Summary:

Any input that comes into a program from an external source – such as a user typing at a keyboard or a network connection – can be the source of security concerns and disastrous bugs. All input should be treated as potentially dangerous.

Description:

Most software packages rely on external input, either via the keyboard, network, or other external sources.  Generally, this input will be of a specific type: for example, a user interface that requests a person’s name expects a series of alphabetic characters. If programs are not carefully written, attackers can construct inputs that can cause malicious code to be executed.

Risk – How Can It Happen?

All input data is a potential source of problems.  If input is not checked to verify that it has the correct type, format, and length, it can cause problems. Failure to validate input can lead to serious security risks such as integer error, buffer overflow, and SQL injections among others.

Fat Finger
Drawing used by permission of Dominik Joswig

Example of Occurrence:

A Norwegian woman mistyped her account number on an internet banking system. Instead of typing her 11-digit account number, she accidentally typed an extra digit, for a total of 12 numbers. The system discarded the extra digit, and transferred  $100,000 to the (incorrect) account given by the 11 remaining numbers.   A simple dialog box informing her that she had typed too many digits may have avoided this expensive error. Olsen, Kai. “The $100,000 Keying error” IEEE Computer, August 2008

Example in Code:

This program returns the square of an input number from a stored array.

import java.util.Scanner;
public class InputValidationExample {

  public static void main(String[] args) {
    int[] vals = new int[10];

    for (int i = 0; i < 10; i++) {
      vals[i] = (i+1)*(i+1);
    }

    System.out.print("Please type a number: ");
    Scanner sc = new Scanner(System.in);
    int which = sc.nextInt();

    int square = vals[which-1];
    System.out.println("The square of "+which+" is "+square);
  }
}

This program has two input validation problems. The first problem occurs if the user inputs a non-integer value. In Java, this causes a NumberFormatException to be thrown. The second problem occurs if the user enters a value that does not lie between 0 and 9. In Java, this will lead to an ArrayIndexOutOfBoundsException. A robust program would catch this error, provide a clear and appropriate error message, and ask the user to re-type their input.

How can I properly validate input?

Check your input: The basic rule is for input validation is to check that input data matches all of the constraints that it must meet to be used correctly in the given circumstance. In many cases, this can be very difficult: confirming that a set of digits is, in fact, a telephone number may require consideration of the many differing phone number formats used by countries around the world. Some of the checks that you might want to use include:

  • Type: Input data should be of the right type. Names should generally be alphabetic, numbers numeric. Punctuation and other uncommon characters are particularly troubling, as they can often be used to form the basis of code-injection attacks. Many programs will handle input data by assuming that all input is of string form, verifying that the string contains appropriate characters, and then converting the string into the desired data type.
  • Range: Verify that numbers are within a range of possible values: For example, the month of a person’s date of birth should lie between 1 and 12. Another common range check involves values that may lead to division by zero errors.
  • Plausibility: Check that values make sense: a person’s age shouldn’t be less than 0 or more than 150.
  • Presence check: Guarantee presence of important data – the omission of important data can be seen as an input validation error.
  • Length: Input that is either too long or too short will not be legitimate. Phone numbers generally don’t have 39 digits; Social Security Numbers have exactly 9
  • Format: Dates, credit card numbers, and other data types have limitations on the number of digits and any other characters used for separation. For example, dates are usually specified by 2 digits for the month, one or two for the day, and either two or four for the year.
  • Checksums: Identification numbers such as bank accounts, often have check digits: additional digits included at the end of a number to provide a verifiability check. The check digit is determined by a calculation based on the remaining digits – if the check digit does not match the results of the calculation,either the ID is bad or the check digit is bad. In either case, the number should be rejected as invalid.

Use appropriate language tools: The safety of tools that read user input varies across programming languages and systems. Some languages, such as C and C++ have library calls that read user input into a character buffer without checking the bounds of that buffer, causing a both a buffer overflow and an input validation problem. Alternative libraries specifically designed with security in mind are often more robust.

The choice of programming languages can play a role in the potential severity of input validation vulnerabilities. As strongly-typed languages, Java and C++ require that the type of data stored in a variable is known ahead of time. This requirement leads to the type mismatch problem when, for example, a string such as “abcd” is typed in response to a request for an integer. Untyped languages such as Perl and Ruby do not have any such requirements – any variable can store any type of value. Of course, these languages do not eliminate validation problems – you may still run into trouble if you use a string to retrieve an item from an integer- indexed array. Some languages provide additional help in the form of built-in procedures that can be used to remove potentially damaging characters from input strings.

Recover Appropriately: A robust program will respond to invalid input in a manner that is appropriate, correct, and secure. For user input, this will often mean providing an informative error message and requesting re-entry of the data. Invalid input from other sources – such as a network connection – may require alternate measures. Arbitrary decisions such as truncating or otherwise reformatting data to “make it fit” should be avoided.

Laboratory Assignment

Program 1

import java.util.*;

public class Input {
    public static void main(String[] args) {
    Scanner scan = new Scanner(System.in);

    int sz = getArraySize(scan);
    String[] names = getNames(scan,sz);

    int which = getWhich(scan);
    String aName = getName(which,names);

    System.out.println("You choose name: "+aName);
  }

  public static int getArraySize(Scanner scan) {
    System.out.print("How many names? ");
    int n =  scan.nextInt();
    scan.nextLine();
    return n;
  }

  public static String[] getNames(Scanner scan, int sz) {
    String[] names = new String[sz];
    for (int i = 0; i < sz; i++ ){
      System.out.print("type name # "+(i+1)+": ");
      names[i] = scan.nextLine();
    }
    return names;
  }

  public static int getWhich(Scanner scan) {
    System.out.print("Which name: ");
    int x = scan.nextInt();
    return x;
  }

  public static String getName(int n,String[] vals) {
      return vals[n-1];
  }
}
Lab Questions:
  1. Complete the security checklist for this program. Submit marked program and completed checklist.
  2. List the potential input validation errors.
  3. Provide example inputs that might cause validation problems, and describe the problems that they might cause.
  4. What happens if you type non-numeric characters for either the number of names or which name you wanted to retrieve?
  5. Revise the program* to properly validate input and gracefully recover from errors.

*Copying and pasting programs may result in syntax errors and other inconsistencies. It is recommended you type each program.

Program 2 (optional – check with your instructor if you need to complete this program)

You’re writing a program that will be used to submit bid on items from an online auction site. Each item is available in multiple lots – for example, there might be 100 boxes of crayons available. Your program must ask users for two important pieces of information:

  • The price that they are willing to pay for the item, given in dollars and cents.
  • The quantity of that item that they want to bid on. This quantity must be at least one, and it must be a whole number – it’s not possible to buy fractional parts of an item.

Your program should validate the user input for both quantities. To do so, take the following steps:

  • Price of items
    1. Ask the user to type the number of items
    2. Read this value into a string
    3. Write a routine that will examine the string to verify that it contains a number that can be a legal amount of money. This string must contain:
      • Some number of integers (possibly zero) – the number of dollars
      • An optional decimal point
      • Some number of integers (possibly zero) – the number of cents
      • There must be at least one digit in the number. Thus, “12.34”, “12”, and “.34”, are all valid prices, but “12,34” and ‘.” are not. To examine all of the characters in the string, you can the charAt(i) method from the string class. You can check each character in the string to see that it is either a decimal or an integer. You should also check to make sure that there are not multiple decimal points.
    4. This routine should return a boolean value that is true if the string is a legitimate price and false otherwise.

    5. If the value provided is not legitimate, print an error message.
    6. If the value is legitimate, convert it into a float by using
      Float f = Float.parseFloat(s) 

      if s is the string that the user typed.

  • Number of items: This will be similar to the price, but you must simply check for whole numbers – no decimals allowed. The quantity must be greater than or equal to one. You will convert the result to an int (using Integer.parseInt(s) ), not a float.
  • An interactive loop: put these two checks in a loop that will ask for both values, check both to see if they are valid, and then repeat requests for any values that are not valid. The easiest way to do this is probably to have two boolean values , one for each input quantity. These values will represent the validity of the input quantities. You will stay in an input loop as long as at least one of them is false. When this is the case, you will then check the values to see which is false (i.e., invalid) . If a value is not valid, you will repeat the prompt and validate the response. This will continue until both are valid.

Security Checklist

Security Checklist

Vulnerability:Input Validation Course: CS2
Task – Check each line of code Completed
1. Mark each variable that receives external input with a V
For each statement that is marked with a V, verify that the variable is checked for each of these criteria. Note any that is not checked for:
1. Length
2  Range (reasonableness?)
3. Format
4. Type
Shaded areas indicate vulnerabilities!

Discussion Questions

  1. You’re writing a program that asks the user to type in a telephone number. How might you validate that the characters that they’ve typed represent a legal telephone number? You should assume that you’re only concerned about phone numbers from the US, but you want to give users as much flexibility as possible, in terms of spaces and punctuation characters. List some rules that you might use. Make sure that you complete this question before moving on to question #2.
  2. Find an example of a phone number that doesn’t fit your rules.
  3. Describe either an example of an input validation problem that you may have encountered. If you can’t remember having any sort of problem, try some web pages or other software tools – try to find a system that fails to validate input data correctly.

Further Work (optional – check with your instructor if you need to answer the following questions)

  1. If input is sufficiently cryptic, it might be hard to provide useful error messages in responses to invalid input. Describe some strategies that might be used to help users recover from invalid input.
  2. Revisit Program 2. Are there any inputs that the above description accepts as valid that perhaps should be considered invalid? If so, what are they and how might you handle them?





Olsen, Kai. “The $100,000 Keying error” IEEE Computer, August 2008

 
Copyright © Towson University