Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email firstname.lastname@example.org
From: Sverre H. Huseby (shhthathost.com)
Date: Thu Mar 14 2002 - 01:57:16 CST
I personally like to split the "filtering" in two or three parts:
1. Input validation. Can (and should?) be done as the first step of
every script. Validation checks that input has the correct type,
eg. that an integer is in fact an integer, that an E-mail address
is a legal address, that a name contains only characters from a
certain set, and so on.
The action for illegal input depends on whether the input is user
generated (input fields in a form) or server generated (hidden
fields, drop down lists, check boxes, URLs from a previous page,
Errors in user input may be caused by misspelling or lack of
knowledge of our input validation rules. The action taken is
typically to redisplay the input form with an error message.
Errors in server generated input should not happen and may be a
sign of an intrusion attempt. I normally just display a very
simple error page noting that the incident has been logged in
2. Sub-system meta-character "washing" (escaping). The handling of
meta-characters depend on what sub-system the data is passed to.
An SQL query needs other washing than an XPath query, which again
needs other washing than a system call dealing with the file
I like to delay the meta-character washing to when data is passed
to the sub-system in question, as washing transforms the input.
Also, washing needs to be done on anything passed to subsystems,
not just user generated input.
Washing is not the same as validation.
3. Washing of HTML output (may be seen as meta-character washing for
an HTML sub-system, so this point is somewhat redundant).
Whatever is passed to the client side passes through an HTML
character encoding filter.
Some people like to do HTML encoding on input when it comes in,
but I prefer delaying it to output time for a couple of reasons:
* It's not just user generated input that must be HTML
encoded: When reading from a file, a database or whatever,
HTML encoding should be done before passing the content to
the client. It is easier to remember that if the rule is
"wash output when output is done".
* I don't like storing HTML character entities in a database.
It makes it harder to search from non-web based interfaces,
and it conflicts with number of characters in a field (a
name of 20 characters may take up more after HTML encoding).
As you can see, doing both input validation and meta-character washing
may be redundant if the input validation step forbids the
meta-characters. Often, redundancy is considered bad, but when it
comes to security I think it is a must.
And I would like to add: I like to build a framework for doing
validation and washing, and if possible, force all scripts to use the
framework rather than directly picking up the input variables.
-- shhthathost.com Computer Geek? Try my Nerd Quiz http://shh.thathost.com/ http://nerdquiz.thathost.com/