OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: Sverre H. Huseby (shh_at_thathost.com)
Date: Wed Oct 16 2002 - 15:48:52 CDT

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    [b0iler]

    | Also, you are just sending the inputed values of parameters. What
    | about the names of the parameter (the $key variables)? They could
    | contain potentially dangerous XSS which is often printed to the
    | client. Also, user input (GPC) is not the only tainted data in a
    | script. Any data that comes from an outside source is potientally
    | dangerous. Files, databases, ENV variables, etc.. need to be
    | treated as if it contains the most clever tricks to evade your
    | filtering and protection schemes.

    Correct. And I've tried to say the same quite a few times on several
    securityfocus lists the last two years.

    We need to shift the focus away from _input_. Input is never trouble-
    some in itself. It first gets troublesome when put in a context in
    which it is interpreted in some way. And then again only when parts
    of it will not be interpreted as plain data, but as something else.
    As b0iler (whoever that is :) ) correctly states above, data from the
    inside may cause just as much trouble as data from the outside. And
    it may do so deep inside a multi-tier system, far from the web layer.

    It's when data is passed somewhere for interpretation that it gets
    troublesome. We should thus pay attention to the format of the data
    whenever we _pass_it_along_, rather than when we receive it from the
    outside. Web applications tend to pass data along all the time:

      * to database servers, often by concatenating the data with strings
        containing SQL constructs, or by using some kind of prepared
        statement mechanism (much better).

      * to shell command interpreters (yikes!).

      * to the OS by sending file names to file handling functions, host
        names to name resolutions libraries and so on. (a large amount of
        "so on" for the OS.)

      * to legacy systems written in some obscure language using some equ-
        ally obscure protocol.

      * to other web servers (B2B) using XML, URL parameters or whatever.

      * to other processes running on the same server, using some
        internally made protocol.

      * many, many, many more...

      * and last, but not least, to the web browser of the user. Which
        luckily is just another sub-system, covered by the same rule as
        the rest.

    And to repeat: "Data" is not only user input. It is anything, no mat-
    ter the source. Every system we pass data to has its own way of
    interpreting it, and the interpretation depends on context. Some
    examples:

      * when building strings containing SQL queries, the quote character
        may cause trouble if it appears prematurely in an SQL string con-
        stant. _Any_ data passed as part of an SQL statement _that_is
        _to_be_interpreted_as_a_string_constant_ will need to have quotes
        escaped in some way. (No, we can't generally forbid quotes. How
        would I be able to write "can't" a few words back if you forbid
        the quote?) And no, we can't generally escape quotes at input
        time either, because then they will look rather funny for the
        _other_ sub-systems, in which quotes have no special meaning
        (eg. a text file or the user's browser).

        For more on this, see another vuln-dev-mail of mine available
        here:

          http://shh.thathost.com/text/passing-data-03.txt

      * when talking to the OS, null-bytes may create confusion when pass-
        ing strings, as the OS (written in C, normally) treats the '\0' as
        a string terminator. Most "modern" languages do not. We'll gen-
        erally need to pay attention to null-bytes when talking to sub-
        systems written in C. The reason is generally that our view of
        the string will differ from the view taken by the OS.

        But there are other things as well. If we pass a _file_name_ to
        the OS, we may need to pay attention to slashes (and for some ob-
        scure OSes, backslashes) and double-dots as well, as they will
        switch context from _file_ to _directory_.

        And hundreds of other examples on how talking to one particular
        sub-function (eg. open()) of a sub-system (eg. the OS) will need
        careful handling of a selected set of characters.

      * and then comes the browser again. The HTML parser in the browser
        gives special meaning to < (tag start) , > (tag end) and &
        (character entity). And if inside those < and >, suddenly " and '
        (both attribute value encapsulators) may have a special meaning
        too. We'll need to escape them somehow, so that they are not
        treated as special characters, but rather as plain characters.
        The correct way is to use HTML encoding (as most of you know).
        The wrong way (generally) is to replace the special characters
        with nothing. Imagine all the complaints you will get if you make
        a discussion forum for mathematicians, and disallow < and > ...

    It is generally _not_possible_ to fetch data from the request and
    start by doing something to it that will match all the possible sub-
    systems in one go. Not without giving severe restrictions as to what
    the data may contain. ("Sorry, Sinead, but your name will have to be
    OConnor for now"). And not without introducing strange appearances
    for some of the sub-systems. ("Welcome, Sinead O\'Connor").

    Input validation has been given _far_ to much focus. It may be good
    as a first measure, to be able to give users nice feedback when data
    don't match the business rules and other high level rules ("the file
    name is not supposed to contain directory elements"), but it generally
    won't solve the low level problems. In systems over toy size, data is
    passed between many different sub-systems, which often have different
    meta-characters that may be abused. People who believe that input
    validation at the web layer will avoid security problems several lay-
    ers down below (or when data come back to the first layer again), have
    given the issue too little thought, IMNSHO.

    Focus on input validation, but focus even more on handling every poss-
    ible meta-character, meta-byte, meta-word or whatever before passing
    the data along to the next sub-system, whatever that is. And that
    rule goes for every layer of the application, not just the web layer.

    Sverre - who feels this discussion would fit better at webappsec than
             at vuln-dev.

    -- 
    shhthathost.com		Computer Geek?  Try my Nerd Quiz
    http://shh.thathost.com/	http://nerdquiz.thathost.com/