OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
Re: [Patch:] restore the native use of isdigit() instead of ap_isdigit() in httpd.

From: Garance A Drosihn (drosihrpi.edu)
Date: Thu Mar 30 2006 - 21:38:50 CST


At 3:21 PM -0700 3/30/06, Daniel Boulet wrote:
>Hmmm. The isdigit man page from the Single Unix Specification
>on the Open Groups' web site (www.unix.org) declares isdigit
>as follows:
>
> #include <ctype.h>
>
> int isdigit(int c);
>
>This documentation for the isdigit interface would appear to
>allow the parameter to be a char since chars can certainly
>be cast to ints without any loss of information.

And this is exactly why we have the problem. Programmers
read the above, and jump to the same conclusion you did.

Note that on some platforms, 'char' is the same as 'signed
char', while on other hardware platforms a plain 'char' is
defined to be the same as 'unsigned char'.

When 'char' is signed, then the values it holds can be
negative values. You never "lose" any information casting
that to an int, but you also do not always end up with value
in the proper range for these functions. And if you test
your code on a platform where 'char' == 'unsigned char',
then even a good battery of unit-tests on that platform
will not discover this coding error.

And since the acceptable range is -1 (EOF) to 255, there
are some of these functions where you can not even cheat
by pretending the range is -255 to 255. The value you'd
want the routine to return for a "char value" of -1 will
not be the correct value for EOF.

The stupid thing in these routines is that the behavior is
UNDEFINED for values outside the correct range. We would
probably be doing ourselves (as programmers) a favor if we
made it a major error to pass in any negative value other
than the value for EOF. Many programmers have read the
description of these routines, and never quite understood
that these routines do *not* take a parameter of 'char'.

I've seen people write updates to change:
      isblah((unsigned char)*someptr)
to isblah(*someptr)
because they thought the original programmer was an idiot.

--
Garance Alistair Drosehn = gadgilead.netel.rpi.edu
Senior Systems Programmer or gadfreebsd.org
Rensselaer Polytechnic Institute or drosihrpi.edu