Overflow

Two words very common in Computer Science are ``buffer overflow'' and ``integer overflow''. Buffer overflow happens when a programmer instructs the computer to write more bytes than the buffer may hold; For example: programmer declares char buffer[2] and instructs computer: buffer[0] = 'o', buffer[1] = 'v', buffer[2] = 'e'. Since the array named ``buffer'' is not big enough to store the letter ``e'', the last one that was attempted, programmer has overflown the buffer and that causes, many times, security holes in the program.

Integer overflow happens when programmer instructs computer to store a number that requires more bits than the integer is capable of; for example: how could the computer store the number 2 in a 1-bit integer data type? In binary (base 2), numbers are expressed with zeros and ones. So, 0d = 0b, 1d = 1b, 2d = 10b. So the number 2 requires 2 bits and if attempted to be stored in a 1-bit integer, overflow is what happens.

Solution

Certify availability. If a buffer can only hold x amount of bytes, make sure computer won't go beyond that. If integer has only y bits, make sure integers have less or equal number of bits than y before instructing computer to store it.

Implementation of scan_uint64()

The function below was written by D. J. Bernstein; the overflow detection was written and nicely explained by Wayne Marshall. The function converts a buffer s to an unsigned integer with 64 bits available. That is, the number can range from 0 to 2^64 - 1.
int scan_uint64(register char *s, register uint64 *u) {
 register unsigned int pos;
 register uint64 r;
 register uint64 c;

 pos = 0; r = 0;

 for ( ;; ) {
   c = (uint64) (unsigned char) (s[pos] - '0');
   if (c < 10) {
       r = r * 10 + c;
     ++pos; continue;
   }
   break;
 }

 *u = r;
 return pos;
}
The function converts chars to integers as usual, but r may overflow. The next implementation solves the problem.
int scan_uint64(register char *s, register uint64 *u) {
 register unsigned int pos;
 register uint64 r;
 register uint64 c;

 pos = 0; r = 0;

 for ( ;; ) {
   c = (uint64) (unsigned char) (s[pos] - '0');
   if (c < 10) {
     if( ((ULLONG_MAX - c) / 10) >= r)
       r = r * 10 + c;
     else return -1; /* overflow */
     ++pos; continue;
   }
   break;
 }

 *u = r;
 return pos;
}
Constant ULLONG_MAX is defined as 0xffffffffffffffffULL in some systems and in a similar fashion in others --- depending on how many bytes is the size of a long.

The conversion is done by multiplying r by the base to append a zero at the end. Example: r = 5. Multiply by 10: r = 50. The next character to be converted is char = '4'. Convert it to integer c; c = 4. Add it to r. r = 50 + 4 = 54.

Verifying

The verification made by ((ULLONG_MAX - c) / 10) >= r) can be geometrically represented and it may offer a clearer view of why it works. Let's take a 4-bit integer data type. With 4 bits, the maximum value we can store in an unsigned integer is 15. We will try to add 10 + 6 --- which overflows --- and demonstrate how the verification from scan_uint64() traps the overflow. We use # to represent a digit and we call it a ``block''.
-----########## 10 blocks
---------######  6 blocks
By adding 10 + 6, we are putting these blocks together. To check, we can verify if one block would overlap on another.
############### 15 blocks (maximum)
##########      10 blocks
         ######  6 blocks
         ^      It overlaps by one block. Overflow.
This is what scan_uint64() does. It checks to see if they'd overlap. Demonstration: Computer will read the string "16" and try to store it in a 4-bit unsigned integer with scan_uint64. First character read is '1', conversion is done and r receives the value of 1. To get 16, computer needs to multiply r by 10 --- our base --- and then add 6, but before multiplying r by 10, computer checks first if 6 can be added without overflow. Verifying: 15 - 6 = 9. Next, 9 / 10 = 0. Is 0 >= 1? No. Overflow.

Created: Fri Dec 31 09:45:02 EST 2004.
Updated: Wed Jan 05 20:56:16 EST 2005.