Buffer overflow attack and what programmer need to know about it
Buffer overflow is one of the most widely talked topic in security circle. In this post we will see how buffer overflow attack is done, what are some mechanisms provided by compilers and operating system to prevent it and finally as a programmer what should you be careful about to avoid buffer overflow.
Buffer overflow attack can be done in buffers which are either in stack or in heap. Any call to functions which does not check the length of input (one example is c/c++ gets()) makes the program vulnerable to buffer overflow
Buffer overflow attack (when buffer is allocated in stack):
In this kind of attack , attacker can execute a carefully crafted “shell code”, when the function containing the overflowed buffer returns. Lets look into it in more detail
/* Program with buffer overflow vulnerability */
#include <unistd.h>
#include <stdio.h>
int ValidatePassword(int input1,int input2){
int isPasswordValid = 0;
char passwd_val[8] = ”AAAAAAA”;
char password_inp[8];
int z = 0;
printf("Input Password:");
gets(password_inp); /* unsafe call */
if(!strncmp(password_inp,passwd_val,7)){
isPasswordValid = 1;
}
z = input1+input2;
printf("%d\n",z);
return isPasswordValid;
}
int main(int argc, char *argv[ ]) {
int y = ValidatePassword(3,5);
printf("%d\n",y);
return 0;
}
The table below is the actual layout of the program compiled using “gcc -g bufferOverflowStack.c -fnostack-protector -O0 “. The table is generated using gnu debugger
(gdb) disass main
(gdb) disass validatePassword
Here are some key observations:
The Stack grows from higher address to lower address (as indicated in the table).
How is shell code created:
Let say attacker wants to get back the unix command promt ($) after “the function under attack” finishes. So attacker have to insert a shell code in the return location of “function under attack”. Usually the shell code is system specific (based on the operating system type and installed library type). So a sample shell code will look like.
<address of system> + <dummy ret address> + <address of /bin/sh>
how do you find the address of program above executable?
(gdb) p system // gives address location of of system
(gdb) find system,+999999999,”/bin/sh”
Buffer overflow attack when buffer is allocated in heap.
Since heap is a seperate area of memory any buffer allocated in heap does not live in the same area as the function execution context. So in this case attacker s find a different attack by overwriding the control structure/ metadata of malloc (or any other heap memory manager). Using buffer overflow attack in heap attacker can write any arbitarary value in location of his choice. Lets explore this in more detail.
A program which is allocating memory in heap and using gets() for user input is prone to Heap overflow vulnerability
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[]){
char *buf1 = malloc(128);
char *buf2 = malloc(256);
gets(buf2); /* unsafe call */
free(buf2);
free(buf1);
}
The above program is compiled with “gcc -g heapoverflowvur.c -fno-stack-protector -O0”
Notice :
Heap is growing from smaller address to bigger addresses (growing up)
Stack is growing from bigger address to smaller addresses (growing down)
The green highlighted address is the metadata which attacker can override.
Attacker can write beyond the allocated length of the buf1, thus overwriding the metadata of malloc. There are two well know method to write to “any location of process”
Unlink attack: This attack overwrite the fd , bk pointers of metadata of free chunk. Since the malloc library function (unlink) move the free chunks around different lists it can write the arbitrary data to an arbitrary location.
Here is the detailed outline of it:
/* Malloc library code Take a chunk (P) off a bin list */
void unlink(malloc_chunk *P, malloc_chunk *BK, malloc_chunk *FD)
{
FD = P->fd;
BK = P->bk;
FD->bk = BK;
BK->fd = FD;
}
1) The attacker overwrites the fd and bk pointers of a free chunk with the memory address and
data of their choice namely
- fd pointer location is overridden with location of memory attacker want to write (- 8 byte) offset as that is added by the unlink code by doing FD->bk
- bk location is overwritten with the data attacker want to write in the location
2) Once the free chunk is deleted from the free list the above unlink() function will be called and the data will be written to desired location. Invocation of unlink() function can happen when the free chunk is allocated to a new request or a coalesced happens for the free chunk
3) This attack is called unlink attack can’t be done in latest glibc. Newer version of glibc is hardened (fixed).
House of Force attack:
This Technique overwrite the metadata of top malloc chunk, and subsequent calls to malloc can be written in desired memory (based on the request size)
- In the example, above: First we write 128 bytes to fill up the allocated space in the heap of our vulnerability program shown above. Then we overwrite the address location buf1–0x8 to 0xFFFFFFFF. This will override the previous chunk size in the metadata section to 0xFFFFFFFF.
- Doing the above change (writing prev free chunk of first allocated malloc chunk to 0xFFFFFFFF) make malloc() fulfilling the larger request from the malloc existing pool itself rather than requesting for the heap expansion.
- Subsequent call of malloc will allocate the memory from top chunk (assuming it can fulfill any size request without expansion) with the returned address = av->top (previous) + malloc_size_requested. This way by specifying the malloc_size and value the attacker can write
to any location.
What compiler and operating system does to prevent these kind of attacks?
ASLR: Since to create shell code attacker have to find the addresses of the executable / functions in the system’s memory. Operating system makes it harder to predict the address of a program in memory (say /bin/sh). It does it by doing the memory layout randomization. This is called ASLR (address space layout randomization). In practise not all modules of a program are ASLR enabled thus vulnerablity still exist in many programs.
Stack Canary: Stack canaries work by modifying every function’s prologue and epilogue regions to place and check a value on the stack respectively. If stack buffer is overwridden error is noticed before executing the shell code and the exception is raised. Wait what if the SEH is overwridden? So stack canary is not fool proof.
DEP / NX
DEP and NX essentially mark important structures in memory as non-executable, and force hardware-level exceptions if you try to execute those memory regions. Lot of exploit based on ROP (chaining) exist to bypass the DEP / NX check.
What a programmer should do to avoid buffer overflow attack?
- Never trust the user input
- always use function which check for the size of input like gets_s() , getline() or fgets(). Never use gets()!
References
https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/
http://security.cs.rpi.edu/courses/binexp-spring2015/lectures/17/10_lecture.pdf
https://handouts.secappdev.org/handouts/2012/Yves%20Younan/C%20and%20C++%20vulnerabilities.pdf
https://security.stackexchange.com/questions/18556/how-do-aslr-and-dep-work
http://css.csail.mit.edu/6.858/2014/readings/return-to-libc.pdf
http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html
https://stackoverflow.com/questions/19124095/return-to-lib-c-buffer-overflow-exercise-issue