In this lecture, I am going to cover:
- ELF files, how do they look inside the memory (during runtime).
- The organization of Stack during the function call.
- What is assembly? How does the assembly code look.
- What are
$ebp
and$esp
?
In computing, the Executable and Linkable Format (ELF, formerly named Extensible Linking Format), is a common standard file format for executable files, object code, shared libraries, and core dumps. Ref : ELF
Did that jump over your head? Don't worry. Let me simplify it.
ELF is basically a format specifying how the code(binary code, either executable or linkable) will look into the memory. ELF headers contain a lot of information about the ELF file content. One can use the following command to view the headers of ELF. I am using it on one of the files of this repo.
readelf -h ./function_call
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8048310
Start of program headers: 52 (bytes into file)
Start of section headers: 6860 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 9
Size of section headers: 40 (bytes)
Number of section headers: 36
Section header string table index: 33
Notice two things for now, they will be important for the course as well:
- Type: EXEC, meaning this particular binary file an executable one.
- Data: 2's complement, little endian, meaning this file is compiled for the machine that uses little endian notation for the address. What are they ?
+---------------+ Highest Address 0xffffffff
| cmd line args |
| env Variable |
+---------------+
| STACK |
+--+------------+
| | |
| | |
| v ^ |
| | |
| | |
+------------+--+
| HEAP |
+---------------+
| Uninitialized |
| Data(BSS) |
+---------------+
| Initialized |
| Data |
+---------------+
| Read Only |
| data |
| + |
| code |
+---------------+ Lowest Address 0X00000000
The above diagram shows how the 4GB of virtual address space of any 32 Binary looks like when it is loaded into the memory. For understanding, I have drawn the stack in top-down order, i.e highest address at the top and lowest at the bottom. Lets's understand the different sections (from top):
-
The very first section stores the command line arguments and environment variables that are passed to the program during its execution.
-
Stack: This stores the dynamic variable created inside the function. Don't confuse yourself with the dynamic variable generated by *alloc family of functions. Every variable is dynamic in program in the sense that every variable is assigned memory during the run time only. The normally declared variables inside functions are stored onto the stack. This grows in reverse order ie. from the highest address to the lowest address.
-
Heap: Dynamic variables that are created by *alloc family of functions.
-
BSS: This section stores the uninitialized(global + static) variables. They are automatically initialized to 0.
-
Next section stores the variable(global + static) which are initialized to some value.
-
The last section stores all the read-only variables and code of the program in binary language.
Stack is used during function call to save the state of the caller function so that when its return from the called function it can continue to execute normally. And this is how the stack looks like after function call is made.
+ Previous function |
| Stack frame |
| |
+----------------------+ <--- previous function stack frame end here
|Space for return value|
+----------------------+
|Arguments for function|
+----------------------+
| return address |
+----------------------+
| saved $ebp |
+----------------------+
| | <--- padding done by compilers
+----------------------+
| local variables |
| |
| |
| |
| |
+----------------------+
| |
| |
| unused space |
+ +
Now lets try to understand what are the use of $ebp
and $esp
.
- esp: As you can see in the above diagram the stack point or esp will keep on changing after each stack push operation. It is actually used to keep the pointer of the top of the stack.
- ebp: During runtime, variable is nothing as names. They are stored as the reference to the base of the stack frame. This base is pointed by the ebp register. That's why when the function calls another function, the value of the ebp register is saved onto stack and the ebp register becomes available for storing the new stack frame.
Its the language you can imagine that is just above the machine language. High level language such as C are first compiled to assembly language and then they are translated to machine language.
Lets take an example how the fucntion call code looks like.
objdump -d ./function_call
0804840b <foo>:
804840b: 55 push %ebp
804840c: 89 e5 mov %esp,%ebp
804840e: 83 ec 08 sub $0x8,%esp
8048411: 83 ec 0c sub $0xc,%esp
8048414: 68 d0 84 04 08 push $0x80484d0
8048419: e8 c2 fe ff ff call 80482e0 <printf@plt>
804841e: 83 c4 10 add $0x10,%esp
8048421: 90 nop
8048422: c9 leave
8048423: c3 ret
08048424 <main>:
8048424: 8d 4c 24 04 lea 0x4(%esp),%ecx
8048428: 83 e4 f0 and $0xfffffff0,%esp
804842b: ff 71 fc pushl -0x4(%ecx)
804842e: 55 push %ebp
804842f: 89 e5 mov %esp,%ebp
8048431: 51 push %ecx
8048432: 83 ec 04 sub $0x4,%esp
8048435: e8 d1 ff ff ff call 804840b <foo>
804843a: b8 00 00 00 00 mov $0x0,%eax
804843f: 83 c4 04 add $0x4,%esp
8048442: 59 pop %ecx
8048443: 5d pop %ebp
8048444: 8d 61 fc lea -0x4(%ecx),%esp
8048447: c3 ret
8048448: 66 90 xchg %ax,%ax
804844a: 66 90 xchg %ax,%ax
804844c: 66 90 xchg %ax,%ax
804844e: 66 90 xchg %ax,%ax
I have oly copied the code of main and foo fucntion here. Observe the call to the foo fucntion from main.
- Main seems to push nothing before the fucntion call. That means foo does not takes any arguments.
- Call instruction will ask the CPU to save the return address(address next to instruction pointer) into the stack. This is done by the call intuction, so will not be visible in the code.
- The first instruction of foo is to push $ebp into the stack.
- The immidiate instruction will be to pint $ebp to point to $esp. Those instruction can be divided into three parts, which are explained in the flow below.
main foo
+-----------------+ +-----------------+
| | | |
|1. ret val space | |1. pushes ebp |
|2. arguments | +------------->CALL Inst +------------------> |2. updates ebp |
| | + | |
| | v | |
+-----------------+ Pushes the return address +-----------------+
In the previous Assembly language section code, I mentioned how the code for simple function call looks like. In this section we will show, how the buffers look like in asm code.
#include <stdio.h>
void foo()
{
char ch[10];
printf("Calling from fucntion");
}
int main()
{
foo();
return 0;
}
Corresponding foo function in assembly.
0804840b <foo>:
804840b: 55 push %ebp
804840c: 89 e5 mov %esp,%ebp
804840e: 83 ec 18 sub $0x18,%esp
8048411: 83 ec 0c sub $0xc,%esp
8048414: 68 d0 84 04 08 push $0x80484d0
8048419: e8 c2 fe ff ff call 80482e0 <printf@plt>
804841e: 83 c4 10 add $0x10,%esp
8048421: 90 nop
8048422: c9 leave
8048423: c3 ret
Notice, two sub calls.
804840e: 83 ec 18 sub $0x18,%esp
8048411: 83 ec 0c sub $0xc,%esp
For now we can ignore the first sub call, but second one is important. The second sub call from esp actually updates the stack to allocate the space for the ch buffer. Notice as I mentioned that names are nothing in asm. They are just reference from $esp or $ebp.