Skip to content

Latest commit

 

History

History
235 lines (216 loc) · 11.4 KB

README.md

File metadata and controls

235 lines (216 loc) · 11.4 KB

Lecture 1: Understanding Memories

PDF.

Video Lecture Hindi

In this lecture, I am going to cover:

  • ELF files, how do they look inside the memory (during runtime).
  • The organization of Stack during the function call.
  • What is assembly? How does the assembly code look.
  • What are $ebp and $esp?

ELF

In computing, the Executable and Linkable Format (ELF, formerly named Extensible Linking Format), is a common standard file format for executable files, object code, shared libraries, and core dumps. Ref : ELF

Did that jump over your head? Don't worry. Let me simplify it. ELF is basically a format specifying how the code(binary code, either executable or linkable) will look into the memory. ELF headers contain a lot of information about the ELF file content. One can use the following command to view the headers of ELF. I am using it on one of the files of this repo. readelf -h ./function_call

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x8048310
  Start of program headers:          52 (bytes into file)
  Start of section headers:          6860 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         9
  Size of section headers:           40 (bytes)
  Number of section headers:         36
  Section header string table index: 33

Notice two things for now, they will be important for the course as well:

  1. Type: EXEC, meaning this particular binary file an executable one.
  2. Data: 2's complement, little endian, meaning this file is compiled for the machine that uses little endian notation for the address. What are they ?

How do ELF file looks into the memory?

                        +---------------+ Highest Address 0xffffffff
                        | cmd line args |
                        | env Variable  |
                        +---------------+
                        |     STACK     |
                        +--+------------+
                        |  |            |
                        |  |            |
                        |  v         ^  |
                        |            |  |
                        |            |  |
                        +------------+--+
                        |     HEAP      |
                        +---------------+
                        | Uninitialized |
                        |   Data(BSS)   |
                        +---------------+
                        |  Initialized  |
                        |     Data      |
                        +---------------+
                        |   Read Only   |
                        |     data      |
                        |       +       |
                        |     code      |
                        +---------------+ Lowest Address 0X00000000

The above diagram shows how the 4GB of virtual address space of any 32 Binary looks like when it is loaded into the memory. For understanding, I have drawn the stack in top-down order, i.e highest address at the top and lowest at the bottom. Lets's understand the different sections (from top):

  1. The very first section stores the command line arguments and environment variables that are passed to the program during its execution.

  2. Stack: This stores the dynamic variable created inside the function. Don't confuse yourself with the dynamic variable generated by *alloc family of functions. Every variable is dynamic in program in the sense that every variable is assigned memory during the run time only. The normally declared variables inside functions are stored onto the stack. This grows in reverse order ie. from the highest address to the lowest address.

  3. Heap: Dynamic variables that are created by *alloc family of functions.

  4. BSS: This section stores the uninitialized(global + static) variables. They are automatically initialized to 0.

  5. Next section stores the variable(global + static) which are initialized to some value.

  6. The last section stores all the read-only variables and code of the program in binary language.

Organization of Stack during function call.

Stack is used during function call to save the state of the caller function so that when its return from the called function it can continue to execute normally. And this is how the stack looks like after function call is made.

                        +   Previous function  |
                        |     Stack frame      |
                        |                      |
                        +----------------------+ <--- previous function stack frame end here
                        |Space for return value|
                        +----------------------+
                        |Arguments for function|
                        +----------------------+
                        |    return address    |
                        +----------------------+
                        |     saved $ebp       |
                        +----------------------+
                        |                      | <---  padding done by compilers
                        +----------------------+
                        |    local variables   |
                        |                      |
                        |                      |
                        |                      |
                        |                      |
                        +----------------------+
                        |                      |
                        |                      |
                        |     unused space     |
                        +                      +

How stack grows:

Now lets try to understand what are the use of $ebp and $esp.

Towards lower memory address. Stack Growth

  1. esp: As you can see in the above diagram the stack point or esp will keep on changing after each stack push operation. It is actually used to keep the pointer of the top of the stack.
  2. ebp: During runtime, variable is nothing as names. They are stored as the reference to the base of the stack frame. This base is pointed by the ebp register. That's why when the function calls another function, the value of the ebp register is saved onto stack and the ebp register becomes available for storing the new stack frame.

Assembly language

Its the language you can imagine that is just above the machine language. High level language such as C are first compiled to assembly language and then they are translated to machine language. Lets take an example how the fucntion call code looks like. objdump -d ./function_call

0804840b <foo>:
 804840b:    55                       push   %ebp
 804840c:    89 e5                    mov    %esp,%ebp
 804840e:    83 ec 08                 sub    $0x8,%esp
 8048411:    83 ec 0c                 sub    $0xc,%esp
 8048414:    68 d0 84 04 08           push   $0x80484d0
 8048419:    e8 c2 fe ff ff           call   80482e0 <printf@plt>
 804841e:    83 c4 10                 add    $0x10,%esp
 8048421:    90                       nop
 8048422:    c9                       leave
 8048423:    c3                       ret

08048424 <main>:
 8048424:    8d 4c 24 04              lea    0x4(%esp),%ecx
 8048428:    83 e4 f0                 and    $0xfffffff0,%esp
 804842b:    ff 71 fc                 pushl  -0x4(%ecx)
 804842e:    55                       push   %ebp
 804842f:    89 e5                    mov    %esp,%ebp
 8048431:    51                       push   %ecx
 8048432:    83 ec 04                 sub    $0x4,%esp
 8048435:    e8 d1 ff ff ff           call   804840b <foo>
 804843a:    b8 00 00 00 00           mov    $0x0,%eax
 804843f:    83 c4 04                 add    $0x4,%esp
 8048442:    59                       pop    %ecx
 8048443:    5d                       pop    %ebp
 8048444:    8d 61 fc                 lea    -0x4(%ecx),%esp
 8048447:    c3                       ret
 8048448:    66 90                    xchg   %ax,%ax
 804844a:    66 90                    xchg   %ax,%ax
 804844c:    66 90                    xchg   %ax,%ax
 804844e:    66 90                    xchg   %ax,%ax

I have oly copied the code of main and foo fucntion here. Observe the call to the foo fucntion from main.

  1. Main seems to push nothing before the fucntion call. That means foo does not takes any arguments.
  2. Call instruction will ask the CPU to save the return address(address next to instruction pointer) into the stack. This is done by the call intuction, so will not be visible in the code.
  3. The first instruction of foo is to push $ebp into the stack.
  4. The immidiate instruction will be to pint $ebp to point to $esp. Those instruction can be divided into three parts, which are explained in the flow below.
      main                                                                foo
+-----------------+                                               +-----------------+
|                 |                                               |                 |
|1. ret val space |                                               |1. pushes ebp    |
|2. arguments     | +------------->CALL Inst +------------------> |2. updates ebp   |
|                 |                    +                          |                 |
|                 |                    v                          |                 |
+-----------------+       Pushes the return address               +-----------------+

Assembly Language 2

In the previous Assembly language section code, I mentioned how the code for simple function call looks like. In this section we will show, how the buffers look like in asm code.

#include <stdio.h>
void foo()
{
    char ch[10];
    printf("Calling from fucntion");
}

int main()
{
    foo();
    return 0;
}

Corresponding foo function in assembly.

0804840b <foo>:
 804840b:    55                       push   %ebp
 804840c:    89 e5                    mov    %esp,%ebp
 804840e:    83 ec 18                 sub    $0x18,%esp
 8048411:    83 ec 0c                 sub    $0xc,%esp
 8048414:    68 d0 84 04 08           push   $0x80484d0
 8048419:    e8 c2 fe ff ff           call   80482e0 <printf@plt>
 804841e:    83 c4 10                 add    $0x10,%esp
 8048421:    90                       nop
 8048422:    c9                       leave
 8048423:    c3                       ret

Notice, two sub calls.

804840e:    83 ec 18                 sub    $0x18,%esp
8048411:    83 ec 0c                 sub    $0xc,%esp

For now we can ignore the first sub call, but second one is important. The second sub call from esp actually updates the stack to allocate the space for the ch buffer. Notice as I mentioned that names are nothing in asm. They are just reference from $esp or $ebp.