Program Execution in x86 architecture.
👷♂️ Software Architecture Series — Part 30
When we think about memory in modern computer systems, we think about a large array of bytes, each with its own address. In a multiprocessor setting, all CPUs share the same memory. When a developer writes a program in a computer, it resides in the disk memory. For execution, the program needs to be loaded in main memory since the CPU can only work with data that resides in memory. When a program is loaded in the main memory, the operating system sets program counter to the address of the first instruction in the program. A program counter is a special type of register, whose value is incremented to point to the next instruction in sequence after executing current instruction. If the system runs out of memory, the OS might temporarily move some data back to the disk (swap space) and bring it back into memory when needed. This process is called paging or swapping, and it helps manage limited memory resources.
However, while writing a program, a developer uses symbolic names for variables and instructions, but a computer needs actual memory locations to execute the program. Moreover, programs do not always run in the same memory location. The operating system might load a program into different memory regions based on availability. Modern computer systems often run multiple programs simultaneously. These programs need to share memory resources without interfering with each other.
Address Binding
To translate symbolic addresses (in the source code of a program) into actual memory locations in the system, address binding is performed. Once the source code is compiled, symbolic addresses are converted into relocatable addresses, also known as logical or virtual addresses, which are not tied to a specific physical memory location but are defined relative to the start of the program or module. This means that the code does not assume any specific location in memory. Instead, it references memory locations based on offsets from the beginning of the program. For example, if a variable is placed 8 bytes from the beginning of the program, its relocatable address might be expressed as “8 bytes from the start”.
Since the relocatable address is a relative one, the entire program can be loaded into different physical memory locations without altering the code. Considering previous example, whether the program is loaded at memory address 0x1000 or 0x2000, the variable will always be 8 bytes from the start of the program. This dynamic loading of programs allows the OS to have flexibility to load the program wherever there is free memory. Moreover, the same compiled code can be loaded into different memory locations at runtime without needing recompilation. This is crucial in supporting multiple programs simultaneously.
Linking
Large programs are generally modular in nature. With relocatable addresses, each module is compiled separately into an object file, which contains machine code, but with unresolved references (e.g., calls to functions defined in other modules). To resolve references between different files and ensure that all function and variable references point to the correct locations, a special tool called linker is used. The linker’s primary job is to combine these object files into a single executable with all references resolved and intact.
The process of linking can be static or dynamic. If a program uses functions from external libraries (e.g., standard C library functions printf()), the linker can include the relevant library code in the final executable. This is called static linking because the library code becomes part of the executable file. Alternatively, the linker can generate references to shared libraries (also known as dynamic-link libraries, or DLLs). In this case, the actual library code is not included in the executable; instead, it is loaded at runtime by the loader. This is called dynamic linking and allows multiple programs to share the same library code. The linker resolves relocatable or logical addresses across different modules and combines them into a single executable. At this stage, the logical addresses are still relative to the program’s address space, not yet tied to any specific physical memory location.
Loading
The final executable file contains machine code ready to be loaded into memory for execution. The operating system uses a tool called loader to read the executable file from disk and load the program’s code, data, and other sections into the main memory. The executable file contains instructions for the loader on how to organize these sections in memory. The loader allocates memory for the program’s code, data, and stack and sets up any necessary memory regions, such as the heap for dynamic memory allocation. In case of memory constraints, a program may not be loaded at the location specified by the linker. In this case, the loader may need to perform additional relocation by adjusting the addresses of variables and functions to match the actual memory location where the program is loaded. In case of dynamic linking, the loader needs to locate and load the necessary shared libraries into memory and resolves references to functions and variables in these libraries. Modern systems often use position-independent code for shared libraries, allowing them to be loaded at different memory locations without needing relocation.
Once the program is loaded, the loader then sets up execution environment for the program by initializing CPU registers, program counter and stack pointer. It initializes the segment registers (like CS, DS, SS in x86 architecture) with the base addresses of different segments (code, data, stack). It also sets up the environment variables and command-line arguments that the program may need. Then, it releases the control back to operating system at program’s entry point (usually main function) to begin executing program instructions.
Execution
The logical address generated by CPU is often segmented and consists of a segment identifier and an offset that specifies the relative address within that segment. The CPU converts logical address into linear address by the process of segmentation. This is an intermediate step in address translation.
Segmentation
Say, a logical address is represented as ‘CS:0040’. Here, ‘CS’ is the segment identifier and ‘0040’ is the offset within the segment.
A segment identifier is a 16-bit field, often called as segment selector and offset is a 32-bit field. The segment selector is further divided into Index (13-bit), Table indicator (1-bit) and Requested Privilege Level (2-bit). The index points to an entry in the Global Descriptor Table (GDT) or the Local Descriptor Table (LDT). These tables store segment descriptors that describe the properties of each segment. The segment descriptor contains the base address of the segment (which represents the starting point of the segment) and the segment limit.
Once the CPU has the segment selector and the offset, it uses the index part of the segment selector to find the corresponding segment descriptor in the GDT or LDT, which in turn reveals the base address of the segment. The linear address is calculated by adding the offset to the base address of the segment.
Linear Address=Base Address+Offset
Considering the above example of logical address ‘CS:0040’; suppose segment selector points to a segment descriptor with a base address of 0x2000.The offset within this segment is 0x0040. The linear address would be calculated as:
Linear Address=0x2000+0x0040=0x2040
Paging
In older systems where paging is not used, the processor maps the linear address directly to a physical address. However, modern operating systems use paging as a memory management technique to translate linear addresses to physical addresses. It allows processes to use more memory than what is physically available. The OS manages this by swapping pages between physical memory and disk (swap space). By keeping track of which pages are in memory and which are on disk, the system can handle larger workloads efficiently, providing an illusion of large, contiguous memory for each process.
This is achieved by dividing both physical memory (RAM) and linear memory into fixed-size blocks. The blocks in physical memory are known as frames, while those in linear memory are called pages. A page table keeps track of where each page of linear memory (can be referred as virtual memory after segmentation) resides in physical memory. It also contains additional information, such as access permissions and whether the page is present in memory or swapped out to disk. When the CPU generates a virtual or linear address, this address is divided into two parts: Page Number (The higher-order bits that determine the page in virtual memory) and Offset (The lower-order bits that determine the offset within that page).
For example, if a system uses 4 KB pages (which is 2¹² bytes), the lower 12 bits of the linear address will be used as the offset, and the remaining higher bits will be the page number. The page number is used as an index to look up page table entries (PTEs), which contain the physical frame number (PFN) corresponding to the page number from the linear address along with other metadata, such as whether the page is present in physical memory, its access permissions, etc. The physical address is then computed by combining the physical frame number from the page table entry with the offset from the linear address.
Physical Address=Physical Frame Number+Offset
Continuing with our example, if the linear address 0x2040 maps to a physical page frame starting at 0x3000, then corresponding physical address would be calculated as:
Physical Address=Physical Frame Base Address+Offset within Page=0x3000+0x0040=0x3040
The run-time mapping from virtual to physical addresses is done by a hardware device called the memory-management unit (MMU). By using virtual addresses, the MMU helps isolate the memory of different processes, which enhances security and stability by preventing one process from accessing another’s memory. However, repeatedly accessing the page table in memory to perform the above-described translation can be time-consuming. To speed up address translation, modern MMUs use a cache called the Translation Lookaside Buffer (TLB) that stores recent translations of virtual to physical addresses.
The TLB is a small, fast cache that holds a subset of the page table entries. When the CPU generates a virtual address, the MMU first checks the TLB for the corresponding physical address mapping. If the mapping is found in the TLB, the physical address can be quickly retrieved without needing to access the page table in memory. This reduces the time needed to access the page table for every memory operation. In case, the mapping is not found in the TLB (a TLB miss), the MMU must perform a full-page table walk to translate the virtual address. This involves traversing the multi-level page tables stored in memory to find the physical address. Once found, the translation is added to the TLB so that future accesses to the same page will result in a faster TLB hit.
In a nutshell, the linker prepares the program for execution by resolving addresses and combining object files, the loader loads the program into memory and sets up the initial environment, and the CPU along with the MMU handles the actual address translation during program execution.