Memory Ramblings: Static Allocation
This is the first post is a post in a series
about memory: memory allocation, memory
management, etc., static allocation, which was originally
going to be part of a series (and might be one day).
I am assuming the reader can understand C (and/or C++) and has an understanding of its memory model and build tools (compiler and linker). This post will have a bit of assembly for the Microchip PIC12 and for Intel/AMD x64.
Memory Allocation
Memory allocation is the process of reserving memory and dividing it up for specific tasks or purposes. There are two aspects to memory allocation - a low level that deals with requesting memory from the operating system and a high level that deals with dividing up memory and reserving it for specific purposes.
Low Level (Reserving Memory)
At the low level, there are two different approaches: static and runtime allocation. With static allocation , either a program is running alone on a computer (with no operating system) and can divide up physical memory as needed, or the program has memory reserved for it by the operating system as part of program loading. In runtime allocation , the program performs a system call to request more (or less) memory from the operating system at runtime.
Runtime allocation is usually called
dynamic allocation
, however, that term
is also used to refer to heap allocators like
malloc()/free()
in C or
new/delete
in C++. I'll call it runtime allocation in this article to avoid the
confusion and note that I am specifically referring to reserving memory from
the operating system, not from the C/C++ heap. The C/C++ heap
itself relies on system calls to get large chunks of memory which are then
managed by a heap algorithm in order to implement
malloc()/free()
and
new/delete
.
On Linux, static allocation is performed through a coordination of
the compiler, linker, and the program loader (specifically the ELF format
loader). The compiler and linker result in a binary that describes what
sections of memory need to be allocated (and where), and the OS sets that up
when loading the program. Runtime allocation can be done with the
mmap()
,
mmap2()
,
brk()
, or
sbrk()
system
calls.
On embedded platforms with no OS, there is no runtime allocation, since the program already has control and access to all of the system's memory. Instead, there is just static allocation in the assembler or compiler to help you divide up all the memory you have access to. You may still have calls used to allocate from a heap, or even map pages into virtual memory, but your program already owns and controls all the memory to begin with. You did not need to ask an OS for access to it.
High Level (Dividing Memory)
At the high level, you have memory but you need to divide it up in a useful,
efficient way. In C, the language provides static allocation, manages
memory on a call stack, and the C standard library provides a heap allocator
through the
malloc()
/
free()
interface.
Other techniques include linear allocators (also called memory arenas), block or pool allocators, reference counting, and garbage collection.
Later posts will might look at some of these allocation strategies.
Static Allocation
Static allocation is the most basic memory allocation technique, to the point that it is not usually thought of as such. Static allocation is used both to obtain memory from the OS and to divide up the memory for different variables and data structures.
With static allocation, you reserve and partition memory at compile time (or earlier). Because the memory layout is generated at compile time, it cannot change or "grow" during runtime; whatever variables, arrays, etc are declared in the program are all there will ever be. Statically allocated memory is available at the very start of the program and persists for the duration of the program.
Static allocation is closely related to program loading, and the same systems that allocate your static memory also set up the memory to store your (machine) code. The program loader is instructed on how to set up a process's virtual address space by reading the headers of the executable.
In this post, we'll look at how static allocation is implemented as well as the costs and benefits of using it. We will start by looking at assembly language programs on an extremely simple device (PIC12), and then work towards the implementation for C on modern computers (x64).
Manual Memory Layout
The simplest way to perform static allocation is to manually select the memory locations for all of your data. This is an approach used in assembly on single-program computers with no OS (ex, a NES game). The programmer manually assigns a purpose to each memory address (or to a range of them). The assembler may help you do this by supporting aliases, but that is not required - you could also use the memory addresses directly. There is little help from the compiler or assembler; it is a manual process done before compilation by the programmer.
For an example of manually laying out memory, we'll look at a program for the PIC12F508 microcontroller. The PIC12F508 is a very small, 4-pin microcontroller. It has:
- 512 words of ROM for program instructions!
- 25 bytes RAM! (Yes, 25. Not a typo.)
- A Harvard architecture. (Separate program (ROM) and data (RAM) address spaces).
- A 1 MIPS CPU, with single cycle instructions running at 1 mhz! (Slower than a TI calculator.)
- Only 33 simple instructions!
- A free assembler (mpasm) and free-ish C compiler (xc8).
You can read the data sheet here. See the Instruction Set section for details on the 33 instructions. If you don't understand the assembly perfectly, don't worry - I'll explain the important parts below the source. Here is the program:
; MPASM instructions are of the form:
; label directive operand1, operand2 ...
;
; Most directives correspond to machine instructions, and cause the
; instruction to be output at the current address location, and then
; move the current location forward by their size.
;
; The label is optional. If there is a label, the label
; becomes an alias for the current address, which is typically
; where the directive following the label writes an instruction.
; You can also place a label on a line alone, without a directive.
;
; Some directives do not output instructions, but instead change
; assembler settings.
; Here is the program itself:
; Behold, a program more boring than hello world: A delayed output.
; MPASM assembler ceremony (set device and import definitions)
list p=12f508 ; Set the device type
#include p12f508.inc ; Include definitions (aliases) for the device
; Set chip configuration bits: (Use internal oscillator, no watchdog timer)
__config _OSC_IntRC & _WDT_OFF & _CP_OFF & _MCLRE_ON
; Creates the reset vector (entry point on startup)
reset_vector org 0x0000
goto main_program
; Declares an alias foo for memory location 0x7
foo equ 0x7
main_program org 0x0001
movlw 0x1E
tris GPIO ; set gp0 pin to output mode, others to input mode
movlw 0x6F
movwf foo ; set foo to 0x6f
loop
decf foo ; foo--
btfss STATUS, Z ; check foo == 0
goto loop ; repeat loop if false, otherwise:
movlw 1 ;
movwf GPIO ; set output on
goto $ ; and halt
end ; MPASM demands this be here, it does nothing
This program initializes a byte at memory location 0x7 to the value 0x6F. It loops while decrementing the value in 0x7, and once it hits zero, it exits the loop, sets an output pin on the microcontroller to on, and then halts (loops in place forever). Exciting stuff.
The key line here is
foo equ 0x7
. This assembly directive creates an
alias
foo
for the value
0x7
. This is not a variable - its an
alias, like a pre-processor definition in C. Wherever
foo
is used,
0x7
is substituted. This allows us to refer to
memory location 0x7 with a mnemonic name. Essentially, as the programmer, we
have decided that 0x7 will store a value we will call foo. By using many
such aliases, a programmer can manually organize and reserve the available
memory.
As an example, let's see how you would use this method to declare an array of 10 2D vectors, each of which contain a 16-bit X field, and a 16-bit Y field: (I am assuming we have a lot more RAM than the 12F508 now :-))
vec2d_size equ 0x4 ; Size of vector
vec2d_x equ 0x0 ; Offset of X field
vec2d_y equ 0x2 ; Offset of Y field
my_vecs equ 0x7 ; Location of 10 vectors
next_var equ (my_vecs + vec2d_size * 10)
The size of the data structure and offsets to its fields are recorded manually. Space at 0x7 is reserved for the 10 vectors. The next variable in RAM has it position computed from the amount of space that the vectors take up, putting it at a location of 0x2F (47). Without an assembler that can compute arithmetic expressions, laying out memory this way becomes tedious and manual. Even with support for arithmetic expressions, your program will get verbose with all of the explicit address calculations clogging up your variable definitions. It would be nice if the assembler could do this for us, which it can, and which we will look at next.
Note that this assembly program also places the code at exact
addresses. The
org
directive
specifies the current program memory position to output code at. So, for
example, the line
main_program org 0x0001
makes the following assembly
instructions write out to program memory address 0x0001 and makes main_program
an alias for 0x0001 (the PIC has a separate address space for program and data
memory, so this does not clash with
foo
at
0x7
).
We can see the result of the
equ
and
org
directives by
looking at the disassembly:
ADDRESS DISM
000 GOTO 0x1
001 MOVLW 0x1E
002 TRIS 0x6
003 MOVLW 0x6F
004 MOVWF 0x7
005 DECF 0x7, F
006 BTFSS 0x3, 0x2
007 GOTO 0x5
008 MOVLW 0x1
009 MOVWF 0x6
00A GOTO 0xA
Linker Based Static Allocation
The next stage after a manual layout (aka absolute addressed) assembly program is to make relocatable code and incorporate a linker. When using a linker, each file is assembled into a binary object file which contains the output in an intermediate form that has not applied absolute addresses yet. The linker then combines the object files into a final program binary.
In an absolute addressed assembly program like the last one, you can spread
your code across multiple files with the
include
directive, but since all of
the variables and code are manually placed at absolute addresses, these
files need to be aware of what memory is being used by each other file. If
you incorporate a third-party library into your program, you
have to go into its code and shuffle everything around to make sure it does
not use any of the memory you have already decided to use for your program. In
general, every time you want to combine two source files together, you need to
make sure their memory layouts do not conflict.
In contrast, when using a linker, it decides where to place everything, so the object file from the third-party library can be linked with your program's object files without any changes to its source. Your assembly files no longer reference absolute memory addresses (unless you are doing so explicitly to access hardware).
To use the linker, your program is assembled as a collection of "translation units". A translation unit generally consists of a single source file, but can contain multiple files via includes. The assembler will process the translation unit and produce a single object file. Each object file contains one or more "sections", which are memory regions that have to be placed at absolute addresses by the linker. Generally, there will be one code section and one data section, but more complex platforms will have more.
Let's look at an example of linking on the PIC12:
list p=12f508
#include p12f508.inc
__config _OSC_IntRC & _WDT_OFF & _CP_OFF & _MCLRE_ON
global start
; The 'code' directive means:
; The following directives' outputs are placed into the code section
reset_vector code 0x0000
goto start
; The 'udata' directive means:
; The following directives' outputs are placed into the udata section
udata
; Reserve a memory location, assign alias foo:
foo res 1
; Go back to the code section
main_program code
start
movlw 0x1E
tris GPIO ; gp0 is output, others are inputs
movlw 0x6F
movwf foo ; set foo to 0x6f
loop
decf foo ; foo--
btfss STATUS, Z ; check foo == 0
goto loop ; repeat if false
movlw 1
movwf GPIO ; set output
goto $ ; halt
end
This program does the same thing as the last one, but there are some big
changes: the
code
directive replaces the
org
directive,
res
replaces the
equ
, the
udata
line
has been added and there is a new
global
directive too.
The
code
directive sets the assembler to output into the code
section. Optionally, the directive takes an absolute address to require code
to output at a specific place; this is used to set interrupt vectors, which
are located at fixed addresses. However, a majority of code will not have an
absolute address. The
udata
directive has also been added. It sets the
assembler to output into the udata (user RAM) section.
The
res
directive
reserves memory and makes the label an alias for the
address of the reserved space. The line
foo res 1
reserves 1 byte of
memory and makes foo an alias to the location of the reserved memory. The
programmer does not know the location of the memory for foo; instead, it is
decided by the linker when it combines the object files.
Code in separate object files do not share the same symbols, so a way is
needed to indicate that a particular symbol comes from another object
file. A file can export a symbol and make it visible to other translation
units with the
global
directive. In other files, the
export
directive declares that a symbol comes from a different object file.
Lets look at the disassembly:
ADDRESS DISM
000 GOTO 0x4
001 XOR W, 0xFF
002 RETLW 0x0
003 RETLW 0x0
004 MOVLW 0x1E
005 TRIS 0x6
006 MOVLW 0x6F
007 MOVWF 0x7
008 DECF 0x7
009 BTFSS 0x3, 0x2
00A GOTO 0x8
00B MOVLW 0x1
00C MOVWF 0x6
00D GOTO 0xD
The linker decided that foo should be at 0x7 - the same as our chosen address
in previous example. This is because it it the first general purpose RAM
location, so the first reserved byte of memory gets placed there. It started
the
main_program
code section at 0x004 instead of 0x001.
Lets repeat the example from last time, where we created an array of 10 vectors:
vec2d_size equ 0x4 ; Size of vector
vec2d_x equ 0x0 ; Offset of X field
vec2d_y equ 0x2 ; Offset of Y field
my_vecs res vec2d_size*10
next_var res 2
another res 1
This is a huge improvement over the last version. Now you only need to know the size of your data structure and how many to allocate. You reserve that much space, and the linker takes care of finding the place to put it. The following variables do not need to compute any offsets. This is starting to look a lot like declarations in C! (We will look at those next.) Note that you do not need a linker combining object files to layout code and reserve memory this way - you could make an assembler or programming language that does the same thing all in one big compilation. However, historically, computers did not have enough memory to assemble and layout an entire program at once, so the linker was created to break it down into per file steps. This separate compilation and linking step is also (sadly) used by C and C++ to this day.
Static Allocation in C
Now its time to look at what happens when we statically allocate in a C program. In C, static allocation of variables is used for all global variables as well as any locals declared with the static keyword.
Here is the program:
#include <xc.h> // Include for PIC platform specific definitions
#include <stdint.h> // Standard integer sizes
uint8_t foo;
uint8_t baz;
uint8_t bar = 17;
void main(void) {
TRIS = 0x1E;
foo = 0x6F;
do {
foo--;
} while (foo);
GPIO = 1;
while (1) {}
}
Once again, this program does the same thing as the last one. Just like the
res
directive, globals in C get placed in memory by the linker.
However, the C language provides some guarantees about statically allocated
variables: they are initialized to 0, or you can provide an initial value. To
help demonstrate this, I've added an two extra variables, one with no value
(init to zero) and the other initialized to 17. Lets look at the disassembly:
ADDRESS DISM
000 GOTO 0x1
001 CALL 0x6
002 MOVWF 0x9
003 CLRF 0x8
004 CLRF FSR
005 GOTO 0x1F3
006 RETLW 0x11
... snip ...
1F3 MOVLW 0x1E
1F4 TRIS GPIO
1F5 MOVLW 0x6F
1F6 MOVWF 0x7
1F7 MOVLW 0x1
1F8 SUBWF 0x7, F
1F9 MOVF 0x7, W
1FA BTFSS STATUS, 0x2
1FB GOTO 0x1F7
1FC MOVLW 0x1
1FD MOVWF GPIO
1FE GOTO 0x1FE
1FF XORLW 0xFF
The main function starts at
0x1F3
(the Microchip xc8 C linker seems to start from
the top of memory down). You'll notice there are some instructions at 0x001 that
run before main. This code copies the value 17 from program
memory into the location for the variable bar (0x9), and zeroes the memory
location of baz (0x8).
Because the PIC architecture has a separate program and data space, there is no way to have data memory hold any known value at startup. So, on this architecture, there is a runtime cost to static allocation in C, since all of the data must be either cleared to zero or copied from constants stored in program memory. Generally speaking, any architecture with either a separate program and data address space or with program code stored in ROM will need to include runtime work to set up C static variables. Next, lets move to a modern architecture with a modern OS, and see what happens there.
Static Allocation with an OS and Virtual Memory
We will go back to assembly, but this time, we will use the NASM assembler on Linux on an Intel x64 architecture processor. If you are unfamiliar with x64 assembly, there are plenty of guides across the internet (like this) in addition to Intel's own documentation. Here's a basic x64 "hello world" for Linux:
; Enter the .bss section, the section for uninitialized
; static data
section .bss
; declare an unitialized 64 bit variable
; named state_uninit_var
static_uninit_var: resq 1
; Enter the .data section, the section for initialized
; static data
section .data
; Create an initialized char/byte array,
; with a newline at the end (the byte value 10)
; with a label/alias 'static_str'
static_str: db "hello, world", 10
; Save the length of the string as an alias
; The $ represents the current output address,
; which is just after the string. So
; $ - static_str == length of the string
static_str_len: equ $-static_str
; Declare an initialized 64 bit integer,
; initial value is 4
static_number: dq 4
; Enter the text section, the section for program instructions
section .text
global _start ; Export the entry point for the program
_start:
mov rax, [static_number] ; rax = *static_number
add rax, 10 ; rax += 10
mov [static_uninit_var], rax ; *static_uninit_var = rax
cmp rax, 14 ; compare rax and 14
jne .no_print ; jump to .no_print if not equal
; Call the write system call (Linux kernel interface):
; A Linux system call is performed by loading arguments
; into specific registers and then running the syscall
; instruction (on x64 at least)
;
; The first argument, in register rax, is the call number,
; denoting which operation to perform.
mov rax, 1 ; write(
mov rdi, 1 ; STDOUT_NO,
mov rsi, static_str ; static_str,
mov rdx, static_str_len ; static_str_len)
syscall
.no_print:
; Call the exit system call
mov rax, 60 ; exit(
mov rdi, 0 ; 0)
syscall
; Note - Build with: nasm -f elf64 -o prog1.o
; Link with: ld prog1.o -o prog1
This program prints "hello, world\n" to stdout and then exits with an exit
code of 0. The syscall instruction triggers a jump into the kernel's system
call handler. Parameters are passed in specific registers as per the Linux
ABI, with register
rax
containing the system call id. We use this to call two
Linux system calls,
write()
and
exit()
.
Code and data are placed into one or more sections defined by the
section directive. The text section contains all program instructions, the
data section contains initialized static data and the bss section contains
uninitialized static data. Just like the Microchip assembler, uninitialized
data is declared with "res" directives. (
resq
reserves quad-words,
which are 8 bytes each).
New though is the stuff in the the ".data" section. Literal data can be
added to the object file with the
d*
set of directives.
db
declares bytes and when followed by a string literal, declares all of the
bytes in the string.
static_str: db "hello, world", 10
reserves 13
bytes for the the "hello, world" string data followed by 10 (a newline).
static_number: dq 4
reserves space for a 64-bit integer initialized
to the value 4.
ELF64 Program Loading
So all this looks nice, but it raises the question of how the program and its data get loaded in the first place. On the Microchip architecture, the program is written to flash ROM and runs right at power-up. But in this case, the program comes from disk and the operating system must load and launch it.
Furthermore, you no longer have access to a simple block of physical memory. Instead, you have a virtual address space that is managed for you by the kernel. With a virtual address space, there is a layer of indirection between your program and the physical memory. The kernel configures a page table that maps address ranges in your program to pages of physical memory. When you access memory in your program, your virtual addresses are first translated into physical ones as per the page table, and then the lookup occurs. So, if the program expects the string "hello, world" to be in a specific place, it needs to get mapped there by the kernel.
To figure this out, we need to look at the executable file that NASM produces.
The
readelf
program can print out text tables of the headers of an
ELF64 binary. With the command
readelf -S prog1
, we can get a list of
the sections:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 00000000004000b0 000000b0
0000000000000041 0000000000000000 AX 0 0 16
[ 2] .data PROGBITS 00000000006000f4 000000f4
0000000000000015 0000000000000000 WA 0 0 4
[ 3] .bss NOBITS 000000000060010c 00000109
000000000000000c 0000000000000000 WA 0 0 4
[ 4] .shstrtab STRTAB 0000000000000000 000002cc
000000000000002c 0000000000000000 0 0 1
[ 5] .symtab SYMTAB 0000000000000000 00000110
0000000000000150 0000000000000018 6 10 8
[ 6] .strtab STRTAB 0000000000000000 00000260
000000000000006c 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
Here you can see the sections that we used, plus a few more. For example, the C language usually puts string constants in the read-only strtab section, not the data section like we have done. What's important here are the address and offset columns. The address value is the location in virtual memory where the section begins, and offset is the offset of that section from the beginning of the ELF64 segment.
If we look at the disassembly of the program, we can see that the program finds
the
"hello, world\n"
string at address 0x6000f4, in the range
the data section is set to be mapped:
Disassembly of section .text:
00000000004000b0 <_start>:
4000b0: 48 8b 04 25 01 01 60 mov rax,QWORD PTR ds:0x600101
4000b7: 00
4000b8: 48 83 c0 0a add rax,0xa
4000bc: 48 89 04 25 0c 01 60 mov QWORD PTR ds:0x60010c,rax
4000c3: 00
4000c4: 48 83 f8 0e cmp rax,0xe
4000c8: 75 1b jne 4000e5 <_start.no_print>
4000ca: b8 01 00 00 00 mov eax,0x1
4000cf: bf 01 00 00 00 mov edi,0x1
4000d4: 48 be f4 00 60 00 00 movabs rsi,0x6000f4
4000db: 00 00 00
4000de: ba 0d 00 00 00 mov edx,0xd
4000e3: 0f 05 syscall
00000000004000e5 <_start.no_print>:
4000e5: b8 3c 00 00 00 mov eax,0x3c
4000ea: bf 00 00 00 00 mov edi,0x0
4000ef: 0f 05 syscall
Next, let's list the ELF64 program headers, which will show us the
segments that get loaded into memory, with
readelf -l prog1
:
Elf file type is EXEC (Executable file)
Entry point 0x4000b0
There are 2 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000f1 0x00000000000000f1 R E 200000
LOAD 0x00000000000000f4 0x00000000006000f4 0x00000000006000f4
0x0000000000000015 0x0000000000000024 RW 200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data .bss
An ELF64 binary contains a collection of segments. LOAD (PT_LOAD) segments are regions of memory that are supposed to be mapped into the process's virtual address space. The first LOAD segement contains the text section, and it gets placed at 0x400000 and is readable/executable. The second segment contains both the data and bss sections, which get mapped to 0x6000f4 and is readable/writable. Notice that the MemSiz of the second segement is larger than the FileSiz; this is because the bss section is all uninitialized data. Instead of getting data from the file, the bss section is initialized to zeroes.
So how do these get into the running process? The code for loading ELF format
binaries is in
fs/binfmt_elf.c.
The function in question is load_elf_binary. Amidst a whole bunch of work, it
eventually loops through all of the program headers looking for PT_LOAD
entries, mmaps (vm_mmap_pgoff) each segment, and calls brk (vm_brk_flags) to
map anonymous (zeroed memory) pages for uninitialized regions such as the bss
section whenenever
MemSiz > FileSiz
.
So, when a program is loaded on a modern Linux system, statically allocated
memory is set up mostly the same way as runtime allocations:
mmap
and
sbrk
. In this case, the file being memory mapped is the program
binary itself, and it is all set up automatically by the ELF loading code.
There must always be at least one
mmap
performed by the kernel,
since, of course, your program cannot preemptively memory map itself. However,
you could actually make a program that contains only a text section and does
not have any data or bss.
Static Allocation and ASLR
Lastly, we will look at what happens with static allocations on a modern Linux C program. Here is the program:
#include <stdio.h>
const char *foo = "hello, world";
int main() {
printf("%llx: %s\n", (long long)foo, foo);
return 0;
}
When you compile and run this program (assuming you have a kernel configured normally for modern Linux distributions), something interesting will happen. Every time you run the program, the address of the string will change!
eric@ubuntu:~/statics$ ./prog5
562c07baa714: hello, world
eric@ubuntu:~/statics$ ./prog5
55e6f8c5c714: hello, world
eric@ubuntu:~/statics$ ./prog5
55c677a52714: hello, world
What changed? The first sign is in the ELF header. The assembly program produced an ELF binary of type ET_EXEC, but GCC produced an ET_DYN file - a file that is relocatable and dynamically loadable. A relocatable library is a binary that can be loaded to any address and still be set up to work correctly. This is typically meant for dynamic (aka shared) libraries, which need to be loaded at runtime by another program, meaning that they will not always be placed at the same address.
The disassembly will hint at
us what's going on here (run
objdump -M intel -d prog5
). This
disassembly is pretty long, so I wont paste it below -
you can read it here. You can also
see the list of sections in the ELF binary
here.
Instead of generating a normal executable, where the program code and data are at known locations in virtual memory, the compiler has produced Position Independent Code. In position independent code, your statically allocated data is not addressed directly. Instead, the actual location of the data is looked up in a Global Offset Table (GOT).
The GOT is is a table of addresses loaded from an ELF executable and then populated by the dynamic library loader. The entries are used to find the location of statically allocated data in sections that have been dynamically loaded into the virtual address space. This allows dynamic libraries to be loaded to any location in memory, while still having their state accessible. There is a similar table, the procedure linkage table, for calling functions in dynamically loaded code.
So if this is an executable, not a dynamic library, why did GCC
make it dynamic, relocatable code? It does this so that the program data can
be intentionally loaded to a different random location on every run - a feature
called address space layout randomization (ASLR). ASLR is a security feature
that attempts to make it much harder for code injected with an overflow
exploit (or similar) to be able to perform dangerous operations. For example,
an exploit may overwrite a return address to jump to the libc
system()
function and launch a shell, or they may try to read a
private key out of the program's memory. Attacks like these become much
harder if the locations of such things in memory are random - now any attack
has to have some way to search for and find everything.
Costs and Benefits of Static Allocation
Costs
The most obvious cost of static allocation is that it is fixed in size. All of the arrays, hashes, etc that you allocate statically are set to a size at compile time and that size won't change.
Another cost of static allocation is poor error reporting. If you rely on static allocation to allocate a large block of memory and it is too big for the system it is run on, the program will just not start. It might print a message like "Segmentation fault", which is confusing and unhelpful to the end user. If you allocate the memory yourself, you can check for failure and either print a useful error message or change the program behavior to require less memory. When you statically allocate, you are not just setting a maximum memory usage, but also a minimum usage.
On microcontroller platforms like the Microchip PIC, a large amount of static allocation may introduce a delay at startup during which the initialzed data is copied to RAM and the uninitialized data is cleared to zero. Depending on your design constraints, the time this adds to startup may be problematic. Plus, you pay twice for your initialized data - if it is non-constant, it has to be stored in program memory and copied into data memory; this can waste a lot of space on small devices.
C++ constructors do not play well with static allocation. Within a single translation unit, globals are constructed at startup in the order of declaration. Across translation units, the order is implementation specific. This may make static allocation hard to use when relying on C++ constructors for initialization, depending on how interdependent your declared objects are.
Benefits
The most obvious cost of static allocation is also the most obvious benefit: it is fixed in size at compile time. Since you know the whole data layout at compile time, the exact memory usage of your program can be guaranteed. This is extremely valuable on memory constrained devices like 8-bit microcontrollers, where you want to be certain that the program can never exceed its memory budget.
Because static allocations are fixed for the duration of a progam and known at compile time, static allocations do not require any runtime pointer management to prevent leaks. Additionally, they are all packed into one contiguous section so there is no external fragmentation.
In languages like C that let you set initial values for static data, static allocations allow you to store data directly in the source code, and have it available immediately at the start of the program.
A Common Misconception
A common misconception regarding static allocation is that using it will result in tons of global variable filled spaghetti code. The assumption that static allocation implies global state is false. Visibility is separate from how the memory was allocated. In C, for example, static allocations can have global scope, file scope (static global), or function scope (static local). Just because data is allocated statically does not mean it has to be accessed globally.
When writing your application, you can have one small area of the code own the static allocation, and have the rest of your program access it through a pointer that is passed in as an argument. This allows you to use static allocation without directly accessing global state, and also supports refactoring later to other allocation strategies.
Next TimeNext Time?
Well, that's it for this topic. The next article will might be about the next most
basic allocator, the call stack, and will have some C and x64 assembly. No
more PIC12 :).