Programming From The Ground Up
By Jonathan Bartlett
Trying to be a programmer without understanding how a CPU works is like trying to practice medicine without learning anatomy. Sure, you can have limited success curing patients with medical advice gleaned from Google, but on the whole you’re going to be a pretty bad doctor. For those who missed out on learning assembly language, I highly recommend working through this book, even if you’ll never program in assembly again. I promise that all kind of lights will go on in your head and you’ll be a vastly better programmer.
The author gives an introduction to Programming using Linux Assembly Language
- 1. Introduction
- 2. Computer Architecture
- 3. First Programs
- 4. Functions
- 5. Dealing with Files
- 7. Developing Robust Programs
- 8. Sharing Functions with Code Libraries
- 9. Intermediate Memory Topics
- 10. Counting like a Computer
- 11. High-Level Languages
- 12. Optimization
- 13. Moving On from Here
1. Introduction
Essentially there are 3 kinds of languages:
-
Machine Language
What the computer actually sees and deals with. Every Command is a number or a sequence of numbers.
-
Assembly Language
The same as machine language, but every instruction is represented by a letter sequence which is easier for a human to remember. Plus some other small improvements.
-
High-Level Language
Used to make programming easier with a more natural language. A single command is translated to multiple instructions in machine language.
2. Computer Architecture
Modern computers are based on Von Neumann architecture: A Division into two main parts - the CPU and the Memory.
Structure of Computer Memory
Can be compared to room full of post boxes. Each location has a number and each location has the same fixed size length in which a single number can be stored.
There is no difference between a program or data stored in memory except how it is used by the CPU. Both are stored and accessed the same way.
The CPU
The CPU reads an instruction from memory and executes it: fetch-execute cycle. In order to accomplish this, it has the following elements:
- Program Counter
- Instruction Decoder
- Data Bus
- General-purpose Registers
- Arithmetic and Logic Unit
The program counter holds the address of the memory location which contains the instruction which is going to be executed next.
The CPU fetches a number from this memory location and passes it on to the instruction decoder. This figures out what the instruction means (addition, subtraction, data movement, etc.) and which locations of memory are going to be needed.
Then the data bus is used to fetch from the memory location which are needed for the calculation. The data bus can be seen on a mainboard as wires which go out from the memory to the CPU.
The CPU itself has some high-speed memory locations called registers. There are different general- and specia-purpose registers.
Now that the CPU has everything it needs, it passes the decoded instruction and the data to the arithmetic and logic unit for the actual processing. The resulta of the calculation are placed in a register or on the data bus for memory storage.
Some Terms
Each storage location in memory is referred to by a number called it’s address. The size of a single storage location is called a byte which is number between 0 and 255. On a 32-bit CPU the registers are 4 bytes long. This is called the computers word size. This fits about 4 billion values. Memory addresses are as long as the word size and therefore fit into a register.
When an address is stored in memory it is called a pointer.
Data Accessing Methods
-
immediate mode
The data is embedded in the instruction itself.
-
register addressing mode
The data is found in a register.
-
direct addressing mode
The instruction is provided with an address of memory from where to load the data e.g. into a register.
-
indexed addressing mode
The instruction is provided with an address of memory and an index register used as offset. On x86 processors you can also give a multiplier to index a whole word or larger record at a time.
-
indirect addressing mode
The instruction is provided with a register which contains a pointer to where the data is in memory.
-
base pointer addressing mode
Like indirect addressing mode, but including an offset
3. First Programs
The Programs demonstrated in this chapter are in 32-bit assembly language. This is what the book uses, but is not up to date. In 64-bit assembly one would use the 64-bit register %rax instead of the 32-bit %eax. And accordinly the movq instruction instead of movl.
Exercise 1
.section .data
.section .text
.globl _start
_start:
movl $1, %eax # 1 is the code for the exit system call
movl $0, %ebx # 0 is the return code
int $0x80 # control transfer from user space to kernel
# by interrupt 0x80
Assembling and Linking
The source is asssembled to an object file:
$ as chap3ex1.s -o exit.o
The object file is then linked to an executable binary
$ ld exit.o -o exit
More info on object files and linking can be found here
Outline of a program
Anything starting with a period like .section .data
is a pseudo operation (or assembler directive). These are not translated directly into a machine instructions but handled by the assembler differently. The .data section can contain static data like numbers or strings. The .text section contains the machine instructions.
The directive .globl _start
makes _start a global symbol. Meaning it is visible to the linker instead of beeing discarded after assembly. _start is special symbol that is always needed because it marks the beginning of the program, which is needed for loading. Symbols are going to be replaced by something else by either the assembler or the linker. They are used to reference locations of data or program by name.
The line _start:
is a label. A label is a symbol with a colon. When the program is assembled, each data value and each instruction is getting an address. The label tells the assembler that the symbols value is the address of the element following the label.
The instruction movl $1, %eax
writes the number 1 into general purpose register %eax. A number liteal is indicated by the $-sign which leads to immediate mode data access.
The line int $0x80
triggers an interrupt which interrupts the program flow and transfers the cpu control to the interrupt handler which was set up by the linux kernel. This way a system call is done.
Exercise 2
The source can be found on GitHub
Some "ifs" like jmp
, je
or jle
are used for flow control.
A list of .long
values is needed in memory, so is created in the .section .data
under the label data_items:
. Now we can refer to the memory address of the lists first item by using the symbol with the same name.
In our case, using 32-bit registers, a long value uses 4 bytes. For 64-bit see: LP64 data model.
Only 3 variables are needed, so everything can stay in registers and must not yet be moved to memory.
The destination index register %edi is incremented in the loop with incl
to walk through the list.
When two values are compared with the cmpl
instruction, the result is stored in the status register %eflags.
indexed addressing mode
The value currently pointed at by the index is then copied from the data_items list to the %eax register: movl data_items(,%edi,4), %eax
using the schema: movl ADDRESS(OFFSET,%INDEX_REGISTER,MULTIPLIER), %DESTINATION_REGISTER
leading to FINAL_ADDRESS = ADDRESS + OFFSET + ( MULTIPLIER * INDEX )
indirect addressing mode
Though not used in the current program it would work like movl (%eax), %ebx
. Where %eax contains the address from where data is moved to %ebx.
base pointer addressing mode
Similar to indirect addressing mode, but adding a constant value to the address like: movl 4(%eax), %ebx
going 4 bytes into the record at which %eax points.
immediate addressing mode
Is used with literals like $0: movl $0, %eax
4. Functions
Are used to break programs apart into pieces.
The parameters plus the expectation what to do is called the functions interface.
A program is composed of many functions. There are often high-level functions which call low-level helper functions. On the lowest level there are the so called primitive functions (or just primitives) on which everything else is build on top of. Typically the primitives are provided by the operating system.
How Functions Work
-
function name
The name is a symbol for the address of the function. It is created by a label before the functions code.
-
function parameters
These are data items which are passed to the function in order to do something with it.
-
local variables
Temporary storage locations which are needed for processing. After return they are thrown away.
-
static variables
storage locations which are not thrown away after return, but reused every thime the function is called. They cannot be accessed by other parts of the program. Static variables should be avoided because they can cause problems.
-
global variables
Data storage that is outside of the function, but used by it for processing. This often considered bad practice but a good use case are configuration values.
-
return address
A special Parameter0 that is not used for processing. It tells the function where the code execution should continou after the function has finished. The
call
instruction automatically passes the the return address to the function.ret
is used for jumping back to this address for execution. -
return value
The data which is the result of the processing is passed back to the main program. Most languages have only a single return value.
How variables are stored and parameters and return values are passed is the languages calling convention.
In assembly language any calling convention can be used, but on linux platforms the most popular is the one from the C programming language.
Assembly Functions using the C Calling Convention
A region of memory is used as the stack. The stack starts at the highest address of memory pushl
pushes a value on top of the the stack, but which is then at the bottom of the stacks memory, which grows up side down in memory. popl
removes the top element from the stack and places it in a register or memory location of your choice.
The stack register %esp does always hold a pointer to the current top of the stack. So pushl
will subtract 4 from %esp in order to point to the word in the memory segment below the former top. popl
does add 4 to %esp accordingly.
Before calling a function, the program pushes the parameters onto stack in reverse order. Then the call
instruction is used. This does two things:
- The call instruction pushes the address of its following instruction onto the stack as return address.
- Then it changes the instruction pointer %eip to point to the address of functions first instruction.
Then the function begins. At this time the stack looks like this:
Parameter #N
...
Parameter2
Parameter1
Return Address <--- (%esp) //Parameter0
At the beginning of the function two steps are done:
-
The function needs to push the current value of the base pointer register %ebp onto the stack as "old base pointer" by doing
pushl %ebp
-
The current top of the stack ("old base pointer") is the memory address that is from now on used as the functions own "base" and therefore is the %ebp base pointer register overwritten with this value: `movl %esp, %ebp.
The base pointer helps the function to find its parameters on the stack below and the local variables above, which we will see soon.
It is reference to well known position in the functions stack frame, which contains all the functions stack variables like parameters, local variables and return address.
After these two steps the stack looks like this:
Parameter #N <--- (N * 4) + 4
...
Parameter2 <--- 12(%ebp)
Parameter1 <--- 8(%ebp)
Return Address <--- 4(%ebp)
Old %ebp <--- permanently (%ebp) and currently (%esp)
Now we assume that the function will need two local variables, each of the size of one word. So in order to reserve the space on the stack we move the stack pointer two positions higher. (which means actually lower in terms of memory address): subl $8, %esp
That the variables are located on the stack frame has the effect that they will disappear together with the stack fram when the function returns. So they are called local variables.
After allocating the two words on the stack it looks like this:
Parameter #N <--- (N * 4) + 4
...
Parameter2 <--- 12(%ebp)
Parameter1 <--- 8(%ebp)
Return Address <--- 4(%ebp)
Old %ebp <--- (%ebp)
Local Variable 1 <--- -4(%ebp)
Local Variable 2 <--- permanently -8(%ebp) and currently (%esp)
When the function is finished with its processing and about to return it does the following:
-
storing the return value in %eax
-
reseting the stack to was before the call
-
returning the execution at the code location where the function was called from
movl %ebp, %esp # setting the stack (top) pointer to the base position
popl %ebp # getting the old base position into base pointer register
ret # return
Control is now back to the calling code, which is in duty to remove the function parameters from the stack if they are not needed anymore
Destruction of Registers
When making a function call, one should asume that the values which are currently in registers are wiped out. Only the base pointer register %ebp is preserved and in the Linux C calling convention also %ebx, %edi and %esi. %eax can always be overwritten with the return value.
The C calling convention is also known as ABI: Application Binary Interface.
Details can be fount at the linux foundation’s refspecs. E.g. System V ABI Edition 4.1. from 1997.
Recursive Functions
An example for a recursive function is the factorial like 4! = 4 * 3 * 2 * 1. The factorial of 4 is 4 times the factorial of 3. So the definition of the factorial function does include the function itself but of n-1.
In order to terminate the recursion a base case is needed: The factorial of 1 is 1.
This works fine because local variables are located on the stack frame and each function call gets its own stack frame.
5. Dealing with Files
Data which is stored on disk is called persistent
The UNIX File Concept
Every File, no matter which program created it, is a sequential stream of bytes.
When program wants to access a file, is has to open it by name. This makes the operating system give you a file descriptor, with which it can be refered to later. When the reading/writing is done, the file is closed, which makes the file descriptor obsolete.
We work with files the following way:
- make the open system call, with the parameters:
- file name, null terminated
- mode (e.g. 0 for read only)
- permission set, like 0666
-
as return value we get the file descriptor
- now a read system call can be made with the following parameters:
- file descriptor
- buffer address
- buffer size
return value is either the number of characters read or an negative error code
a write system call takes the same parameters as read and the buffer should be filled with data to write
- the close system call takes only the file descriptor as parameter, which is then invalid afterwards
Buffers
A buffer is continous block of bytes which is used for bulk data transfer. This fixed size area of memory is used to temporarily store the bytes/characters which the operating system did read from the disk e.g.
In order to reserve memory for the buffer, the section .bss can be used in addition to .text and .data. This is a way to reserve uninitilized storage without taking space in the executable:
.lcomm my_buffer, 500
This creates the symbol my_buffer which references a 500-byte location in memory.
To store the address of my_buffer into a register one uses:
movl $my_buffer %ecx
without the dollar sign, direct addressing mode would be used, reading the content of the memory location at my_buffer. But here we use immediate mode addressing, loading the value of my_buffer itself.
Special Files
Linux programms already have 3 special file descriptors open when they begin:
-
STDIN Standing for standard input. This read-only file can e.g. represent the keyboard. It as the file descriptor 0.
-
STDOUT Standing for standard output. This write-only file can e.g. represent the console display. It as the file descriptor 1.
-
STDERR Standing for standard error. Is used to split error messages from the normal program output at standard out. It as the file descriptor 2.
UNIX-based operating systems treat all input/output systems as files, like network connections or even audio devices. Interprocess communication with with pipes are also special files.
Defining Constants
.equ LINUX_SYSCALL, 0x80
can be used like: int $LINUX_SYSCALL
7. Developing Robust Programs
Robust programs handle errors and do not crash, no matter what the user does.
User Testing
One needs to test what happens if the user enters data in an unexpected way. E.g. entering a letter instead of number. A good way to ensure this is to let other programmers and users test the program. Especially non-programmers who don’t know what the computer expects are more likely to find bugs.
Data Testing
When testing functions wiht input data, most important is testing the corner cases.
For numeric data one always needs to test:
-
The number 0
-
The number 1
-
A number within the expected range
-
A number outside the expected range
-
The first number in the expected range
-
The last number in the expected range
-
The first number below the expected range
-
The first number above the expected range
For lists one needs to test 0 items, 1 item, a lot of items, etc.
One should also test data around any turning points. E.g. if an age under or over 30 is handled differently, then 29, 30 and 31 need to be tested at least.
In C there is the assert macro which can be turned off at compile time once the code is stable, so the binary is faster.
Module Testing
A program must not only be tested as a whole, but one needs to test also the individual pieces. In order to test functions effectively with different input data, one needs to develop additional functions whose sole purpose is to call functions for testing. These are called drivers (not to be confused with hardware drivers).
If the code beeing testing is making calls to functions which do not exist yet, one can write a small function called a stub which simply returns a value for proceeding.
Having Error Codes
It is good to have a unique error code for every possible contingency. This is often everything the users has when reporting an error. This should not be confused with troubleshooting why the error happended.
Recovery Points
To simplify error handling is good to break the program apart into disctinct units, where each unit fails and is recovered as a whole. E.g. reading a configuration file could be a unit. If there is failure at any point (opening, reading, parsing) then the program would simply skip to the recovery point for that problem. The recovery function can then e.g. close the file if it is still left open.
A good fall-back mechanism for saving effort or as a last resort is to have a single function which does simply print the error code and message and then exit the program.
8. Sharing Functions with Code Libraries
Programming can be made easier by having a set of functions on the system that are shared among any program that wishes to use them. Having a repository of shared code, a program can point to a dynamic library which contains the function needed.
If a bug is found in a library function it only has to be fixed once and all programs using it are updated automatically. But this can be also a drawback as it is possible that a program inadvertantly relies on a bug in shared function.
This and other drawbacks lead to term "DLL hell". But is generally assumed that the advantages outweight the drawbacks.
The shared code file are called: shared libraries, dynamic libraries, shared objects, dynamic-link libraries, DLLs, or .so files.
This book deals with dynamic libraries but not shared libraries which are using position-independent code and are linked differently.
The programs from previous chapters were statically-linked executables, which contained all of the functionality.
The helloworld program from this chapter is dynamically-linked, which means that some parts of the program code are contained in external libraries:
$ ld -dynamic-linker /lib/ld-linux.so.2 -o hello hello-lib.o -lc
when the program is started, the file /lib/ld-linux.so.2
is loaded first. This dynamic linker sees that the library libc.so is needed and checks the directories from /etc/ld.so.conf
and LD_LIBRARY_PATH
. The library is then loaded into the programs memory and instances of e.g. printf in the program are replaced with the actual location in the library.
To check which binaries an executable uses do:
$ ldd ./hello
# linux-gate.so.1 (0xf772a000)
# libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xf7557000)
# /lib/ld-linux.so.2 (0x5661f000)
The symbols from a library can be printed:
$ objdump -R /lib/x86_64-linux-gnu/libz.so.1 | tail
000000000021a148 R_X86_64_JUMP_SLOT open
000000000021a150 R_X86_64_JUMP_SLOT inflateInit2_
000000000021a158 R_X86_64_JUMP_SLOT inflateReset
000000000021a160 R_X86_64_JUMP_SLOT lseek64
000000000021a168 R_X86_64_JUMP_SLOT deflateResetKeep
000000000021a170 R_X86_64_JUMP_SLOT inflateReset2
000000000021a178 R_X86_64_JUMP_SLOT strerror
000000000021a180 R_X86_64_JUMP_SLOT __cxa_finalize
You can override symbols in the stock libraries by creating a library with the same symbols and specifying the library in LD_PRELOAD
:
$ LD_PRELOAD=/path/to/my/malloc.so /bin/ls
9. Intermediate Memory Topics
A computer looks at memory as a long sequence of numbered storage locations. Each are the same for programs and data and the computer does not know which are which. It only knows where to start executing.
Review of some Terms:
-
Byte The size of a memory storage location. On x86 it can hold a number between 0 and 255.
-
Word The size of a normal register. On 32-bit x86 it is 4 bytes long. Most computer operations handle a word at a time.
-
Address A number that refers to byte of memory. If data spans several bytes, it’s address is the address of the first byte. Normally one does not type an address directly, but one let it the assembler by using labels and symbols. The assembler will replace a symbol with its address.
-
Pointer A pointer is a register or memory word whose value is an address. E.g. %ebp is used as an address.
Environment Variables | <- 0xbfffffff |
Arg #n | |
… | |
Arg #2 | |
Arg #1 | |
Program name | |
# of arguments | <- %esp |
Unmapped Memory |
|
Program Code and Data | <- 0x08048000 |
The last accessible memory address is called the break. It is the boundary between the unmapped memory and the code and data segment or rather the heap.
Memory Mapping
or Every Memory Address is a Lie.
A program runnning on Linux can only access virtual memory. The physical memory refers to the actual RAM chips inside the computer. Virtual Memory is the way a program sees memory. Linux loads a program into a physical memory space large enough to fit and tell the CPU to pretend that this memory is actually at the address 0x08048000. The process of assinging virtual addresses to physical is called mapping.
Initially the area between .bss and stack is not mapped. The beginning of the area of unmapped memory is called the break
Virtual Memory can also be mapped to disk using the swap.
In order to make memory mapping more efficient, it is seperated into pages of 4096 bytes.
The amount of memory that a program really takes in physical memory is called the resident set size. The program top shows the resident size next to the virtual size.
Getting More Memory
If a program needs more memory, it is possible to move up the break point in order to create the heap segment, and move up more to increase the heap size. Inside the heap one uses the concept of allocate and deallocate so memory can be reused. Otherwise you could run out of memory, by just moving break more and more without freeing parts of memory which are not needed anymore. In C this is called malloc and free.
Performance Issues
The malloc code in the example always iterates through all memory segment, which means it runs in linear time. Better would be an algorithm that runs in constant time.
Another performance problem is, that the brk system call is used very often. When the CPU has to switch from user mode to kernel mode, this is called a context switch, which is slow.
10. Counting like a Computer
Humans count in decimal (base ten), but computers in binary (base two). Each digit in binary numer is called a bit, which stands for binary digit.
Binary Operations
George Boole was the first who studied boolean operators:
- AND
- OR
- NOT
- XOR
If you XOR a number with itself, all bits turn to zero. Doing this on a register is faster then loading.
So movl $0, %eax
is often replace by xorl %eax, %eax
.
Binary Numbers are also used for representing other data. In the following Example this is what people like:
Bob | |
---|---|
Food | 1 |
Techno | 1 |
Designer Clothes | 0 |
Football | 1 |
Alice | |
---|---|
Food | 1 |
Techno | 0 |
Designer Clothes | 1 |
Football | 1 |
Now if you want to know which things both Bob and Alice like, you would use the AND operation
1101 AND
1011
-------
1001
For finding out what they disagree on, XOR is used:
1101 XOR
1011
-------
0110
Additionaly there two binary operators which are not boolean: shift and rotate.
A left shift moves each digit one space to the left, puts a zero in the first spot, and chops off the furthest digit to the left. A left rotate does the same thing, but takes the furthest digit to the left and puts in the first spot.
Shift left 10010111 = 00101110
Rotate left 10010111 = 00101111
In order to output special bits, shifting and masking is needed. Masking is done by doing an AND with a number that has the bits we are interested in set to 1.
When a number represents a set of options for a function, the individual true/false elements are called flags. Here are some flags for the linux open system call.
O_WRONLY = 0b00000001
O_RDWR = 0b00000010
O_CREAT = 0b01000000
You can OR these flags together in the needed combination. So if you need O_WRONLY
and O_CREAT
it would be 0101
Program Status Register
The CPU has a special register called the program status register. It holds information about what happened in computation. There is the carry flag to indicate if a register was overflowed. It also hold the result of cmpl
instruction, which is used by conditional jumps like jge or jne.
Floating-point Numbers
The computer stores decimal numbers in two parts: exponent and mantissa. For example the 12345.2 can be represented as as 1.23452 * 10^4. The mantissa is 1.23452 and the exponent is 4 with a base of 10. But computers use a base of 2. All numbers are stored as X.XXXXX * 2^XXXX. The number 1 is stored as 1.00000 * 2^0. Separating the mantissa and the exponent into tow different values is called a floating point representation.
If a floating point value is big enough, adding 1.0 to it does not change it’s value. On x86 platforms, a 32-bit floating-point number cannot have 1.0 added to it past 16777216.0, because it is no longer significant. The number no longer changes when 1.0 added to it. So if there is a multiplication followed by an addition, it may give a different result than if the addition is performed first.
On most computers floatin point arithmetic takes much longer than integer arithmetic.
Negative Numers
The first intuition would be to use one bit as the sign. But this has several disadvantages. E.g. there would be a positive and a negative zero. Instead the two’s complement representation is used.
Signed values have a different range. For example a byte can normally have values up to 255. But a signed byte can store values from -128 to 127.
If a 32 bit value in two’s complement would be needed to extend to 64 bit, it cannot simply be done by 0-padding. instead sign extension must be used. There is also a sign-preserving shift-right, sarl and a shift right which does not preserve the sign, shrl.
Octal and Hexadecimal Numbers
Octal digits represent a grouping of three bits. linux permissions are done using octal numbers.
In the assembler octal values are indicated by a leading zero, like: 0664.
hexadecimal is base 16. One digit represents 4 bits which is half a byte. So two digits represent one byte.
In the assembler hex values are indicated by leading 0x like: 0xFFFF.
Order of Bytes in a Word
The x86 processor is a little-endian processor which meanst that it stores the “little end”, or least significant byte first in memory. So if there is a word in a register: 0x5d 23 ef ee It would be written to memory in reverse order: 0xee ef 23 5d. But the bits within the bytes are ordered normally. Other processors are big-endian processors.
11. High-Level Languages
For “real-world” programming, assembly language is to cumbersome most of the time. So many computer languages have been invented to make programming easier.
It is useful to know a wide variety of languages with different concepts. The more languages you know, the easier it is to pick up a new language.
Compiled and Interpreted Languages
Many languages are compiled languages. A statement is translated into one or many machine instructions.
Other languages are interpreted. These require the user to run a program called interpreter which in turn runs the given program.
There is also a class of hybrid languages which partially compile a program to byte code before execution. So the interpreter can work much faster.
High-level languages are oriented around the programmer instead of araound the machine. This can include the following features:
-
Grouping multiple operations into a single expression
-
Using “big values” which are more conceptual than the 4-byte words that computers normally deal with. E.g. having strings as a single value.
-
Better flow control than just jumps
-
Checking types of assingments
-
Memory handled automatically
-
Resembling the problem domain rather than the computer hardware
Your First C Program
#include <stdio.h>
int main(int argc, char **argv)
{
/* print to standard out */
puts("Hello World\n");
return 0;
}
*#include
Then there is the declaration of the function main, including it’s name, arguments and return type. One does not have to worry where the arguments are on the stack - the C compiler takes care of that and also of loading values into and out of registers.
puts(“Hello World\n”); is a function call. In assembly one had to push the arguments of the function onto the stack before calling the function. But C takes care of this complexity and also of defining storage for the string.
Finally the main function returns the number 0 which is automatically used as exit code for the program.
In assembly language a program is tied to both the operating system and the hardware platform, but a high-level program can be compiled for different target platforms.
12. Optimization
The process of making an application run more effectively. It can be optimized for speed, memory usage, disk usage or other things.
When to Optimize
It is better to not optimize at all than to optimize too soon. This is because optimization generally makes the code less clear and more complex which increases the cost of maintenance.
Where to Optimize
A profiler is a good tool for finding out in which functions the program spends the most time.
In general there are two categories of optimization - local optimizations and global optimizations. Local means making a specific peace of code run faster while global optimiziations are of structural nature.
Local Optimizations
Some of these may be done already by a compilers optimizer
-
Precomputing Calculations
-
Remembering Calculation Results
-
Locality of Reference
-
Register Usage
-
Inline Functions
-
Optimized Instructions
-
Different Memory Addressing Modes
-
Word-aligned Data in Memory
Global Optimizations
These are about the structure of a program. It involves achieving the following properties:
Parallelization
This means that an algorithm can effectively be split among multiply processes. The more parallelizable an application is, the better it can take advantage of multiprocessors and clustered computer configurations.
Statelessness
Stateless functions and programs are those that rely entirely on the data explicitly passed to them for functioning. This makes them parallelizable and often benefit from caching.
13. Moving On from Here
Congratulations on getting this far!
Even if you never use assembly language again, you have gained a valuable perspective and mental framework for understanding the rest of computer science.
There are essentially three methods to learn more:
-
From the Bottom Up - This is how this book teaches. It starts with low level programming and works towards more generalized teaching.
-
From the Top Down - This focuses on what you want to do with the computer and teaches you how to break it down more and more.
-
From the Middle - This is characterized by books which teach a specific programming language or API. These are not as concerned with concepts but with specifics.
A good programmer takes all three approaches into account and is constantly learning and pushing his limits.
From the Bottom Up
-
Programming from the Ground Up by Jonathan Bartlett
-
Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson and Ronald L. Rivest
-
The Art of Computer Programming by Donald Knuth
-
Programming Languages by Samuel N. Kamin
-
Modern Operating Systems by Andrew Tanenbaum
-
Linkers and Loaders by John Levine
From the Top Down
-
How to Design Programs by Matthias Felleisen, Robert Bruce Findler, Matthew Flat and Shiram Krishnamurthi
-
Simply Scheme: Introducing Computer Science by Brian Harvey and Matthew Wright
-
How to think Lika a Computer Scientist: Learning with Python by Allen Downey, Jeff Elkner and Chris Meyers
-
Structure an Interpretation of Computer Programs by Harold Abelson and Gerald Jay Sussman with Julie Sussman
-
Design Patterns by Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides
-
What not How: The Business Rules Approach to Application Development by Chris Date
-
The Algorithm Design Manual by Steve Skiena
-
Programming Language Pragmatics by Michael Scott
-
Essentials of Programming Languages by Daniel P. Friedman and Mitchell Wand
From the Middle Out
-
Programming Pearl by Larry Wall e.a.
-
Common LISP: The Language by Guy R. Steele
-
ANSI Common LISP by Paul Graham
-
The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie
-
The Waite Group’s C Primer Plus by Stephen Prata
-
The C++ Programming Language by Bjarne Stroustrup
-
Thinking in Java by Bruce Eckel
-
The Scheme Programming Language by R. Kent Dybvig
-
Linux Assembly Language Programming by Bob Neveln
Specialized Topics
-
Practical Programming - Programming Pearls by Jon Louis Bentley
-
Databases - Understanding Relational Databases by Fabian Pascal
-
Project Management - The Mythical Man-Month by Fred P. Brooks
-
UNIX Programming - The Art of Unix Programming by Eric S. Raymond
-
UNIX Programming - Advanced Programming in the UNIX Environment by W. Richard Stevens
-
Network Programming - Unix Network Programming by W. Richard Stevens
-
Generic Programming - Modern C++ Design by Andrei Alexandrescu
-
Compilers - The Art of Compiler Design: Theory and Practice by Thomas Pittman and James Peters
-
Compilers - Advanced Compiler Design and Implementation by Steven S. Muchnick
-
Development Process - Refactoring: Improving the Design of Existing Code by Martin Fowler e.a.
-
Typesetting - Computers & Typesetting by Donald Knuth
-
Cryptography - Applied Cryptography by Bruce Schneier
-
Linux - Professional Linux Programming by Neil Matthew e.a.
-
Linux Kernel Linux Device Drivers by Alessandro Rubini e.a.
-
Open Source Programming - Cathedral and the Bazaar by Eric S. Raymond
-
Computer Architecture - Computer Architecture: A Quantitative Approach by John L. Hennessy and David A. Patterson