By Jonathan Bartlett

Trying to be a programmer without understanding how a CPU works is like trying to practice medicine without learning anatomy. Sure, you can have limited success curing patients with medical advice gleaned from Google, but on the whole you’re going to be a pretty bad doctor. For those who missed out on learning assembly language, I highly recommend working through this book, even if you’ll never program in assembly again. I promise that all kind of lights will go on in your head and you’ll be a vastly better programmer.

Joel Spolsky

The author gives an introduction to Programming using Linux Assembly Language

1. Introduction

Essentially there are 3 kinds of languages:

  • Machine Language

    What the computer actually sees and deals with. Every Command is a number or a sequence of numbers.

  • Assembly Language

    The same as machine language, but every instruction is represented by a letter sequence which is easier for a human to remember. Plus some other small improvements.

  • High-Level Language

    Used to make programming easier with a more natural language. A single command is translated to multiple instructions in machine language.

2. Computer Architecture

Modern computers are based on Von Neumann architecture: A Division into two main parts - the CPU and the Memory.

Structure of Computer Memory

Can be compared to room full of post boxes. Each location has a number and each location has the same fixed size length in which a single number can be stored.

There is no difference between a program or data stored in memory except how it is used by the CPU. Both are stored and accessed the same way.

The CPU

The CPU reads an instruction from memory and executes it: fetch-execute cycle. In order to accomplish this, it has the following elements:

  • Program Counter
  • Instruction Decoder
  • Data Bus
  • General-purpose Registers
  • Arithmetic and Logic Unit

The program counter holds the address of the memory location which contains the instruction which is going to be executed next.

The CPU fetches a number from this memory location and passes it on to the instruction decoder. This figures out what the instruction means (addition, subtraction, data movement, etc.) and which locations of memory are going to be needed.

Then the data bus is used to fetch from the memory location which are needed for the calculation. The data bus can be seen on a mainboard as wires which go out from the memory to the CPU.

The CPU itself has some high-speed memory locations called registers. There are different general- and specia-purpose registers.

Now that the CPU has everything it needs, it passes the decoded instruction and the data to the arithmetic and logic unit for the actual processing. The resulta of the calculation are placed in a register or on the data bus for memory storage.

Some Terms

Each storage location in memory is referred to by a number called it’s address. The size of a single storage location is called a byte which is number between 0 and 255. On a 32-bit CPU the registers are 4 bytes long. This is called the computers word size. This fits about 4 billion values. Memory addresses are as long as the word size and therefore fit into a register.

When an address is stored in memory it is called a pointer.

Data Accessing Methods

  • immediate mode

    The data is embedded in the instruction itself.

  • register addressing mode

    The data is found in a register.

  • direct addressing mode

    The instruction is provided with an address of memory from where to load the data e.g. into a register.

  • indexed addressing mode

    The instruction is provided with an address of memory and an index register used as offset. On x86 processors you can also give a multiplier to index a whole word or larger record at a time.

  • indirect addressing mode

    The instruction is provided with a register which contains a pointer to where the data is in memory.

  • base pointer addressing mode

    Like indirect addressing mode, but including an offset

3. First Programs

The Programs demonstrated in this chapter are in 32-bit assembly language. This is what the book uses, but is not up to date. In 64-bit assembly one would use the 64-bit register %rax instead of the 32-bit %eax. And accordinly the movq instruction instead of movl.

Exercise 1

	        .section .data

        	.section .text
        	.globl _start

    _start:
	        movl $1, %eax 	# 1 is the code for the exit system call
	
         	movl $0, %ebx	# 0 is the return code

        	int $0x80	# control transfer from user space to kernel
		            	# by interrupt 0x80
			

Assembling and Linking

The source is asssembled to an object file:

$ as chap3ex1.s -o exit.o

The object file is then linked to an executable binary

$ ld exit.o -o exit

More info on object files and linking can be found here

Outline of a program

Anything starting with a period like .section .data is a pseudo operation (or assembler directive). These are not translated directly into a machine instructions but handled by the assembler differently. The .data section can contain static data like numbers or strings. The .text section contains the machine instructions.

The directive .globl _start makes _start a global symbol. Meaning it is visible to the linker instead of beeing discarded after assembly. _start is special symbol that is always needed because it marks the beginning of the program, which is needed for loading. Symbols are going to be replaced by something else by either the assembler or the linker. They are used to reference locations of data or program by name.

The line _start: is a label. A label is a symbol with a colon. When the program is assembled, each data value and each instruction is getting an address. The label tells the assembler that the symbols value is the address of the element following the label.

The instruction movl $1, %eax writes the number 1 into general purpose register %eax. A number liteal is indicated by the $-sign which leads to immediate mode data access.

The line int $0x80 triggers an interrupt which interrupts the program flow and transfers the cpu control to the interrupt handler which was set up by the linux kernel. This way a system call is done.

Exercise 2

The source can be found on GitHub

Some "ifs" like jmp, je or jle are used for flow control.

A list of .long values is needed in memory, so is created in the .section .data under the label data_items:. Now we can refer to the memory address of the lists first item by using the symbol with the same name.

In our case, using 32-bit registers, a long value uses 4 bytes. For 64-bit see: LP64 data model.

Only 3 variables are needed, so everything can stay in registers and must not yet be moved to memory.

The destination index register %edi is incremented in the loop with incl to walk through the list.

When two values are compared with the cmpl instruction, the result is stored in the status register %eflags.

indexed addressing mode

The value currently pointed at by the index is then copied from the data_items list to the %eax register: movl data_items(,%edi,4), %eax

using the schema: movl ADDRESS(OFFSET,%INDEX_REGISTER,MULTIPLIER), %DESTINATION_REGISTER

leading to FINAL_ADDRESS = ADDRESS + OFFSET + ( MULTIPLIER * INDEX )

indirect addressing mode

Though not used in the current program it would work like movl (%eax), %ebx. Where %eax contains the address from where data is moved to %ebx.

base pointer addressing mode

Similar to indirect addressing mode, but adding a constant value to the address like: movl 4(%eax), %ebx going 4 bytes into the record at which %eax points.

immediate addressing mode

Is used with literals like $0: movl $0, %eax

4. Functions

Are used to break programs apart into pieces.

The parameters plus the expectation what to do is called the functions interface.

A program is composed of many functions. There are often high-level functions which call low-level helper functions. On the lowest level there are the so called primitive functions (or just primitives) on which everything else is build on top of. Typically the primitives are provided by the operating system.

How Functions Work

  • function name

    The name is a symbol for the address of the function. It is created by a label before the functions code.

  • function parameters

    These are data items which are passed to the function in order to do something with it.

  • local variables

    Temporary storage locations which are needed for processing. After return they are thrown away.

  • static variables

    storage locations which are not thrown away after return, but reused every thime the function is called. They cannot be accessed by other parts of the program. Static variables should be avoided because they can cause problems.

  • global variables

    Data storage that is outside of the function, but used by it for processing. This often considered bad practice but a good use case are configuration values.

  • return address

    A special Parameter0 that is not used for processing. It tells the function where the code execution should continou after the function has finished. The call instruction automatically passes the the return address to the function. ret is used for jumping back to this address for execution.

  • return value

    The data which is the result of the processing is passed back to the main program. Most languages have only a single return value.

How variables are stored and parameters and return values are passed is the languages calling convention.

In assembly language any calling convention can be used, but on linux platforms the most popular is the one from the C programming language.

Assembly Functions using the C Calling Convention

A region of memory is used as the stack. The stack starts at the highest address of memory pushl pushes a value on top of the the stack, but which is then at the bottom of the stacks memory, which grows up side down in memory. popl removes the top element from the stack and places it in a register or memory location of your choice.

The stack register %esp does always hold a pointer to the current top of the stack. So pushl will subtract 4 from %esp in order to point to the word in the memory segment below the former top. popl does add 4 to %esp accordingly.

Before calling a function, the program pushes the parameters onto stack in reverse order. Then the call instruction is used. This does two things:

  1. The call instruction pushes the address of its following instruction onto the stack as return address.
  2. Then it changes the instruction pointer %eip to point to the address of functions first instruction.

Then the function begins. At this time the stack looks like this:

Parameter #N
...
Parameter2
Parameter1
Return Address <--- (%esp) //Parameter0

At the beginning of the function two steps are done:

  1. The function needs to push the current value of the base pointer register %ebp onto the stack as "old base pointer" by doing pushl %ebp

  2. The current top of the stack ("old base pointer") is the memory address that is from now on used as the functions own "base" and therefore is the %ebp base pointer register overwritten with this value: `movl %esp, %ebp.

The base pointer helps the function to find its parameters on the stack below and the local variables above, which we will see soon.

It is reference to well known position in the functions stack frame, which contains all the functions stack variables like parameters, local variables and return address.

After these two steps the stack looks like this:

Parameter #N    <--- (N * 4) + 4
...
Parameter2      <--- 12(%ebp)
Parameter1      <--- 8(%ebp)
Return Address  <--- 4(%ebp)
Old %ebp        <--- permanently (%ebp) and currently (%esp)

Now we assume that the function will need two local variables, each of the size of one word. So in order to reserve the space on the stack we move the stack pointer two positions higher. (which means actually lower in terms of memory address): subl $8, %esp

That the variables are located on the stack frame has the effect that they will disappear together with the stack fram when the function returns. So they are called local variables.

After allocating the two words on the stack it looks like this:

Parameter #N     <--- (N * 4) + 4
...
Parameter2       <--- 12(%ebp)
Parameter1       <--- 8(%ebp)
Return Address   <--- 4(%ebp)
Old %ebp         <--- (%ebp)
Local Variable 1 <--- -4(%ebp)
Local Variable 2 <--- permanently -8(%ebp) and currently (%esp)

When the function is finished with its processing and about to return it does the following:

  1. storing the return value in %eax

  2. reseting the stack to was before the call

  3. returning the execution at the code location where the function was called from

movl %ebp, %esp   # setting the stack (top) pointer to the base position
popl %ebp         # getting the old base position into base pointer register
ret               # return

Control is now back to the calling code, which is in duty to remove the function parameters from the stack if they are not needed anymore

Destruction of Registers

When making a function call, one should asume that the values which are currently in registers are wiped out. Only the base pointer register %ebp is preserved and in the Linux C calling convention also %ebx, %edi and %esi. %eax can always be overwritten with the return value.

The C calling convention is also known as ABI: Application Binary Interface.

Details can be fount at the linux foundation’s refspecs. E.g. System V ABI Edition 4.1. from 1997.

Recursive Functions

An example for a recursive function is the factorial like 4! = 4 * 3 * 2 * 1. The factorial of 4 is 4 times the factorial of 3. So the definition of the factorial function does include the function itself but of n-1.

In order to terminate the recursion a base case is needed: The factorial of 1 is 1.

This works fine because local variables are located on the stack frame and each function call gets its own stack frame.

5. Dealing with Files

Data which is stored on disk is called persistent

The UNIX File Concept

Every File, no matter which program created it, is a sequential stream of bytes.

When program wants to access a file, is has to open it by name. This makes the operating system give you a file descriptor, with which it can be refered to later. When the reading/writing is done, the file is closed, which makes the file descriptor obsolete.

We work with files the following way:

  1. make the open system call, with the parameters:
    • file name, null terminated
    • mode (e.g. 0 for read only)
    • permission set, like 0666
  2. as return value we get the file descriptor

  3. now a read system call can be made with the following parameters:
    • file descriptor
    • buffer address
    • buffer size

    return value is either the number of characters read or an negative error code

    a write system call takes the same parameters as read and the buffer should be filled with data to write

  4. the close system call takes only the file descriptor as parameter, which is then invalid afterwards

Buffers

A buffer is continous block of bytes which is used for bulk data transfer. This fixed size area of memory is used to temporarily store the bytes/characters which the operating system did read from the disk e.g.

In order to reserve memory for the buffer, the section .bss can be used in addition to .text and .data. This is a way to reserve uninitilized storage without taking space in the executable:

.lcomm my_buffer, 500

This creates the symbol my_buffer which references a 500-byte location in memory.

To store the address of my_buffer into a register one uses:

movl $my_buffer %ecx

without the dollar sign, direct addressing mode would be used, reading the content of the memory location at my_buffer. But here we use immediate mode addressing, loading the value of my_buffer itself.

Special Files

Linux programms already have 3 special file descriptors open when they begin:

  • STDIN Standing for standard input. This read-only file can e.g. represent the keyboard. It as the file descriptor 0.

  • STDOUT Standing for standard output. This write-only file can e.g. represent the console display. It as the file descriptor 1.

  • STDERR Standing for standard error. Is used to split error messages from the normal program output at standard out. It as the file descriptor 2.

UNIX-based operating systems treat all input/output systems as files, like network connections or even audio devices. Interprocess communication with with pipes are also special files.

Defining Constants

.equ LINUX_SYSCALL, 0x80 can be used like: int $LINUX_SYSCALL

7. Developing Robust Programs

Robust programs handle errors and do not crash, no matter what the user does.

User Testing

One needs to test what happens if the user enters data in an unexpected way. E.g. entering a letter instead of number. A good way to ensure this is to let other programmers and users test the program. Especially non-programmers who don’t know what the computer expects are more likely to find bugs.

Data Testing

When testing functions wiht input data, most important is testing the corner cases.

For numeric data one always needs to test:

  • The number 0

  • The number 1

  • A number within the expected range

  • A number outside the expected range

  • The first number in the expected range

  • The last number in the expected range

  • The first number below the expected range

  • The first number above the expected range

For lists one needs to test 0 items, 1 item, a lot of items, etc.

One should also test data around any turning points. E.g. if an age under or over 30 is handled differently, then 29, 30 and 31 need to be tested at least.

In C there is the assert macro which can be turned off at compile time once the code is stable, so the binary is faster.

Module Testing

A program must not only be tested as a whole, but one needs to test also the individual pieces. In order to test functions effectively with different input data, one needs to develop additional functions whose sole purpose is to call functions for testing. These are called drivers (not to be confused with hardware drivers).

If the code beeing testing is making calls to functions which do not exist yet, one can write a small function called a stub which simply returns a value for proceeding.

Having Error Codes

It is good to have a unique error code for every possible contingency. This is often everything the users has when reporting an error. This should not be confused with troubleshooting why the error happended.

Recovery Points

To simplify error handling is good to break the program apart into disctinct units, where each unit fails and is recovered as a whole. E.g. reading a configuration file could be a unit. If there is failure at any point (opening, reading, parsing) then the program would simply skip to the recovery point for that problem. The recovery function can then e.g. close the file if it is still left open.

A good fall-back mechanism for saving effort or as a last resort is to have a single function which does simply print the error code and message and then exit the program.

8. Sharing Functions with Code Libraries

Programming can be made easier by having a set of functions on the system that are shared among any program that wishes to use them. Having a repository of shared code, a program can point to a dynamic library which contains the function needed.

If a bug is found in a library function it only has to be fixed once and all programs using it are updated automatically. But this can be also a drawback as it is possible that a program inadvertantly relies on a bug in shared function.

This and other drawbacks lead to term "DLL hell". But is generally assumed that the advantages outweight the drawbacks.

The shared code file are called: shared libraries, dynamic libraries, shared objects, dynamic-link libraries, DLLs, or .so files.

This book deals with dynamic libraries but not shared libraries which are using position-independent code and are linked differently.

The programs from previous chapters were statically-linked executables, which contained all of the functionality.

The helloworld program from this chapter is dynamically-linked, which means that some parts of the program code are contained in external libraries:

$ ld -dynamic-linker /lib/ld-linux.so.2 -o hello hello-lib.o -lc

when the program is started, the file /lib/ld-linux.so.2 is loaded first. This dynamic linker sees that the library libc.so is needed and checks the directories from /etc/ld.so.conf and LD_LIBRARY_PATH. The library is then loaded into the programs memory and instances of e.g. printf in the program are replaced with the actual location in the library.

To check which binaries an executable uses do:

$ ldd ./hello
# linux-gate.so.1 (0xf772a000)
# libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xf7557000)
# /lib/ld-linux.so.2 (0x5661f000)

The symbols from a library can be printed:

$ objdump -R /lib/x86_64-linux-gnu/libz.so.1 | tail
000000000021a148 R_X86_64_JUMP_SLOT  open
000000000021a150 R_X86_64_JUMP_SLOT  inflateInit2_
000000000021a158 R_X86_64_JUMP_SLOT  inflateReset
000000000021a160 R_X86_64_JUMP_SLOT  lseek64
000000000021a168 R_X86_64_JUMP_SLOT  deflateResetKeep
000000000021a170 R_X86_64_JUMP_SLOT  inflateReset2
000000000021a178 R_X86_64_JUMP_SLOT  strerror
000000000021a180 R_X86_64_JUMP_SLOT  __cxa_finalize

You can override symbols in the stock libraries by creating a library with the same symbols and specifying the library in LD_PRELOAD:

$ LD_PRELOAD=/path/to/my/malloc.so /bin/ls

9. Intermediate Memory Topics

A computer looks at memory as a long sequence of numbered storage locations. Each are the same for programs and data and the computer does not know which are which. It only knows where to start executing.

Review of some Terms:

  • Byte The size of a memory storage location. On x86 it can hold a number between 0 and 255.

  • Word The size of a normal register. On 32-bit x86 it is 4 bytes long. Most computer operations handle a word at a time.

  • Address A number that refers to byte of memory. If data spans several bytes, it’s address is the address of the first byte. Normally one does not type an address directly, but one let it the assembler by using labels and symbols. The assembler will replace a symbol with its address.

  • Pointer A pointer is a register or memory word whose value is an address. E.g. %ebp is used as an address.

   
Environment Variables <- 0xbfffffff
Arg #n  
 
Arg #2  
Arg #1  
Program name  
# of arguments <- %esp

Unmapped Memory

 
Program Code and Data <- 0x08048000

The last accessible memory address is called the break. It is the boundary between the unmapped memory and the code and data segment or rather the heap.

Memory Mapping

or Every Memory Address is a Lie.

A program runnning on Linux can only access virtual memory. The physical memory refers to the actual RAM chips inside the computer. Virtual Memory is the way a program sees memory. Linux loads a program into a physical memory space large enough to fit and tell the CPU to pretend that this memory is actually at the address 0x08048000. The process of assinging virtual addresses to physical is called mapping.

Initially the area between .bss and stack is not mapped. The beginning of the area of unmapped memory is called the break

Virtual Memory can also be mapped to disk using the swap.

In order to make memory mapping more efficient, it is seperated into pages of 4096 bytes.

The amount of memory that a program really takes in physical memory is called the resident set size. The program top shows the resident size next to the virtual size.

Getting More Memory

If a program needs more memory, it is possible to move up the break point in order to create the heap segment, and move up more to increase the heap size. Inside the heap one uses the concept of allocate and deallocate so memory can be reused. Otherwise you could run out of memory, by just moving break more and more without freeing parts of memory which are not needed anymore. In C this is called malloc and free.

Performance Issues

The malloc code in the example always iterates through all memory segment, which means it runs in linear time. Better would be an algorithm that runs in constant time.

Another performance problem is, that the brk system call is used very often. When the CPU has to switch from user mode to kernel mode, this is called a context switch, which is slow.

10. Counting like a Computer

Humans count in decimal (base ten), but computers in binary (base two). Each digit in binary numer is called a bit, which stands for binary digit.

Binary Operations

George Boole was the first who studied boolean operators:

  • AND
  • OR
  • NOT
  • XOR

If you XOR a number with itself, all bits turn to zero. Doing this on a register is faster then loading.

So movl $0, %eax is often replace by xorl %eax, %eax.

Binary Numbers are also used for representing other data. In the following Example this is what people like:

Bob  
Food 1
Techno 1
Designer Clothes 0
Football 1
Alice  
Food 1
Techno 0
Designer Clothes 1
Football 1

Now if you want to know which things both Bob and Alice like, you would use the AND operation

1101 AND
1011
-------
1001

For finding out what they disagree on, XOR is used:

1101 XOR
1011
-------
0110

Additionaly there two binary operators which are not boolean: shift and rotate.

A left shift moves each digit one space to the left, puts a zero in the first spot, and chops off the furthest digit to the left. A left rotate does the same thing, but takes the furthest digit to the left and puts in the first spot.

Shift left 10010111 = 00101110

Rotate left 10010111 = 00101111

In order to output special bits, shifting and masking is needed. Masking is done by doing an AND with a number that has the bits we are interested in set to 1.

When a number represents a set of options for a function, the individual true/false elements are called flags. Here are some flags for the linux open system call.

O_WRONLY = 0b00000001

O_RDWR = 0b00000010

O_CREAT = 0b01000000

You can OR these flags together in the needed combination. So if you need O_WRONLY and O_CREAT it would be 0101

Program Status Register

The CPU has a special register called the program status register. It holds information about what happened in computation. There is the carry flag to indicate if a register was overflowed. It also hold the result of cmpl instruction, which is used by conditional jumps like jge or jne.

Floating-point Numbers

The computer stores decimal numbers in two parts: exponent and mantissa. For example the 12345.2 can be represented as as 1.23452 * 10^4. The mantissa is 1.23452 and the exponent is 4 with a base of 10. But computers use a base of 2. All numbers are stored as X.XXXXX * 2^XXXX. The number 1 is stored as 1.00000 * 2^0. Separating the mantissa and the exponent into tow different values is called a floating point representation.

If a floating point value is big enough, adding 1.0 to it does not change it’s value. On x86 platforms, a 32-bit floating-point number cannot have 1.0 added to it past 16777216.0, because it is no longer significant. The number no longer changes when 1.0 added to it. So if there is a multiplication followed by an addition, it may give a different result than if the addition is performed first.

On most computers floatin point arithmetic takes much longer than integer arithmetic.

Negative Numers

The first intuition would be to use one bit as the sign. But this has several disadvantages. E.g. there would be a positive and a negative zero. Instead the two’s complement representation is used.

Signed values have a different range. For example a byte can normally have values up to 255. But a signed byte can store values from -128 to 127.

If a 32 bit value in two’s complement would be needed to extend to 64 bit, it cannot simply be done by 0-padding. instead sign extension must be used. There is also a sign-preserving shift-right, sarl and a shift right which does not preserve the sign, shrl.

Octal and Hexadecimal Numbers

Octal digits represent a grouping of three bits. linux permissions are done using octal numbers.

In the assembler octal values are indicated by a leading zero, like: 0664.

hexadecimal is base 16. One digit represents 4 bits which is half a byte. So two digits represent one byte.

In the assembler hex values are indicated by leading 0x like: 0xFFFF.

Order of Bytes in a Word

The x86 processor is a little-endian processor which meanst that it stores the “little end”, or least significant byte first in memory. So if there is a word in a register: 0x5d 23 ef ee It would be written to memory in reverse order: 0xee ef 23 5d. But the bits within the bytes are ordered normally. Other processors are big-endian processors.

11. High-Level Languages

For “real-world” programming, assembly language is to cumbersome most of the time. So many computer languages have been invented to make programming easier.

It is useful to know a wide variety of languages with different concepts. The more languages you know, the easier it is to pick up a new language.

Compiled and Interpreted Languages

Many languages are compiled languages. A statement is translated into one or many machine instructions.

Other languages are interpreted. These require the user to run a program called interpreter which in turn runs the given program.

There is also a class of hybrid languages which partially compile a program to byte code before execution. So the interpreter can work much faster.

High-level languages are oriented around the programmer instead of araound the machine. This can include the following features:

  • Grouping multiple operations into a single expression

  • Using “big values” which are more conceptual than the 4-byte words that computers normally deal with. E.g. having strings as a single value.

  • Better flow control than just jumps

  • Checking types of assingments

  • Memory handled automatically

  • Resembling the problem domain rather than the computer hardware

Your First C Program

#include <stdio.h>

int main(int argc, char **argv)
{
  /* print to standard out */
  puts("Hello World\n");

  return 0;
}

*#include * tells the **preprocessor** to paste the file stdio.h into the program.

Then there is the declaration of the function main, including it’s name, arguments and return type. One does not have to worry where the arguments are on the stack - the C compiler takes care of that and also of loading values into and out of registers.

puts(“Hello World\n”); is a function call. In assembly one had to push the arguments of the function onto the stack before calling the function. But C takes care of this complexity and also of defining storage for the string.

Finally the main function returns the number 0 which is automatically used as exit code for the program.

In assembly language a program is tied to both the operating system and the hardware platform, but a high-level program can be compiled for different target platforms.

12. Optimization

The process of making an application run more effectively. It can be optimized for speed, memory usage, disk usage or other things.

When to Optimize

It is better to not optimize at all than to optimize too soon. This is because optimization generally makes the code less clear and more complex which increases the cost of maintenance.

Where to Optimize

A profiler is a good tool for finding out in which functions the program spends the most time.

In general there are two categories of optimization - local optimizations and global optimizations. Local means making a specific peace of code run faster while global optimiziations are of structural nature.

Local Optimizations

Some of these may be done already by a compilers optimizer

  • Precomputing Calculations

  • Remembering Calculation Results

  • Locality of Reference

  • Register Usage

  • Inline Functions

  • Optimized Instructions

  • Different Memory Addressing Modes

  • Word-aligned Data in Memory

Global Optimizations

These are about the structure of a program. It involves achieving the following properties:

Parallelization

This means that an algorithm can effectively be split among multiply processes. The more parallelizable an application is, the better it can take advantage of multiprocessors and clustered computer configurations.

Statelessness

Stateless functions and programs are those that rely entirely on the data explicitly passed to them for functioning. This makes them parallelizable and often benefit from caching.

13. Moving On from Here

Congratulations on getting this far!

Even if you never use assembly language again, you have gained a valuable perspective and mental framework for understanding the rest of computer science.

There are essentially three methods to learn more:

  • From the Bottom Up - This is how this book teaches. It starts with low level programming and works towards more generalized teaching.

  • From the Top Down - This focuses on what you want to do with the computer and teaches you how to break it down more and more.

  • From the Middle - This is characterized by books which teach a specific programming language or API. These are not as concerned with concepts but with specifics.

A good programmer takes all three approaches into account and is constantly learning and pushing his limits.

From the Bottom Up

From the Top Down

From the Middle Out

Specialized Topics