Labels

2008/08/17

[0x02]. Notes on Assembly - Acquainting oneself with the Memory

The biggest part of Assembly Language is all about the CPU talking to the main memory. I'm going to dive deep into this subject. Let's start with the memory management so we can smoothly move over to the CPU and understand what different CPU registers were designed for.

For a start read How Computers Work: Processor and Main Memory by Roger Young to understand in more detail how memory addressing and memory IO operations are performed.

Below a list of essential terms connected to memory and its management.

Random Access Memory (RAM)
  • main operational memory in PC computers. Its characteristic consists in the way the data is accessed (read from or written to), namely just by using electric impulses. (This is very different from other storage mediums, e.g. magnetic tape, where reading data requires mechanical movement of the tape what takes very long and the time of reading some particular data depends on data's physical location on the medium.)
  • Running programs and their data are read into RAM at execution time.
x86 Processor Modes
Memory is always accessed under a strong supervision of the processor, if not by the CPU itself, hence the CPU controls what mode the memory is accessed in.
  • Real mode - there original CPU mode introduced with 286 machines. It has a 20bit address space, thus allowing to address 2^20bytes (=1MiB) of memory only. Segments in real more are always 64KiB in size.

  • Protected mode - due to compatibility reasons, all x86 CPUs start in Real mode (so they can support archaic operating systems like DOS), and can be immediately switched to Protected mode by setting appropriate flags in the registers. Protected mode can enrich the system into additional features, like
    • the use of virtual memory,
    • 8086 virtual mode,
    • privilege levels,
    • multitasking
    • and others...
    Segment sizes can vary.

  • Unreal mode - this mode breaks the 20-bit addressing limit that exists in real mode and allows to address up to 4GiB of Memory

  • Long - this mode is available on 64bit processors only. It allows 64bit applications to run 64bit, at the same time 16 and 32 bit apps are switched to compatibility mode and can be executed without problems.
Virtual Memory
On modern operating systems hundreds of processes run at the same time. If you sum up the amount of memory they use at any given time it would exceed the physical RAM amount. This is possible thanks to virtual memory, a technique that tricks running programs into thinking, that they have more RAM memory at disposal than there is factually available. It is done by dumping the memory space of inactive processes into secondary storage. This is called paging. Moreover, Virtual Memory enables Operating Systems to protect and manage memory the way an Operating System is programmed to.

Paging
During this process inactive areas of real memory are dumped onto the secondary storage and used re-read back into RAM when a program calls them.

Segmentation
A relative way (not an absolute way) to address physical memory by a usage of the Segment:Offset notation. Best explained in this article by Daniel Sedory:

Memory Management Unit
a hardware part of the CPU that controls how the CPU accesses memory. Its 4 main functions are:
  • translating virtual-to-physical addresses;
  • memory protection;
  • cache control;
  • bus arbitration;
... so roughly:
  1. When a x86 computer is turned on, it starts in real mode and can only address the first mebibyte (1024*1024bytes=KiB^2) of RAM. This is more than enough to bootstrap an Operating System.
  2. The Operating System can switch the CPU from real into protected mode.
  3. When in protected mode, the system can take advantage of e.g. memory protection and virtual memory to manage memory resources. It does that under strong supervision of the Memory Management Unit.
http://en.wikipedia.org/wiki/X86
http://en.wikipedia.org/wiki/Memory_management_unit
http://en.wikipedia.org/wiki/Virtual_memory

2008/08/05

[0x01]. Notes on Assembly - AT&T vs Intel syntax

There are two main syntaxes for Assembly Language: AT&T and Intel. The former was invented by AT&T Labs in 1960's and is used on all UNIX-based systems, the original intention was to preserve portability and compatibility between different UNIX flavors. The latter was invented by Intel and is commonly used in MS systems. I have a bit of a chicken'n'egg problem here, as I have no idea who ripped off the most part from the other party but it's not relevant nor important here... The main differences between the syntax are as follows:

IntelAT&T
Mnemonics are case-insensitiveMnemonics are lowercase
case insensitive registers in form of AH, ax, Eax
lowercase registers are preceded with % (percent) sign, as in %eax, %ax
Memory operands are prefixed with size accordingly:
  • byte ptr ADDR - 8 bits
  • word ptr ADDR - 16 bits
  • dword ptr ADDR - 32 bits
  • qword ptr ADDR - 64 bits

machine instructions end with one of three possible suffixes:
  • b - for byte
  • w - for word
  • l - for long word
  • q - for quadruple

e.g. movl, movw, movb
The programmer first specifies the destination and then the source operand.
"mov bx, ax" moves ax to bx
You first specify the source and then the destination operand.
"movw %ax, %bx" will move %ax to %bx.
Immediate operands, like numbers or memory addresses, are entered with "h", "b", or no suffix at all for hex, binary or decimal digits respectivelyImmediate operands are preceded by $ (dollar sign).
Comment is denoted by a ; (colon)A comment is denoted by a # (hash)
Jump and call operands are undelimitedJumps and calls are prefixed by an * (asterisk)

...so that the same C program (main.c), that only returns 0 to the environment would look like this:

AT&T
C
Intel

.text
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl $0, %eax
leave
ret
.subsections_via_symbols


int main(void){
return 0;
}


[SECTION .text]
_main:
push ebp
mov ebp, esp
sub esp, 8
mov eax, 0
leave
ret
.subsections_via_symbols



Sources:
http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/gnu-assembler/i386-syntax.html
http://en.wikipedia.org/wiki/Unix