Segmentation Registers Use

Segmentation registers use

Expanding on Benoit's answer to question 3...

The division of programs into logical parts such as code, constant data, modifiable data and stack is done by different agents at different points in time.

First, your compiler (and linker) creates executable files where this division is specified. If you look at a number of executable file formats (PE, ELF, etc), you'll see that they support some kind of sections or segments or whatever you want to call it. Besides addresses and sizes and locations within the file, those sections bear attributes telling the OS the purpose of these sections, e.g. this section contains code (and here's the entry point), this - initialized constant data, that - uninitialized data (typically not taking space in the file), here's something about the stack, over there is the list of dependencies (e.g. DLLs), etc.

Next, when the OS starts executing the program, it parses the file to see how much memory the program needs, where and what memory protection is needed for every section. The latter is commonly done via page tables. The code pages are marked as executable and read-only, the constant data pages are marked as not executable and read-only, other data pages (including those of the stack) are marked as not executable and read-write. This is how it ought to be normally.

Often times programs need read-write and, at the same time, executable regions for dynamically generated code or just to be able to modify the existing code. The combined RWX access can be either specified in the executable file or requested at run time.

There can be other special pages such as guard pages for dynamic stack expansion, they're placed next to the stack pages. For example, your program starts with enough pages allocated for a 64KB stack and then when the program tries to access beyond that point, the OS intercepts access to those guard pages, allocates more pages for the stack (up to the maximum supported size) and moves the guard pages further. These pages don't need to be specified in the executable file, the OS can handle them on its own. The file should only specify the stack size(s) and perhaps the location.

If there's no hardware or code in the OS to distinguish code memory from data memory or to enforce memory access rights, the division is very formal. 16-bit real-mode DOS programs (COM and EXE) didn't have code, data and stack segments marked in some special way. COM programs had everything in one common 64KB segment and they started with IP=0x100 and SP=0xFFxx and the order of code and data could be arbitrary inside, they could intertwine practically freely. DOS EXE files only specified the starting CS:IP and SS:SP locations and beyond that the code, data and stack segments were indistinguishable to DOS. All it needed to do was load the file, perform relocation (for EXEs only), set up the PSP (Program Segment Prefix, containing the command line parameter and some other control info), load SS:SP and CS:IP. It could not protect memory because memory protection isn't available in the real address mode, and so the 16-bit DOS executable formats were very simple.

Trouble understanding stack segment register

In x86, the stack is a LastInFirstOut (LIFO) structure where the SS segment register marks the start and the stackpointer SP points directly above the free space on the stack. In memory, the free space is lower than the used space because the stack grows downward. It is this downward expansion that makes talking about the stackpointer as "the top of the stack" confusing because it is counterintuitive for the top to be at the bottom.

In x86-16, the stack can occupy at most 64KB or 65536 bytes. The SP register which is a 16-bit register can never address anything outside of this stack segment.

Now if your program initialization has these instructions:

mov  ax, 5000h
mov  ss, ax
mov  sp, 7000h

you are telling that the stack is going to be a chunk of 28672 (7000h) bytes starting at linear address 0005'0000h and ending at linear address 0005'6FFFh. At this point in your program you can say that "the stack is empty". And it would be a severe programming error to eg. pop ax while the SS:SP register pair has 5000h:7000h.

| 5000h (SS)                                                     | 6000h
|                                                                |
|<--------------------------------- 64KB ----------------------->|
|      This is the stack     |   This is not part of the stack   |
|                                                                |
|                            ^                                   |
| 0                          | SP=7000h                    65535 |

In order to place a new item on the stack (push / call / int), the stackpointer SP is lowered and then the new item is written at that address. For removal (pop / ret / iret) the item where SP points at is read and then SP is raised.

Let's see that in action:

  mov  cx, 6144
More:
  push cx
  loop More

Registerwise, only the stackpointer SP has changed.

| 5000h (SS)                                                     | 6000h
|                                                                |
|<--------------------------------- 64KB ----------------------->|
|      This is the stack     |   This is not part of the stack   |
|                xxxxxxxxxxxx                                    |
|                ^                                               |
| 0              | SP=4000h                                65535 |

Now removing two thirds of it:

  mov  cx, 4096
More:
  pop  ax
  loop More

Once again, registerwise only the stackpointer SP has changed.

| 5000h (SS)                                                     | 6000h
|                                                                |
|<--------------------------------- 64KB ----------------------->|
|      This is the stack     |   This is not part of the stack   |
|                        xxxx                                    |
|                        ^                                       |
| 0                      | SP=6000h                        65535 |

We can read/write the stack memory just like any other memory. However because of the segmented nature of memory, ordinarily we would need to use the SS: segment override:

mov  ax, [ss:6000h]

mov  bx, 6000h
mov  ax, [ss:bx]

Or we could just make DS refer to the stack segment:

mov  cx, 5000h
mov  ds, cx 

mov  ax, [6000h]

mov  bx, 6000h
mov  ax, [bx]

Here begins the strange case of BP. The designer has made it so that all memory referencing that relies on the BP register, will be relative to the stack segment by default. We can address data in the stack segment without having to specify a segment override or manipulating the DS segment register if we load the offset address in BP:

mov  bp, 6000h
mov  ax, [bp]

Other than this 'stickyness' to the stack segment, there's nothing special about BP.

How is it being specified which segment register should be used (x86)

The default segment is DS; that’s what the processor uses in your example.

In 64-bit mode, it doesn’t matter what segment is used, because the segment base is always 0 and permissions are ignored. (There are one or two minor differences, which I won’t go into here.)

In 32-bit mode, most OSes set the base of all segments to 0, and sets their permissions the same, so again it doesn’t matter.

In code where it does matter (especially 16 bit code that needs to use more than 64 KB of memory), the code must use far pointers, which include the segment selector as part of the pointer value. The software must load the selector into a segment register in order to perform the memory access.

Why 64 bit mode ( Long mode ) doesn't use segment registers?

In a manner of speaking, when you perform array ("indexed") type addressing with general registers, you are doing essentially the same thing as the segment registers. In the bad old days of 8-bit and 16-bit programming, many applications required much more data (and occasionally more code) than a 16-bit address could reach.

So many CPUs solved this by having a larger addressable memory space than the 16-bit addresses could reach, and made those regions of memory accessible by means of "segment registers" or similar. A program would set the address in a "segment register" to an address above the (65536 byte) 16-bit address space. Then when certain instructions were executed, they would add the instruction specified address to the appropriate (or specified) "segment register" to read data (or code) beyond the range of 16-bit addresses or 16-bit offsets.

However, the situation today is opposite!

How so? Today, a 64-bit CPU can address more than (not less than) all addressable memory space. Most 64-bit CPUs today can address something like 40-bits to 48-bits of physical memory. True, there is nothing to stop them from addressing a full 64-bit memory space, but they know nobody (but the NSA) can afford that much RAM, and besides, hanging that much RAM on the CPU bus would load it down with capacitance, and slow down ALL memory accesses outside the CPU chip.

Therefore, the current generation of mainstream CPUs can address 40-bits to 48-bits of memory space, which is more than 99.999% of the market would ever imagine reaching. Note that 32-bits is 4-gigabytes (which some people do exceed today by a factor of 2, 4, 8, 16), but even 40-bits can address 256 * 4GB == 1024GB == 1TB. While 64GB of RAM is reasonable today, and perhaps even 256GB in extreme cases, 1024GB just isn't necessary except for perhaps 0.001% of applications, and is unaffordable to boot.

And if you are in that 0.001% category, just buy one of the CPUs that address 48-bits of physical memory, and you're talking 256TB... which is currently impractical because it would load down the memory bus with vastly too much capacitance (maybe even to the point the memory bus would stop completely stop working).

The point is this. When your normal addressing modes with normal 64-bit registers can already address vastly more memory than your computer can contain, the conventional reason to add segment registers vanishes.

This doesn't mean people could not find useful purposes for segment registers in 64-bit CPUs. They could. Several possibilities are evident. However, with 64-bit general registers and 64-bit address space, there is nothing that general registers could not do that segment registers can. And general purpose registers have a great many purposes, which segment registers do not. Therefore, if anyone was planning to add more registers to a modern 64-bit CPU, they would add general purpose registers (which can do "anything") rather than add very limited purpose "segment registers".

And indeed they have. As you may have noticed, AMD and Intel keep adding more [sorta] general-purpose registers to the SIMD register-file, and AMD doubled the number of [truly] general purpose registers when they designed their 64-bit x86_64 CPUs (which Intel copied).

What is the purpose of segment registers in x86 protected mode?

Basicaly the purpose is the same as in real mode except the way they work is slightly different. DS in your example selects one memory descriptor in your GDT(google this term if you really wanna understand this, "Global descriptor table") which contains information like base address, end address, granularity etc. Your offset is then added to the base address, the end. If you are on windows (i bet on linux its the same) you dont generaly have to worry about these segment registers, as you said its flat model, that means there should be only one descriptor for all the memory, so if you dont change these registers it should work as if they werent even existing.

Do the x86 segment registers have special meaning/usage on modern CPUs and OSes?

In 2002, Linux kernel hacker Ingo Molnar used segmentation when implementing Exec Shield, a form of data execution prevention, on 32 bit x86 systems. This is one modern use of segmentation that I'm aware of but mostly in a get the most mileage out of hardware mechanisms you can't change way. Segmentation is not used to implement data execution prevention on x86-64 CPUs with NX support.

The FS and GS segment registers are still used on x86-64:

In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address. The FS and GS segments are exceptions. These segment registers (which hold the segment base) can be used as an additional base registers in linear address calculations. They facilitate addressing local data and certain operating system data structures.
Intel System Programming Guide, Chapter 3.2.4

On x86-64, Linux uses FS for thread local storage and GS for kernel space per cpu data. See How are the fs/gs registers used in Linux AMD64? and Detail about MSR_GS_BASE in linux x86 64

Is it possible to use segment registers to in an assigment?

Yes, it is possible, but it doesn't do what you think it does.

The instruction

mov al, es:0x48

loads a byte from address es:0x48 into al. This will only load 0x48 into al if the byte at address es:0x48 holds 0x48.

The x86 instruction set has no instruction to compute linear addresses. Even something like

lea ax, es:0x48

will only give you 0x48 (i.e. the offset into the segment) regardless of what es holds.

How are segment registers involved in memory address translation?

The usual translation goes as follow:

 Logical address   -->   GDT -->  Linear address          --> Page tables --> Physical Address
(segment:offset)                 (segment base + offset)         

\______________________________________________________/ 
                  Virtual address                                     
             (can be either logical or linear)

If running in VMX non-root mode (i.e. in a VM) and EPT is enabled then:

 Logical address   -->   GDT -->  Linear address          --> Page tables --> Guest Physical Address --> EPT --> (System) Physical Address
(segment:offset)                 (segment base + offset)         

\______________________________________________________/                      \__________________________________________________________/
                  Virtual address                                                        Physical address
             (can be either logical or linear)

If an IOMMU is present (like the umbrella technology VT-d):

Logical address   -->   GDT -->  Linear address          --> Page tables --> Guest Physical Address --> EPT --> (System) Physical Address  -->  1 or 2 level translation --> (IO) Physical address
(segment:offset)                 (segment base + offset)         

\______________________________________________________/                     \___________________________________________________________________________________________________________________/
                  Virtual address                                                        Physical address
             (can be either logical or linear)

The MMIO can even perform the translation of the Guest Virtual Address or the Guest Physical Address (one of it's purposes is to reify the Virtual address of an application to the hardware and simplify the management of the plethora of address spaces encountered during the translation).

Note As Hadi Brais pointed out, the term "Virtual address" only designates a Linear address in the Intel and AMD manuals.

I find it more useful to label both the logical and the linear addresses as virtual because they are before the page translation step.

The segment register holds a segment selector that index a segment descriptor that is used to performs the security checks and get the segment base that is summed with the offset part of the logical address.

After that, it's done.

Every address specified at the instruction level is a logical address - requiring the lookup of the segment descriptor.

To avoid reading it from memory each time the memory is accessed by an instruction, the CPU caches it - otherwise that would be a performance killer.

The OS setup the segment registers based on what it need to do but it rarely need more that four segments anyway.

The primary intent for segmentation (in PM) was to fulfil process isolation by defining non overlapping segments for each program.

A program usually need only a stack segment, a data segment and a code segment - the other three are there to avoid saving/restoring the data segment back then when a segment max size was 64KiB (read: Real mode. fs and gs were added later though).

Today OSes use a flat model where there are only two segments (code and data/stack - this is a simplification, other segments are required) encompassing the whole address space, plus OS specifics segments for things like TLS or PEB/TEB.

So six segment registers are even more than it's needed, the 8192 entries of the GDT are there in case they are (if even) needed.

Assembler use of segment register

It is unclear what part of this question I should answer.

The original error was related to a typo. DS should have been DX
Version 5.10 of MASM you use doesn't support the segment and size overrides inside the square brackets []. Code like this:
```
MOV     [WORD PTR DS:CARLOC],DX
```
Needs to be written as:
```
MOV     WORD PTR DS:[CARLOC],DX
```
The version of MASM and LINK you are using doesn't generate COM programs. You need a program that used to come with DOS called EXE2BIN that could convert certain types of EXE programs to COM. You'd have to run EXE2BIN like this:
```
EXE2BIN progname.exe progname.com
```
The version of MASM doesn't support the simplified segment directives .MODEL, .CODE, .DATA, and .STACK that I am aware of so they need to be removed.

Rather than use EXE2BIN to convert from an EXE to a COM program you can modify the code to run as an EXE program. Remove the lines:

    .MODEL  TINY
    .CODE
    .ORG 100h

Create a STACK segment with something like:

STACK SEGMENT STACK
    db 512 DUP(?)
STACK ENDS

An EXE program needs to initialize the DS (and ES if necessary) early on at the program start. This is unlike COM programs where CS=DS=ES=SS and no such initialization is necessary. You'd add these lines to initialize DS:

    MOV     AX, CODE                ; Initialize the Code Segment
    MOV     DS, AX

You placed all your data in the CODE segment so you need to initialize DS to be the same as CODE.

The final version of the program that should run as an EXE is:

        TITLE   FormulaONE TURBO (256 byte game)

STACK SEGMENT STACK
    db 512 DUP(?)
STACK ENDS

CODE    SEGMENT BYTE PUBLIC 'CODE'
        ASSUME  CS:CODE,DS:CODE

;--------------------------------------------------------------------------
;                       ACTUAL PROGRAM BEGINS HERE
;--------------------------------------------------------------------------
START:
        MOV     AX, CODE                ; Initialize the Code Segment
        MOV     DS, AX
        MOV     BP,AX                   ; Reset score to 0 (=MOV BP,0)
        MOV     AH,06H                  ; Clear Screen and home the cursor
        CALL    SCROLL
;--------------------------------------------------------------------------
;                             MAIN GAME LOOP
;--------------------------------------------------------------------------
GAME:
        MOV     DX,1629H                ; Load CAR loc (LINE 16H, COL 29H)
CARLOC  EQU     $-2                     ; Self modifying code (CAR loc)

        CALL    MOVEIT                  ; Move cursor to DH,DL (car loc)
;--------------------------------------------------------------------------
;  Erase the car at old screen location
;--------------------------------------------------------------------------
        MOV     AL,20H                  ; Print 5 spaces
        PUSH    AX
        OUT     61H,AL                  ;   Turn off speaker (AL=00100000b)
        MOV     BL,70H                                                ;^^
        MOV     CL,5
        INT     10H

        MOV     AX,0E0AH                ; Move cursor to next line
        INT     10H

        POP     AX                      ; Print 5 more spaces
        INT     10H
;--------------------------------------------------------------------------
;  Move to new car location based on shift key status
;--------------------------------------------------------------------------
        MOV     CL,40H                  ; Get shift key status
        MOV     ES,CX                   ;   (=MOV ES,0040H)
        MOV     AL,BYTE PTR ES:[0017H]

        TEST    AL,1                    ; Right SHIFT key pressed?
        JZ      TRYLFT                  ;    No...Try left shift
        INC     DX                      ;    Yes..move car right 1 space
TRYLFT: TEST    AL,2                    ; Left SHIFT key pressed?
        JZ      KEYEND                  ;    No...done checking keys
        DEC     DX                      ;    Yes..move car left 1 space
KEYEND: MOV     WORD PTR DS:[CARLOC],DX ; Save new car location in memory
                                        ; (That is the self-modifying part)
        PUSH    DX                      ; Save car location on stack also
;--------------------------------------------------------------------------
;  Scroll the track down one line
;--------------------------------------------------------------------------
        MOV     AX,0701H                ; Scroll screen down 1 line
        CALL    SCROLL                  ;   this also sets BH=0 and BL=2
                                        ;   and homes the cursor

        MOV     CL,40                   ; Print left side of track
LMARGN  EQU     $-1                     ;   (Pointer to Left Margin)
        INT     10H

        MOV     DX,CX                   ; Find right side of track position
        ADD     DX,26                   ;   (Starting track width = 26)
TRKWID  EQU     $-1                     ;   (Pointer to Track Width)
        MOV     CL,80
        SUB     CX,DX
        CALL    MOVEIT                  ; Move cursor to right side of track
        INT     10H                     ; Print grass on right side of track
;--------------------------------------------------------------------------
;  Print the score in the lower right corner of the screen
;--------------------------------------------------------------------------
        MOV     DX,184EH                ; Screen loc 77,25 bottom right
        CALL    MOVEIT                  ; Move cursor to score location
        MOV     AX,BP                   ; Move Score to AX
        MOV     CL,8                    ; Shift score right 8 bits
        SAR     AX,CL                   ; (This makes it hard to get to Z!)
        ADD     AX,0E00H+65             ; MOV AH,0Eh & Convert score to A-Z
        INT     10H                     ; Print the score on the screen
;--------------------------------------------------------------------------
;  Check for a collision
;--------------------------------------------------------------------------
        POP     DX                      ; Restore car location from stack
        CALL    MOVEIT                  ; Move cursor under left front tire
        JNZ     PCAR                    ; Hit something? Yes... Print our
                                        ;   red car and exit the game
        PUSH    DX                      ; Save left tire position to stack
        ADD     DL,4                    ; Move cursor under right front tire
        CALL    MOVEIT                  ; Check to see if we hit something
        POP     DX                      ;    Restore our car position
        JNZ     PCAR                    ; Hit something? Yes... Print our
                                        ;   red car and exit the game
        PUSH    DX                      ; Save car position to stack
;--------------------------------------------------------------------------
;  No collision, go ahead and print our car (red)
;--------------------------------------------------------------------------
        CALL    PCAR                    ; Print our red car (CX=8)
;--------------------------------------------------------------------------
;  Slow game down by waiting for 3 vertical retraces and play sound effects
;--------------------------------------------------------------------------
        MOV     CL,3                    ; CX is delay invertical retraces
DELAY:  MOV     DX,03DAH                ; Video screen port
HERE:   IN      AL,DX                   ; Get current video status
        TEST    AL,8                    ; Check vertical retrace bit
        JNE     HERE                    ; Wait for 1 full vertical retrace
HERE2:                                  ; Turn on and off speaker...
        ADD     AL,BYTE PTR DS:[005DH]  ;   (Check command line for Q)
        DEC     AX                      ;   (which is for Quiet mode.)
        OUT     61H,AL                  ; while waiting for screen refresh
        IN      AL,DX
        TEST    AL,8
        JE      HERE2
        LOOP    DELAY                   ; Go wait for another until CX=0
;--------------------------------------------------------------------------
;  Keep track of our current score
;--------------------------------------------------------------------------
        INC     BP                      ; Count lines printed so far (score)
;--------------------------------------------------------------------------
;  Adjust size and placement of track
;--------------------------------------------------------------------------
        POP     DX                      ; Restore our car position fm stack
        MOV     AX,BP                   ; TEST AL=2 bytes, TEST BP=4 bytes

        TEST    AL,255                  ; Make track smaller each 256 lines
        JNZ     NOCHG                   ;   Go around if not time for change
        DEC     BYTE PTR DS:[TRKWID]    ;   Change width (Self-mod code!)
NOCHG:

        TEST    AL,9                    ; Make track wavy every so often
        JNZ     ENEMY                   ;  Time to go straight
        TEST    AL,128                  ;  Left or right?
        JZ      LEFT
        ADD     BYTE PTR DS:[LMARGN],2  ; -Move right 2 spaces (Self-mod!)
;        INC     DX                      ;    Make sure that enemy car
;        INC     DX                      ;      stays ON the track. (TAI)
LEFT:   DEC     BYTE PTR DS:[LMARGN]    ; -Move left 1 space   (Self-mod!)
;        DEC     DX                      ;    Make sure that enemy car
                                        ;      stays ON the track. (TAI)
;--------------------------------------------------------------------------
;  Draw an opponent car every 15 screen lines
;--------------------------------------------------------------------------
ENEMY:                                  ; Our car position is in DX register
        MOV     DH,0                    ; Make it into enemy position using
                                        ; True Artificial Intellegence (tm)
                                        ; ^    ^          ^  TAI :-)
        TEST    AL,15                   ; Every 15 lines print enemy car
        MOV     AX,OFFSET GAME          ;    Prepare for RET below
        PUSH    AX                      ;    Use RET as a jump to GAME loop
        JNZ     GOBACK                  ; Not time yet to print enemy car
;--------------------------------------------------------------------------
;                PRINT CAR AT SCREEN LOCATION "DX"
;
; On entry:  DH points to line, DL to column, CX to car graphic offset
;                                             (8 for red, 0 for blue car)
; On exit:  The proper car will be drawn.  Also, if we used CALL PCAR to
;           get here we will be returned into the program at that point.
;           If we used JNZ PCAR to get here we will be returned to the
;           DOS prompt (the game will end).
;--------------------------------------------------------------------------
PCAR:
        PUSH    BP                      ; Save our current score counter
        MOV     BP,OFFSET CAR2          ; Point to the car graphic
        ADD     BP,CX                   ;   Add offset to proper car
        SUB     BYTE PTR [BP+4],24      ; Print stripe on hood of car
        MOV     AX,1302H                ; Print the car to the screen
        PUSH    AX                      ;    AX may change in INT 10h call
        MOV     CL,5                    ;    Graphic is 5 characters wide
        PUSH    DS                      ;    It is located in the data seg
        POP     ES                      ;      but INT 10h needs that in ES
        INT     10H                     ; Print the first line of the car
        ADD     BYTE PTR [BP+4],24      ; Print cockpit and rear stripe
        POP     AX                      ; (=MOV AX,1302H)
        INC     DH                      ; Point to next line of the screen
        INT     10H                     ; Print the second line of the car
        POP     BP                      ; Restore current score counter
GOBACK: RET
CAR2:  
        DB      0DCH,70H,0DEH,71H,0D2H,1FH,0DDH,71H     ; Blue car graphic
        DB      0DCH,70H                                ;   Common tire
        DB      0DEH,74H,0D2H,4EH,0DDH,74H,0DCH,70H     ; Red car graphic
;--------------------------------------------------------------------------
;                     SCROLL SCREEN DOWN "AL" LINES
;                      (or if AH=6, clear screen)
;
; On entry:  AH must be 7, AL must be number of lines to scroll (1)
;

Segmentation Registers Use