Why Do I Need to Use [ ] (Square Brackets) When Moving Data from Register to Memory, But Not When Other Way Around

Why do I need to use [ ] (square brackets) when moving data from register to memory, but not when other way around?

Using brackets and not using brackets are basically two different things:

A bracket means that the value in the memory at the given address is meant.

An expression without a bracket means that the address (or value) itself is meant.

Examples:

mov ecx, 1234

Means: Write the value 1234 to the register ecx

mov ecx, [1234]

Means: Write the value that is stored in memory at address 1234 to the register ecx

mov [1234], ecx

Means: Write the value stored in ecx to the memory at address 1234

mov 1234, ecx

... makes no sense (in this syntax) because 1234 is a constant number which cannot be changed.

Linux "write" syscall (INT 80h, EAX=4) requires the address of the value to be written, not the value itself!

This is why you do not use brackets at this position!

What do square brackets mean in x86 assembly?

Let's make a very simple example and imagine we have a CPU with only two registers, EAX and EBX.

mov ebx, eax

Simply copies the value in eax to the ebx register

 | EAX : 01234567 |   ---->   | EAX : 01234567 |
| EBX : 00000000 | ====> | EBX : 01234567 |

Now let's add some memory space

ADDRESS           VALUE
00000000 6A43210D
00000004 51C9A847
00000008 169B87F1
0000000C C981A517
00000010 9A16D875
00000014 54C9815F

mov [ebx], eax

Moves the value in eax to the memory address contained in ebx.

 | EAX : 01234567 |   --no-->   | EAX : 01234567 |
| EBX : 00000008 | --change--> | EBX : 00000008 |

ADDRESS VALUE
00000000 6A43210D -> 6A43210D
00000004 51C9A847 -> 51C9A847
00000008 169B87F1 =====> 01234567
0000000C C981A517 -> C981A517
00000010 9A16D875 -> 9A16D875
00000014 54C9815F -> 54C9815F


mov ebx, [eax]

Moves the value from the memory address contained in eax to ebx.

 | EAX : 00000008 |    ->     | EAX : 00000008 |
| EBX : 01234567 | ====> | EBX : 169B87F1 |

[No change to memory]
ADDRESS VALUE
00000000 6A43210D
00000004 51C9A847
00000008 169B87F1
0000000C C981A517
00000010 9A16D875
00000014 54C9815F


mov [ebx], [eax]

This, finally, you would think would move the value from the memory address contained in eax to the memory address contained in ebx.

 | EAX : 00000010 |   --no-->   | EAX : 00000010 |
| EBX : 00000008 | --change--> | EBX : 00000008 |

ADDRESS VALUE
00000000 6A43210D -> 6A43210D
00000004 51C9A847 -> 51C9A847
00000008 169B87F1 =====> 9A16D875
0000000C C981A517 -> C981A517
00000010 *9A16D875 -> 9A16D875
00000014 54C9815F -> 54C9815F

But this combination is disallowed by the x86 architecture. You cannot move from memory to memory.

The use of brackets is therefore equivalent to a dereferencing operation.

What does adding two registers in square brackets mean?

This will add the values of the two registers and subsequently use them as a memory address reference to either retrieve the value at that register:

 MOV EDX, [EBX+EAX]

or store a value to that location:

 MOV [EBX+EDX], ECX

Basic use of immediates vs. square brackets in YASM/NASM x86 assembly

Indeed, your thought is correct.That is, bl will contain 5 and cl the memory address of buffer(in fact the label buffer is a memory address itself).


Now, let me explain the differences between the operations you mentioned:

  • moving an immediate into a register can be done using mov reg,imm.What may be confusing is that labels e.g buffer are immediate values themselves that contain an address.

  • You cannot really move a register into an immediate, since immediate values are constants, like 2 or FF1Ah.What you can do is move a register to the place where the constant points to.You can do it like mov [const], reg .

  • You can also use indirect addressing like mov reg2,[reg1] provided reg1 points to a valid location, and it will transfer the value pointed by reg1 to reg2.


So, mov cl, buffer will move the address of buffer to cl(which may or may not give the correct address, since cl is only one byte long) , whereas mov cl, [buffer] will get the actual value.

Summary

  • When you use [a], then you refer to the value at the place where a points to.For example, if a is F5B1, then [a] refers to the address F5B1 in RAM.
  • Labels are addresses,i.e values like F5B1.
  • Values stored in registers do not have to be referenced to as [reg] because registers do not have addresses.In fact, registers can be thought of as immediate values.

What do the brackets mean in NASM syntax for x86 asm?

[L1] means the memory contents at address L1. After running mov al, [L1] here, The al register will receive the byte at address L1 (the letter 'w').

8086- why can't we move an immediate data into segment register?

Remember that the syntax of assembly language (any assembly) is just a human-readable way to write machine code. The rules of what you can do in machine code depend on how the processor's electronics were designed, not on what the assembler syntax could easily support.

So, just because it looks like you could write mov DS, 5000h and that conceptually it doesn't seem like there is a reason why you shouldn't be able to do it, it's really about "is there a mechanism by which the processor can load a segment register directly from an immediate value?"

In the case of 8086 assembly, I figure that the reason is simply that the engineers just didn't create an electric path that could feed a signal from the memory I/O data lines to the lines that write to the segment registers.


Why? I have several theories, but no authoritative knowledge.

The most likely reason is simply one of simplifying the design: it takes extra wiring and gates to do that, and it's an uncommon enough operation (this is the 70's) that it's not worth the real estate in the chip. This is not surprising; the 8086 already went overboard allowing any of the normal registers to be connected to the ALU (arithmetic logic unit) which allows any register to be used as an accumulator. I'm sure that wasn't cheap to do. Most processors at the time only allowed one register (the accumulator) to be used for that purpose.


As far as the brackets, you are correct. Let's say memory position 5000h contains the number 4321h. mov ax, 5000h puts the value 5000h into ax, while mov ax, [5000h] loads 4321h from memory into ax. Essentially, the brackets act like the * pointer dereference operator in C.

Just to highlight the fact that assembly is an idealized abstraction of what machine code can do, you should note that the two variations are not the same instruction with different parameters, but completely different opcodes. They could have used – say – MOV for the first and MVD (MoVe Direct addressed memory) for the second opcode, but they must have decided that the bracket syntax was easier for programmers to remember.



Related Topics



Leave a reply



Submit