What Does a Hexadecimal Number, with a Register in Parenthesis Mean in Assembly

What does a hexadecimal number, with a register in parenthesis mean in Assembly?

It is one of the many x86 addressing modes. Specifically, this is referred to as "displacement" addressing.

Since you said you used objdump and didn't specify that you used the -M flag, I'm going to assume this in the GAS syntax (as opposed to Intel syntax). This means that the first operand is the source, and the second operand is the destination.

The lea 0x1C(%ebp),%eax instruction means, "Take the value in %ebp, add 0x1C (28 in decimal), then store that value in %eax".

x86-64 Assembly: Two registers in parentheses? movsd (%rdx,%rsi,8), %xmm0

This is the so called AT&T (or GAS) syntax. It's an alternative to more popular Intel syntax. In AT&T syntax the address operand syntax is:

segment:displacement(base register, index register, scale factor)

where most parts are optional. In your example %rdx is the base register, %rsi is the index register and 8 is the scale factor.

What it does is to load the lower 64-bit part of the xmm0 register from the address rdx + rsi * 8. In Intel syntax it would be:

movsd   xmm0, [rdx+rsi*8]

which is a bit more intuitive (at least to me).

What do the brackets mean in NASM syntax for x86 asm?

[L1] means the memory contents at address L1. After running mov al, [L1] here, The al register will receive the byte at address L1 (the letter 'w').

What is the -4 for in assembler: movl $1, -4(%rbp)

-4 / -8 / -12 bytes relative to the address held in rbp, which is the pointer to the top of the stack (which grows downward). 4 bytes / 32 bits because that is the size of int on your machine.

CMP in x86 with parentheses and address

In AT&T syntax this form represents

OFFSET(BASE REGISTER, INDEX REGISTER, INDEX SCALE)

so the address represented is the value of BASE REGISTER (if present) + INDEX * SCALE (if present) + OFFSET, so

EBX*4 + 0x80498d4 in your case.

What is the significance of the LEA instruction = 0xb48daed9 +3479: lea -0xc(%ebp),%esp?

On the mechanical level, the instruction

lea -0xc(%ebp),%esp

adds -0xc (that is: -12) to %ebp and writes the result to %esp.

On the logical level, it allocates a called function's stack frame. I'd expect to see it in a context similar to this:

push %ebp            ; save previous base pointer
mov %esp,%ebp ; set %ebp = %esp: old stack pointer is new base pointer
lea -0xc(%ebp),%esp ; allocate 12 bytes for local variables

%ebp and %esp are the stack pointer registers. %ebp points to the base of the stack frame and %esp to its "top" (actually the bottom because the stack grows downward), so the lea instruction moves the stack pointer 12 bytes below the base, staking a claim of 12 bytes for local variables. Doing this after saving the old base pointer and setting the new base pointer to the old stack pointer pushes a new frame of 12 bytes onto the call stack.

It seems unlikely that this instruction itself causes a trap, but in the event of a stack overflow, the allocated stack frame will be invalid and explosions are expected when trying to use it. My suspicion is that you have a runaway recursive function.

Another possibility, as @abligh mentions, is that the stack pointer became corrupted somewhere along the line. This can happen, among other things, if a buffer overflow happens in a stack-allocated buffer so that a previously saved base pointer is overwritten with garbage. Upon return from the function, the garbage is restored in lieu of the overwritten base pointer, and a subsequent function call will not have anything sensible with which to work.

Array in Hexadecimal in Assembly x86 MASM

WORD are 2 BYTEs. DWORD are two WORDs ("D" stands for "double"). QWORD is 4*WORD (Quad).

Memory is addressed in bytes, ie. content of memory can be viewed as (for three bytes with values: 0xB, 20, 10):

address | value
----------------
0000 | 0B
0001 | 14
0002 | 0A

WORD then occupies two bytes in memory, on x86 the least significant byte goes at lower address, most significant is at higher address.

So WORD 0x1234 is stored in memory at address 0xDEAD as:

address | value
----------------
DEAD | 34
DEAE | 12

Registers on x86 are special tiny bit of memory located directly on CPU itself, which is not addressable by the numerical addresses like above, but only by the instruction opcode containing the number of register (in source their are named ax, bx, ...).

That means you have no registers in your question, and it makes no sense to talk about registers in it.

In normal assembler [B+2] would be BYTE 40, (bytes at B are: 30, 0, 40, 0, 70, 0, 11, 0). In MASM it may be different, as it's trying to work with "variables" considering also their size, so [B+2] may be treated as WORD 70. I don't know for sure, and I don't want to know, MASM has too many quirks to be used logically, and you have to learn them. (just create short code with B WORD 0, 1, 2, 3, 4 MOV ax,[B+2] and check the disassembly in debugger).

[A+2] is 10. You are missing the point that [A] is [A+0]. Like in C/C++ arrays, indexing goes from 0, not from 1.

Rest of answers can be easily figured out, if you draw the bytes on the paper (for example DWORD 0x310 compiles to 10 03 00 00 hexa bytes).

I wonder where you got 0x15 in first possible answer, as I don't see any value 21 in A.


edit due to new comments ... I will "compile" it for you, make sure you either understand every byte, or ask under answer which one is not clear.

; A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
A:
0B 14 0A 0D 0C
; B WORD 0d30, 0d40, 0d70, 0hB
B: ;▼ ▼ ▼ ▼
1E 00 28 00 46 00 0B 00
; D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
D: ;▼ ▼ ▼ ▼ ▼ ▼
B0 00 00 00 00 02 00 00 10 03 00 00 00 04 00 00 00 05 00 00 00 06 00 00

Notice how A, B and D are just labels marking some address in memory, that's how most Assemblers work with symbols. In MASM it's more tricky, as it tries to be "clever" and keeps not only the address around, but also it knows the D was defined as DWORD and not BYTE. That's not the case with different assemblers.

Now [D+4] in MASM is tricky, it will probably use the size knowledge to default to DWORD size of that expression (in other assemblers you should specify, like "DWORD PTR [D+4]", or it is deduced from target register size automatically, when possible). So [D+4] will fetch bytes 00 02 00 00 = DWORD 00000200. (I just hope MASM doesn't recalculate also the +4 offset as +4th dword, ie +16 in bytes).

Now to your comments, I will torn them apart into tiny bits with mistakes, as while often it's easy to understand what you did mean, in Assembly once you start writing code, it's not enough to have good intention, you must be exact and accurate, CPU will not fill any gap, and do exactly what you wrote.

Can you explain how did you get 0d13 of A and through to 0d30 of B @Jester?

Go to my "compiled" bytes, and D-1 (when offset are in bytes) means one byte back from D: address, ie. that 00 at the end of B line. Now for D-10 count 10 bytes back from D: ... That will go to 0D in A line, as 8 bytes are in B array, and remaining two are at end of A array.

Now if you read from that address 4 bytes: 0D 0C 1E 00 = DWORD 001E0C0D. (Jester mixed up decimal 13 into 13h by accident in his final "dword" value)

each value in B will occupy two "slots" as you count back? And each value in A will occupy four "slots"?

It's other way around, two values in B will form 1 DWORD slot, and four values in A will form 1 DWORD. Just as "D" data of 6 DWORD can be treated also as 12 WORD values, or 24 BYTE values. For example DWORD PTR [A+2] is 1E0C0D0A.

first value of A,B, and D are offsets x86 is little endian

"value of A" is actually some memory address, I think I automatically don't mention "value" in such case, but "address", "pointer" or "label" (although "value of symbol A" is valid English sentence, and can be resolved after symbols have addresses assigned).

OFFSET A has particular special meaning in MASM, taking the byte offset of address A since the start of it's segment (in 32b mode this is usually the "address" for human, as segments start from 0 and memory is flat-mapped. In real mode segment part of address was important, as offset was only 16 bit (only 64k of memory addressable through offset only)).

In your case I would say "value at A", as "content of memory at address A". It's subtle, I know, but when everyone talks like this, it's clear.

B is 0d40

[B+2] is 40. B+2 is some address+2. B is some address. It's the [x] brackets marking "value from memory at x".

Although in MASM it's a bit different, it will compile mov eax,D as mov eax,DWORD PTR [D] to mimic "variable" usage, but that's specific quirk of MASM. Avoid using that syntax, it hides memory usage from unfocused reader of source, use mov eax,[D] even in MASM (or get rid of MASM ideally).

D...bytes (in hex) = 0x200, 0,0,0,...

0x200 is not byte, hexa formatting has that neat feature, that two digits pair form single byte. So hexa 200 is 3 digits => one and half of byte.

Consider how those DWORD values were created from bytes.. in decimal formatting you would have to recalculate the whole value, so bytes 40,30,20,10 are 40 + 30*256 + 20*65536 + 10*16777216 = 169090600 -> the original values are not visible there. With hexa 28 1E 14 0A you just reassemble them in correct order 0A141E28.

D is 0x200.

No, D is address. And even [D] is 0xB0.

Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C?

B0 is at D+0 address. You don't count it into those 10 bytes in [D-10], that B0 is zero bytes beyond D (D+0). Look at my "compiled" memory and count bytes there to get comfortable with offsets.



Related Topics



Leave a reply



Submit