What Are These Seemingly-Useless Callq Instructions in My X86 Object Files For

What are these seemingly-useless callq instructions in my x86 object files for?

The 00 00 00 00 (relative) target address in e8 00 00 00 00 is intended to be filled in by the linker. It doesn't mean that the call falls through. It just means you are disassembling an object file that has not been linked yet.

Also, a call to the next instruction, if that was the end result after the link phase, would not be a no-op, because it changes the stack (a certain hint that this is not what is going on in your case).

function call seems not working properly in disassembled code

It calls address 0, which is the address of the f function.

e8 is the call instruction in x86 according to this:

http://www.cs.cmu.edu/~fp/courses/15213-s07/misc/asm64-handout.pdf

call uses the displacement relative to the next instruction, at memory location 1e. That becomes memory location 0. So it's callq 1e when in reality it's calling address 0.

which MOV instructions in the x86 are not used or the least used, and can be used for a custom MOV extension

Your best bet is regular mov with a prefix that GCC will never emit on its own. i.e. create a new mov encoding that includes a mandatory prefix in front of any other mov. Like how lzcnt is rep bsr.

Or if you're modifying GCC and as, you can add a new mnemonic that just uses otherwise-invalid (in 64-bit mode) single byte opcodes for memory-source, memory-dest, and immediate-source versions of mov. AMD64 freed up several opcodes, including the BCD instructions like AAM, and push/pop most segment registers. (x86-64 can still mov to/from Sregs, but there's just 1 opcode per direction, not 2 per Sreg for push ds/pop ds etc.)

Assuming my workload just includes integers i.e. most probably wont be using the xmm and mmx registers

Bad assumption for XMM: GCC aggressively uses 16-byte movaps / movups instead of copying structs 4 or 8 bytes at a time. It's not at all rare to find vector mov instructions in scalar integer code as part of inline expansion of small known-length memcpy or struct / array init. Also, those mov instructions have at least 2-byte opcodes (SSE1 0F 28 movaps, so a prefix in front of plain mov is the same size as your idea would have been).

However, you're right about MMX regs. I don't think modern GCC will ever emit movq mm0, mm1 or use MMX at all, unless you use MMX intrinsics. Definitely not when targeting 64-bit code.

Also mov to/from control regs (0f 21/23 /r) or debug registers (0f 20/22 /r) are both the mov mnemonic, but gcc will definitely never emit either on its own. Only available with GP register operands as the operand that isn't the debug or control register. So that's technically the answer to your title question, but probably not what you actually want.


GCC doesn't parse its inline asm template string, it just includes it in its asm text output to feed to the assembler after substituting for %number operands. So GCC itself is not an obstacle to emitting arbitrary asm text using inline asm.

And you can use .byte to emit arbitrary machine code.

Perhaps a good option would be to use a 0E byte as a prefix for your special mov encoding that you're going to make GEM decode specially. 0E is push CS in 32-bit mode, invalid in 64-bit mode. GCC will never emit either.

Or just an F2 repne prefix; GCC will never emit repne in front of a mov opcode (where it doesn't apply), only movs. (F3 rep / repe means xrelease when used on a memory-destination instruction so don't use that. https://www.felixcloutier.com/x86/xacquire:xrelease says that F2 repne is the xacquire prefix when used with locked instructions, which doesn't include mov to memory so it will be silently ignored there.)

As usual, prefixes that don't apply have no documented behaviour, but in practice CPUs that don't understand a rep / repne ignore it. Some future CPU might understand it to mean something special, and that's exactly what you're doing with GEM.

Picking .byte 0x0e; instead of repne; might be a better choice if you want to guard against accidentally leaving these prefixes in a build you run on a real CPU. (It will #UD -> SIGILL in 64-bit mode, or usually crash from messing up the stack in 32-bit mode.) But if you do want to be able to run the exact same binary on a real CPU, with the same code alignment and everything, then an ignored REP prefix is ideal.


Using a prefix in front of a standard mov instruction has the advantage of letting the assembler encode the operands for you:

template<class T>
void fancymov(T& dst, T src) {
// fixme: imm -> mem needs a size suffix, defeating template
// unless you use Intel-syntax where the operand includes "dword ptr"
asm("repne; movl %1, %0"
#if 1
: "=m"(dst)
: "ri" (src)
#else
: "=g,r"(dst)
: "ri,rmi" (src)
#endif
: // no clobbers
);
}

void test(int *dst, long src) {
fancymov(*dst, (int)src);
fancymov(dst[1], 123);
}

(Multi-alternative constraints let the compiler pick either reg/mem destination or reg/mem source. In practice it prefers the register destination even when that will cost it another instruction to do its own store, so that sucks.)

On the Godbolt compiler explorer, for the version that only allows a memory-destination:

test(int*, long):
repne; movl %esi, (%rdi) # F2 E9 37
repne; movl $123, 4(%rdi) # F2 C7 47 04 7B 00 00 00
ret

If you wanted this to be usable for loads, I think you'd have to make 2 separate versions of the function and use the load version or store version manually, where appropriate, because GCC seems to want to use reg,reg whenever it can.


Or with the version allowing register outputs (or another version that returns the result as a T, see the Godbolt link):

test2(int*, long):
repne; mov %esi, %esi
repne; mov $123, %eax
movl %esi, (%rdi)
movl %eax, 4(%rdi)
ret

Difference between JS and JL x86 instructions

JS jumps if the sign flag is set (SF=1), while JL jumps if the sign flag doesn't equal the overflow flag (SF != OF).

There are situations where one of these condtions will be met, but not the other. Consider the following:

mov al, -100
cmp al, 30

Here the flags will be set based on the result of -100 - 30. -100 is negative and 30 is positive, but the result (-130) can not be represented by 8 bits in two's complement, so you get arithmetic overflow and a result of positive 126.

This is perhaps easier to see if we use hexadecimal notation: -100 == 0x9C, 30 == 0x1E, 0x9C - 0x1E = 0x7E == 126.

So we have a positive result (SF=0) and overflow (OF=1). Therefore, in this case JS would not jump but JL would (since SF != OF).

Which jump condition you should use depends on what you're trying to achive. If you're comparing two values and you want them to be interpreted as signed and jump if one is less than the other; use JL. If you want to jump if the result if a calculation is negative; use JS.

what are the reasons for virtualising these instructions?

Modifying the machine state (segment bases / limits, disabling interrupts, etc.) obviously can't be allowed, or the guest could break out of the VM or at least hang it. (E.g. by running an infinite loop with interrupts disabled.)

pushf/popf are slightly subtle: remember that IF (the interrupts-enabled bit which cli/sti flip) is one of the bits in EFLAGS.

You want the physical machine to have interrupts enabled while the guest disables interrupts. But you also want the guest to see IF=0 when it has interrupts disabled on the virtual x86 that it's running on. So you need to virtualize pushf as well as popf.

Gameboy rotate instructions

With instruction mnemonics, it's often hard to give too much importance to the names. However, here there is some answer, by looking at the long names shown in some (but not all) places:

RL: Rotate Left

RLC: Rotate Left Circular

These names make sense if you think about the carry being a part of the circular cycle in the case of RLC. The 'C' in RLC isn't for "carry", it's for "circular".



Related Topics



Leave a reply



Submit