What are these seemingly-useless callq instructions in my x86 object files for?
The 00 00 00 00
(relative) target address in e8 00 00 00 00
is intended to be filled in by the linker. It doesn't mean that the call falls through. It just means you are disassembling an object file that has not been linked yet.
Also, a call to the next instruction, if that was the end result after the link phase, would not be a no-op, because it changes the stack (a certain hint that this is not what is going on in your case).
function call seems not working properly in disassembled code
It calls address 0, which is the address of the f function.
e8 is the call instruction in x86 according to this:
http://www.cs.cmu.edu/~fp/courses/15213-s07/misc/asm64-handout.pdf
call uses the displacement relative to the next instruction, at memory location 1e. That becomes memory location 0. So it's callq 1e when in reality it's calling address 0.
which MOV instructions in the x86 are not used or the least used, and can be used for a custom MOV extension
Your best bet is regular mov
with a prefix that GCC will never emit on its own. i.e. create a new mov
encoding that includes a mandatory prefix in front of any other mov
. Like how lzcnt
is rep bsr
.
Or if you're modifying GCC and as
, you can add a new mnemonic that just uses otherwise-invalid (in 64-bit mode) single byte opcodes for memory-source, memory-dest, and immediate-source versions of mov
. AMD64 freed up several opcodes, including the BCD instructions like AAM, and push/pop most segment registers. (x86-64 can still mov
to/from Sregs, but there's just 1 opcode per direction, not 2 per Sreg for push ds/pop ds etc.)
Assuming my workload just includes integers i.e. most probably wont be using the xmm and mmx registers
Bad assumption for XMM: GCC aggressively uses 16-byte movaps
/ movups
instead of copying structs 4 or 8 bytes at a time. It's not at all rare to find vector mov instructions in scalar integer code as part of inline expansion of small known-length memcpy
or struct / array init. Also, those mov
instructions have at least 2-byte opcodes (SSE1 0F 28 movaps
, so a prefix in front of plain mov
is the same size as your idea would have been).
However, you're right about MMX regs. I don't think modern GCC will ever emit movq mm0, mm1
or use MMX at all, unless you use MMX intrinsics. Definitely not when targeting 64-bit code.
Also mov
to/from control regs (0f 21/23 /r
) or debug registers (0f 20/22 /r
) are both the mov
mnemonic, but gcc will definitely never emit either on its own. Only available with GP register operands as the operand that isn't the debug or control register. So that's technically the answer to your title question, but probably not what you actually want.
GCC doesn't parse its inline asm template string, it just includes it in its asm text output to feed to the assembler after substituting for %number
operands. So GCC itself is not an obstacle to emitting arbitrary asm text using inline asm.
And you can use .byte
to emit arbitrary machine code.
Perhaps a good option would be to use a 0E
byte as a prefix for your special mov
encoding that you're going to make GEM decode specially. 0E
is push CS
in 32-bit mode, invalid in 64-bit mode. GCC will never emit either.
Or just an F2 repne
prefix; GCC will never emit repne
in front of a mov
opcode (where it doesn't apply), only movs
. (F3 rep
/ repe
means xrelease when used on a memory-destination instruction so don't use that. https://www.felixcloutier.com/x86/xacquire:xrelease says that F2 repne is the xacquire prefix when used with lock
ed instructions, which doesn't include mov
to memory so it will be silently ignored there.)
As usual, prefixes that don't apply have no documented behaviour, but in practice CPUs that don't understand a rep
/ repne
ignore it. Some future CPU might understand it to mean something special, and that's exactly what you're doing with GEM.
Picking .byte 0x0e;
instead of repne;
might be a better choice if you want to guard against accidentally leaving these prefixes in a build you run on a real CPU. (It will #UD -> SIGILL in 64-bit mode, or usually crash from messing up the stack in 32-bit mode.) But if you do want to be able to run the exact same binary on a real CPU, with the same code alignment and everything, then an ignored REP prefix is ideal.
Using a prefix in front of a standard mov
instruction has the advantage of letting the assembler encode the operands for you:
template<class T>
void fancymov(T& dst, T src) {
// fixme: imm -> mem needs a size suffix, defeating template
// unless you use Intel-syntax where the operand includes "dword ptr"
asm("repne; movl %1, %0"
#if 1
: "=m"(dst)
: "ri" (src)
#else
: "=g,r"(dst)
: "ri,rmi" (src)
#endif
: // no clobbers
);
}
void test(int *dst, long src) {
fancymov(*dst, (int)src);
fancymov(dst[1], 123);
}
(Multi-alternative constraints let the compiler pick either reg/mem destination or reg/mem source. In practice it prefers the register destination even when that will cost it another instruction to do its own store, so that sucks.)
On the Godbolt compiler explorer, for the version that only allows a memory-destination:
test(int*, long):
repne; movl %esi, (%rdi) # F2 E9 37
repne; movl $123, 4(%rdi) # F2 C7 47 04 7B 00 00 00
ret
If you wanted this to be usable for loads, I think you'd have to make 2 separate versions of the function and use the load version or store version manually, where appropriate, because GCC seems to want to use reg,reg whenever it can.
Or with the version allowing register outputs (or another version that returns the result as a T
, see the Godbolt link):
test2(int*, long):
repne; mov %esi, %esi
repne; mov $123, %eax
movl %esi, (%rdi)
movl %eax, 4(%rdi)
ret
Difference between JS and JL x86 instructions
JS
jumps if the sign flag is set (SF=1
), while JL
jumps if the sign flag doesn't equal the overflow flag (SF != OF
).
There are situations where one of these condtions will be met, but not the other. Consider the following:
mov al, -100
cmp al, 30
Here the flags will be set based on the result of -100 - 30
. -100
is negative and 30
is positive, but the result (-130
) can not be represented by 8 bits in two's complement, so you get arithmetic overflow and a result of positive 126.
This is perhaps easier to see if we use hexadecimal notation: -100 == 0x9C
, 30 == 0x1E
, 0x9C - 0x1E = 0x7E == 126
.
So we have a positive result (SF=0
) and overflow (OF=1
). Therefore, in this case JS
would not jump but JL
would (since SF != OF
).
Which jump condition you should use depends on what you're trying to achive. If you're comparing two values and you want them to be interpreted as signed and jump if one is less than the other; use JL
. If you want to jump if the result if a calculation is negative; use JS
.
what are the reasons for virtualising these instructions?
Modifying the machine state (segment bases / limits, disabling interrupts, etc.) obviously can't be allowed, or the guest could break out of the VM or at least hang it. (E.g. by running an infinite loop with interrupts disabled.)
pushf
/popf
are slightly subtle: remember that IF
(the interrupts-enabled bit which cli
/sti
flip) is one of the bits in EFLAGS.
You want the physical machine to have interrupts enabled while the guest disables interrupts. But you also want the guest to see IF=0
when it has interrupts disabled on the virtual x86 that it's running on. So you need to virtualize pushf
as well as popf
.
Gameboy rotate instructions
With instruction mnemonics, it's often hard to give too much importance to the names. However, here there is some answer, by looking at the long names shown in some (but not all) places:
RL
: Rotate Left
RLC
: Rotate Left Circular
These names make sense if you think about the carry being a part of the circular cycle in the case of RLC
. The 'C' in RLC
isn't for "carry", it's for "circular".
Related Topics
Is Substitution Performed on a Variadic Parameter Pack Type If the Pack Is Empty
#Include Absolute Path Syntax in C/C++
Generate a Plane with Triangle Strips
Why Is the Data Type Needed in Pointer Declarations
Convert Hexadecimal String with Leading "0X" to Signed Short in C++
Can You Access Private Member Variables Across Class Instances
How to Throw Std::Exceptions with Variable Messages
How Would One Call Std::Forward on All Arguments in a Variadic Function
Easiest Way to Flip a Boolean Value
Vector.Erase(Iterator) Causes Bad Memory Access
What Are the Incompatible Differences Between C(99) and C++(11)
C++ Passing an Array Pointer as a Function Argument
How to Resume Input Stream After Stopped by Eof in C++
How to Get Position of a Certain Element in Strings Vector, to Use It as an Index in Ints Vector