Direct C Function Call Using Gcc's Inline Assembly

Direct C function call using GCC's inline assembly

I got the answer from GCC's mailing list:

asm("call %P0" : : "i"(callee));  // FIXME: missing clobbers

Now I just need to find out what %P0 actually means because it seems to be an undocumented feature...

Edit: After looking at the GCC source code, it's not exactly clear what the code P in front of a constraint means. But, among other things, it prevents GCC from putting a $ in front of constant values. Which is exactly what I need in this case.


For this to be safe, you need to tell the compiler about all registers that the function call might modify, e.g. : "eax", "ecx", "edx", "xmm0", "xmm1", ..., "st(0)", "st(1)", ....

See Calling printf in extended inline ASM for a full x86-64 example of correctly and safely making a function call from inline asm.

Calling a function in gcc inline assembly

Generally you'll want to do something like

void *x;
asm(".. code that writes to register %0" : "=r"(x) : ...
int r = some_function(x);
asm(".. code that uses the result..." : ... : "r"(r), ...

That is, you don't want to do the function call in the inline asm at all. That way you don't have to worry about details of the calling conventions, or stack frame management.

Is it possible to call a built in function from assembly in C++

No. Builtin functions aren't real functions that you can call with call. They always inline when used in C / C++.

For example, if you want int __builtin_popcount (unsigned int x) to get either a popcnt instruction for targets with -mpopcnt, or a byte-wise lookup table for targets that don't support the popcnt instruction, you are out of luck. You will have to #ifdef yourself and use popcnt or an alternative sequence of instructions.


The function you're talking about, __builtin_ia32_aesenc128 is just a wrapper for the aesenc assembly instruction which you can just use directly if writing in asm.


If you're writing asm instead of using C++ intrinsics (like #include <immintrin.h>) for performance, you need to have a look at http://agner.org/optimize/ to write more efficient asm. e.g. use %ecx as a loop counter, not %cx. You're gaining nothing from using a 16-bit partial register.

You could also write more efficient inline-asm constraints, e.g. the movq %%rbx, %0 is a waste of an instruction. You could have used %0 the whole time instead of an explict %rbx. If your inline asm starts or ends with a mov instruction to copy to/from an output/input operand, usually you're doing it wrong. Let the compiler allocate registers for you. See the inline-assembly tag wiki.

Or better, https://gcc.gnu.org/wiki/DontUseInlineAsm. Code with intrinsics typically compiles well for x86. See Intel's intrinsics guide: #include <immintrin.h> and use __m128i _mm_aesenc_si128 (__m128i a, __m128i RoundKey). (In gcc that's just a wrapper for __builtin_ia32_aesenc128, but it makes your code portable to other x86 compilers.)

Calling printf in extended inline ASM

Specific problem to your code: RDI is not maintained across a function call (see below). It is correct before the first call to printf but is clobbered by printf. You'll need to temporarily store it elsewhere first. A register that isn't clobbered will be convenient. You can then save a copy before printf, and copy it back to RDI after.


I do not recommend doing what you are suggesting (making function calls in inline assembler). It will be very difficult for the compiler to optimize things. It is very easy to get things wrong. David Wohlferd wrote a very good article on reasons not to use inline assembly unless absolutely necessary.

Among other things the 64-bit System V ABI mandates a 128-byte red zone. That means you can't push anything onto the stack without potential corruption. Remember: doing a CALL pushes a return address on the stack. Quick and dirty way to resolve this problem is to subtract 128 from RSP when your inline assembler starts and then add 128 back when finished.

The 128-byte area beyond the location pointed to by %rsp is considered to
be reserved and shall not be modified by signal or interrupt handlers.8 Therefore,
functions may use this area for temporary data that is not needed across function
calls. In particular, leaf functions may use this area for their entire stack frame,
rather than adjusting the stack pointer in the prologue and epilogue. This area is
known as the red zone.

Another issue to be concerned about is the requirement for the stack to be 16-byte aligned (or possibly 32-byte aligned depending on the parameters) prior to any function call. This is required by the 64-bit ABI as well:

The end of the input argument area shall be aligned on a 16 (32, if __m256 is
passed on stack) byte boundary. In other words, the value (%rsp + 8) is always
a multiple of 16 (32) when control is transferred to the function entry point.

Note: This requirement for 16-byte alignment upon a CALL to a function is also required on 32-bit Linux for GCC >= 4.5:

In context of the C programming language, function arguments are pushed on the stack in the reverse order. In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary when calling a function (previous versions only required a 4-byte alignment.)

Since we call printf in inline assembler we should ensure that we align the stack to a 16-byte boundary before making the call.

You also have to be aware that when calling a function some registers are preserved across a function call and some are not. Specifically those that may be clobbered by a function call are listed in Figure 3.4 of the 64-bit ABI (see previous link). Those registers are RAX, RCX, RDX, RD8-RD11, XMM0-XMM15, MMX0-MMX7, ST0-ST7 . These are all potentially destroyed so should be put in the clobber list if they don't appear in the input and output constraints.

The following code should satisfy most of the conditions to ensure that inline assembler that calls another function will not inadvertently clobber registers, preserves the redzone, and maintains 16-byte alignment before a call:

int main()
{
const char* test = "test\n";
long dummyreg; /* dummyreg used to allow GCC to pick available register */

__asm__ __volatile__ (
"add $-128, %%rsp\n\t" /* Skip the current redzone */
"mov %%rsp, %[temp]\n\t" /* Copy RSP to available register */
"and $-16, %%rsp\n\t" /* Align stack to 16-byte boundary */
"mov %[test], %%rdi\n\t" /* RDI is address of string */
"xor %%eax, %%eax\n\t" /* Variadic function set AL. This case 0 */
"call printf\n\t"
"mov %[test], %%rdi\n\t" /* RDI is address of string again */
"xor %%eax, %%eax\n\t" /* Variadic function set AL. This case 0 */
"call printf\n\t"
"mov %[temp], %%rsp\n\t" /* Restore RSP */
"sub $-128, %%rsp\n\t" /* Add 128 to RSP to restore to orig */
: [temp]"=&r"(dummyreg) /* Allow GCC to pick available output register. Modified
before all inputs consumed so use & for early clobber*/
: [test]"r"(test), /* Choose available register as input operand */
"m"(test) /* Dummy constraint to make sure test array
is fully realized in memory before inline
assembly is executed */
: "rax", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10", "r11",
"xmm0","xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
"xmm8","xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
"mm0","mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm6",
"st", "st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)"
);

return 0;
}

I used an input constraint to allow the template to choose an available register to be used to pass the str address through. This ensures that we have a register to store the str address between the calls to printf. I also get the assembler template to choose an available location for storing RSP temporarily by using a dummy register. The registers chosen will not include any one already chosen/listed as an input/output/clobber operand.

This looks very messy, but failure to do it correctly could lead to problems later as you program becomes more complex. This is why calling functions that conform to the System V 64-bit ABI within inline assembler is generally not the best way to do things.

Is this assembly function call safe/complete?

The original question was Is this assembly function call safe/complete?. The answer to that is: no. While it may appear to work in this simple example (especially if optimizations are disabled), you are violating rules that will eventually lead to failures (ones that are really hard to track down).

I'd like to address the (obvious) followup question of how to make it safe, but without feedback from the OP on the actual intent, I can't really do that.

So, I'll do the best I can with what we have and try to describe the things that make it unsafe and some of the things you can do about it.

Let's start by simplifying that asm:

 __asm__(
"mov %0, %%edi;"
:
: "g"(a)
);

Even with this single statement, this code is already unsafe. Why? Because we are changing the value of a register (edi) without letting the compiler know.

How can the compiler not know you ask? After all, it's right there in the asm! The answer comes from this line in the gcc docs:

GCC does not parse the assembler instructions themselves and does not
know what they mean or even whether they are valid assembler input.

In that case, how do you let gcc know what's going on? The answer lies in using the constraints (the stuff after the colons) to describe the impact of the asm.

Perhaps the simplest way to fix this code would be like this:

  __asm__(
"mov %0, %%edi;"
:
: "g"(a)
: edi
);

This adds edi to the clobber list. In brief, this tell gcc that the value of edi is going to be changed by the code, and that gcc shouldn't assume any particular value will be in it when the asm exits.

Now, while that's the easiest, it's not necessarily the best way. Consider this code:

  __asm__(
""
:
: "D"(a)
);

This uses a machine constraint to tell gcc to put the value of the variable a into the edi register for you. Doing it this way, gcc will load the register for you at a 'convenient' time, perhaps by always keeping a in edi.

There is one (significant) caveat to this code: By putting the parameter after the 2nd colon, we are declaring it to be an input. Input parameters are required to be read-only (ie they must have the same value on exiting the asm).

In your case, the call statement means that we won't be able to guarantee that edi won't be changed, so this doesn't quite work. There are a few ways to deal with this. The easiest is to move the constraint up after the first colon, making it an output, and specify "+D" to indicate that the value is read+write. But then the contents of a are going to be pretty much undefined after the asm (printf could set it to anything). If destroying a is unacceptable, there's always something like this:

int junk;
__asm__ volatile (
""
: "=D" (junk)
: "0"(a)
);

This tells gcc that on starting the asm, it should put the value of the variable a into the same place as output constraint #0 (ie edi). It also says that on output, edi won't be a anymore, it will contain the variable junk.

Edit: Since the 'junk' variable isn't actually going to be used, we need to add the volatile qualifier. Volatile was implicit when there weren't any output parameters.

One other point on that line: You end it with a semi-colon. This is legal and will work as expected. However, if you ever want to use the -S command line option to see exactly what code got generated (and if you want to get good with inline asm, you will), you will find that produces difficult-to-read code. I'd recommend using \n\t instead of a semi-colon.

All that and we're still on the first line...

Obviously the same would apply to the other two mov statements.

Which brings us to the call statement.

Both Michael and I have listed a number of reasons doing call in inline asm is difficult.

  • Handling all the registers that may be clobbered by the function call's ABI.
  • Handling red-zone.
  • Handling alignment.
  • Memory clobber.

If the goal here is 'learning,' then feel free to experiment. But I don't know that I would ever feel comfortable doing this in production code. Even when it looks like it works, I'd never feel confident there wasn't some weird case I'd missed. That's aside from my normal concerns about using inline asm at all.

I know, that's a lot of information. Probably more than you were looking for as an introduction to gcc's asm command, but you've picked a challenging place to start.

If you haven't done so already, spend time looking over all the docs in gcc's Assembly Language interface. There's a lot of good information there along with examples to try to explain how it all works.

Inline assembly in C

Use this syntax, you can access variables declared in C from the inline assembly

#include <stdio.h>

int main() {
int number = 0;
printf("%d\n",number);
asm(
"mov %[number],%%eax\n"
"inc %%eax\n"
"mov %%eax,%[number]\n"
: [number] "=m" (number) : "m" (number) : "eax", "cc" );
printf("%d\n",number);
return 0;
}

You can let the compiler load number into the eax register for you by specifying the "a" constraint on the input

#include <stdio.h>

int main() {
int number = 0;
printf("%d\n",number);
asm(
"inc %%eax\n"
"mov %%eax,%[number]\n"
: [number] "=m" (number) : "a" (number) : "cc" );
printf("%d\n",number);
return 0;
}

And since x86 inc instruction can operate on memory directly you could reduce it to this

#include <stdio.h>

int main() {
int number = 0;
printf("%d\n",number);
asm(
"incl %[number]\n" /* incl -> "long" (32-bits) */
: [number] "=m" (number) : "m" (number) : "cc" );
printf("%d\n",number);
return 0;
}

For more information see gcc documentation:

6.41 Assembler Instructions with C Expression Operands

How do I pass arguments to C++ functions when I call them from inline assembly

To push 8-byte values such as doubles, you won't be able to use a regular PUSH instruction. And neither do you push floating-point parameters (or doubles) on to the floating-point stack. You need to put these fat parameters on the stack 'by hand'. For example, to push π as a parameter to a function f:

  __asm {
FLDPI // load pi onto FP stack
SUB ESP,8 // make room for double on processor stack
FSTP QWORD PTR [ESP] // store pi in proc stack slot (and pop from FP stack)
CALL f
ADD ESP,8 // clean up stack (assuming f is _cdecl)
}


Related Topics



Leave a reply



Submit