What Does the "Lock" Instruction Mean in X86 Assembly

What does the lock instruction mean in x86 assembly?


  1. LOCK is not an instruction itself: it is an instruction prefix, which applies to the following instruction. That instruction must be something that does a read-modify-write on memory (INC, XCHG, CMPXCHG etc.) --- in this case it is the incl (%ecx) instruction which increments the long word at the address held in the ecx register.

    The LOCK prefix ensures that the CPU has exclusive ownership of the appropriate cache line for the duration of the operation, and provides certain additional ordering guarantees. This may be achieved by asserting a bus lock, but the CPU will avoid this where possible. If the bus is locked then it is only for the duration of the locked instruction.

  2. This code copies the address of the variable to be incremented off the stack into the ecx register, then it does lock incl (%ecx) to atomically increment that variable by 1. The next two instructions set the eax register (which holds the return value from the function) to 0 if the new value of the variable is 0, and 1 otherwise. The operation is an increment, not an add (hence the name).

LOCK prefix of Intel instruction. What is the point?

I certainly would not call lock useless. lock cmpxchg is the standard way to perform compare-and-swap, which is the basic building block of many synchronization algorithms.

Also, see fetch-and-add.

8086 lock pin and ASM LOCK prefix how it works

If the only CPU has the memory bus locked, no other device can read or change memory contents during that time, not even via DMA. (Or with multiple CPUs on a shared bus with no cache, same deal.) Therefore, no other memory operations at all can happen between the load and the store of a lock add [di], ax for example, making it atomic wrt. any possible observer. (Other than a logic analyzer connected to the bus, which doesn't count.)

Semi-related: Can num++ be atomic for 'int num'? describes how the lock prefix works on modern CPUs for cacheable memory, providing RMW atomicity without a bus lock, just hanging on to the cache line for the duration.

We call this a "cache lock"; all modern CPUs work this way for aligned locked operations, only doing an expensive bus lock on something like xchg [mem], ax that spans a boundary between two cache-lines. That hurts throughput on all cores, and is so expensive that modern CPUs have a way to make that always fault, but not other unaligned loads/stores, as well as performance counters for it.

Fun fact: xchg [mem], reg has implicit lock semantics on 386 and newer. (Which is unfortunate because it makes it unusable for performance reasons as just a plain load/store when you're running low on registers). It didn't on 286 or earlier, unless you did lock xchg. This is possibly related to the fact that there were SMP 386 systems (with a primitive sequentially-consistent memory model). The modern x86 memory model applies to 486 and later SMP systems.

How do I use the LOCK ASM prefix to read a value?

Use XADD or MOV instruction instead ADD instruction!
See also MFENCE, LFENCE and SFENCE instructions!

EDIT:
You can't use LOCK instruction with ADD instruction if source operand is a memory operand!

From: "Intel® 64 and IA-32 Architectures Software Developer’s Manual"

The LOCK prefix can be prepended only
to the following instructions and only
to those forms of the instructions
where the destination operand is a
memory operand: ADD, ADC, AND, BTC,
BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC,
NEG, NOT, OR, SBB, SUB, XOR, XADD, and
XCHG. If the LOCK prefix is used with
one of these instructions and the
source operand is a memory operand, an
undefined opcode exception (#UD) may
be generated. An undefined opcode
exception will also be generated if
the LOCK prefix is used with any
instruction not in the above list. The
XCHG instruction always asserts the
LOCK# signal regardless of the
presence or absence of the LOCK prefix

EDIT2:
Form: "Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume3A"

8.1.1 Guaranteed Atomic Operations.
The Intel486 processor (and newer
processors since) guarantees that the
following basic memory operations will
always be carried out atomically:

  • Reading or writing a byte
  • Reading or writing a word aligned
    on a 16-bit boundary
  • Reading or writing a doubleword aligned on a 32-bit boundary

The Pentium processor (and newer
processors since) guarantees that the
following additional memory operations
will always be carried out atomically:

  • Reading or writing a quadword aligned on a 64-bit boundary
  • 6-bit accesses to uncached memory locations that fit within a 32-bit

    data bus The P6 family processors

    (and newer processors since)

    guarantee that the following

    additional memory operation will

    always be carried out atomically:
  • Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit

    within a cache line

Accesses to cacheable memory that are
split across bus widths, cache lines,
and page boundaries are not guaranteed
to be atomic by the
Intel Core 2 Duo,
Intel Core Duo, Pentium M, Pentium 4,
Intel Xeon, P6 family, Pentium, and
Intel486 processors. The Intel Core 2
Duo, Intel Core Duo, Pentium M,
Pentium 4, Intel Xeon, and P6 family
processors provide bus control signals
that permit external memory subsystems
to make split accesses atomic;
however, nonaligned data accesses will
seriously impact the performance of
the processor and should be avoided.

So, for reading I prefer to use CMPXCHG instruction with LOCK prefix, like:

LOCK        CMPXCHG   EAX, [J]

For writing:

MOV   [J], EAX
SFENCE

.

What is the scope of lock prefix?

The lock prefix affects a single instruction.

Instructions stop being atomic when they modify memory shared between several CPUs. Modifications that involve reading a memory operand, performing some operation on it (e.g. AND, XOR, INC, etc) and then writing it back are not seen as atomic by other CPUs. The lock prefix "locks" the memory location, so the 3 steps (Read, Modify, Write) look as one, i.e. other CPUs can only observe what was before and what was after the locked instruction.

See the official CPU documentation from Intel or AMD.

EDIT: In your newly added example neither of those instructions can be interrupted, if we're talking about interrupts. Interrupts occur between entire instructions. The lock prefix makes the sub instruction atomic. The sete instruction is not intended to be atomic, it's there to transform the ZF flag into a zero or non-zero integer value.

What do these x86 Assembly instruction codes mean?

You'll want to see Intel® 64 and IA-32 Architectures Software Developer Manuals.

"slash x" denotes that part of the instruction is encoded in the opcode (reg) part of the modr/m byte. See Vol 2A Chapter 2 INSTRUCTION FORMAT.

"ib" and "id" mean "immediate byte" and "immediate dword" respectively. You can see all the abbreviations in Vol 2A Appendix A.2 OPCODE MAP / KEY TO ABBREVIATIONS.

What does the operand of this mov instruction underlined in this image mean?

See above, _a$ and $T3853 are symbols defined to value 12 and 8 respectively. So

mov ecx, DWORD PTR _a$[esp-4]

is the same as

mov ecx, DWORD PTR 12[esp-4]

or

mov ecx, DWORD PTR [esp-4+12]

or

mov ecx, DWORD PTR [esp+8]


Related Topics



Leave a reply



Submit