What does the lock instruction mean in x86 assembly?
LOCK
is not an instruction itself: it is an instruction prefix, which applies to the following instruction. That instruction must be something that does a read-modify-write on memory (INC
,XCHG
,CMPXCHG
etc.) --- in this case it is theincl (%ecx)
instruction whichinc
rements thel
ong word at the address held in theecx
register.The
LOCK
prefix ensures that the CPU has exclusive ownership of the appropriate cache line for the duration of the operation, and provides certain additional ordering guarantees. This may be achieved by asserting a bus lock, but the CPU will avoid this where possible. If the bus is locked then it is only for the duration of the locked instruction.This code copies the address of the variable to be incremented off the stack into the
ecx
register, then it doeslock incl (%ecx)
to atomically increment that variable by 1. The next two instructions set theeax
register (which holds the return value from the function) to 0 if the new value of the variable is 0, and 1 otherwise. The operation is an increment, not an add (hence the name).
LOCK prefix of Intel instruction. What is the point?
I certainly would not call lock
useless. lock cmpxchg
is the standard way to perform compare-and-swap, which is the basic building block of many synchronization algorithms.
Also, see fetch-and-add.
8086 lock pin and ASM LOCK prefix how it works
If the only CPU has the memory bus locked, no other device can read or change memory contents during that time, not even via DMA. (Or with multiple CPUs on a shared bus with no cache, same deal.) Therefore, no other memory operations at all can happen between the load and the store of a lock add [di], ax
for example, making it atomic wrt. any possible observer. (Other than a logic analyzer connected to the bus, which doesn't count.)
Semi-related: Can num++ be atomic for 'int num'? describes how the lock
prefix works on modern CPUs for cacheable memory, providing RMW atomicity without a bus lock, just hanging on to the cache line for the duration.
We call this a "cache lock"; all modern CPUs work this way for aligned locked
operations, only doing an expensive bus lock on something like xchg [mem], ax
that spans a boundary between two cache-lines. That hurts throughput on all cores, and is so expensive that modern CPUs have a way to make that always fault, but not other unaligned loads/stores, as well as performance counters for it.
Fun fact: xchg [mem], reg
has implicit lock
semantics on 386 and newer. (Which is unfortunate because it makes it unusable for performance reasons as just a plain load/store when you're running low on registers). It didn't on 286 or earlier, unless you did lock xchg
. This is possibly related to the fact that there were SMP 386 systems (with a primitive sequentially-consistent memory model). The modern x86 memory model applies to 486 and later SMP systems.
How do I use the LOCK ASM prefix to read a value?
Use XADD or MOV instruction instead ADD instruction!
See also MFENCE, LFENCE and SFENCE instructions!
EDIT:
You can't use LOCK instruction with ADD instruction if source operand is a memory operand!
From: "Intel® 64 and IA-32 Architectures Software Developer’s Manual"
The LOCK prefix can be prepended only
to the following instructions and only
to those forms of the instructions
where the destination operand is a
memory operand: ADD, ADC, AND, BTC,
BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC,
NEG, NOT, OR, SBB, SUB, XOR, XADD, and
XCHG. If the LOCK prefix is used with
one of these instructions and the
source operand is a memory operand, an
undefined opcode exception (#UD) may
be generated. An undefined opcode
exception will also be generated if
the LOCK prefix is used with any
instruction not in the above list. The
XCHG instruction always asserts the
LOCK# signal regardless of the
presence or absence of the LOCK prefix
EDIT2:
Form: "Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume3A"
8.1.1 Guaranteed Atomic Operations.
The Intel486 processor (and newer
processors since) guarantees that the
following basic memory operations will
always be carried out atomically:
- Reading or writing a byte
- Reading or writing a word aligned
on a 16-bit boundary- Reading or writing a doubleword aligned on a 32-bit boundary
The Pentium processor (and newer
processors since) guarantees that the
following additional memory operations
will always be carried out atomically:
- Reading or writing a quadword aligned on a 64-bit boundary
- 6-bit accesses to uncached memory locations that fit within a 32-bit
data bus The P6 family processors
(and newer processors since)
guarantee that the following
additional memory operation will
always be carried out atomically:- Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit
within a cache lineAccesses to cacheable memory that are
split across bus widths, cache lines,
and page boundaries are not guaranteed
to be atomic by the Intel Core 2 Duo,
Intel Core Duo, Pentium M, Pentium 4,
Intel Xeon, P6 family, Pentium, and
Intel486 processors. The Intel Core 2
Duo, Intel Core Duo, Pentium M,
Pentium 4, Intel Xeon, and P6 family
processors provide bus control signals
that permit external memory subsystems
to make split accesses atomic;
however, nonaligned data accesses will
seriously impact the performance of
the processor and should be avoided.
So, for reading I prefer to use CMPXCHG instruction with LOCK prefix, like:
LOCK CMPXCHG EAX, [J]
For writing:
MOV [J], EAX
SFENCE
.
What is the scope of lock prefix?
The lock prefix affects a single instruction.
Instructions stop being atomic when they modify memory shared between several CPUs. Modifications that involve reading a memory operand, performing some operation on it (e.g. AND, XOR, INC, etc) and then writing it back are not seen as atomic by other CPUs. The lock prefix "locks" the memory location, so the 3 steps (Read, Modify, Write) look as one, i.e. other CPUs can only observe what was before and what was after the locked instruction.
See the official CPU documentation from Intel or AMD.
EDIT: In your newly added example neither of those instructions can be interrupted, if we're talking about interrupts. Interrupts occur between entire instructions. The lock prefix makes the sub
instruction atomic. The sete
instruction is not intended to be atomic, it's there to transform the ZF
flag into a zero or non-zero integer value.
What do these x86 Assembly instruction codes mean?
You'll want to see Intel® 64 and IA-32 Architectures Software Developer Manuals.
"slash x" denotes that part of the instruction is encoded in the opcode (reg) part of the modr/m byte. See Vol 2A Chapter 2 INSTRUCTION FORMAT.
"ib" and "id" mean "immediate byte" and "immediate dword" respectively. You can see all the abbreviations in Vol 2A Appendix A.2 OPCODE MAP / KEY TO ABBREVIATIONS.
What does the operand of this mov instruction underlined in this image mean?
See above, _a$
and $T3853
are symbols defined to value 12 and 8 respectively. So
mov ecx, DWORD PTR _a$[esp-4]
is the same as
mov ecx, DWORD PTR 12[esp-4]
or
mov ecx, DWORD PTR [esp-4+12]
or
mov ecx, DWORD PTR [esp+8]
Related Topics
How to Compile SQLite with Icu
Colour Output of Program Run Under Bash
How to Know If One Shared Library Depends on Another Shared Library or Not
How to Make a Function Async-Signal-Safe
Libstdc++ Static Linking in Dynamic Library
/Usr/Lib64/Libstdc++.So.6: Version 'Glibcxx_3.4.15' Not Found
How to Get a List of Installed True Type Fonts on Linux Using C or C++
Pthread Condition Variables Not Signalling Even Though Set to Pthread_Process_Shared
Shgetknownfolderpath Equivalent API in Linux
Hello World Python Extension in C++ Using Boost
Format Number with Commas in C++
What Is the Easiest Way to Parse an Ini File in C++
How Do Exceptions Work (Behind the Scenes) in C++
A Warning - Comparison Between Signed and Unsigned Integer Expressions