How Large Is a Dword with 32- and 64-Bit Code

How large is a DWORD with 32- and 64-bit code?

Actually, on 32-bit computers a word is 32-bit, but the DWORD type is a leftover from the good old days of 16-bit.

In order to make it easier to port programs to the newer system, Microsoft has decided all the old types will not change size.

You can find the official list here:
http://msdn.microsoft.com/en-us/library/aa383751(VS.85).aspx

All the platform-dependent types that changed with the transition from 32-bit to 64-bit end with _PTR (DWORD_PTR will be 32-bit on 32-bit Windows and 64-bit on 64-bit Windows).

64-bit Windows API: what is the size of a C/C++ DWORD?

The only thing that changes size between 32 and 64 are pointers. So DWORD stays 32 bits wide.

Some things are not immediately obviously pointers, e.g. HANDLE, LPARAM, WPARAM. But these three change width as they actually hold pointers.

What's the size of a QWORD on a 64-bit machine?

In x86 terminology/documentation, a "word" is 16 bits because x86 evolved out of 16-bit 8086. Changing the meaning of the term as extensions were added would have just been confusing, because Intel still had to document 16-bit mode and everything, and instruction mnemonics like cwd (sign-extend word to dword) bake the terminology into the ISA.

x86 word = 2 bytes
x86 dword = 4 bytes (double word)
x86 qword = 8 bytes (quad word)
x86 double-quad or xmmword = 16 bytes, e.g. movdqa xmm0, [rdi].

Also in the cqo mnemonic, oct-word. (Sign-extend RAX into RDX:RAX, e.g. before idiv)

And then we have fun instruction like punpcklqdq: shuffle together two qwords into a dqword, or pclmulqdq for carry-less multiplication of qwords, producing a dq full result. But beyond that, SIMD mnemonics tend to be AVX vextracti128 or AVX512 (with optional per-element masking) vextractf64x4 to extract the high 256 bits of a ZMM register.

Not to mention stuff like "tbyte" = 10 byte x87 extended-precision float; x86 is weird and not everything is a power of 2. Also 48-bit seg:off 16:32 far pointers in Protected mode. (Basically never used, just the 32-bit offset part.)

Most other 64-bit ISAs evolved out of 32-bit ISAs (AArch64, MIPS64, PowerPC64, etc.), or were 64-bit from the start (Alpha), so "word" means 32 bits in that context.

32-bit word = 4 bytes
dword = 8 bytes (double word), e.g. MIPS daddu is 64-bit integer add
qword = 16 bytes (quad word), if supported at all.

"Machine word" and putting labels on architectures.

The whole concept of "machine word" doesn't really apply to x86, with its machine-code format being a byte stream, and equal support for multiple operand-sizes, and unaligned loads/stores that mostly don't care about naturally aligned stuff, only cache line boundaries for normal cacheable memory.

Even "word oriented" RISCs can have a different natural size for registers and cache accesses than their instruction width, or what their documentation uses as a "word".

The whole concept of "word size" is over-rated in general, not just on x86. Even 64-bit RISC ISAs can load/store aligned 32-bit or 64-bit memory with equal efficiency, so pick whichever is most useful for what you're doing. Don't base your choice on figuring out which one is the machine's "word size", unless there's only one maximally efficient size (e.g. 32-bit on some 32-bit RISCs), then you can usefully call that the word size.

A "word" doesn't mean 64 bits on any 64-bit machine I've heard of. Even DEC Alpha AXP, which was designed from the ground up to be aggressively 64-bit, uses 32-bit instruction words. IIRC, the manual calls a word 32 bits.

Being able to load 64-bits into an integer register with a single instruction does not make that the "word size". Bitness and word size don't have hard specific technical meanings; most CPUs have multiple different sizes internally. (e.g. 64 byte buses between L2 and L1d cache on Intel since Haswell, along with 32-byte SIMD load/store.)

So it's basically up to the CPU vendor's documentation authors to choose what "word" (and thus dword / qword) mean for their ISA.

Fun fact: SPARC64 talks about "short word" (32 bits) vs. "long word" (64 bits), rather than word / double-word. I don't know if just "word" without any qualifier has any meaning in 64-bit SPARC documentation.

How many bits does a WORD contain in 32/64 bit OS respectively?

The concept of a "word" has several meanings. There's 3 meanings embedded in the question.

The generic term "processor word", in context of CPU architectures
The "bit size" of software/OS, vs the "bit size" of hardware
The all-caps term WORD, meaning a 16 bit value - This is a part of the Windows "Win32" C language API

When describing the Win32 WORD type definition, this also comes up:

The Intel/AMD instruction set concept of a "Word", "Doubleword", and "Quadword"

The generic term "processor word", in context of CPU architectures

In common/generic usage, a "processor word" refers to the size of a processor register. It can also refer to the size of CPU instruction, or the size of a pointer (depending on which exact CPU architecture). In simple cases, a 32 bit processor will have a 32 bit "word" size (and pointer size). A 64 bit processor will have a 64 bit "word" size (and pointer size).

There is a wikipedia article on this "processor word" concept, which details all the generic uses of the term, and the sizes for several current and historical CPU architectures.

"Bit size" of software/OS vs the "bit size" of hardware

A "64 bit" CPU and a "64 bit" OS are necessary in order to run "64 bit" software. This much is probably obvious.

"64 bit software" uses 64 bit instructions (e.g. adding 64 bit numbers together, or copying 64 bits of data from a processor register to RAM at the same time). It also can use a 64 bit pointer size. This means that instead of only being able to use a maximum of 4 Gigabytes of RAM (like "32 bit software"), it can theoretically use about 17 Billion Gigabytes of RAM (16 Exabytes).

A "64 bit" x64/x86 CPU can also run "32 bit" (or even "16 bit") software. It can do this without any changes to the code, and without having to rebuild the software. This is because all the old CPU instructions still exist on new CPUs, and they are backwards compatible.

These concepts aren't strictly the same as the generic concept of a "processor word", but are closely related.

Note: This concept starts getting slightly more complicated when you talk about older and more specialized processors (especially older video game systems), but the question wasn't really about those so I won't go into detail. Those tend to be talked about as "64 bit" or "8 bit" systems, but the truth is a bit more complicated than that. See the "processor word" wiki article I linked above, or an article about the specific system in question.

The question's specific context - `WORD`, in all-caps

The capitalization and the specific sizes in the question (16 bit for WORD, on a 32 bit OS) imply something different than the generic term "processor word".

In legacy Windows programming (the Win32 API), there is a macro defined called WORD, the size of which is 16 bits. This made sense when processors were 16 bit. However, even when you compile code that contains this macro for a 32 bit or 64 bit target, it will still be 16 bits. A DWORD in the Win32 API is 32 bits, and a QWORD is 64 bits.

This is because Microsoft really tries very hard in their Win32 API to support backwards compatibility without having to do any changes to code. For the most part you can compile the Win32 samples from the Windows 95 era without changes, and they'll still work exactly the same way today.

Microsoft very likely inherited this naming scheme from Intel (and possibly AMD) documentation.

The Intel/AMD instruction set concept of a "Word", "Doubleword", etc

In Intel docs, a "Word" (Win32 WORD) is 16 bits. A "Doubleword" (Win32 DWORD) is 32 bits. A "Quadword" (Win32 QWORD) is 64 bits. The related assembly instruction names also reflect this naming scheme (e.g. MMX Add Packed Integers PADD instructions: PADDW, PADDD, PADDQ).

For some examples, you can check this wikipedia article on the x86 instruction set, or the Intel software development manuals.

This naming scheme doesn't necessarily make sense in terms of the general concept of a "processor word", since these concepts only address a part of a register. However they do make sense in terms of creating a stable programming interface for x86 programs. This is a big part of why you can use "32 bit" (and 16 bit) programs on top of a "64 bit" OS.

POINTER bit size

The size of a pointer is 4 Bytes on a 32bits and 8 Bytes on a 64bits runtime.

The sentence you found in the documentation just says that the compiler expects a DWORD when you do the difference of 2 pointers.
Meaning, you will get that warning when you try to do something like this:

diTest := pTest - pTest2;

diTest beeing a DINT and pTest and pTest2 beeing two pointers.

Also meaning you may lose some information if you use a DWORD as a result assignment of the difference of 2 pointers on 64bit systems.
In fact you will lose 4 bytes.

DWORD are 4 bytes long and pointers on 64 bit systems are 8 bytes long.

In order to store the addresses of your pointers in a way that is cross platform use the PVOID type, which is 4 bytes on 32 bit and 8 bytes on 64 bit systems. PVOID is available in the CAA Types library.

Alternatively, you can use __XWORD, as PVOID is an alias of __XWORD, which is converted into LWORD on 64-bit platforms and DWORD on 32-bit platforms.platforms.

Assembly 32-bit addressing size instead of 64-bit in 64-bit Mode

This would be unsafe on x86-64 MacOS for example, or in a Linux PIE executable. Program size isn't the only factor because it's not loaded starting at virtual address 0. The first byte of your program may be at something like 0x555555555000, so truncating an address to 32 bit would break you code no matter how small your program is.

(You'd get an invalid relocation linker error from using [.data + rax*4] in that case, though, just from using .data as an absolute disp32. 32-bit absolute addresses no longer allowed in x86-64 Linux?). But if you'd used [edi + eax*4] with a valid pointer in RDI, you could write code that would assemble but crash in a PIE executable or a MacOS executable.)

But yes, the default non-PIE Linux code model places all code and static data in the low 2GiB of virtual address space so 32-bit absolute sign- or zero-extended numbers can represent addresses.

Your data in memory is the same size regardless of how you address it, so your alternatives are

 movzx    eax, al
 mov      eax, DWORD [4 * eax + table_of_32bit_pointers]  ; pointless
 mov      eax, DWORD [4 * rax + table_of_32bit_pointers]  ; good

 ; RAX holds a zero-extended pointer.

mov rax, QWORD [8 * rax + .data] would load 8 bytes from a different location. You're still mixing up address size and operand-size.

Using compact 32-bit pointers in memory doesn't mean you have to use 32-bit address size when you load them.

Like I explained in your previous question there's no reason to use 32-bit address-size after zero-extending an index to 64-bit with movzx eax, al. (BTW, prefer movzx ecx, al; mov-elimination only works between different registers.)

BTW, if your strings are all the same length, or you can pad them to fixed length cheaply, you don't need a table of pointers. You can instead just compute the address from the start of the first string + scaled index. e.g. p = .DATA1 + idx*5 in this case, where your strings are 5 bytes long each.

lea  eax, [.DATA1 + RAX + RAX*4]    ; 4+1 = 5
; eax points at the selected 5-byte string buffer

Also, don't use .data as a symbol name. It's the name of a section so that's going to get confusing.

is Dword still a word or is it two word

The size of a word is architecture specific. They usually refer to a unit that the ISA handles natively. In case of a Doubleword or DWORD, it's merely a unit which its size is twice the size of a word.

So if you are talking about an architecture where the size of a word is 16bit (e.g. Intel 8086), then DWORDs can hold 32bits of information. Since -123456 is FFFE1DC0 (w/ sign extension to 32bit), it can indeed be stored in one DWORD.

How Large Is a Dword with 32- and 64-Bit Code