How large is a DWORD with 32- and 64-bit code?
Actually, on 32-bit computers a word is 32-bit, but the DWORD type is a leftover from the good old days of 16-bit.
In order to make it easier to port programs to the newer system, Microsoft has decided all the old types will not change size.
You can find the official list here:
http://msdn.microsoft.com/en-us/library/aa383751(VS.85).aspx
All the platform-dependent types that changed with the transition from 32-bit to 64-bit end with _PTR (DWORD_PTR will be 32-bit on 32-bit Windows and 64-bit on 64-bit Windows).
64-bit Windows API: what is the size of a C/C++ DWORD?
The only thing that changes size between 32 and 64 are pointers. So DWORD stays 32 bits wide.
Some things are not immediately obviously pointers, e.g. HANDLE, LPARAM, WPARAM. But these three change width as they actually hold pointers.
What's the size of a QWORD on a 64-bit machine?
In x86 terminology/documentation, a "word" is 16 bits because x86 evolved out of 16-bit 8086. Changing the meaning of the term as extensions were added would have just been confusing, because Intel still had to document 16-bit mode and everything, and instruction mnemonics like cwd
(sign-extend word to dword) bake the terminology into the ISA.
- x86 word = 2 bytes
- x86 dword = 4 bytes (double word)
- x86 qword = 8 bytes (quad word)
- x86 double-quad or xmmword = 16 bytes, e.g.
movdqa xmm0, [rdi]
.
Also in thecqo
mnemonic, oct-word. (Sign-extend RAX into RDX:RAX, e.g. beforeidiv
)
And then we have fun instruction like punpcklqdq
: shuffle together two qwords into a dqword, or pclmulqdq
for carry-less multiplication of qwords, producing a dq full result. But beyond that, SIMD mnemonics tend to be AVX vextracti128
or AVX512 (with optional per-element masking) vextractf64x4
to extract the high 256 bits of a ZMM register.
Not to mention stuff like "tbyte" = 10 byte x87 extended-precision float; x86 is weird and not everything is a power of 2. Also 48-bit seg:off 16:32 far pointers in Protected mode. (Basically never used, just the 32-bit offset part.)
Most other 64-bit ISAs evolved out of 32-bit ISAs (AArch64, MIPS64, PowerPC64, etc.), or were 64-bit from the start (Alpha), so "word" means 32 bits in that context.
- 32-bit word = 4 bytes
- dword = 8 bytes (double word), e.g. MIPS
daddu
is 64-bit integer add - qword = 16 bytes (quad word), if supported at all.
"Machine word" and putting labels on architectures.
The whole concept of "machine word" doesn't really apply to x86, with its machine-code format being a byte stream, and equal support for multiple operand-sizes, and unaligned loads/stores that mostly don't care about naturally aligned stuff, only cache line boundaries for normal cacheable memory.
Even "word oriented" RISCs can have a different natural size for registers and cache accesses than their instruction width, or what their documentation uses as a "word".
The whole concept of "word size" is over-rated in general, not just on x86. Even 64-bit RISC ISAs can load/store aligned 32-bit or 64-bit memory with equal efficiency, so pick whichever is most useful for what you're doing. Don't base your choice on figuring out which one is the machine's "word size", unless there's only one maximally efficient size (e.g. 32-bit on some 32-bit RISCs), then you can usefully call that the word size.
A "word" doesn't mean 64 bits on any 64-bit machine I've heard of. Even DEC Alpha AXP, which was designed from the ground up to be aggressively 64-bit, uses 32-bit instruction words. IIRC, the manual calls a word 32 bits.
Being able to load 64-bits into an integer register with a single instruction does not make that the "word size". Bitness and word size don't have hard specific technical meanings; most CPUs have multiple different sizes internally. (e.g. 64 byte buses between L2 and L1d cache on Intel since Haswell, along with 32-byte SIMD load/store.)
So it's basically up to the CPU vendor's documentation authors to choose what "word" (and thus dword / qword) mean for their ISA.
Fun fact: SPARC64 talks about "short word" (32 bits) vs. "long word" (64 bits), rather than word / double-word. I don't know if just "word" without any qualifier has any meaning in 64-bit SPARC documentation.
How many bits does a WORD contain in 32/64 bit OS respectively?
The concept of a "word" has several meanings. There's 3 meanings embedded in the question.
- The generic term "processor word", in context of CPU architectures
- The "bit size" of software/OS, vs the "bit size" of hardware
- The all-caps term
WORD
, meaning a 16 bit value - This is a part of the Windows "Win32" C language API
When describing the Win32 WORD
type definition, this also comes up:
- The Intel/AMD instruction set concept of a "Word", "Doubleword", and "Quadword"
The generic term "processor word", in context of CPU architectures
In common/generic usage, a "processor word" refers to the size of a processor register. It can also refer to the size of CPU instruction, or the size of a pointer (depending on which exact CPU architecture). In simple cases, a 32 bit processor will have a 32 bit "word" size (and pointer size). A 64 bit processor will have a 64 bit "word" size (and pointer size).
There is a wikipedia article on this "processor word" concept, which details all the generic uses of the term, and the sizes for several current and historical CPU architectures.
"Bit size" of software/OS vs the "bit size" of hardware
A "64 bit" CPU and a "64 bit" OS are necessary in order to run "64 bit" software. This much is probably obvious.
"64 bit software" uses 64 bit instructions (e.g. adding 64 bit numbers together, or copying 64 bits of data from a processor register to RAM at the same time). It also can use a 64 bit pointer size. This means that instead of only being able to use a maximum of 4 Gigabytes of RAM (like "32 bit software"), it can theoretically use about 17 Billion Gigabytes of RAM (16 Exabytes).
A "64 bit" x64/x86 CPU can also run "32 bit" (or even "16 bit") software. It can do this without any changes to the code, and without having to rebuild the software. This is because all the old CPU instructions still exist on new CPUs, and they are backwards compatible.
These concepts aren't strictly the same as the generic concept of a "processor word", but are closely related.
Note: This concept starts getting slightly more complicated when you talk about older and more specialized processors (especially older video game systems), but the question wasn't really about those so I won't go into detail. Those tend to be talked about as "64 bit" or "8 bit" systems, but the truth is a bit more complicated than that. See the "processor word" wiki article I linked above, or an article about the specific system in question.
The question's specific context - WORD
, in all-caps
The capitalization and the specific sizes in the question (16 bit for WORD
, on a 32 bit OS) imply something different than the generic term "processor word".
In legacy Windows programming (the Win32 API), there is a macro defined called WORD
, the size of which is 16 bits. This made sense when processors were 16 bit. However, even when you compile code that contains this macro for a 32 bit or 64 bit target, it will still be 16 bits. A DWORD
in the Win32 API is 32 bits, and a QWORD
is 64 bits.
This is because Microsoft really tries very hard in their Win32 API to support backwards compatibility without having to do any changes to code. For the most part you can compile the Win32 samples from the Windows 95 era without changes, and they'll still work exactly the same way today.
Microsoft very likely inherited this naming scheme from Intel (and possibly AMD) documentation.
The Intel/AMD instruction set concept of a "Word", "Doubleword", etc
In Intel docs, a "Word" (Win32 WORD
) is 16 bits. A "Doubleword" (Win32 DWORD
) is 32 bits. A "Quadword" (Win32 QWORD
) is 64 bits. The related assembly instruction names also reflect this naming scheme (e.g. MMX Add Packed Integers PADD
instructions: PADDW
, PADDD
, PADDQ
).
For some examples, you can check this wikipedia article on the x86 instruction set, or the Intel software development manuals.
This naming scheme doesn't necessarily make sense in terms of the general concept of a "processor word", since these concepts only address a part of a register. However they do make sense in terms of creating a stable programming interface for x86 programs. This is a big part of why you can use "32 bit" (and 16 bit) programs on top of a "64 bit" OS.
POINTER bit size
The size of a pointer is 4 Bytes on a 32bits and 8 Bytes on a 64bits runtime.
The sentence you found in the documentation just says that the compiler expects a DWORD when you do the difference of 2 pointers.
Meaning, you will get that warning when you try to do something like this:
diTest := pTest - pTest2;
diTest beeing a DINT and pTest and pTest2 beeing two pointers.
Also meaning you may lose some information if you use a DWORD as a result assignment of the difference of 2 pointers on 64bit systems.
In fact you will lose 4 bytes.
DWORD are 4 bytes long and pointers on 64 bit systems are 8 bytes long.
In order to store the addresses of your pointers in a way that is cross platform use the PVOID type, which is 4 bytes on 32 bit and 8 bytes on 64 bit systems. PVOID is available in the CAA Types library.
Alternatively, you can use __XWORD, as PVOID is an alias of __XWORD, which is converted into LWORD on 64-bit platforms and DWORD on 32-bit platforms.platforms.
Assembly 32-bit addressing size instead of 64-bit in 64-bit Mode
This would be unsafe on x86-64 MacOS for example, or in a Linux PIE executable. Program size isn't the only factor because it's not loaded starting at virtual address 0
. The first byte of your program may be at something like 0x555555555000
, so truncating an address to 32 bit would break you code no matter how small your program is.
(You'd get an invalid relocation linker error from using [.data + rax*4]
in that case, though, just from using .data
as an absolute disp32
. 32-bit absolute addresses no longer allowed in x86-64 Linux?). But if you'd used [edi + eax*4]
with a valid pointer in RDI, you could write code that would assemble but crash in a PIE executable or a MacOS executable.)
But yes, the default non-PIE Linux code model places all code and static data in the low 2GiB of virtual address space so 32-bit absolute sign- or zero-extended numbers can represent addresses.
Your data in memory is the same size regardless of how you address it, so your alternatives are
movzx eax, al
mov eax, DWORD [4 * eax + table_of_32bit_pointers] ; pointless
mov eax, DWORD [4 * rax + table_of_32bit_pointers] ; good
; RAX holds a zero-extended pointer.
mov rax, QWORD [8 * rax + .data]
would load 8 bytes from a different location. You're still mixing up address size and operand-size.
Using compact 32-bit pointers in memory doesn't mean you have to use 32-bit address size when you load them.
Like I explained in your previous question there's no reason to use 32-bit address-size after zero-extending an index to 64-bit with movzx eax, al
. (BTW, prefer movzx ecx, al
; mov-elimination only works between different registers.)
BTW, if your strings are all the same length, or you can pad them to fixed length cheaply, you don't need a table of pointers. You can instead just compute the address from the start of the first string + scaled index. e.g. p = .DATA1 + idx*5
in this case, where your strings are 5 bytes long each.
lea eax, [.DATA1 + RAX + RAX*4] ; 4+1 = 5
; eax points at the selected 5-byte string buffer
Also, don't use .data
as a symbol name. It's the name of a section so that's going to get confusing.
is Dword still a word or is it two word
The size of a word is architecture specific. They usually refer to a unit that the ISA handles natively. In case of a Doubleword or DWORD, it's merely a unit which its size is twice the size of a word.
So if you are talking about an architecture where the size of a word is 16bit (e.g. Intel 8086), then DWORDs can hold 32bits of information. Since -123456 is FFFE1DC0 (w/ sign extension to 32bit), it can indeed be stored in one DWORD.
Related Topics
How to Print to Console When Using Qt
Compare Double to Zero Using Epsilon
Visual C++ 2008 Express Download Link Dead
The New Keyword "Auto"; When Should It Be Used to Declare a Variable Type
What Exactly Is Streambuf? How to Use It
App Does Not Run with VS 2008 Sp1 Dlls, Previous Version Works with Rtm Versions
In C++, What Does & Mean After a Function's Return Type
Gnu C++ How to Check When -Std=C++0X Is in Effect
Why am I Getting String Does Not Name a Type Error
Why Would I Prefer Using Vector to Deque
How to Implement a BéZier Curve in C++
How to Enable Core Dump in My Linux C++ Program
Why Should I Ever Use Inline Code
How to See the Template Instantiated Code by C++ Compiler
Why Does a C/C++ Compiler Need Know the Size of an Array at Compile Time
Subtle C++ Inheritance Error with Protected Fields
"Launch Failed. Binary Not Found." Snow Leopard and Eclipse C/C++ Ide Issue