Can You Enter X64 32-Bit "Long Compatibility Sub-Mode" Outside of Kernel Mode

Am I guaranteed to not encounter non-64-bit instructions if there are no compatibility mode switches in x86-64?

Every sequence of bytes of machine code either decodes as instructions or raises a #UD illegal-instruction exception. With the CPU in 64-bit mode, that means they're decoded as 64-bit mode instructions if they don't fault. See also Is x86 32-bit assembly code valid x86 64-bit assembly code? (no, not in general).

If it's a normal program emitted by a compiler, it's unlikely there are any illegal instructions in its machine code, unless someone used inline asm, or used your program to disassemble a non-code section. Or an obfuscated program that puts partial instructions ahead of actual jump target, so simple disassemblers get confused and decode with instruction boundaries different from how it will actually run. x86 machine code is a byte stream that is not self-synchronizing.

TL:DR: in a normal program, yes, every sequence of bytes you encounter when disassembling is valid 64-bit-mode instructions.

66 and 67 do not switch modes, they merely switch the operand size for that one instruction. e.g. 66 40 90 is still a REX prefix in 64-bit mode (for the NOP instruction that follows). So it's just a nop (xchg ax,ax), not overriding it to decode as it would in 32-bit mode as inc ax / xchg eax,eax.

Try assembling and then disassembling db 0x66, 0x40, 0x90 with nasm -felf32 then with nasm -felf64 to see how that same sequence decodes in 64-bit mode, not like it would in 32-bit mode.

Many instruction encodings are the same in both 32 and 64-bit mode, since they share the same default operand-size (for non-stack instructions). e.g. b8 39 30 00 00 mov eax,0x3039 is the code for mov eax, 12345 in either 32 or 64-bit mode.

(When you say "64-bit instruction", I hope you don't mean 64-bit operand-size, because that's not the case. All operand-sizes from 8 to 64-bit are encodeable in 64-bit mode for most instructions.)

And yes, it's safe to assume that user-space programs don't switch modes by doing a far jmp. Unless you're on Windows, then the WOW64 DLLs do that for some reason instead of directly calling into the kernel. (Linux has 32-bit user-space use sysenter or other direct system call).

Is it possible to use both 64 bit and 32 bit instructions in the same executable in 64 bit Linux?

Switching between long mode and compatibility mode is done by changing CS. User mode code cannot modify the descriptor table, but it can perform a far jump or far call to a code segment that is already present in the descriptor table. I think that in Linux (for example) the required compatibility mode descriptor is present.

Here is sample code for Linux (Ubuntu). Build with

$ gcc -no-pie switch_mode.c switch_cs.s

switch_mode.c:

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>

extern bool switch_cs(int cs, bool (*f)());
extern bool check_mode();

int main(int argc, char **argv)
{
    int cs = 0x23;
    if (argc > 1)
        cs = strtoull(argv[1], 0, 16);
    printf("switch to CS=%02x\n", cs);

    bool r = switch_cs(cs, check_mode);

    if (r)
        printf("cs=%02x: 64-bit mode\n", cs);
    else
        printf("cs=%02x: 32-bit mode\n", cs);

    return 0;
}

switch_cs.s:

        .intel_syntax noprefix
        .code64
        .text
        .globl switch_cs
switch_cs:
        push    rbx
        push    rbp
        mov     rbp, rsp
        sub     rsp, 0x18

        mov     rbx, rsp
        movq    [rbx], offset .L1
        mov     [rbx+4], edi

        // Before the lcall, switch to a stack below 4GB.
        // This assumes that the data segment is below 4GB.
        mov     rsp, offset stack+0xf0
        lcall   [rbx]

        // restore rsp to the original stack
        leave
        pop     rbx
        ret

        .code32
.L1:
        call    esi
        lret

        .code64
        .globl check_mode
// returns false for 32-bit mode; true for 64-bit mode
check_mode:
        xor     eax, eax
        // In 32-bit mode, this instruction is executed as
        // inc eax; test eax, eax
        test    rax, rax
        setz    al
        ret

        .data
        .align  16
stack:  .space 0x100

x86 32 bit opcodes that differ in x86-x64 or entirely removed

Almost all instructions that are available in both modes have the same opcodes in both modes.

Removed instructions:

Binary-coded-decimal stuff like AAM (ASCII-adjust after multiplication) for fixing up binary-coded-decimal after doing normal binary add/sub/mul/div on a register holding two base-10 digits in each 4-bit half. They ran slowly anyway, and weren't used. Storing numbers as binary integers instead of BCD is widespread.
push / pop of CS/DS/ES/SS were removed. push/pop FS and GS are still valid (those two segments can still have a non-zero base in long mode). mov Sreg, r32 and mov r32, Sreg are still available for the "neutered" segment registers, so you can emulate push / pop using a scratch integer reg. CS still matters; a far jump to another code segment can switch to 32-bit mode, and the others still need valid segment descriptors.
Other obscure segment stuff like ARPL: Adjust RPL Field of Segment Selector. It's really just a bit-field clamp and set flags instructions for integer registers, so could be emulated by a few other instructions in the rare places where a kernel might want it.
Maybe some other obscure or privileged instructions that compilers never used in 32-bit code. (Not that compilers ever emitted any of the above either, without intrinsics or inline asm.)

Removed (repurposed) encodings of some still-available instructions: In your case, 32-bit can use the inc r32 single-byte opcodes (0x40 + register-number). 64-bit mode only has the inc r/m32 encoding, where the register to be incremented is specified with a 2nd byte. (In this case, the 0x4x bytes were repurposed as the REX prefix byte.)

Intel's insn reference (follow the link in the x86 tag wiki) shows the following for inc:

Opcode   Instruction Op/   64-Bit   Compat/
                     En     Mode    Leg mode

FF /0   INC r/m32     M     Valid     Valid     Increment r/m doubleword by 1.
40+ rd  INC r32       O      N.E.     Valid     Increment doubleword register by 1.

N.E. means not encodable. The Op/En column describes how operands are encoded.

Jan Hubicka's AMD64 ISA overview briefly describes the repurposing of single-byte inc/dec opcodes for REX prefixes, and the default operand sizes and how immediate data is still 32-bit. movabs is available for loading 64-bit immediate constants, or load/store from/to a 64-bit absolute address.

AMD's AMD64 manual, Section 2.5.11 Reassigned Opcodes has a table which is quite short. It only lists:

4x inc/dec r32 that turned into REX prefixes
63 ARPL that became MOVSXD (sign-extend dword to qword, when used with REX.W=1 (which means the W bit in the REX prefix = 1)).

Early AMD64 and Intel EM64T CPUs left out SAHF/LAHF in long mode, but later re-added that instruction with the same opcode as in 32-bit. That table also doesn't list instructions that were removed entirely (the BCD instructions and maybe others) to make room for possible future extensions.

They could have simplified things a lot, and made x86-64 a much better cleaner instruction set with more room for future extensions, but every difference from 32-bit means more decoder transistors. There are no machine instructions that moved to a different opcode in 64-bit.

Multiple machine instructions often share the same asm mnemonic, mov being the most overloaded one. There are loads, stores, mov with immediate-constants, move to/from segment registers, all in 8-bit and 32-bit. (16-bit is the 32-bit with an operand-size prefix, same for 64-bit with a REX prefix.) There's a special opcode for loading RAX from a 64-bit absolute address. There's also a special opcode for loading a 64-bit immediate-constant into a register. (AT&T syntax calls this movabs, but it's still just mov in Intel/NASM)

Is it possible to run 16 bit code in an operating system that supports Intel IA-32e mode?

16-bit DOS apps can't run under 64-bit Windows, because virtual-8086 mode isn't available in long mode

However 16-bit protected mode is still available, so technically it's possible to run 16-bit Windows 3.x apps. That's how Wine runs 16-bit Windows apps in 64-bit Linux. Unfortunately 64-bit Windows doesn't have the same capability, although the reason is not because 64-bit mode cannot run 16-bit instructions but because the significant part has been increased.

The primary reason is that handles have 32 significant bits on 64-bit Windows. Therefore, handles cannot be truncated and passed to 16-bit applications without loss of data.
https://learn.microsoft.com/en-us/windows/win32/winprog64/running-32-bit-applications

So if you want to run 16-bit apps on 64-bit Windows you have to use a virtual machine

For more detailed information please read Peter Cordes' answer

See also Can a 64-bit computer (x86) run a 16-bit OS natively, without emulation?

Difference between far JMP and far CALL in a long 64-bit mode

An inter-privilege-level far call is possible in 64-bit through a call gate. The code segment descriptor specified by the target call gate must be non-conforming and its DPL must be smaller than the CPL. Then the new CPL is set to the DPL. On the other hand, inter-privilege-level control transfer is not possible with a far jump. That is, if the code segment descriptor specified by the call gate is non-conforming and DPL < CPL, then a general protection (GP) exception occurs.

You can't JMP to a non-64-bit segment from a 64-bit segment. Otherwise, a GP occurs.