What Is Difference Between Arm64 and Armhf

What is difference between arm64 and armhf?

armhf stands for "arm hard float", and is the name given to a debian port for arm processors (armv7+) that have hardware floating point support.

On the beaglebone black, for example:

:~$ dpkg --print-architecture
armhf

Although other commands (such as uname -a or arch) will just show armv7l

:~$ cat /proc/cpuinfo 
processor       : 0
model name      : ARMv7 Processor rev 2 (v7l)
BogoMIPS        : 995.32
Features        : half thumb fastmult vfp edsp thumbee neon vfpv3 tls
...

The vfpv3 listed under Features is what refers to the floating point support.

Incidentally, armhf, if your processor supports it, basically supersedes Raspbian, which if I understand correctly was mainly a rebuild of armhf with work arounds to deal with the lack of floating point support on the original raspberry pi's. Nowdays, of course, there's a whole ecosystem build up around Raspbian, so they're probably not going to abandon it. However, this is partly why the beaglebone runs straight debian, and that's ok even if you're used to Raspbian, unless you want some of the special included non-free software such as Mathematica.

is arm64 better at compatibility than armhf?

Yes, arm64 is a better choice for wider compatibility with available docker images. arm64 images are also able to use more than 4GB process memory.

As of 22 February 2022 on hub.docker.com:

123,643 ARM64 images
69,567 ARM images

Which Debian architecture, armel or armhf, do I use to emulate a BeagleBoneBlack?

The armel architecture supports the ARMv4 instruction set.
This architecture handles floating-point computation in a compatibility mode which it slows performance but allows compatibility with code written for processors without floating point units.
So you can use the armel architecture to build high compatible systems.

The armhf architecture supports ARMv7 platform, and more, it adds direct hardware floating-point support.
This means the armhf architecture is faster than the armel one, but it lacks the compatibility with the old architectures.

Source: http://www.xappsoftware.com/wordpress/2013/01/31/armhf-versus-armel/

Is there performance advantage to ARM64

I am not sure a general response can be given, but I can provide some examples of differences. There are of course additional differences added in version 8 of the ARM architecture, which apply regardless of target instruction set.

Performance-positive additions in AArch64

32 General-purpose registers gives compilers more wiggle room.
I/D cache synchronization mechanisms accessible from user mode (no system call needed).
Load/Store-Pair instructions makes it possible to load 128-bits of data with one instruction, and still remain RISC-like.
The removal of near-universal conditional execution makes more out-of-ordering possible.
The change in layout of NEON registers (D0 is still lower half of Q0, but D1 is now lower half of Q1 rather than upper half of Q0) makes more out-of-ordering possible.
64-bit pointers make pointer tagging possible.
CSEL enables all kind of crazy optimizations.

Performance-negative changes in AArch64

More registers may also mean higher pressure on the stack.
Larger pointers mean larger memory footprint.
Removal of near-universal conditional execution may cause higher pressure on branch predictor.
Removal of load/store-multiple means more instructions needed for function entry/exit.

Performance-relevant changes in ARMv8-A

Load-Aquire/Store-Release semantics remove need for explicit memory barriers for basic synchronization operations.

I probably forgot lots of things, but those are some of the more obvious changes.

How does the ARM architecture differ from x86?

ARM is a RISC (Reduced Instruction Set Computing) architecture while x86 is a CISC (Complex Instruction Set Computing) one.

The core difference between those in this aspect is that ARM instructions operate only on registers with a few instructions for loading and storing data from/to memory while x86 can use memory or register operands with ALU instructions, sometimes getting the same work done in fewer instructions. Sometimes more because ARM has its own useful tricks like loading a pair of registers in one instruction, or using a shifted register as part of another operation. Up until ARMv8 / AArch64, ARM was a native 32 bit architecture, favoring four byte operations over others.

So ARM is a simpler architecture, leading to small silicon area and lots of power save features while x86 becomes a power beast in terms of both power consumption and production.

To answer your question "Is the x86 Architecture specially designed to work with a keyboard while ARM expects to be mobile?". x86 isn't specially designed to work with a keyboard just like ARM isn't designed specifically for mobile. However, again because of the core architectural choices, x86 also has instructions to work directly with a separate IO address space, while ARM does not. Instead, ARM uses memory-mapped IO for everything, including reading/writing PCI IO space. (Which is rarely needed with modern devices because it's slow on x86. e.g. modern USB controllers, so accessing USB-connected devices is as efficient as the USB controller makes it.)

If you need a document to quote, this is what Cortex-A Series Programmers Guide (4.0) tells about differences between RISC and CISC architectures:

An ARM processor is a Reduced Instruction Set Computer (RISC)
processor.
Complex Instruction Set Computer (CISC) processors, like
the x86, have a rich instruction set capable of doing complex things
with a single instruction. Such processors often have significant
amounts of internal logic that decode machine instructions to
sequences of internal operations (microcode).
RISC architectures, in
contrast, have a smaller number of more general purpose instructions,
that might be executed with significantly fewer transistors, making
the silicon cheaper and more power efficient. Like other RISC
architectures, ARM cores have a large number of general-purpose
registers and many instructions execute in a single cycle. It has
simple addressing modes, where all load/store addresses can be
determined from register contents and instruction fields.

ARM company also provides a paper titled Architectures, Processors, and Devices Development Article describing how those terms apply to their business.

An example comparing instruction set architecture:

For example if you would need some sort of bytewise memory comparison block in your application (generated by compiler, skipping details), this is how it might look like on x86, if optimizing for code-size over speed. (rep movsb / rep stosb are fast-ish on modern CPUs, the conditional-rep comparison instructions aren't.)

repe cmpsb         /* repeat while equal compare string bytewise */

while on ARM shortest form might look like (without error checking or optimization for comparing multiple bytes at once etc.)

top:
ldrb r2, [r0, #1]! /* load a byte from address in r0 into r2, increment r0 after */
ldrb r3, [r1, #1]! /* load a byte from address in r1 into r3, increment r1 after */
subs r2, r3, r2    /* subtract r2 from r3 and put result into r2      */
beq  top           /* branch(/jump) if result is zero                 */

which should give you a hint on how RISC and CISC instruction sets differ in complexity. Interestingly, x86 does not have write-back addressing modes (that load and increment the pointer) except via its "string" instructions like lodsd.

What Is Difference Between Arm64 and Armhf