Differences Between Arm "Versions" (Armv7 Only)

Differences between arm versions? (ARMv7 only)

I would assume that it's indicating packages compiled for little-endian and hard-float ABI as appropriate - i.e. it's a software thing and only tangentially related to the hardware.

In other words, you don't actually have an "armv7l" processor - you have an ARMv7 processor which may well have a hardware FPU (and can run big-endian if you really wanted to), but you happen to be running a soft-float userspace that doesn't rely on one being present - just like running an i686 distribution doesn't imply you're not on an x86_64 machine. Different Linux distributions have different names for their various ports but some trivial poking around suggests this case might be openSUSE's convention.

Differences between ARM architectures from a C programmer's perspective?

The ARM world is a bit messy.

For the C programmers, things are simple: all ARM architectures offer a regular, 32-bit with flat addressing programming model. As long as you stay with C source code, the only difference you may see is about endianness and performance. Most ARM processors (even old models) can be both big-endian and little-endian; the choice is then made by the logic board and the operating system. Good C code is endian neutral: it compiles and works correctly, regardless of the platform endianness (endian neutrality is good for reliability and maintainability, but also for performance: non-neutral code is code which accesses the same data through pointers of distinct sizes, and this wreaks havoc with the strict aliasing rules that the compiler uses to optimize code).

The situation is quite different if you consider binary compatibility (i.e. reusing code which has been compiled once):

There are several instruction sets:
1. the original ARM instruction set with a 26-bit program counter (very old, very unlikely to be encountered nowadays)
2. the ARM instruction set with a 32-bit program counter (often called "ARM code")
3. the Thumb instruction set (16-bit simplified opcodes)
4. the Thumb-2 instruction set (Thumb with extensions)

A given processor may implement several instruction sets. The newest processor which knows only ARM code is the StrongARM, an ARMv4 representative which is already quite old (15 years). The ARM7TDMI (ARMv4T architecture) knows both ARM and Thumb, as do almost all subsequent ARM systems except the Cortex-M. ARM and Thumb code can be mixed together within the same application, as long as the proper glue is inserted where conventions change; this is called thumb interworking and can be handled automatically by the C compiler.

The Cortex-M0 knows only Thumb instructions. It knows a few extensions, because in "normal" ARM processors, the operating system must use ARM code (for handling interrupts); thus, the Cortex-M0 knows a few Thumb-for-OS things. This does not matter for application code.

The other Cortex-M know only Thumb-2. Thumb-2 is mostly backward compatible with Thumb, at least at assembly level.

Some architectures add extra instructions.

Thus, if some code is compiled with a compiler switch telling that this is for an ARMv6, then the compiler may use one of the few instructions with the ARMv6 has but not the ARMv5. This is a common situation, encountered on almost all platforms: e.g., if you compile C code on a PC, with GCC, using the -march=core2 flag, then the resulting binary may fail to run on an older Pentium processor.

There are several call conventions.

The call convention is the set of rules which specify how functions exchange parameters and return values. The processor knows only of its registers, and has no notion of a stack. The call convention tells in which registers parameters go, and how they are encoded (e.g. if there is a char parameter, it goes in the low 8 bits of a register, but is the caller supposed to clear/sign-extend the upper 24 bits, or not ?). It describes the stack structure and alignment. It normalizes alignment conditions and padding for structure fields.

There are two main conventions for ARM, called ATPCS (old) and AAPCS (new). They are quite different on the subject of floating point values. For integer parameters, they are mostly identical (but AAPCS requires a stricter stack alignment). Of course, conventions vary depending on the instruction set, and the presence of Thumb interworking.

In some cases, it is possible to have some binary code which conforms to both ATPCS and AAPCS, but that is not reliable and there is no warning on mismatch. So the bottom-line is: you cannot have true binary compatibility between systems which use distinct call conventions.

There are optional coprocessors.

The ARM architecture can be extended with optional elements, which add their own instructions to the core instruction set. The FPU is such an optional coprocessor (and it is very rarely encountered in practice). Another coprocessor is NEON, a SIMD instruction set found on some of the newer ARM processors.

Code which uses a coprocessor will not run on a processor which does not feature that coprocessor, unless the operating system traps the corresponding opcodes and emulates the coprocessor in software (this is more or less what happens with floating-point arguments when using the ATPCS call convention, and it is slow).

To sum up, if you have C code, then recompile it. Do not try to reuse code compiled for another architecture or system.

Do different ARM manufacturers provide different instruction sets?

goto http://infocenter.arm.com along the left under contents look for ARM architecture. And under that Reference Manuals. used to be there was a single ARM ARM (ARM Architecture Reference Manual) but the family has grown to the point they had to break it into, well, families.

The ARM ARM's are going to show you the instruction sets. What I think they call the ARMv5 manual is the old ARM ARM. You will find the ARM instructions (32bit) and thumb instructions (16 bit). For each instruction they list what architecture supports it, so you might see an ARMv5 instruction that is not supported by the ARMv4 (ARMv4 a.k.a ARM7, like the popular ARM7TDMI core). Thumb instructions are supported by ARMv4T and newer, etc.

So there is the core 32 bit arm instruction set which you may have been used to with new instructions added from time to time and bugs/restrictions fixed (ldr r0,[r0] for example), etc.

The floating point unit has had one or two overhauls, most cores do not have a fpu and the ones that have an fpu that doesnt mean the chip vendor included it in the chip. the fpa being the older, vfp being newer and now neon stuff. If you pay attention these all fall into the generic coprocessor instructions category. But you dont have to know/use the coprocessor version they have aliases for everything.

There is/was this java/jazelle thing, same story some cores might have it as an option doesnt mean the vendor included it.

At least two sets of thumb2 extensions to the thumb instruction set. Before thumb2 extensions the thumb instructions were all 16 bit and had a one to one mapping to an ARM instruction, makes sense you only need an ARM core, the decoder translates from the smaller instruction to ARM instruction and feeds that to the core. All instructions are 16 bit except the branch, and if you look at that pattern you can quite easily decode that as two separate 16 bit instructions. So then they decide to make their microcontroller offering smaller, instead of everyone just using the ARM7TDMI and consuming the chip size and power, thumb2 capable processors are thumb only, they do not support 32 bit ARM instructions, there is no ARM core that thumb instructions are translated to, etc. new core. The ARMv6-M a.k.a Cortex-m0 and Cortex-m1 take the thumb instruction set and add a few 32 bit instructions to close the performance gap to ARM (thumb was smaller yes, but a little slower than ARM if you compiled the same code to both, it took like 10-20% more instructions from my experiments to use thumb). In theory thumb-2 (ARMv7-M) outperforms ARM when and where you can compare them. For whatever reason the Cortex-m3 came out first which is ARMv7-M and has a bunch of 32 bit thumb2 instructions added to the thumb instruction set. I recently counted and ARMv6-M added like 20, ARMv7-M has like 140-150 instructions added to the base thumb instruction set. thumb2 is basically variable word length. And again only runs on the cortex-m series. Looking at it it is almost like they re-built the ARM instruction set again under the name thumb. not completely but you get back a lot of arm like instructions, three register instead of two, being able to reach higher registers and use immediates, etc. What this caused is a desire to write asm that compiled for both ARM and thumb/thumb2. So they came up with a unified syntax. you can write an instruction like

add r0,r1

If assembling for thumb, that is the instruction, if assembling for arm they will convert it to

add r0,r0,r1

for you, instead of any syntax errors. You have to specify that you are using the unified syntax, at least with the gnu binutils assembler (gas).

An equally important set of documents is the Technical Reference Manuals, also at infocenter.arm.com. Each core has a trm, actually each rev of each core has a TRM. Also the extra cost items like L2 caches have their own TRM, for each rev. it is important to find out the core the chip vendor bought/used and if possible the revision (rev 2.0 r2p0, rev 1.0 r1p0, etc) as there are programming differences as well as errata differences between them (dont trust Linux as a reference!, it is a huge mess, every time I look yet another company has completely misunderstood and misapplied core/errata differences, it si a bit of a disaster at the moment). Sometimes the TRM includes instruction information, or paints a more clear picture on what that core supports and doesnt support. The ARM ARM's are generic they cover the whole family or a number of families of cores, where the TRM is very specific to one core. An example of confusing between the ARM ARM and the TRM is that looking at the ARM ARM you might get the impression that you can use BE-32 or BE-8 big endian modes, the reality is you have either one or the other ARMv6 and newer is BE-8, period, get used to it. ARMv5 and ARMv4 is BE-32 or before ARMv6 just called big endian. I highly recommend NOT using big endian on an arm despite what you think you might gain from it. go with the native mode and you will save yourself a ton of work and failure. I mention it from personal experience trying to figure out why the bits described in an ARM ARM just didnt work in the core I was using.

A 64 bit core is somewhere in the development phase, I wouldnt be surprised if it is done and just looking for someone to pull the trigger and use it. Actually the ARMv8 doc is available, downloading now.

Short answer infocenter.arm.com under ARM Architecture you find all the docs describing the different instruction sets as well as improvements/additions over time to those instruction sets.

What are the advantages of armv7 over armv6 when compiling iPhone apps?

One of the bigger differences is that the armv6 architecture has hardware support for double precision floating point arithmetic, while armv7 only provides legacy software support for double precision floating point arithmetic.

To compensate, the armv7 architecture has a "NEON" unit that provides blindingly fast hardware support for single precision floating point arithmetic.

This is something you'll need to take into account if you're doing anything that involves floating point arithmetic, whether you're doing it in single or double precision. If you're doing it in double precision, but don't necessarily need that amount of precision, you can probably get a significant performance boost on armv7 devices by using single precision instead.

Apple covered a lot of the differences between armv6 and armv7 and an introduction to the Accelerate framework in one of their WWDC sessions this year. The videos should still be available on iTunes ( as of July '10).

How does the ARM architecture differ from x86?

ARM is a RISC (Reduced Instruction Set Computing) architecture while x86 is a CISC (Complex Instruction Set Computing) one.

The core difference between those in this aspect is that ARM instructions operate only on registers with a few instructions for loading and storing data from/to memory while x86 can use memory or register operands with ALU instructions, sometimes getting the same work done in fewer instructions. Sometimes more because ARM has its own useful tricks like loading a pair of registers in one instruction, or using a shifted register as part of another operation. Up until ARMv8 / AArch64, ARM was a native 32 bit architecture, favoring four byte operations over others.

So ARM is a simpler architecture, leading to small silicon area and lots of power save features while x86 becomes a power beast in terms of both power consumption and production.

To answer your question "Is the x86 Architecture specially designed to work with a keyboard while ARM expects to be mobile?". x86 isn't specially designed to work with a keyboard just like ARM isn't designed specifically for mobile. However, again because of the core architectural choices, x86 also has instructions to work directly with a separate IO address space, while ARM does not. Instead, ARM uses memory-mapped IO for everything, including reading/writing PCI IO space. (Which is rarely needed with modern devices because it's slow on x86. e.g. modern USB controllers, so accessing USB-connected devices is as efficient as the USB controller makes it.)

If you need a document to quote, this is what Cortex-A Series Programmers Guide (4.0) tells about differences between RISC and CISC architectures:

An ARM processor is a Reduced Instruction Set Computer (RISC)
processor.
Complex Instruction Set Computer (CISC) processors, like
the x86, have a rich instruction set capable of doing complex things
with a single instruction. Such processors often have significant
amounts of internal logic that decode machine instructions to
sequences of internal operations (microcode).
RISC architectures, in
contrast, have a smaller number of more general purpose instructions,
that might be executed with significantly fewer transistors, making
the silicon cheaper and more power efficient. Like other RISC
architectures, ARM cores have a large number of general-purpose
registers and many instructions execute in a single cycle. It has
simple addressing modes, where all load/store addresses can be
determined from register contents and instruction fields.

ARM company also provides a paper titled Architectures, Processors, and Devices Development Article describing how those terms apply to their business.

An example comparing instruction set architecture:

For example if you would need some sort of bytewise memory comparison block in your application (generated by compiler, skipping details), this is how it might look like on x86, if optimizing for code-size over speed. (rep movsb / rep stosb are fast-ish on modern CPUs, the conditional-rep comparison instructions aren't.)

repe cmpsb         /* repeat while equal compare string bytewise */

while on ARM shortest form might look like (without error checking or optimization for comparing multiple bytes at once etc.)

top:
ldrb r2, [r0, #1]! /* load a byte from address in r0 into r2, increment r0 after */
ldrb r3, [r1, #1]! /* load a byte from address in r1 into r3, increment r1 after */
subs r2, r3, r2    /* subtract r2 from r3 and put result into r2      */
beq  top           /* branch(/jump) if result is zero                 */

which should give you a hint on how RISC and CISC instruction sets differ in complexity. Interestingly, x86 does not have write-back addressing modes (that load and increment the pointer) except via its "string" instructions like lodsd.

Differences Between Arm "Versions" (Armv7 Only)