Alignment VS Stride in Swift

alignment vs stride in Swift

Here is a simple example:

struct Foo {
let a: Int16
let b: Int8
}

print(MemoryLayout<Foo>.size) // 3
print(MemoryLayout<Foo>.alignment) // 2
print(MemoryLayout<Foo>.stride) // 4
  • The alignment of the struct is the maximal alignment of all its
    fields, in this case the maximum of 2 and 1.
  • The stride of the struct is the size rounded up to alignment,
    here 3 rounded up to a multiple of 4.

The stride is the distance between (the start of) contiguous instance of the same type in memory:

let array = [Foo(a: 1, b:2), Foo(a: 3, b: 4), Foo(a: 5, b: 6)]
array.withUnsafeBytes {
print(Data($0) as NSData) // <01000234 03000474 0500066f>
print($0.count) // 12
}

The struct stride is a multiple of the struct alignment, so that
all instances (and therefore all instance fields) are properly aligned.

The details can be found in
Type Layout:

Fragile Struct and Tuple Layout

Structs and tuples currently share the same layout algorithm, noted as the "Universal" layout algorithm in the compiler implementation. The algorithm is as follows:

  • Start with a size of 0 and an alignment of 1.
  • Iterate through the
    fields, in element order for tuples, or in var declaration order for
    structs. For each field:

    • Update size by rounding up to the alignment
      of the field, that is, increasing it to the least value greater or
      equal to size and evenly divisible by the alignment of the field.
    • Assign the offset of the field to the current value of size.
    • Update
      size by adding the size of the field.
    • Update alignment to the max of
      alignment and the alignment of the field.
  • The final size and alignment
    are the size and alignment of the aggregate. The stride of the type is
    the final size rounded up to alignment.

Is MemoryLayout T .size/stride/alignment compile time?

This code:

func getSizeOfInt64() -> Int {
return MemoryLayout<Int64>.size
}

generates this assembly:

MyApp`getSizeOfInt64():
0x1000015a0 <+0>: pushq %rbp
0x1000015a1 <+1>: movq %rsp, %rbp
0x1000015a4 <+4>: movl $0x8, %eax
0x1000015a9 <+9>: popq %rbp
0x1000015aa <+10>: retq

From this, it appears that MemoryLayout<Int64>.size is indeed compile-time.

Checking the assembly for stride and alignment is left to the reader, but they give similar results (actually identical, in the case of Int64).

EDIT:

If we're talking about generic functions, obviously more work has to be done since the function will not know the type of which it's getting the size at compile time, and thus can't just put in a constant. But, if you define your function to take a type instead of an instance of the type, it does a little less work than in your example:

func getSizeOf<T>(_: T.Type) -> Int {
return MemoryLayout<T>.size
}

called like: getSizeOf(UInt64.self)

generates this assembly:

MyApp`getSizeOf<A>(_:):
0x100001590 <+0>: pushq %rbp
0x100001591 <+1>: movq %rsp, %rbp
0x100001594 <+4>: movq %rsi, -0x8(%rbp)
0x100001598 <+8>: movq %rdi, -0x10(%rbp)
-> 0x10000159c <+12>: movq -0x8(%rsi), %rsi
0x1000015a0 <+16>: movq 0x88(%rsi), %rax
0x1000015a7 <+23>: popq %rbp
0x1000015a8 <+24>: retq

Does Swift guarantee the storage order of fields in classes and structs?

Yes, the order of the struct elements in memory is the order of
their declaration. The details can be found
in Type Layout
(emphasis added). Note however the use of "currently", so this
may change in a future version of Swift:

Fragile Struct and Tuple Layout

Structs and tuples currently share the same layout algorithm, noted as the "Universal" layout algorithm in the compiler implementation. The algorithm is as follows:

  • Start with a size of 0 and an alignment of 1.
  • Iterate through the
    fields, in element order for tuples, or in var declaration order for
    structs. For each field:

    • Update size by rounding up to the alignment
      of the field, that is, increasing it to the least value greater or
      equal to size and evenly divisible by the alignment of the field.
    • Assign the offset of the field to the current value of size.
    • Update
      size by adding the size of the field.
    • Update alignment to the max of
      alignment and the alignment of the field.
  • The final size and alignment
    are the size and alignment of the aggregate. The stride of the type is
    the final size rounded up to alignment.

The padding/alignment is different from C:

Note that this differs from C or LLVM's normal layout rules in that size and stride are distinct; whereas C layout requires that an embedded struct's size be padded out to its alignment and that nothing be laid out there, Swift layout allows an outer struct to lay out fields in the inner struct's tail padding, alignment permitting.

Only if a struct is imported from C then it is guaranteed to have
the same memory layout. Joe Groff from Apple writes at
[swift-users] Mapping C semantics to Swift

If you depend on a specific layout, you should define the struct in C and import it into Swift for now.

and later in that discussion:

You can leave the struct defined in C and import it into Swift. Swift will respect C's layout.

Example:

struct A {
var a: UInt8 = 0
var b: UInt32 = 0
var c: UInt8 = 0
}

struct B {
var sa: A
var d: UInt8 = 0
}

// Swift 2:
print(sizeof(A), strideof(A)) // 9, 12
print(sizeof(B), strideof(B)) // 10, 12

// Swift 3:
print(MemoryLayout<A>.size, MemoryLayout<A>.stride) // 9, 12
print(MemoryLayout<B>.size, MemoryLayout<B>.stride) // 10, 12

Here var d: UInt8 is layed out in the tail padding of var sa: A.
If you define the same structures in C

struct  CA {
uint8_t a;
uint32_t b;
uint8_t c;
};

struct CB {
struct CA ca;
uint8_t d;
};

and import it to Swift then

// Swift 2:
print(sizeof(CA), strideof(CA)) // 9, 12
print(sizeof(CB), strideof(CB)) // 13, 16

// Swift 3:
print(MemoryLayout<CA>.size, MemoryLayout<CA>.stride) // 12, 12
print(MemoryLayout<CB>.size, MemoryLayout<CB>.stride) // 16, 16

because uint8_t d is layed out after the tail padding of struct CA sa.

As of Swift 3, both size and stride return the same value
(including the struct padding) for structures imported from C,
i.e. the same value as sizeof in C would return.

Here is a simple function which helps to demonstrate the above (Swift 3):

func showMemory<T>(_ ptr: UnsafePointer<T>) {
let data = Data(bytes: UnsafeRawPointer(ptr), count: MemoryLayout<T>.size)
print(data as NSData)
}

The structures defined in Swift:

var a = A(a: 0xaa, b: 0x, c: 0xcc)
showMemory(&a) // <aa000000 cc>

var b = B(sa: a, d: 0xdd)
showMemory(&b) // <aa000000 ccdd>

The structures imported from C:

var ca = CA(a: 0xaa, b: 0x, c: 0xcc)
showMemory(&ca) // <aa000000 cc000000>

var cb = CB(ca: ca, d: 0xdd)
showMemory(&cb) // <aa000000 cc000000 dd000000>

Alignment of simd_packed vector in Swift (vs Metal Shader language)

I actually think it's impossible to achieve relaxed alignment like this with a packed type in Swift. I think Swift compiler just can't bring the alignment attributes to actual Swift interface.

I think this makes simd_packed_float4 useless in Swift.

I have made a playground to check this, and using it as it's intended doesn't work.

import simd

MemoryLayout<simd_float4>.stride
MemoryLayout<simd_packed_float4>.alignment

let capacity = 8
let buffer = UnsafeMutableBufferPointer<Float>.allocate(capacity: capacity)

for i in 0..<capacity {
buffer[i] = Float(i)
}

let rawBuffer = UnsafeMutableRawBufferPointer.init(buffer)

let readAligned = rawBuffer.load(fromByteOffset: MemoryLayout<Float>.stride * 4, as: simd_packed_float4.self)

print(readAligned)

let readUnaligned = rawBuffer.load(fromByteOffset: MemoryLayout<Float>.stride * 2, as: simd_packed_float4.self)

print(readUnaligned)

Which will output

SIMD4<Float>(4.0, 5.0, 6.0, 7.0)
Swift/UnsafeRawPointer.swift:900: Fatal error: load from misaligned raw pointer

If you do need to load or put unaligned simd_float4 vectors into buffers, I would suggest just making an extension that does this component-wise, so all the alignments work out, kinda like this

extension UnsafeMutableRawBufferPointer {
func loadFloat4(fromByteOffset offset: Int) -> simd_float4 {
let x = rawBuffer.load(fromByteOffset: offset + MemoryLayout<Float>.stride * 0, as: Float.self)
let y = rawBuffer.load(fromByteOffset: offset + MemoryLayout<Float>.stride * 1, as: Float.self)
let z = rawBuffer.load(fromByteOffset: offset + MemoryLayout<Float>.stride * 2, as: Float.self)
let w = rawBuffer.load(fromByteOffset: offset + MemoryLayout<Float>.stride * 3, as: Float.self)

return simd_float4(x, y, z, w)
}
}

let readUnaligned2 = rawBuffer.loadFloat4(fromByteOffset: MemoryLayout<Float>.stride * 2)
print(readUnaligned2)

Or you can even make it generic

Using MemoryLayout on a struct gives the incorrect size

You cannot rely on simply adding together the sizes of the individual fields of a structure to get the struct's size.

Swift can add padding into the fields of a struct to align fields on various byte boundaries to improve the efficiency in accessing the data at runtime.

If you want to allocate one item, you can simply use the size of the memory layout. If you want a contiguous block of n instances then you should allocate blocks based on the stride of the layout.

Why do some types (e.g. Float80) have a memory alignment bigger than word size?

With the help of Martin R's link and the hint that it is a processor design decision. I found the readon why.

Cache lines.

Cache lines are a very small memory for the processor, on the Intel Mac 64 bit of mine it is 128 bit (16 bytes).

As seen in the picture of the question I knew there was a difference between the dotted and the bold lines. The bold lines are between the cache lines of the processor. You don't want to load 2 cache lines if you could do better with a little more memory cost. So if the processor only allows, that types with a size of 8 bytes (or bigger) are aligned on the start of a cache line (every multiple of 16). There will be no two cache line reads for a type that is as big as a cache line (double the word size im my case, 16 bytes). As you can see in the picture only the red blocks are crossing the bold line (so they are not allowed per design).

See the link attached for more info.

Cache effects



Related Topics



Leave a reply



Submit