Is there a way to convert a double array to a struct array?
Use struct
and num2cell
:
data = [1,2;3,4];
S = struct ('data', num2cell(data));
Can a struct of doubles be typecast to an array of doubles in C?
This leads to undefined behaviour. The layout of the struct is not totally prescribed by the standard. For instance, there may be padding.
Is it legal to cast array of wrapper structs containing POD to the array of POD type it contains?
Now, this code is not valid. There are several reasons for this. First, casting a pointer to the first member of the struct to the struct itself violates strict aliasing rule. This you can fix by making Wrapper
a child class of the Data
.
The second issue is more problematic, as you are trying to treat an array (vector in this case) polymorphically. sizeof Data
is different from the sizeof Wrapper
, so an attempt to index an array of Wrapper
elements as if it was an array of Data
elements will end up pointing into random areas of the array.
Fast interleave 2 double arrays into an array of structs with 2 float and 1 int (loop invariant) member, with SIMD double-float conversion?
Loosely inspired by Intel's 4x3 transposition example and based on @PeterCordes solution, here is an AVX1 solution, which should get a throughput of 8 structs within 8 cycles (bottleneck is still p5):
#include <immintrin.h>
#include <stddef.h>
struct f2u {
float O1, O2;
unsigned int Offset;
};
static const unsigned uiDefaultOffset = 123;
void cvt_interleave_avx(f2u *__restrict dst, double *__restrict pA, double *__restrict pB, ptrdiff_t len)
{
__m256 voffset = _mm256_castsi256_ps(_mm256_set1_epi32(uiDefaultOffset));
// 8 structs per iteration
ptrdiff_t i=0;
for(; i<len-7; i+=8)
{
// destination address for next 8 structs as float*:
float* dst_f = reinterpret_cast<float*>(dst + i);
// 4*vcvtpd2ps ---> 4*(p1,p5,p23)
__m128 inA3210 = _mm256_cvtpd_ps(_mm256_loadu_pd(&pA[i]));
__m128 inB3210 = _mm256_cvtpd_ps(_mm256_loadu_pd(&pB[i]));
__m128 inA7654 = _mm256_cvtpd_ps(_mm256_loadu_pd(&pA[i+4]));
__m128 inB7654 = _mm256_cvtpd_ps(_mm256_loadu_pd(&pB[i+4]));
// 2*vinsertf128 ---> 2*p5
__m256 A76543210 = _mm256_set_m128(inA7654,inA3210);
__m256 B76543210 = _mm256_set_m128(inB7654,inB3210);
// 2*vpermilps ---> 2*p5
__m256 A56741230 = _mm256_shuffle_ps(A76543210,A76543210,_MM_SHUFFLE(1,2,3,0));
__m256 B67452301 = _mm256_shuffle_ps(B76543210,B76543210,_MM_SHUFFLE(2,3,0,1));
// 6*vblendps ---> 6*p015 (does not need to use p5)
__m256 outA1__B0A0 = _mm256_blend_ps(A56741230,B67452301,2+16*2);
__m256 outA1ccB0A0 = _mm256_blend_ps(outA1__B0A0,voffset,4+16*4);
__m256 outB2A2__B1 = _mm256_blend_ps(B67452301,A56741230,4+16*4);
__m256 outB2A2ccB1 = _mm256_blend_ps(outB2A2__B1,voffset,2+16*2);
__m256 outccB3__cc = _mm256_blend_ps(voffset,B67452301,4+16*4);
__m256 outccB3A3cc = _mm256_blend_ps(outccB3__cc,A56741230,2+16*2);
// 3* vmovups ---> 3*(p237,p4)
_mm_storeu_ps(dst_f+ 0,_mm256_castps256_ps128(outA1ccB0A0));
_mm_storeu_ps(dst_f+ 4,_mm256_castps256_ps128(outB2A2ccB1));
_mm_storeu_ps(dst_f+ 8,_mm256_castps256_ps128(outccB3A3cc));
// 3*vextractf128 ---> 3*(p23,p4)
_mm_storeu_ps(dst_f+12,_mm256_extractf128_ps(outA1ccB0A0,1));
_mm_storeu_ps(dst_f+16,_mm256_extractf128_ps(outB2A2ccB1,1));
_mm_storeu_ps(dst_f+20,_mm256_extractf128_ps(outccB3A3cc,1));
}
// scalar cleanup for if _iNum is not even
for (; i < len; i++)
{
dst[i].O1 = static_cast<float>(pA[i]);
dst[i].O2 = static_cast<float>(pB[i]);
dst[i].Offset = uiDefaultOffset;
}
}
Godbolt link, with minimal test-code at the end: https://godbolt.org/z/0kTO2b
For some reason, gcc does not like to generate vcvtpd2ps
which directly convert from memory to a register. This might works better with aligned loads (having input and output aligned is likely beneficial anyway). And clang apparently wants to outsmart me with one of the vextractf128
instructions at the end.
Casting a managed array to an array of structs without copying
There is actually a cheat, but it is an ugly unsafe totally unsafe cheat:
[StructLayout(LayoutKind.Sequential)]
//[StructLayout(LayoutKind.Sequential, Pack = 4)]
public struct DataStructure
{
public int Id;
public double Value;
}
[StructLayout(LayoutKind.Explicit)]
public struct DataStructureConverter
{
[FieldOffset(0)]
public int[] IntArray;
[FieldOffset(0)]
public DataStructure[] DataStructureArray;
}
and then you can convert it without problems:
var myarray = new int[8];
myarray[0] = 1;
myarray[3] = 2;
//myarray[4] = 2;
DataStructure[] ds = new DataStructureConverter { IntArray = myarray }.DataStructureArray;
int i1 = ds[0].Id;
int i2 = ds[1].Id;
Note that depending on the size of DataStructure
(if it is 16 bytes or 12 bytes), you have to use Pack = 4
(if it is 12 bytes) or you don't need anything (see explanation (1) later)
I'll add that this technique is undocumented and totally unsafe. It even has a problem: ds.Length isn't the length of the DataStructure[]
but is the length of the int[]
(so in the example given it is 8, not 2)
The "technique" is the same I described here and originally described here.
explanation (1)
The sizeof(double)
is 8 bytes, so Value
is normally aligned on the 8 bytes boundary, so normally there is a "gap" between Id
(that has sizeof(int) == 4
) and Value
of 4 bytes. So normally sizeof(DataStructure) == 16
. Depending on how the DataStructure is built, there could not be this gap, so the Pack = 4
that forces alignment on the 4 byte boundary.
Related Topics
Variadic Template in VS 2012 (Visual C++ November 2012 Ctp)
Run a Program with More Than One Source Files in Gnu C++ Compiler
C/C++ MACro/Template Blackmagic to Generate Unique Name
Libraries in /Usr/Local/Lib Not Found
Checking for Eof in String::Getline
When Have You Used C++ 'Mutable' Keyword
C++ Virtual Table Layout of Mi(Multiple Inheritance)
Including C Headers Inside a C++ Program
Defining a Variable in the Condition Part of an If-Statement
Copy Elision: Move Constructor Not Called When Using Ternary Expression in Return Statement
Adding List Items or Nodes in Linked List
When to Make a Type Non-Movable in C++11
How to Create Unique_Ptr That Holds an Allocated Array
Thread Safe Lazy Construction of a Singleton in C++
How to Reduce Compile Time, and Linking Time for Visual C++ Projects (Native C++)
Are There Any MACros to Determine If My Code Is Being Compiled to Windows