How to Output Array of Doubles to Hard Drive

How to output array of doubles to hard drive?

Hey... so you want to do it in a single write/read, well its not too hard, the following code should work fine, maybe need some extra error checking but the trial case was successful:

#include <string>
#include <fstream>
#include <iostream>

bool saveArray( const double* pdata, size_t length, const std::string& file_path )
{
    std::ofstream os(file_path.c_str(), std::ios::binary | std::ios::out);
    if ( !os.is_open() )
        return false;
    os.write(reinterpret_cast<const char*>(pdata), std::streamsize(length*sizeof(double)));
    os.close();
    return true;
}

bool loadArray( double* pdata, size_t length, const std::string& file_path)
{
    std::ifstream is(file_path.c_str(), std::ios::binary | std::ios::in);
    if ( !is.is_open() )
        return false;
    is.read(reinterpret_cast<char*>(pdata), std::streamsize(length*sizeof(double)));
    is.close();
    return true;
}

int main()
{
    double* pDbl = new double[1000];
    int i;
    for (i=0 ; i<1000 ; i++)
        pDbl[i] = double(rand());

    saveArray(pDbl,1000,"test.txt");

    double* pDblFromFile = new double[1000];
    loadArray(pDblFromFile, 1000, "test.txt");

    for (i=0 ; i<1000 ; i++)
    {
        if ( pDbl[i] != pDblFromFile[i] )
        {
            std::cout << "error, loaded data not the same!\n";
            break;
        }
    }
    if ( i==1000 )
        std::cout << "success!\n";

    delete [] pDbl;
    delete [] pDblFromFile;

    return 0;
}

Just make sure you allocate appropriate buffers! But thats a whole nother topic.

How to convert an array of array of Doubles to an RDD[String]

Maybe something like this:

scala> val testDensities: Array[Array[Double]] = Array(Array(1.1, 1.2), Array(2.1, 2.2), Array(3.1, 3.2))
scala> val strRdd = sc.parallelize(testDensities).map(_.mkString("[",",","]"))
strRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[16] at map at <console>:26
scala> strRdd.collect
res7: Array[String] = Array([1.1,1.2], [2.1,2.2], [3.1,3.2])

But I have two question:

Why do you want to do it? I understand that is only because you are
learning and you are playing with Spark.
Why do you try to use "Array"? It is not the first time that I see people trying to transform all in arrays. Keep RDD until the end and use more generic collections types.

Why is your code wrong:
Because you apply the map in your local array (in the Driver) and then create a RDD from a list of lists.
So:

You are not parallelizing the execution of the maps. In fact, you are parallelizing nothing.
You create an RDD of Lists and not of String.

If you execute your code in the console:

scala> val testData = sc.parallelize(Seq(testDensities.map { x => x.toArray }.map { x => x.toString() } ))
testData: org.apache.spark.rdd.RDD[Array[String]] = ParallelCollectionRDD[14] at parallelize at <console>:26

the response is clear: RDD[Array[String]]

printing all contents of array in C#

You may try this:

foreach(var item in yourArray)
{
    Console.WriteLine(item.ToString());
}

Also you may want to try something like this:

yourArray.ToList().ForEach(i => Console.WriteLine(i.ToString()));

EDIT: to get output in one line [based on your comment]:

 Console.WriteLine("[{0}]", string.Join(", ", yourArray));
 //output style:  [8, 1, 8, 8, 4, 8, 6, 8, 8, 8]

EDIT(2019): As it is mentioned in other answers it is better to use Array.ForEach<T> method and there is no need to do the ToList step.

Array.ForEach(yourArray, Console.WriteLine);

about limit of the array output to file in c++

Your arrays are too big to fit on the stack, which has limited size no matter how much RAM you have. A quick-fix is to make them static. The better solution is to allocate them dynamically, either using new or vector. The fact your array is 2d makes it a bit tricky to pass a pointer to the file. Can you do this?

vector<double> data(2000000000);
fstream outfile("output.dat", ios::out | ios::binary);
outfile.write((char*)&data[0], data.size() * sizeof (double));

If this throws std::bad_alloc then your OS didn't have enough memory to provide.

summing array of doubles with large value span : proper algorithm

As GuyGreer suggested, you can use Kahan summation:

double sum = 0.0;
double c = 0.0;
for (double value : values) {
    double y = value - c;
    double t = sum + y;
    c = (t - sum) - y;
    sum = t;
}

EDIT: You should also consider using Horner's method to evaluate the polynomial.

double value = coeffs[degree];
for (auto i = degree; i-- > 0;) {
    value *= x;
    value += coeffs[i];
}

Issues saving double as binary in c++

The trouble is that base 10 representation of double in ascii is flawed and not guaranteed to give you the correct result (especially if you only use 10 digits). There is a potential for a loss of information even if you use all std::numeric_limits<max_digits10> digits as the number may not be representable in base 10 exactly.

The other issue you have is that the binary representation of a double is not standardized so using it is very fragile and can lead to code breaking very easily. Simply changing the compiler or compiler sittings can result in a different double format and changing architectures you have absolutely no guarantees.

You can serialize it to text in a non lossy representation by using the hex format for doubles.

 stream << std::fixed << std::scientific << particles[i].pos[0];

 // If you are using C++11 this was simplified to

 stream << std::hexfloat << particles[i].pos[0];

This has the affect of printing the value with the same as "%a" in printf() in C, that prints the string as "Hexadecimal floating point, lowercase". Here both the radix and mantissa are converted into hex values before being printed in a very specific format. Since the underlying representation is binary these values can be represented exactly in hex and provide a non lossy way of transferring data between systems. IT also truncates proceeding and succeeding zeros so for a lot of numbers is relatively compact.

On the python side. This format is also supported. You should be able to read the value as a string then convert it to a float using float.fromhex()

see: https://docs.python.org/3/library/stdtypes.html#float.fromhex

But your goal is to save space:

But now, to save space, I am trying to save the configuration as a binary file.

I would ask the question do you really need to save space? Are you running on a low powered low resource environment? Sure then space saving can definitely be a thing (but that is rare nowadays (but these environments do exist)).

But it seems like you are running some form of particle simulation. This does not scream low resource use case. Even if you have tera bytes of data I would still go with a portable easy to read format over binary. Preferably one that is not lossy. Storage space is cheap.

How to Output Array of Doubles to Hard Drive