Writing a Matrix into a Single Txt File with Mpi

writing a matrix into a single txt file with mpi

So it's not a good idea to write large amounts of data as text. It's really, really, slow, it generates unnecessarily large files, and it's a pain to deal with. Large amounts of data should be written as binary, with only summary data for humans written as text. Make the stuff the computer is going to deal with easy for the computer, and only the stuff you're actually going to sit down and read easy for you to deal with (eg, text).

Whether you're going to write as text or binary, you can use MPI-IO to coordinate your output to the file to generate one large file. We have a little tutorial on the topic (using MPI-IO, HDF5, and NetCDF) here. For MPI-IO, the trick is to define a type (here, a subarray) to describe the local layout of data in terms of the global layout of the file, and then write to the file using that as the "view". Each file sees only its own view, and the MPI-IO library coordinates the output so that as long as the views are non-overlapping, everything comes out as one big file.

If we were writing this out in binary, we'd just point MPI_Write to our data and be done with it; since we're using text, we have to convert out data into a string. We define our array the way we normally would have, except instead of it being of MPI_FLOATs, it's of a new type which is charspernum characters per number.

The code follows:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <mpi.h>

float **alloc2d(int n, int m) {
float *data = malloc(n*m*sizeof(float));
float **array = malloc(n*sizeof(float *));
for (int i=0; i<n; i++)
array[i] = &(data[i*m]);
return array;
}

int main(int argc, char **argv) {
int ierr, rank, size;
MPI_Offset offset;
MPI_File file;
MPI_Status status;
MPI_Datatype num_as_string;
MPI_Datatype localarray;
const int nrows=10;
const int ncols=10;
float **data;
char *const fmt="%8.3f ";
char *const endfmt="%8.3f\n";
int startrow, endrow, locnrows;

const int charspernum=9;

ierr = MPI_Init(&argc, &argv);
ierr|= MPI_Comm_size(MPI_COMM_WORLD, &size);
ierr|= MPI_Comm_rank(MPI_COMM_WORLD, &rank);

locnrows = nrows/size;
startrow = rank * locnrows;
endrow = startrow + locnrows - 1;
if (rank == size-1) {
endrow = nrows - 1;
locnrows = endrow - startrow + 1;
}

/* allocate local data */
data = alloc2d(locnrows, ncols);

/* fill local data */
for (int i=0; i<locnrows; i++)
for (int j=0; j<ncols; j++)
data[i][j] = rank;

/* each number is represented by charspernum chars */
MPI_Type_contiguous(charspernum, MPI_CHAR, &num_as_string);
MPI_Type_commit(&num_as_string);

/* convert our data into txt */
char *data_as_txt = malloc(locnrows*ncols*charspernum*sizeof(char));
int count = 0;
for (int i=0; i<locnrows; i++) {
for (int j=0; j<ncols-1; j++) {
sprintf(&data_as_txt[count*charspernum], fmt, data[i][j]);
count++;
}
sprintf(&data_as_txt[count*charspernum], endfmt, data[i][ncols-1]);
count++;
}

printf("%d: %s\n", rank, data_as_txt);

/* create a type describing our piece of the array */
int globalsizes[2] = {nrows, ncols};
int localsizes [2] = {locnrows, ncols};
int starts[2] = {startrow, 0};
int order = MPI_ORDER_C;

MPI_Type_create_subarray(2, globalsizes, localsizes, starts, order, num_as_string, &localarray);
MPI_Type_commit(&localarray);

/* open the file, and set the view */
MPI_File_open(MPI_COMM_WORLD, "all-data.txt",
MPI_MODE_CREATE|MPI_MODE_WRONLY,
MPI_INFO_NULL, &file);

MPI_File_set_view(file, 0, MPI_CHAR, localarray,
"native", MPI_INFO_NULL);

MPI_File_write_all(file, data_as_txt, locnrows*ncols, num_as_string, &status);
MPI_File_close(&file);

MPI_Type_free(&localarray);
MPI_Type_free(&num_as_string);

free(data[0]);
free(data);

MPI_Finalize();
return 0;
}

Running gives:

$ mpicc -o matrixastxt matrixastxt.c  -std=c99
$ mpirun -np 4 ./matrixastxt
$ more all-data.txt
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
2.000 2.000 2.000 2.000 2.000 2.000 2.000 2.000 2.000 2.000
2.000 2.000 2.000 2.000 2.000 2.000 2.000 2.000 2.000 2.000
3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000
3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000
3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000
3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000

Writing a large matrix in a single file using MPI

I would say that the easy way is to use a library designed to perform such operations efficiently : http://2decomp.org/mpiio.html

You can also look at their source code (files io.f90 and io_write_one.f90).

In the source code, you will see a call to MPI_FILE_SET_SIZE that may be relevant for your case.

EDIT : consider using "call MPI_File_Set_View(fhandle, 0_MPI_OFFSET_KIND,...". Answer from MPI-IO: MPI_File_Set_View vs. MPI_File_Seek

Parallel output using MPI IO to a single file

Your binary file output is almost right; but your calculations for your offset within the file and the amount of data to write is incorrect. You want your offset to be

MPI_Offset offset = sizeof(double)*Pstart;

not

MPI_Offset offset = sizeof(double)*rank;

otherwise you'll have each rank overwriting each others data as (say) rank 3 out of nprocs=5 starts writing at double number 3 in the file, not (30/5)*3 = 18.

Also, you want each rank to write NNN/nprocs doubles, not sizeof(double) doubles, meaning you want

MPI_File_write(file, localArray, NNN/nprocs, MPI_DOUBLE, &status);

How to write as a text file is a much bigger issue; you have to convert the data into string internally and then output those strings, making sure you know how many characters each line requires by careful formatting. That is described in this answer on this site.

writing a matrix into a txt file with ; after each row C++?

Just add a ; before the newline:

std::ofstream output("Power vector.txt"); 
for (k=1; k<PowerMatrix.size(); k++)
{
for (l=1; l<PowerMatrix.size(); l++)
{
output << PowerMatrix[i][j] << " "; // behaves like cout - cout is also a stream
}
output << ";" << endl;
}

How to read a txt file in MPI by a single process? Why my approach does not work?

Finally, I find the problem. I am working under win7 with visual studio. Seems I have to indicate explicitly the path of my file. Even I put "Im.txt" to the same folder with the source code file, it does not work.



Related Topics



Leave a reply



Submit