C++ Reading CSV File and Assigning Values to Array

reading a csv file into struct array

In the while loop:

while (fgets(buf, 255, bookFile) != NULL)

you are copying into the memory location of buffer new contents from file. As tmp points to a certain point in the buffer, its contents are being replaced too.

 tmp = strtok(NULL, ";");
books[i].name = tmp;

You should allocate memory for each struct of the array and then use strcopy.

You can find an explanation of differences between strcpy and strdup here:
strcpy vs strdup

Reading data from a .csv file to append into a 2D array in c

Yes you are right, you are using strtok incorrectly.

The first thing I would do is to read each line and the parse the line using
strtok, like this:

char line[1024];

const char *delim=",\n";

while(fgets(line, sizeof line, fp))
{
char *token = strtok(line, delim);

do {
printf("token: %s\n", token);
} while(token = strtok(NULL, delim));
}

strtok requires that all subsequent calls of strtok must be called with
NULL. strtok will return NULL when no more token can be found, usually the
end of the line has been reached. Note that I added the newline in the
delimiters argument. When the destination buffer is large enough fgets writes
the newline as well. Putting the newline in the delimiters list is nice trick
because strtok will get rid of the newline for you.

The code above gives you a way getting each cell of the csv, as a string. You
would have to convert the values yourself. This is the tricky bit, if the csv
contains empty spaces, quotes, etc, you need different strategies to parse the
correct value of the cell. You can use function like strtol & friend which
allow you to recover from errors, but they are not bullet proof, there will be
cases when they fail as well.

An easy example would be:

char line[1024];

const char *delim=",\n";

while(fgets(line, sizeof line, fp))
{
char *token = strtok(line, delim);

do {
int val;
if(sscanf(token, "%d", &n) != 1)
fprintf(stderr, "'%s' is not a number!\n", token);
else
printf("number found: %d\n", val);
} while(token = strtok(NULL, delim));
}

Note that this not cover all cases, for example cell that are in quotes.

The last thing to be done would be to store the values. One way of doing it is
to allocate memory for a pointer to an int array and reallocate memory for
every cell. Here again the problem lies in the csv file, sometimes they have the
wrong format, some rows will be empty or some rows will have more or less
columns than the other rows, this can be tricky. At this point it would be a good
idea to use a library for parsing csv.

The following code will assume that csv is well formatted and the number of
columns is always the same across all rows and no line is longer than 1023
characters long. When *cols is 0, I calculate the number of columns base on
the first line. If other rows have less columns, all remaining values will be 0
(because of the calloc sets new allocated memory to 0). If there are more
colmuns than in the first row, this columns will be ignored:

int **parse_csv(const char *filename, size_t *rows, size_t *cols)
{
if(filename == NULL || rows == NULL || cols == NULL)
return NULL;

FILE *fp = fopen(filename, "r");

if(fp == NULL)
return NULL;

int **csv = NULL, **tmp;

*rows = 0;
*cols = 0;

char line[1024];
char *token;
char *delim = ",\n";

while(fgets(line, sizeof line, fp))
{
tmp = realloc(csv, (*rows + 1) * sizeof *csv);
if(tmp == NULL)
return csv; // return all parsed rows so far

csv = tmp;

if(*cols == 0)
{
// calculating number of rows
char copy[1024];
strcpy(copy, line);

token = strtok(copy, delim);

do {
(*cols)++;
} while((token = strtok(NULL, delim)));
}

int *row = calloc(*cols, sizeof *row);

if(row == NULL)
{
if(*rows == 0)
{
free(csv);
return NULL;
}

return csv; // return all parsed rows so far
}

// increment rows count
(*rows)++;

size_t idx = 0;

token = strtok(line, delim);

do {
if(sscanf(token, "%d", row + idx) != 1)
row[idx] = 0; // in case the conversion fails,
// just to make sure to have a defined value
// in the cell

idx++;
} while((token = strtok(NULL, delim)) && idx < *cols);

csv[*rows - 1] = row;
}

fclose(fp);
return csv;
}

void free_csv(int **csv, size_t rows)
{
if(csv == NULL)
return;

for(size_t i = 0; i < rows; ++i)
free(csv[i]);

free(csv);
}

Now you can parse it like this:

size_t cols, rows;
int **csv = parse_csv("file.csv", &rows, &cols);

if(csv == NULL)
{
// error handling...
// do not continue
}

...

free_csv(csv, rows);

Now csv[3][4] would give you the cell at row 3, col 4 (starting from 0).


edit

Things I noticed from you code:

void main() is wrong. main should have only one of the following prototypes:

  • int main(void);
  • int main(int argc, char **argv);
  • int main(int argc, char *argv[]);

Another:

int main(void)
{
char *strcat(char *dest, const char *src);
char *strtok(char *str, const char *delim);
...

}

Don't put that in the main function, put it outside, also there are standard
header files for this. In this case include string.h

#include <string.h>

int main(void)
{
...
}

Another

const char *delim = (const char *)',';

This is just wrong, it's like trying to sell an apple and call it orange. ','
is a single character of type char. It has the value 44. It's the same as
doing:

const char *delim = (const char*) 44;

you are setting the address where delim should point to 44.

You have to use double quotes:

const char *delim = ",";

Note that 'x' and "x" are not the same. 'x' is 120 (see ASCII), it's
a single char. "x" is a string literal, it returns you a pointer to the start
of a sequence of characters that ends with the '\0'-terminating byte, aka a
string. Those are fundamentally different things in C.

Read a large CSV File and storing the content in C language

Your code had several issues. Please find the corrected version below.

Some of the issues you had:

  1. Not saving result of strtok properly in Filedata.time. You need to copy string, not use assignment. So you need to allocate space for Filedata.time using malloc (don't forget to free each Filedata.time for which you allocated in such case) and copy result of strtok there; alternatively you can also use fixed length strings if you like.

  2. Using wrong format specifier in last printf for Data[i].time_diff.

  3. Not using atoi for time_diff.

  4. Use of uninitialized variables. You used filesize variable without initializing it.

    struct Filedata 
    {
    char *time; /*increase if you like or use dynamic memory*/
    int time_diff;
    int SN;
    int RS;
    int Fr;
    };
    struct Filedata Data[ARR_SIZE];

    int main(int argc, char *argv[])
    {

    char *buffer;
    FILE *fp;
    char *token;
    int filesize = 0;
    int i = 0, j=0;

    if ((fp=fopen("C:\\test.txt", "r"))==NULL)
    {
    printf ("file cannot be opened");
    return 1;
    }


    buffer = malloc (BUFFER_SIZE);
    if (buffer == NULL)
    {
    printf("Error: Out of Memory");
    return 1;
    }


    fgets(buffer, BUFFER_SIZE, fp);
    token = strtok(buffer, ";");


    while (token !=NULL)
    {
    printf (" \t%s", token);
    token = strtok (NULL, ";");
    }

    while ((fgets(buffer, BUFFER_SIZE, fp)))
    {
    if(i>=ARR_SIZE) break;

    Data[i].time=malloc(256); // Use constant(or define) which is more suitable for you
    token = strtok(buffer, ";");
    strcpy(Data[i].time,token);

    token = strtok(NULL, ";");
    Data[i].time_diff = atoi(token);

    token = strtok(NULL, "; ");
    Data[i].SN = atoi(token);


    token = strtok(NULL, "; ");
    Data[i].RS = atoi(token);


    token = strtok(NULL, "; ");
    Data[i].Fr = atoi(token);


    printf("\t%s\t%d\t%d\t%d\t%d\t \n", Data[i].time, Data[i].time_diff, Data[i].SN, Data[i].RS, Data[i].Fr);
    i++;


    }

    // Note: Also don't forget to free each Data[i].time for
    // which you allocated space, e.g.
    for(j =0;j<i; j++) free(Data[j].time);

    free(buffer);
    return 0;



    }


Related Topics



Leave a reply



Submit