I am working on a program in C to parse data from a CSV file. The structure of my CSV file includes fields that may be empty, and I need to handle these cases properly. Here's an example of a CSV row:
123,John,Doe,,New York
124,Mary,,01/01/2000,Los Angeles
125,,,,
I want to store the parsed data in an array of structs :
typedef struct {
int ID;
char FirstName[25];
char LastName[25];
char dateOfBirth[15];
char cityOfBirth[25];
int del;
} Student;
I attempted to parse the CSV file using sscanf like this:
int read = sscanf(line, "%d,%24[^,],%24[^,],%14[^,],%24[^,\n]",
&student.ID,
student.FirstName,
student.LastName,
student.dateOfBirth,
student.cityOfBirth);
student.del = 0;
if (read != 5) {
printf("Error reading line or handling empty fields.\n");
return;
}
However, when a field is empty (e.g., ,,), fscanf doesn't work as I expected. It either skips empty fields or fails to parse the line correctly.
To work around this, I tried using strtok to tokenize the CSV row after reading it as a string:
char delimiters[] = ",";
char line[256];
fgets(line, sizeof(line), csvFile);
char *token = strtok(line, delimiters);
if (token != NULL) student.ID = atoi(token);
else student.ID = 0; // Handle empty ID
token = strtok(NULL, delimiters);
if (token != NULL) strcpy(student.FirstName, token);
else strcpy(student.FirstName, ""); // Handle empty field
This partially works, but strtok skips consecutive delimiters, so I cannot reliably detect and handle empty fields.
scanf()will never succeed, imo, because you absolutely must read the CSV file a character at a time -- or at least parse the read buffer a character at a time -- to catch all the nuance of the CSV format. I recommend 1st obtaining the rule/logic documents of CSV encoding, then build your code from that.