R: Read in .csv file and convert into multiple column data frame

Question

I am new to R and currently having a plenty of trouble just reading in .csv file and converting it into data.frame with 7 columns. Here is what I am doing:

gene_symbols_table <- as.data.frame(read.csv(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=TRUE, sep=","))

After that I am getting a data.frame with dim = 46761 x 1, but I need it to be 46761 x 7. I tried the following stackoverflow threads:

But somehow nothing is working in my case. Here is how the table looks:

> head(gene_symbols_table, 3)
input.reason.matches.organism.name.primaryIdentifier.symbol.briefDescription.c
lass.secondaryIdentifier
1                     WBGene00008675 MATCH 1 Caenorhabditis elegans    
WBGene00008675 irld-26  Gene F11A5.7
2                      WBGene00008676 MATCH 1 Caenorhabditis elegans 
WBGene00008676 oac-15  Gene F11A5.8
3                            WBGene00008677 MATCH 1 Caenorhabditis elegans 
WBGene00008677   Gene F11A5.9

The .csv file in Excel looks like this:

input   |  reason   |  matches  |   organism.name  |    primaryIdentifier   |  symbol   | 
briefDescription
WBGene00008675  |   MATCH  |    1     |   Caenorhabditis elegans    WBGene00008675  |   irld-26   |   ...   
...

The following code:

gene_symbols_table <- read.table(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=FALSE, sep=",", 
col.names = paste0("V",seq_len(7)), fill = TRUE)

Seems to be working, however when I look into dim I can see right away that it is wrong: 20124 x 7. Then:

V1
1input;reason;matches;organism.name;primaryIdentifier;symbol;briefDescription;class;secondaryIdentifier
2                     WBGene00008675;MATCH;1;Caenorhabditis 
elegans;WBGene00008675;irld-26;;Gene;F11A5.7
3                      WBGene00008676;MATCH;1;Caenorhabditis 
elegans;WBGene00008676;oac-15;;Gene;F11A5.8
  V2 V3 V4 V5
1            
2            
3

1

So, it is wrong

Other attempts at read.table are giving me the error specified in the second stackoverflow thread.

I have also tried splitting the data.frame with one column into 7, but so far no success.

I think you'll need to include more lines of the file (as displayed in a text editor, not Excel) in order to get help. Your Excel snippet suggests you might need a sep = "|" argument but this remains unclear. Also, the response from read.csv() is a data frame, so you don't need as.data.frame(). — Thomas
– Thomas, Commented Dec 13, 2017 at 21:57
I added '|' myself here for the sake of visualizing it better. In Excel these are just cells — Nikita Vlasenko
– Nikita Vlasenko, Commented Dec 13, 2017 at 21:58
@NikitaVlasenko Do you have any way of knowing if your data is 'ragged', meaning that some rows could have more or less than 7 columns? Another reason you could have that error is if you have an index column in your data without a column name. — Nate
– Nate, Commented Dec 13, 2017 at 21:59

phil_t · Accepted Answer · 2017-12-13 22:00:23Z

4

The sep seems to be space or semi-colon, and not comma from what the table looks like. So either try specifying that, or you could try fread from the data.table package, which automatically detects the separator.

gene_symbols_table <- as.data.frame(fread(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=TRUE))

answered Dec 13, 2017 at 22:00

phil_t

8713 gold badges7 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

R: Read in .csv file and convert into multiple column data frame

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related