3

I am new to R and currently having a plenty of trouble just reading in .csv file and converting it into data.frame with 7 columns. Here is what I am doing:

gene_symbols_table <- as.data.frame(read.csv(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=TRUE, sep=","))

After that I am getting a data.frame with dim = 46761 x 1, but I need it to be 46761 x 7. I tried the following stackoverflow threads:

  1. How can you read a CSV file in R with different number of columns

  2. read.delim() - errors "more columns than column names" and "header and ''col.names" are of different lengths"

  3. Split a column of a data frame to multiple columns

But somehow nothing is working in my case. Here is how the table looks:

> head(gene_symbols_table, 3)
input.reason.matches.organism.name.primaryIdentifier.symbol.briefDescription.c
lass.secondaryIdentifier
1                     WBGene00008675 MATCH 1 Caenorhabditis elegans    
WBGene00008675 irld-26  Gene F11A5.7
2                      WBGene00008676 MATCH 1 Caenorhabditis elegans 
WBGene00008676 oac-15  Gene F11A5.8
3                            WBGene00008677 MATCH 1 Caenorhabditis elegans 
WBGene00008677   Gene F11A5.9

The .csv file in Excel looks like this:

input   |  reason   |  matches  |   organism.name  |    primaryIdentifier   |  symbol   | 
briefDescription
WBGene00008675  |   MATCH  |    1     |   Caenorhabditis elegans    WBGene00008675  |   irld-26   |   ...   
...

The following code:

gene_symbols_table <- read.table(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=FALSE, sep=",", 
col.names = paste0("V",seq_len(7)), fill = TRUE)

Seems to be working, however when I look into dim I can see right away that it is wrong: 20124 x 7. Then:

V1
1input;reason;matches;organism.name;primaryIdentifier;symbol;briefDescription;class;secondaryIdentifier
2                     WBGene00008675;MATCH;1;Caenorhabditis 
elegans;WBGene00008675;irld-26;;Gene;F11A5.7
3                      WBGene00008676;MATCH;1;Caenorhabditis 
elegans;WBGene00008676;oac-15;;Gene;F11A5.8
  V2 V3 V4 V5
1            
2            
3        

1

So, it is wrong

Other attempts at read.table are giving me the error specified in the second stackoverflow thread.

I have also tried splitting the data.frame with one column into 7, but so far no success.

6
  • What happens when you change sep=',' to sep=';'? Commented Dec 13, 2017 at 21:54
  • more columns than column names error Commented Dec 13, 2017 at 21:56
  • 1
    I think you'll need to include more lines of the file (as displayed in a text editor, not Excel) in order to get help. Your Excel snippet suggests you might need a sep = "|" argument but this remains unclear. Also, the response from read.csv() is a data frame, so you don't need as.data.frame(). Commented Dec 13, 2017 at 21:57
  • I added '|' myself here for the sake of visualizing it better. In Excel these are just cells Commented Dec 13, 2017 at 21:58
  • @NikitaVlasenko Do you have any way of knowing if your data is 'ragged', meaning that some rows could have more or less than 7 columns? Another reason you could have that error is if you have an index column in your data without a column name. Commented Dec 13, 2017 at 21:59

1 Answer 1

4

The sep seems to be space or semi-colon, and not comma from what the table looks like. So either try specifying that, or you could try fread from the data.table package, which automatically detects the separator.

gene_symbols_table <- as.data.frame(fread(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=TRUE))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.