read.csv() error: more columns than column names

Question

I'm trying to input a CSV file, but I get the following error:

associatedata <- read.csv("AssociatedSpeciesID_1.csv", header=TRUE, fileEncoding = 'UTF-8-BOM') %>% mutate_all(na_if, "")

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names

Here's the CSV below: I can't find where the number of columns doesn't matched up. I've tried common solutions to other questions, but nothing's worked.

ObjectID,GlobalID,AssociatedSpeciesKnown,Associates,NewAssociate,UnknownSpecies_Description,AssociatedSpeciesAbundance,Coflowering,ParentGlobalID,CreationDate,Creator,EditDate,Editor
1,54e33e7c-1ff1-464f-8872-df027fcfe8ec,known,Amelanchier utahensis,,,Few,no,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
2,68420bc9-d6c6-4d7f-a149-7306399ce5c1,known,NewSpecies,Genus species,,Occasional,yes,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
3,88a6807b-b00c-4e58-84bb-4e8cb61409ae,unknown,,,ritiidiwjjviern bg,Common,no,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
4,9fc8ea4a-e197-42cc-bd75-614d5b106364,known,Artemisia nova,,,Common,no,ea9eb086-89c2-4aa5-a2f6-95519cd35a58,1/7/2022 3:56:26 PM,ejob_BLM,1/7/2022 3:56:26 PM,ejob,,

Without having actually looked at it, I'm going to guess that there is a space somewhere. Have you tried setting check.names = F? It might come in wacky, but it will help narrow down the issue. — Kat
– Kat, Commented Jan 7, 2022 at 16:54
not sure why the error is caused, but may be you can try other package like read_csv() from tidyverse, not entirely sure if that would help but could be worth a try — monte
– monte, Commented Jan 7, 2022 at 16:55
@Kat No luck, still get the same error. Do spaces in the field values matter? Because there are spaces there. — Emily
– Emily, Commented Jan 7, 2022 at 17:00
Whitespace can definitely make a difference. You could try header = F, skip = 1 and then read in just line one for the headings (nrow = 1). Does that work? — Kat
– Kat, Commented Jan 7, 2022 at 17:02

G. Grothendieck · Accepted Answer · 2022-01-07 17:53:57Z

The header has 13 fields and all other records have 15 and examining it we see that there are two trailing commas on the end of each data line.

count.fields("abc.csv", sep = ",")
## [1] 13 15 15 15 15

1) If we remove the two trailing commas then it works. (You may not need the strip.white but it was added because the code in the Note at the end is indented 4 spaces to satisfy SO. It won't hurt.)

L <- "abc.csv" |>
  readLines() |>
  sub(pattern = ",,$", replacement = "")
DF <- read.csv(text = L, strip.white = TRUE)

giving

> str(DF)
'data.frame':   4 obs. of  13 variables:
 $ ObjectID                  : int  1 2 3 4
 $ GlobalID                  : chr  "54e33e7c-1ff1-464f-8872-df027fcfe8ec" "68420bc9-d6c6-4d7f-a149-7306399ce5c1" "88a6807b-b00c-4e58-84bb-4e8cb61409ae" "9fc8ea4a-e197-42cc-bd75-614d5b106364"
 $ AssociatedSpeciesKnown    : chr  "known" "known" "unknown" "known"
 $ Associates                : chr  "Amelanchier utahensis" "NewSpecies" "" "Artemisia nova"
 $ NewAssociate              : chr  "" "Genus species" "" ""
 $ UnknownSpecies_Description: chr  "" "" "ritiidiwjjviern bg" ""
 $ AssociatedSpeciesAbundance: chr  "Few" "Occasional" "Common" "Common"
 $ Coflowering               : chr  "no" "yes" "no" "no"
 $ ParentGlobalID            : chr  "9fc6b840-8584-4045-b69f-f0e9488a1f06" "9fc6b840-8584-4045-b69f-f0e9488a1f06" "9fc6b840-8584-4045-b69f-f0e9488a1f06" "ea9eb086-89c2-4aa5-a2f6-95519cd35a58"
 $ CreationDate              : chr  "1/7/2022 3:55:46 PM" "1/7/2022 3:55:46 PM" "1/7/2022 3:55:46 PM" "1/7/2022 3:56:26 PM"
 $ Creator                   : chr  "ejob_BLM" "ejob_BLM" "ejob_BLM" "ejob_BLM"
 $ EditDate                  : chr  "1/7/2022 3:55:46 PM" "1/7/2022 3:55:46 PM" "1/7/2022 3:55:46 PM" "1/7/2022 3:56:26 PM"
 $ Editor                    : chr  "ejob" "ejob" "ejob" "ejob"

2) Alternately if sed is on your path then:

read.csv(pipe("sed -e s/,,$// abc.csv"), strip.white = TRUE)

3) This would also work.

DF <- read.csv("abc.csv", header = FALSE, skip = 1, strip.white = TRUE)[1:13]
names(DF) <- read.table("abc.csv", sep = ",", strip.white = TRUE, nrows = 1)

Note

Generate file from question.

Lines <- "ObjectID,GlobalID,AssociatedSpeciesKnown,Associates,NewAssociate,UnknownSpecies_Description,AssociatedSpeciesAbundance,Coflowering,ParentGlobalID,CreationDate,Creator,EditDate,Editor
1,54e33e7c-1ff1-464f-8872-df027fcfe8ec,known,Amelanchier utahensis,,,Few,no,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
2,68420bc9-d6c6-4d7f-a149-7306399ce5c1,known,NewSpecies,Genus species,,Occasional,yes,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
3,88a6807b-b00c-4e58-84bb-4e8cb61409ae,unknown,,,ritiidiwjjviern bg,Common,no,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
4,9fc8ea4a-e197-42cc-bd75-614d5b106364,known,Artemisia nova,,,Common,no,ea9eb086-89c2-4aa5-a2f6-95519cd35a58,1/7/2022 3:56:26 PM,ejob_BLM,1/7/2022 3:56:26 PM,ejob,,
"
cat(Lines, file = "abc.csv")

Collectives™ on Stack Overflow

read.csv() error: more columns than column names

1 Answer 1

Note

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Note

Comments

Your Answer

Sign up or log in

Post as a guest

Related