use first row data as column names in r

Question

I have a dirty dataset that I could not read it with header = T. After I read and clean it, I would like to use the now first row data as the column name. I tried multiple methods on Stack Overflow without success. What could be the problem?

The dataset t1 should look like this after clean up:

      V1    V2  V3  V4  V5
1   col1    col2    col3    col4
2   row1    2   4   5   56
3   row2    74  74  3   534
4   row3    865 768 8   7
5   row4    68  86  65  87

I tried: colnames(t1) <- t1[1,]. Nothing happens.
I tried: names(t1) <- ti[1,], Nothing happens.
I tried: lapply(t1, function(x) {names(x) <- x[1, ]; x}). It returns an error message:
```
Error in `[.default`(x, 1, ) : incorrect number of dimensions
```

Could anyone help?

Looking at your data, do you have blanks in some columns? try str(t1[1,]) and see if it's doing what you expect. — MikeRSpencer
– MikeRSpencer, Commented Aug 17, 2015 at 16:08
colnames(t1) <- t1[1, ] and then t1 <- t1[-1, ] should work — nicholaspooran
– nicholaspooran, Commented Nov 7, 2023 at 9:56

zek19 · Accepted Answer · 2019-12-12 13:01:14Z

77

Sam Firke's ever useful package janitor has a function especially for this: row_to_names.

Example from his documentation:

library(janitor)

x <- data.frame(X_1 = c(NA, "Title", 1:3),
           X_2 = c(NA, "Title2", 4:6))
x %>%
  row_to_names(row_number = 2)

answered Dec 12, 2019 at 13:01

zek19

1,1438 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Pierre L · Accepted Answer · 2015-08-17 15:46:23Z

29

header.true <- function(df) {
  names(df) <- as.character(unlist(df[1,]))
  df[-1,]
}

Test

df1 <- data.frame(c("a", 1,2,3), c("b", 4,5,6))
header.true(df1)
  a b
2 1 4
3 2 5
4 3 6

edited Aug 17, 2015 at 15:46

answered Aug 17, 2015 at 15:40

Pierre L

28.5k6 gold badges50 silver badges70 bronze badges

1 Comment

PesKchan Over a year ago

life saver little fucntion ..every time i have to see the V1 issues not sure why

mpalanco · Accepted Answer · 2015-08-17 18:12:55Z

17

Probably, the data type of the data frame columns are factors. That is why the code you tried didn't work, you can check it using str(df):

First option

Use the argument stringsAsFactors = FALSEwhen you import your data:

df <- read.table(text =  "V1    V2  V3  V4  V5
                        col1    col2    col3    col4 col5
                        row1    2   4   5   56
                        row2    74  74  3   534
                        row3    865 768 8   7
                        row4    68  86  65  87", header = TRUE, 
                        stringsAsFactors = FALSE )

Then you can use your first attempt, then remove your first row if you'd like:

colnames(df) <- df[1,]
df <- df[-1, ]

Second option

It will work if your columns are factors or characters:

names(df) <- lapply(df[1, ], as.character)
df <- df[-1,]

Output:

  col1 col2 col3 col4 col5
2 row1    2    4    5   56
3 row2   74   74    3  534
4 row3  865  768    8    7
5 row4   68   86   65   87

edited Aug 17, 2015 at 18:12

answered Aug 17, 2015 at 17:49

mpalanco

13.7k3 gold badges66 silver badges71 bronze badges

1 Comment

Matthew Kozubov Over a year ago

Not sure if relevant, but I had a matrix, and this solution almost worked except I changed names(df) to colnames(df), and it seems to have worked?

Kim · Accepted Answer · 2020-09-08 03:27:27Z

12

While @sbha has already offered a tidyverse solution, I would like to leave a fully pipeable dplyr option. I agree that this should could be an incredibly useful function.

library(dplyr)
data.frame(x = c("a", 1, 2, 3), y = c("b", 4, 5, 6)) %>%
  `colnames<-`(.[1, ]) %>%
  .[-1, ]

answered Sep 8, 2020 at 3:27

Kim

4,3682 gold badges34 silver badges53 bronze badges

Comments

mattbawn · Accepted Answer · 2015-08-17 16:31:34Z

7

How about:

my.names <- t1[1,]

colnames(t1) <- my.names

i.e. specifically naming the row as a variable?

with the following code:

namex <-c("col1","col2","col3","col4")
row1 <- c(2, 4, 5, 56)
row2 <- c(74, 73, 3, 534)
row3 <- c(865, 768, 8, 7)
row4 <- c(68, 58, 65, 87)

t1 <- data.frame(namex, row1, row2, row3, row4)
t1 <- t(t1)

my.names <- t1[1,]

colnames(t1) <- my.names

It seems to work, but maybe I'm missing something?

edited Aug 17, 2015 at 16:31

answered Aug 17, 2015 at 15:50

mattbawn

1,3782 gold badges13 silver badges33 bronze badges

1 Comment

Veerendra Gadekar Over a year ago

yes you are missing two steps, first you need to remove the first row which you are using as column names and convert the matrix to data.frame

Marcus · Accepted Answer · 2020-12-17 12:55:30Z

6

You almost did that, only missed calling a vector with c

colnames(t1)=t1[c(1),]

Then you can erase the first row, as now it is doubled

t1=t1[-c(1),]

answered Dec 17, 2020 at 12:55

Marcus

611 silver badge1 bronze badge

1 Comment

anatol Over a year ago

best solution ever!

MikeRSpencer · Accepted Answer · 2015-08-17 16:21:07Z

5

Take a step back, when you read your data use skip=1 in read.table to miss out the first line entirely. This should make life a bit easier when you're cleaning data, particularly for data type. This is key as your problem stems from your data being encoded as factor.

You can then read in your column names separately with nrows=1 in read.table.

edited Aug 17, 2015 at 16:21

answered Aug 17, 2015 at 16:11

MikeRSpencer

1,3161 gold badge10 silver badges24 bronze badges

Comments

sbha · Accepted Answer · 2019-08-16 21:42:32Z

3

Similar to some of the other answers, here is a dplyr/tidyverse option:

library(tidyverse)

names(df) <- df %>% slice(1) %>% unlist()
df <- df %>% slice(-1)

answered Aug 16, 2019 at 21:42

sbha

10.5k2 gold badges79 silver badges64 bronze badges

Comments

DMillan · Accepted Answer · 2018-06-04 09:05:44Z

1

Using data.table:

library(data.table)

namex <-c("col1","col2","col3","col4")
row1 <- c(2, 4, 5, 56)
row2 <- c(74, 73, 3, 534)
row3 <- c(865, 768, 8, 7)
row4 <- c(68, 58, 65, 87)

t1 <- data.table(namex, row1, row2, row3, row4)
t1 <- data.table(t(t1))

setnames(t1, as.character(t1[1,]))
t1 <- t1[-1,]

answered Jun 4, 2018 at 9:05

DMillan

1296 bronze badges

Comments

otteheng · Accepted Answer · 2021-06-16 16:26:06Z

0

Building off of Pierre L's answer. Sometimes the first row in a document ends up getting split into two or more rows when pulled into a data frame. This slight modification helped solve that for me.

header.true <- function(df) {
  r1 <- as.character(unlist(df[1,]))
  r2 <- as.character(unlist(df[2,]))
  r1.2 <- paste(r1,r2, sep = ".")
  names(df) <- r1.2
  df[-c(1,2),]
}

Test

df1 <- data.frame(c("a", "xx",1,2,3), c("b", "xx",4,5,6))
header.true(df1)
  a.xx b.xx
3    1    4
4    2    5
5    3    6

answered Jun 16, 2021 at 16:26

otteheng

6041 gold badge11 silver badges27 bronze badges

Comments

cigien · Accepted Answer · 2023-02-12 12:47:07Z

0

I think the shortest way is:

colnames(df) <- unlist(df[1, ])

edited Feb 12, 2023 at 12:47

cigien

61.2k11 gold badges86 silver badges124 bronze badges

answered Feb 12, 2023 at 11:06

Grad Doc

11 bronze badge

Comments

LMc · Accepted Answer · 2024-08-05 22:08:20Z

Here are two possibilities that also provide some useful flexibility (shown below):

library(unheadr)

mash_colnames(df, n_name_rows = 1, keep_names = FALSE)
#>   col1 col2 col3 col4 col5
#> 2 row1    2    4    5   56
#> 3 row2   74   74    3  534
#> 4 row3  865  768    8    7
#> 5 row4   68   86   65   87

library(scrutiny)

row_to_colnames(df)

Both these options allow you to get column names if they are broken across multiple rows, with the unheadr package offering a bit more flexibility:

babies <-
  data.frame(
    stringsAsFactors = FALSE,
    Baby = c(NA, NA, "Angie", "Yean", "Pierre"),
    Age = c("in", "months", "11", "9", "7"),
    Weight = c("kg", NA, "2", "3", "4"),
    Ward = c(NA, NA, "A", "B", "C")
  )

babies
#>     Baby    Age Weight Ward
#> 1   <NA>     in     kg <NA>
#> 2   <NA> months   <NA> <NA>
#> 3  Angie     11      2    A
#> 4   Yean      9      3    B
#> 5 Pierre      7      4    C

mash_colnames(babies, n_name_rows = 2, keep_names = TRUE)
#>     Baby Age_in_months Weight_kg Ward
#> 3  Angie            11         2    A
#> 4   Yean             9         3    B
#> 5 Pierre             7         4    C

Or some survey data tend to have user response options in a different row from the question:

survey <-
  data.frame(
    stringsAsFactors = FALSE,
    X1 = c("Participant", NA, "12", "34", "45", "123"),
    X2 = c(
      "How did you hear about us?",
      "TV", "TRUE", "FALSE", "FALSE", "FALSE"
    ),
    X3 = c(NA, "Social Media", "FALSE", "TRUE", "FALSE", "FALSE"),
    X4 = c(NA, "Radio", "FALSE", "TRUE", "FALSE", "TRUE"),
    X5 = c(NA, "Flyer", "FALSE", "FALSE", "FALSE", "FALSE"),
    X6 = c("Age", NA, "31", "23", "19", "24")
  )
survey
#>            X1                         X2           X3    X4    X5   X6
#> 1 Participant How did you hear about us?         <NA>  <NA>  <NA>  Age
#> 2        <NA>                         TV Social Media Radio Flyer <NA>
#> 3          12                       TRUE        FALSE FALSE FALSE   31
#> 4          34                      FALSE         TRUE  TRUE FALSE   23
#> 5          45                      FALSE        FALSE FALSE FALSE   19
#> 6         123                      FALSE        FALSE  TRUE FALSE   24

mash_colnames(survey, 2, keep_names = FALSE, sliding_headers = TRUE, sep = "_")
#>   Participant How did you hear about us?_TV How did you hear about us?_Social Media How did you hear about us?_Radio How did you hear about us?_Flyer Age
#> 3          12                          TRUE                                   FALSE                            FALSE                            FALSE  31
#> 4          34                         FALSE                                    TRUE                             TRUE                            FALSE  23
#> 5          45                         FALSE                                   FALSE                            FALSE                            FALSE  19
#> 6         123                         FALSE                                   FALSE                             TRUE                            FALSE  24

Collectives™ on Stack Overflow

use first row data as column names in r

12 Answers 12

Comments

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

Comments

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related