For loop for converting character data to numeric in a data frame

Question

I have 50 data frames (different name each) with 10 (same name) columns of climate data. The first 5 columns although they are numbers, their class is "character". The rest 4 columns are already in the correct class (numeric) and the last one (named 'wind dir') is in character class so no change is needed.

I tried two ways to convert the class of those 5 columns in all 50 data frames, but nothing worked.

1st way) Firstly I've created a vector with the names of those 50 data frames and I named it onomata.

Secondly I've created a vector col_numbers2 <- c(1:5) with the number of columns I would like to convert.

Then I wrote the following code:

for(i in onomata){
  i[col_numbers2] <- sapply(i[col_numbers2], as.numeric)
}

Checking the class of those first five columns I saw that nothing changed. (No error report after executing the code)

2nd way) Then I tried to use the dplyr package with a for loop and the code is as follows:

for(i in onomata){
 i <- i %>%
  mutate_at(vars(-`wind_dir`),as.numeric)

In this case, I excluded the character column, and I applied the mutate function to the whole data frame, but I received an error message :

Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "character"

What do you think I am doing wrong ?

Thank you

Original data table (what I get when I use read.table() for each txt file:

date	Time	Tdry	Humidity	Wind_velocity	Wind_direction	Wind_gust
02/01/15	02:00	2.4	77.0	6.4	WNW	20.9
02/01/15	03:00	2.3	77.0	11.3	NW	30.6
02/01/15	04:00	2.3	77.0	9.7	NW	20.9
02/01/15	05:00	2.3	77.0	11.3	NW	30.6
02/01/15	06:00	2.3	78.0	9.7	NW	19.3
02/01/15	07:00	2.2	79.0	12.9	NNW	35.4
02/01/15	08:00	2.4	79.0	8.0	NW	14.5
02/01/15	09:00	2.6	79.0	8.0	WNW	20.9

Data after I split data in columns 1 and 2 (date, time):

day	month	year	Hour	Tdry	Humidity	Wind_velocity	Wind_direction	Wind_gust
02	01	15	02	2.4	77.0	6.4	WNW	20.9
02	01	15	03	2.3	77.0	11.3	NW	30.6
02	01	15	04	2.3	77.0	9.7	NW	20.9
02	01	15	05	2.3	77.0	11.3	NW	30.6
02	01	15	06	2.3	78.0	9.7	NW	19.3
02	01	15	07	2.2	79.0	12.9	NNW	35.4
02	01	15	08	2.4	79.0	8.0	NW	14.5
02	01	15	09	2.6	79.0	8.0	WNW	20.9

Hi, I understand the 50 dataframes are in your environment as separated objects. Is there a way for you to put all these dataframes in a list? — Paul
– Paul, Commented May 10, 2021 at 9:27
Hi Paul! Yes, I have all 50 txt files in my hard drive, but these 5 columns I want to convert are not included in these files, I created them by breaking up 2 columns. — Kon Ath
– Kon Ath, Commented May 10, 2021 at 9:29
I can see a solution if you load these tables as a list of dataframe. Then use lapply or mapply to change the columns from character to numeric. However, when you load these files in R, are these columns considered as character? If yes, maybe there is a problem with the data itself (like wrong decimal separator, etc.) — Paul
– Paul, Commented May 10, 2021 at 9:34
These 5 character class columns Paul are created by breaking 2 other character columns (date with type "01/05/19" and hour "05:00"), so I don't have the choice to read those files as numeric or something. — Kon Ath
– Kon Ath, Commented May 10, 2021 at 9:38

Paul · Accepted Answer · 2021-05-10 15:04:58Z

Here are two possible ways. Both relies on getting all your files in a list of dataframes (called df_list in the example below). To acheive this you could use mget() (ex: mget(onomata) or list.files()).

Once this is done, you can use lapply (or mapply) to go through all your dataframes.

Solution 1

To transform your data, I propose you 1st convert it into POSIXct format and then extract the relevant elements to make the wanted columns.

# create a custom function that transforms each dataframe the way you want
fun_split_datehour <- function(df){
  
  df[, "datetime"] <- as.POSIXct(paste(df$date, df$hour), format = "%d/%m/%Y %H:%M") # create a POSIXct column with info on date and time
  
  # Extract elements you need from the date & time column and store them in new columns
  df[,"year"] <- as.numeric(format(df[, "datetime"], format = "%Y"))
  df[,"month"] <- as.numeric(format(df[, "datetime"], format = "%m"))
  df[,"day"] <- as.numeric(format(df[, "datetime"], format = "%d"))
  df[,"hour"] <- as.numeric(format(df[, "datetime"], format = "%H"))
  df[,"min"] <- as.numeric(format(df[, "datetime"], format = "%M"))
  
  return(df)
}

# use this function on each dataframe of your list
lapply(df_list, FUN = fun_split_datehour)

Adapted from Split date data (m/d/y) into 3 separate columns (this answer)

Data:

# two dummy dataframe, date and hour format does not matter, you can tell as.POSIXct what to expect using format argument (see ?as.POSIXct)
df1 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
                 hour = c("05:32", "08:20", "15:33"))
df2 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
                  hour = c("05:32", "08:20", "15:33"))
# you can replace c("df1", "df2") with onomata:  df_list <- mget(onomata)
df_list <- mget(c("df1", "df2"))

Outputs:

> lapply(df_list, FUN = fun_split_datehour)
$df1
        date hour            datetime year month day min
1 2010-01-02    5 2010-01-02 05:32:00 2010     1   2  32
2 2010-02-03    8 2010-02-03 08:20:00 2010     2   3  20
3 2010-09-10   15 2010-09-10 15:33:00 2010     9  10  33

$df2
        date hour            datetime year month day min
1 2010-01-02    5 2010-01-02 05:32:00 2010     1   2  32
2 2010-02-03    8 2010-02-03 08:20:00 2010     2   3  20
3 2010-09-10   15 2010-09-10 15:33:00 2010     9  10  33

And columns year, month, day, hour and min are numeric. You can check using str(lapply(df_list, FUN = fun_split_datehour)).

Note: looking at the question you asked before this one, you might find https://stackoverflow.com/a/24376207/10264278 usefull. In addition, using POSIXct format will save you time if you want to make plots, arrange, etc.

Solution 2

If you do not want to use POSIXct, you could do:

# Dummy data changed to match you situation with already splited date
dfa <- data.frame(day = c("02", "03", "10"),
                  hour = c("05", "08", "15"))
dfb <- data.frame(day = c("02", "03", "10"),
                  hour = c("05", "08", "15"))
df_list <- mget(c("dfa", "dfb"))

# Same thing, use lapply() to go through each dataframe of the list and apply() to use as.numeric on the wanted columns
lapply(df_list, FUN = function(df){as.data.frame(apply(df[1:2], 2, as.numeric))}) # change df[1:2] to select columns you want to convert in your actual dataframes

Rui Barradas · Accepted Answer · 2021-05-10 09:40:00Z

1

Maybe the following code can help.
First, get the filenames with list.files. Second, read them all in with lapply. If read.table is not the appropriate function, read help("read.table"), it is the same page as for read.csv, read.csv2, etc. Then, coerce the first 5 columns of all data.frames to numeric in one go.

filenames <- list.files(path = "your_directory", pattern = "\\.txt")
onomata <- lapply(filenames, read.table)

onomata <- lapply(onomata, function(X){
  X[1:5] <- lapply(X[1:5], as.numeric)
  X
})

answered May 10, 2021 at 9:40

Rui Barradas

78k8 gold badges41 silver badges75 bronze badges

9 Comments

Kon Ath Over a year ago

Thank you for the Answer Rui. The problem is that these columns I am trying to convert does not exist in the original files (txt), I 've created created them by breaking 2 other character columns (column 1: date with type "01/05/19" and column 2: hour "05:00"), so the new columns are like this: col1: 01 col2: 05 col3: 19 col4: 05 col5: 00. If I had the choice to read them from the source file, your script seems fine!

Rui Barradas Over a year ago

@Paul X[1:5] <- lapply(X[1:5], as.numeric) will keep the structure of data.frame X, there's no need for apply.

Rui Barradas Over a year ago

@H.Johnson Can you post the first rows of one of the files in the question? And I do not understand if you need 5 columns or if a conversion to class "Date" or "POSIXt" wouldn't be better.

Kon Ath Over a year ago

Guys thanks for your answers, in the afternoon when I'll be on my PC I will post some rows from the original txt file, and the new data frame I've created after breaking up the two columns. Thank you for your time and asnwers and see you in a few hours!

Kon Ath Over a year ago

Your solution Paul worked! I shouldn't have split the character column of date and time into new character columns, but instead I should have convert them into POSIXt / and hour (with chron function), and then separate them with the method you suggested. Thank you very much!

|

Collectives™ on Stack Overflow

For loop for converting character data to numeric in a data frame

2 Answers 2

Comments

9 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

9 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related