0

I have 50 data frames (different name each) with 10 (same name) columns of climate data. The first 5 columns although they are numbers, their class is "character". The rest 4 columns are already in the correct class (numeric) and the last one (named 'wind dir') is in character class so no change is needed.

I tried two ways to convert the class of those 5 columns in all 50 data frames, but nothing worked.

1st way) Firstly I've created a vector with the names of those 50 data frames and I named it onomata.

Secondly I've created a vector col_numbers2 <- c(1:5) with the number of columns I would like to convert.

Then I wrote the following code:

for(i in onomata){
  i[col_numbers2] <- sapply(i[col_numbers2], as.numeric)
}

Checking the class of those first five columns I saw that nothing changed. (No error report after executing the code)

2nd way) Then I tried to use the dplyr package with a for loop and the code is as follows:

for(i in onomata){
 i <- i %>%
  mutate_at(vars(-`wind_dir`),as.numeric)

In this case, I excluded the character column, and I applied the mutate function to the whole data frame, but I received an error message :

Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "character"

What do you think I am doing wrong ?

Thank you

Original data table (what I get when I use read.table() for each txt file:

date Time Tdry Humidity Wind_velocity Wind_direction Wind_gust
02/01/15 02:00 2.4 77.0 6.4 WNW 20.9
02/01/15 03:00 2.3 77.0 11.3 NW 30.6
02/01/15 04:00 2.3 77.0 9.7 NW 20.9
02/01/15 05:00 2.3 77.0 11.3 NW 30.6
02/01/15 06:00 2.3 78.0 9.7 NW 19.3
02/01/15 07:00 2.2 79.0 12.9 NNW 35.4
02/01/15 08:00 2.4 79.0 8.0 NW 14.5
02/01/15 09:00 2.6 79.0 8.0 WNW 20.9

Data after I split data in columns 1 and 2 (date, time):

day month year Hour Minutes Tdry Humidity Wind_velocity Wind_direction Wind_gust
02 01 15 02 00 2.4 77.0 6.4 WNW 20.9
02 01 15 03 00 2.3 77.0 11.3 NW 30.6
02 01 15 04 00 2.3 77.0 9.7 NW 20.9
02 01 15 05 00 2.3 77.0 11.3 NW 30.6
02 01 15 06 00 2.3 78.0 9.7 NW 19.3
02 01 15 07 00 2.2 79.0 12.9 NNW 35.4
02 01 15 08 00 2.4 79.0 8.0 NW 14.5
02 01 15 09 00 2.6 79.0 8.0 WNW 20.9
4
  • Hi, I understand the 50 dataframes are in your environment as separated objects. Is there a way for you to put all these dataframes in a list? Commented May 10, 2021 at 9:27
  • Hi Paul! Yes, I have all 50 txt files in my hard drive, but these 5 columns I want to convert are not included in these files, I created them by breaking up 2 columns. Commented May 10, 2021 at 9:29
  • I can see a solution if you load these tables as a list of dataframe. Then use lapply or mapply to change the columns from character to numeric. However, when you load these files in R, are these columns considered as character? If yes, maybe there is a problem with the data itself (like wrong decimal separator, etc.) Commented May 10, 2021 at 9:34
  • 1
    These 5 character class columns Paul are created by breaking 2 other character columns (date with type "01/05/19" and hour "05:00"), so I don't have the choice to read those files as numeric or something. Commented May 10, 2021 at 9:38

2 Answers 2

1

Here are two possible ways. Both relies on getting all your files in a list of dataframes (called df_list in the example below). To acheive this you could use mget() (ex: mget(onomata) or list.files()).

Once this is done, you can use lapply (or mapply) to go through all your dataframes.

Solution 1

To transform your data, I propose you 1st convert it into POSIXct format and then extract the relevant elements to make the wanted columns.

# create a custom function that transforms each dataframe the way you want
fun_split_datehour <- function(df){
  
  df[, "datetime"] <- as.POSIXct(paste(df$date, df$hour), format = "%d/%m/%Y %H:%M") # create a POSIXct column with info on date and time
  
  # Extract elements you need from the date & time column and store them in new columns
  df[,"year"] <- as.numeric(format(df[, "datetime"], format = "%Y"))
  df[,"month"] <- as.numeric(format(df[, "datetime"], format = "%m"))
  df[,"day"] <- as.numeric(format(df[, "datetime"], format = "%d"))
  df[,"hour"] <- as.numeric(format(df[, "datetime"], format = "%H"))
  df[,"min"] <- as.numeric(format(df[, "datetime"], format = "%M"))
  
  return(df)
}

# use this function on each dataframe of your list
lapply(df_list, FUN = fun_split_datehour)

Adapted from Split date data (m/d/y) into 3 separate columns (this answer)

Data:

# two dummy dataframe, date and hour format does not matter, you can tell as.POSIXct what to expect using format argument (see ?as.POSIXct)
df1 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
                 hour = c("05:32", "08:20", "15:33"))
df2 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
                  hour = c("05:32", "08:20", "15:33"))
# you can replace c("df1", "df2") with onomata:  df_list <- mget(onomata)
df_list <- mget(c("df1", "df2"))

Outputs:

> lapply(df_list, FUN = fun_split_datehour)
$df1
        date hour            datetime year month day min
1 2010-01-02    5 2010-01-02 05:32:00 2010     1   2  32
2 2010-02-03    8 2010-02-03 08:20:00 2010     2   3  20
3 2010-09-10   15 2010-09-10 15:33:00 2010     9  10  33

$df2
        date hour            datetime year month day min
1 2010-01-02    5 2010-01-02 05:32:00 2010     1   2  32
2 2010-02-03    8 2010-02-03 08:20:00 2010     2   3  20
3 2010-09-10   15 2010-09-10 15:33:00 2010     9  10  33

And columns year, month, day, hour and min are numeric. You can check using str(lapply(df_list, FUN = fun_split_datehour)).

Note: looking at the question you asked before this one, you might find https://stackoverflow.com/a/24376207/10264278 usefull. In addition, using POSIXct format will save you time if you want to make plots, arrange, etc.


Solution 2

If you do not want to use POSIXct, you could do:

# Dummy data changed to match you situation with already splited date
dfa <- data.frame(day = c("02", "03", "10"),
                  hour = c("05", "08", "15"))
dfb <- data.frame(day = c("02", "03", "10"),
                  hour = c("05", "08", "15"))
df_list <- mget(c("dfa", "dfb"))

# Same thing, use lapply() to go through each dataframe of the list and apply() to use as.numeric on the wanted columns
lapply(df_list, FUN = function(df){as.data.frame(apply(df[1:2], 2, as.numeric))}) # change df[1:2] to select columns you want to convert in your actual dataframes
Sign up to request clarification or add additional context in comments.

Comments

1

Maybe the following code can help.
First, get the filenames with list.files. Second, read them all in with lapply. If read.table is not the appropriate function, read help("read.table"), it is the same page as for read.csv, read.csv2, etc. Then, coerce the first 5 columns of all data.frames to numeric in one go.

filenames <- list.files(path = "your_directory", pattern = "\\.txt")
onomata <- lapply(filenames, read.table)

onomata <- lapply(onomata, function(X){
  X[1:5] <- lapply(X[1:5], as.numeric)
  X
})

9 Comments

Thank you for the Answer Rui. The problem is that these columns I am trying to convert does not exist in the original files (txt), I 've created created them by breaking 2 other character columns (column 1: date with type "01/05/19" and column 2: hour "05:00"), so the new columns are like this: col1: 01 col2: 05 col3: 19 col4: 05 col5: 00. If I had the choice to read them from the source file, your script seems fine!
@Paul X[1:5] <- lapply(X[1:5], as.numeric) will keep the structure of data.frame X, there's no need for apply.
@H.Johnson Can you post the first rows of one of the files in the question? And I do not understand if you need 5 columns or if a conversion to class "Date" or "POSIXt" wouldn't be better.
Guys thanks for your answers, in the afternoon when I'll be on my PC I will post some rows from the original txt file, and the new data frame I've created after breaking up the two columns. Thank you for your time and asnwers and see you in a few hours!
Your solution Paul worked! I shouldn't have split the character column of date and time into new character columns, but instead I should have convert them into POSIXt / and hour (with chron function), and then separate them with the method you suggested. Thank you very much!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.