2

I have this df:

data <- structure(list(location = c("bern", "bern", "zurich", "zurich", 
                                "basel", "basel", "basel"), location_latitude = c(4.1, 4.1, 6.2, 
                                                                                  6.2, 7.3, 7.3, 7.3), location_longitude = c(2.1, 2.1, 3.2, 3.2, 
                                                                                                                              5.6, 5.6, 5.6), location_population = c(38, 38, 72, 72, 46, 46, 
                                                                                                                                                                      46), origin = c("zurich", "basel", "bern", "basel", "bern", "zurich", 
                                                                                                                                                                                      "locarno"), origin_temperature = c(12, 20, 21, 20, 21, 12, 27
                                                                                                                                                                                      )), row.names = c(NA, 7L), class = "data.frame")

I have latitude and longitude for location, but I don’t have latitude and longitude for origin.

I want to insert two columns and populate them with latitude and longitude for origin, based on corresponding coordinates of column location, like this:

data_needed <- structure(list(location = c("bern", "bern", "zurich", "zurich", 
                                       "basel", "basel", "basel"), location_latitude = c(4.1, 4.1, 6.2, 
                                                                                         6.2, 7.3, 7.3, 7.3), location_longitude = c(2.1, 2.1, 3.2, 3.2, 
                                                                                                                                     5.6, 5.6, 5.6), location_population = c(38, 38, 72, 72, 46, 46, 
                                                                                                                                                                             46), origin = c("zurich", "basel", "bern", "basel", "bern", "zurich", 
                                                                                                                                                                                             "locarno"), origin_latitude = c("6.2", "7.3", "4.1", 
                                                                                                                                                                                                                             "7.3", "4.1", "6.2", "NA"), origin_longitude = c("3.2", 
                                                                                                                                                                                                                                                                                             "5.6", "2.1", "5.6", "2.1", "3.2", "NA"), origin_temperature = c(12, 
                                                                                                                                                                                                                                                                                                                                                              20, 21, 20, 21, 12, 27)), row.names = c(NA, 7L), class = "data.frame")

I assume it needs to be done column wise, but I don’t know how to do it.

Also I don’t want to have to add conditions that specify locations (e.g., if “zurich”), because the dataset has thousands of locations and origins. I need this to be done ‘automatically’.

Also note that origins that have no matching coordinates in locations (such as Locarno) should return NAs.

Please help!

2 Answers 2

3

Using base R:

data <- within(data, origin_latitude <- location_latitude[match(origin, location)])
data <- within(data, origin_longitude<- location_longitude[match(origin, location)])

Using data.table:

setDT(data)
data[, 
     c("origin_latitude", "origin_longitude") := .SD[match(origin, location)], 
     .SDcols = c("location_latitude", "location_longitude")]
Sign up to request clarification or add additional context in comments.

2 Comments

Many thanks for base R solution. Can you please explain how it works? In particular, the match function?
match() returns a vector of the positions of (first) matches of its first argument in its second. See for example match(c("J", "O"), LETTERS). So match(origin, location) is asking, what is the first row where I can find this origin in the location column. And when we know the row we can find the corresponding lat/long.
2

Here is a way using dplyr

library(dplyr)

data %>%
    select(origin = "location", origin_latitude = "location_latitude", origin_longitude = "location_longitude") %>%
    distinct() %>%
    left_join(data, ., by = "origin") %>%
    select(-origin_temperature, origin_temperature)

  location location_latitude location_longitude location_population  origin origin_latitude origin_longitude origin_temperature
1     bern               4.1                2.1                  38  zurich             6.2              3.2                 12
2     bern               4.1                2.1                  38   basel             7.3              5.6                 20
3   zurich               6.2                3.2                  72    bern             4.1              2.1                 21
4   zurich               6.2                3.2                  72   basel             7.3              5.6                 20
5    basel               7.3                5.6                  46    bern             4.1              2.1                 21
6    basel               7.3                5.6                  46  zurich             6.2              3.2                 12
7    basel               7.3                5.6                  46 locarno              NA               NA                 27

2 Comments

it worked, thank you so much. 2 quick questions so that I can understand the code: I gather that 'distinct()' retains unique rows, but why is this needed in this case? Also, what does '.,' do/mean when using 'left_join'?
We use distinct to prevent duplicate rows when we join. The . references the output of the current pipe, which we are joining with our original data

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.