2

Data

I have 2 data sets:
* segments dataset represents the road segments (lhrs.)
* hwys dataset represents the highways that contain individual lhrs.

> segments
# A tibble: 1 x 5
   lhrs mto_collision_ref_number latitude longitude highway_number
  <dbl>                    <dbl>    <dbl>     <dbl>          <dbl>
1 10004                  1549630     42.9     -78.9              1 


> hwys
# A tibble: 5 x 3
  STREET          longitude latitude
  <fct>               <dbl>    <dbl>
1 HIGHWAY 3           -80.0     42.9
2 ADELAIDE AVE E      -78.9     43.9
3 HOWARD AVE          -83.0     42.2
4 HIGHWAY 12          -79.7     44.7
5 CORONATION BLVD     -80.3     43.4

The problem

As you can see, the STREET column is missing in the segments dataset. I want to create this column in the segments dataset by finding the distance between a given lhrs and a STREET based on the longitude and latitude values. This means that I need to compare one set of long, lat of lhrs to all 5 STREET locations and find the one that has the minimum distance. I think this can be done using purrr package.

My code

I can find the distances between each lhrs and STREET using the geosphere::distVincentyEllipsoid() distance as follows:

library(tidyverse)



segments_nested <- segments %>% group_by(mto_collision_ref_number) %>% nest()


segments_nested %>% 
  mutate(diztances = purrr::map(
    data, ~ distVincentyEllipsoid(hwys %>% select(longitude, latitude),
                                             c(.$longitude, .$latitude)))) %>% 
  unnest(.preserve = data)


# A tibble: 5 x 3
  mto_collision_ref_number data             diztances
                     <dbl> <list>               <dbl>
1                  1549630 <tibble [1 x 4]>    85316.
2                  1549630 <tibble [1 x 4]>   110700.
3                  1549630 <tibble [1 x 4]>   342921.
4                  1549630 <tibble [1 x 4]>   213961.
5                  1549630 <tibble [1 x 4]>   125547.  

HOWEVER, I can't still figure out how to connect these distances with the STREET. Please guide me how I can use purrr::map to calculate the distances AS WELL AS the corresponding STREET. Once I have that, I can just group_by(mto_collision_ref_number) and get the summarize(min(diztances)).

3
  • 2
    It's easier to help if you use dput to post your data Commented Mar 19, 2019 at 17:59
  • Piping your last set of code into another unnest() called with no arguments gets a data frame of the references, distances, and then the columns that make up the contents of data. Final dimensions are 5x6 Commented Mar 19, 2019 at 18:10
  • @camille That does not include STREET. Commented Mar 19, 2019 at 20:15

1 Answer 1

1

One way home is to take advantage of the flexibility in the anonymous function and use it to return an object that is already to spec. I used a combination of group_by() and transmute().

# this is setup for transmute() so we keep 'STREET' around
hwys <- group_by(hwys, STREET) 

segments_nested %>%
  mutate(results = purrr::map(
    data, ~ transmute(hwys, diztances = geosphere::distVincentyEllipsoid(c(longitude, latitude),
                                             c(.$longitude, .$latitude))))) %>% 
  unnest(results)

And bingo 'STREET' is back on the menu boys!

  mto_collision_ref_number STREET         diztances
                     <int> <chr>              <dbl>
1                  1549630 HIGHWAY3          89840.
2                  1549630 ADELAIDEAVEE     111101.
3                  1549630 HOWARDAVE        345569.
4                  1549630 HIGHWAY12        210099.
5                  1549630 CORONATIONBLVD   126702.

In the future try to share your data in a easier to reproduce format, I prefer read.table(text = ) but dput() is also fine as suggested above. I had to copy, paste and manipulate your output chunk to get it into R:

segments <- read.table(
  text = "lhrs mto_collision_ref_number latitude longitude highway_number
  1 10004 1549630 42.9 -78.9 1",
  header = T,
  stringsAsFactors = F
)
hwys <- read.table(
  text = "  STREET longitude latitude
  1 HIGHWAY3 -80.0 42.9
  2 ADELAIDEAVEE  -78.9 43.9
  3 HOWARDAVE -83.0 42.2
  4 HIGHWAY12 -79.7 44.7
  5 CORONATIONBLVD -80.3 43.4",
  header = T, 
  stringsAsFactors = F
)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot! +1 for the gif :) I don't know why but the output of dput for these data was too long. It was probably due to the spatial geometric info. in the data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.