3

I have imported 4 columns and 1180598 rows of data in R from a text file. Following are the first five rows of data:

  Vehicle ID  Time    Vehicle Class  Preceding Vehicle
 1   2        0.1           2               0
 2   2        0.2           2               0
 3   2        0.3           2               0
 4   2        0.4           2               0
 5   2        0.5           2               0

The left-most column above is the index. 'Vehicle ID' is the ID of vehicle at a specific 'Time' as shown in 'Time' column. There are 2169 vehicles in total but here only vehicle 2 is shown. 'Vehicle Class' can be 1=motorcycle, 2=car or 3=truck. In the data shown above it is car. 'Preceding Vehicle' is the ID of the vehicle preceding the vehicle mentioned in 'Vehicle ID' column.

I want to create a new column of 'Preceding Vehicle Class' using the information above. For R to find the Preceding Vehicle Class, it must first look in the 'Preceding Vehicle' column and then go to look in 'Vehicle ID' column, when it finds the same ID it should see the class of vehicle in 'Vehicle Class' column and store the result in a new column 'Preceding Vehicle Class'. I have tried following code, but loading time exceeds 5 minutes and nothing happens:

for (i in a[,'Preceding Vehicle'])  for (j in a[,'Vehicle ID']) {
if (i==j) {pclass <- a[,'Vehicle ID']} else {pclass <- 0} }
a[,'Preceding Vehicle Class'] <-  pclass

'a' is the name of dataframe. Please help fixing the code.

2 Answers 2

3

Using the following version of a:

a <- structure(list(VehicleID = c(0L, 0L, 2L, 2L), Time = c(0.1, 0.2, 0.4, 0.5), VehicleClass = c(8L, 8L, 2L, 2L), PrecedingVehicle = c(-1L, -1L, 0L, 0L)), .Names = c("VehicleID", "Time", "VehicleClass", "PrecedingVehicle"), class = "data.frame", row.names = c("1", "2", "9", "10"))

Which looks like:

   VehicleID Time VehicleClass PrecedingVehicle
1          0  0.1            8               -1
2          0  0.2            8               -1
9          2  0.4            2                0
10         2  0.5            2                0

You can just do:

a$PrecVehClass <- a$VehicleClass[match(a$PrecedingVehicle,a$VehicleID)]

Which will give you your desired result:

   VehicleID Time VehicleClass PrecedingVehicle PrecVehClass
1          0  0.1            8               -1           NA
2          0  0.2            8               -1           NA
9          2  0.4            2                0            8
10         2  0.5            2                0            8
Sign up to request clarification or add additional context in comments.

Comments

1

Given a as in thelatemail's answer:

new_a = merge(a, a[, c('VehicleID', 'VehicleClass')], 
              by.x='PrecedingVehicle',
              by.y='VehicleID', 
              all.x=TRUE)

names(new_a) = c("PrecedingVehicle" ,"VehicleID","Time","VehicleClass",
             "Preceding Vehicle Class")

All processing is actually done with merge in 1st line. I just didn't find more elegant way to deal with renaming of columns...

If you are familiar with SQL then this is exactly left outer self-join.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.