1

Imagine the following data (the code to reproduce all the outputs is at the end):

df

           cars horsepower year safety
1        Toyota        140 2008      4
2      Chrysler        120 2009      4
3          Ford        140 2010      5
4           BMW        150 2008      3
5 Mercedes-Benz        150 2008      3
6       Hyundai        120 2009      4
7        Jaguar        150 2007      3
8         Tesla        120 2010      5

I'd like to swap the cars to get something like:

   cars_initial    cars_match horsepower year safety horsepowerMatch yearMatch safetyMatch
1        Toyota           BMW        140 2008      4             150      2008           3
2         Tesla      Chrysler        120 2010      5             120      2009           4
3 Mercedes-Benz          Ford        150 2008      3             140      2010           5
4        Jaguar       Hyundai        150 2007      3             120      2009           4
5       Hyundai        Jaguar        120 2009      4             150      2007           3
6          Ford Mercedes-Benz        140 2010      5             150      2008           3
7      Chrysler         Tesla        120 2009      4             120      2010           5
8           BMW        Toyota        150 2008      3             140      2008           4

Now this is a typical assignment problem that was in the case above solved randomly, i.e. by cost matrix set to 0 in all cases.

What I'm interested in are the outcomes. In the above case, the solution yields the following stats:

stats

  horsepower year safety
1       0.25 0.25      0

That is to say, 1/4 of swaps had an equal horsepower, etc.

Here is my question: How to solve such assignments by setting constraints on what exactly should be the outcome statistics directly, without the trial-and-error approach with setting the costs?

For instance, what if I would like to have a solution where safety has more than 0.20 match, and year at least 0.10, like below?

desiredOutput

   cars_initial    cars_match
1        Toyota      Chrysler
2         Tesla Mercedes-Benz
3 Mercedes-Benz           BMW
4        Jaguar        Toyota
5       Hyundai         Tesla
6          Ford       Hyundai
7      Chrysler        Jaguar
8           BMW          Ford

statsDesired

  horsepower year safety
1       0.25 0.12   0.25

Of course I could just set the cost matrix to a lower number in all cases where safety of cars is equal.

But is there a way to influence the result by directly setting the constraint on what should be the outcome statistics?

Perhaps there is a way to optimize the costs in order to arrive at the desired result?

The code:

library(lpSolve)
library(dplyr)
library(tidyr)

set.seed(1)

df <- data.frame(
  cars = c("Toyota", "Chrysler", "Ford", "BMW", "Mercedes-Benz", "Hyundai", "Jaguar", "Tesla"),
  horsepower = c(140, 120, 140, 150, 150, 120, 150, 120),
  year = c(2008, 2009, 2010, 2008, 2008, 2009, 2007, 2010),
  safety = c(4, 4, 5, 3, 3, 4, 3, 5)
)

mat <- df %>% select(cars) %>%
  crossing(df %>% select(cars)) %>%
  mutate(val = 0) %>% 
  spread(cars, val)

solved <- lp.assign(mat %>% select(-cars1) %>% as.matrix())$solution

matches <- as.data.frame(solved) %>%
  setNames(., names(mat %>% select(-cars1))) %>%
  bind_cols(mat %>% select(cars1)) %>%
  gather(key, val, -cars1) %>%
  filter(val == 1) %>% select(-val, cars_initial = cars1, cars_match = key)

nms <- c("cars", paste0(names(df %>% select(-cars)), "Match"))

matches <- matches %>%
  left_join(df, by = c("cars_initial" = "cars")) %>%
  left_join(df %>% setNames(., nms), by = c("cars_match" = "cars"))

stats <- matches %>%
  summarise(
    horsepower = round(sum(horsepower == horsepowerMatch) / n(), 2),
    year = round(sum(year == yearMatch) / n(), 2),
    safety = round(sum(safety == safetyMatch) / n(), 2)
  )

desiredOutput <- data.frame(cars_initial = matches$cars_initial, cars_match = c("Chrysler", "Mercedes-Benz", "BMW", "Toyota", "Tesla", "Hyundai", "Jaguar", "Ford"))

statsDesired <- desiredOutput %>%
  left_join(df, by = c("cars_initial" = "cars")) %>%
  left_join(df %>% setNames(., nms), by = c("cars_match" = "cars")) %>%
  summarise(
    horsepower = round(sum(horsepower == horsepowerMatch) / n(), 2),
    year = round(sum(year == yearMatch) / n(), 2),
    safety = round(sum(safety == safetyMatch) / n(), 2)
  )

I hope the examples above are sufficient, this is my first question so please let me know if I need to provide something more.

The code is in R, but I have also added the tag Python as I don't really mind the language of possible solutions.

6
  • Welcome to SO! Let me make sure I understand your question correctly. Are you saying: For each car type, choose a match, so that at least a% of the pairs have the same HP as each other, at least b% of the pairs have the same year as each other, and at least c% of the pairs have the same safety as each other? Commented May 23, 2019 at 21:12
  • Thank you! Exactly, that's a very good formulation. Commented May 23, 2019 at 21:13
  • I would formulate this as an integer programming problem. How familiar are you with IP? Commented May 23, 2019 at 21:14
  • Not at all I must admit! Do you have some examples of similar types of problems and their solutions? Commented May 23, 2019 at 21:16
  • Well in that case it would be a whole new field for you to learn. I will get you started on a formulation for this problem and you can decide whether you want to keep going with it or try another approach. Commented May 23, 2019 at 21:18

1 Answer 1

1

Here is a partial formulation of this problem as an integer programming (IP) problem.

Let I be the set of car types. For car types i and j in I, let:

  • h[i,j] = 1 if cars i and j have the same horsepower
  • y[i,j] = 1 if cars i and j have the same year
  • and similarly for s[i,j] (safety)

These are parameters, meaning inputs to your model. (You'll need to write code to calculate these binary quantities based on your data table.)

Now introduce the following decision variables, i.e., variables that your IP model will choose values of:

  • x[i,j] = 1 if we assign car type j as type i's match

Now, normally an IP has an objective function that we want to minimize or maximize. In this case, there is no objective function -- you just want to find a set of matches that satisfies your constraints. So your objective function can just be:

minimize 0

Here is the first constraint. It says: At least a of the matches must have the same horsepower. (a is a fraction.) The left-hand side is the number of matches that have the same horsepower: For each pair of car types i and j, if j is assigned as i's match and they have the same horsepower, count a 1; otherwise, count a 0. The right-hand side is the number of matches you want, i.e., a fraction of the whole set.

subject to sum {i in I, j in I} h[i,j] * x[i,j] >= a * |I|

Now formulate similar constraints for the other categories.

Next, you need a constraint that says each car type i must be assigned to exactly one car type j:

subject to sum {j in I} x[i,j] == 1 for all i in I

Finally, you need constraints saying that the decision variables are binary:

subject to x[i,j] in {0,1} for all i, j in I

Now, in terms of solving this thing, you will need to either use a mathematical modeling language like AMPL or GAMS, or a package like PuLP for Python.

I hope this helps. I might have bitten off more than you can chew here.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot! This should get me started indeed. I will wait a bit to see if there are any other answers, otherwise I accept yours.
No worries, that’s totally fine.
Agreed this is a good fit for IP. Of note in this construction of the problem, when you are applying constraints to the outcome, you will likely get the first answer that satisfies your criteria, not the best/optimal answer because you don't have an objective function. Further, it isn't clear how you would set these constraints. How would you know beforehand that 20% horsepower match is achievable? This setup could fail to produce any results. I would make an objective function. However, if the problem is truly this small, just enumerate all the options (in this case 8!) and sort.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.