1

I can't simplify my data so I put them entirely. I would like to build the best possible team of 11 players according to the "niveau" column. Each "id" has a "niveau" note for the "statut" column. I think it would be necessary to test all the possible combinations of "niveau" without there being any "id" duplicates in order to obtain the best average level of the 11 players, but I don't know how to proceed. Do you have an idea please? Thank you

import pandas as pd

data = {'statut': {0: 'titulaire_01', 1: 'titulaire_01', 2: 'titulaire_01', 3: 'titulaire_01', 4: 'titulaire_01', 5: 'titulaire_01', 6: 'titulaire_01', 7: 'titulaire_01', 8: 'titulaire_02', 9: 'titulaire_02', 10: 'titulaire_02', 11: 'titulaire_02', 12: 'titulaire_02', 13: 'titulaire_02', 14: 'titulaire_02', 15: 'titulaire_02', 16: 'titulaire_02', 17: 'titulaire_02', 18: 'titulaire_02', 19: 'titulaire_02', 20: 'titulaire_02', 21: 'titulaire_02', 22: 'titulaire_02', 23: 'titulaire_02', 24: 'titulaire_02', 25: 'titulaire_02', 26: 'titulaire_02', 27: 'titulaire_02', 28: 'titulaire_03', 29: 'titulaire_03', 30: 'titulaire_03', 31: 'titulaire_03', 32: 'titulaire_03', 33: 'titulaire_03', 34: 'titulaire_03', 35: 
'titulaire_03', 36: 'titulaire_03', 37: 'titulaire_03', 38: 'titulaire_03', 39: 'titulaire_03', 40: 'titulaire_03', 41: 'titulaire_03', 42: 'titulaire_03', 43: 'titulaire_03', 44: 'titulaire_03', 45: 'titulaire_03', 46: 'titulaire_03', 47: 'titulaire_03', 48: 'titulaire_04', 49: 'titulaire_04', 50: 'titulaire_04', 51: 'titulaire_04', 52: 'titulaire_04', 53: 'titulaire_04', 54: 'titulaire_04', 55: 'titulaire_04', 56: 'titulaire_04', 57: 'titulaire_05', 58: 'titulaire_05', 59: 'titulaire_05', 60: 'titulaire_05', 61: 'titulaire_05', 62: 'titulaire_05', 63: 'titulaire_05', 64: 'titulaire_05', 65: 'titulaire_05', 66: 'titulaire_05', 67: 'titulaire_06', 68: 'titulaire_06', 69: 'titulaire_06', 70: 'titulaire_06', 71: 'titulaire_06', 72: 'titulaire_06', 73: 'titulaire_06', 74: 'titulaire_06', 75: 'titulaire_06', 76: 'titulaire_06', 77: 'titulaire_06', 78: 'titulaire_06', 79: 'titulaire_07', 80: 'titulaire_07', 81: 'titulaire_07', 82: 'titulaire_07', 83: 'titulaire_07', 84: 'titulaire_07', 85: 'titulaire_07', 86: 'titulaire_07', 87: 'titulaire_07', 88: 'titulaire_07', 89: 'titulaire_07', 90: 'titulaire_07', 91: 'titulaire_07', 92: 'titulaire_07', 93: 'titulaire_07', 94: 'titulaire_07', 95: 'titulaire_07', 96: 'titulaire_07', 97: 'titulaire_07', 98: 'titulaire_08', 99: 'titulaire_08', 100: 'titulaire_08', 101: 'titulaire_08', 102: 'titulaire_08', 103: 'titulaire_08', 104: 'titulaire_08', 105: 'titulaire_08', 106: 'titulaire_08', 107: 'titulaire_08', 108: 'titulaire_08', 109: 'titulaire_08', 110: 'titulaire_08', 111: 'titulaire_08', 112: 'titulaire_08', 113: 'titulaire_08', 114: 'titulaire_08', 115: 'titulaire_08', 116: 'titulaire_08', 117: 'titulaire_09', 118: 'titulaire_09', 119: 'titulaire_09', 120: 'titulaire_09', 121: 'titulaire_09', 122: 'titulaire_09', 123: 'titulaire_09', 124: 'titulaire_09', 125: 'titulaire_09', 126: 'titulaire_09', 127: 'titulaire_09', 128: 'titulaire_09', 129: 'titulaire_09', 130: 'titulaire_09', 131: 'titulaire_09', 132: 'titulaire_09', 133: 'titulaire_09', 134: 'titulaire_09', 135: 'titulaire_09', 136: 'titulaire_10', 137: 'titulaire_10', 138: 'titulaire_10', 139: 'titulaire_10', 140: 'titulaire_10', 141: 'titulaire_10', 142: 'titulaire_10', 143: 'titulaire_10', 144: 'titulaire_10', 145: 'titulaire_10', 146: 'titulaire_10', 147: 'titulaire_10', 148: 'titulaire_10', 149: 'titulaire_10', 150: 'titulaire_10', 151: 'titulaire_10', 152: 'titulaire_10', 153: 'titulaire_10', 154: 'titulaire_10', 155: 'titulaire_10', 156: 'titulaire_10', 157: 'titulaire_10', 158: 'titulaire_11', 159: 'titulaire_11', 160: 'titulaire_11', 161: 'titulaire_11', 162: 'titulaire_11', 163: 'titulaire_11', 164: 'titulaire_11', 165: 'titulaire_11', 166: 'titulaire_11', 167: 'titulaire_11', 168: 'titulaire_11', 169: 'titulaire_11', 170: 'titulaire_11', 171: 'titulaire_11', 172: 'titulaire_11', 173: 'titulaire_11', 174: 'titulaire_11', 175: 'titulaire_11', 176: 'titulaire_11', 177: 'titulaire_11', 178: 'titulaire_11', 179: 'titulaire_11'}, 'id': {0: 2002134607, 1: 2002043469, 2: 67156610, 3: 73201503, 4: 2000165962, 5: 2000143545, 6: 2002042688, 7: 2000055323, 8: 49054631, 9: 48031358, 10: 49048802, 11: 2002042816, 12: 2000045508, 13: 73201458, 14: 67191910, 15: 2002134617, 16: 2002042628, 17: 2000023214, 18: 2000165961, 19: 2000121963, 20: 2000045487, 21: 2000006106, 22: 14196664, 23: 2000055604, 24: 2002043613, 25: 49054633, 26: 49037900, 27: 2002043635, 28: 48031358, 29: 49037900, 30: 2002043635, 31: 2000121963, 32: 2000165961, 33: 67191910, 34: 2002042816, 35: 73201458, 36: 49054633, 37: 2000045487, 38: 2002043613, 39: 2000006106, 40: 2000055604, 41: 2000023214, 42: 2000045508, 43: 2002042628, 44: 14196664, 45: 2002134617, 46: 49054631, 47: 49048802, 48: 49040506, 49: 85126966, 50: 83169864, 51: 2002043476, 52: 2000045508, 53: 2002043613, 54: 2002042669, 55: 2000023214, 56: 73201460, 57: 67211095, 58: 83169864, 59: 13196665, 60: 2000055604, 61: 2000011411, 62: 2000165964, 63: 73201458, 64: 2002042939, 65: 2002043635, 66: 2002043613, 67: 2000045698, 68: 2002042722, 69: 2000132382, 70: 49054633, 71: 2002042845, 72: 2000045520, 73: 73201505, 74: 73201458, 75: 70137157, 76: 49040506, 77: 2002043635, 78: 2000143548, 79: 73200890, 80: 49060705, 81: 2000045543, 82: 2000045698, 
83: 2000011617, 84: 2002042722, 85: 2002042642, 86: 2000113673, 87: 85137101, 88: 19217413, 89: 2000147147, 90: 2002042845, 91: 2002043003, 92: 2002042627, 93: 2002042966, 94: 2000047331, 95: 2002042666, 96: 2000134665, 97: 2002042690, 98: 2000011617, 99: 2000045698, 100: 49060705, 101: 2000047331, 102: 2000147147, 103: 2000134665, 104: 2000113673, 105: 73200890, 106: 2002042845, 107: 19217413, 108: 2000045543, 109: 2002043003, 110: 2002042722, 111: 2002042666, 112: 2002042966, 113: 2002042627, 114: 2002042690, 115: 2002042642, 116: 85137101, 117: 2000134665, 118: 2002042666, 119: 2002042627, 120: 2000047331, 121: 2002042966, 122: 2002043003, 123: 2002042690, 124: 2002042845, 125: 2000147147, 126: 19217413, 127: 85137101, 128: 2002042722, 129: 2002042642, 130: 2000045543, 131: 2000011617, 132: 2000113673, 133: 49060705, 134: 73200890, 135: 2000045698, 136: 62124125, 137: 2002043171, 138: 2000165960, 139: 2002134617, 140: 2002042690, 141: 2000047311, 142: 2000105477, 143: 2002042627, 144: 2000037444, 145: 49060705, 146: 2002042642, 147: 2002134611, 148: 2002043003, 149: 2002042966, 150: 73201412, 151: 2002042813, 152: 67256520, 153: 2000047306, 154: 2002042983, 155: 12092876, 156: 96026541, 157: 2002043636, 158: 2000165960, 159: 49060705, 160: 12092876, 161: 2002042690, 162: 2002134617, 163: 2002042642, 164: 73201412, 165: 62124125, 166: 2000105477, 167: 2002042966, 168: 96026541, 169: 2002042983, 170: 2000047311, 171: 2002043171, 172: 2002134611, 173: 2002042813, 174: 2000047306, 175: 67256520, 176: 2002043003, 177: 2002043636, 178: 2002042627, 179: 2000037444}, 'niveau': {0: 13.605263157894736, 1: 25.13157894736842, 2: 22.473684210526315, 3: 16.236842105263158, 4: 15.789473684210526, 5: 15.342105263157896, 6: 28.394736842105264, 7: 14.789473684210526, 8: 16.727272727272727, 9: 25.741935483870968, 10: 17.424242424242426, 11: 28.03030303030303, 12: 16.696969696969695, 13: 16.636363636363637, 14: 25.454545454545453, 15: 16.484848484848484, 16: 30.606060606060606, 17: 16.424242424242426, 18: 17.151515151515152, 19: 17.151515151515152, 20: 19.151515151515152, 21: 22.03030303030303, 22: 25.272727272727273, 23: 19.818181818181817, 24: 25.12121212121212, 25: 20.272727272727273, 26: 28.09090909090909, 27: 26.0, 28: 26.06451612903226, 29: 28.545454545454547, 30: 26.242424242424242, 31: 17.454545454545453, 32: 17.606060606060606, 33: 25.757575757575758, 34: 28.333333333333332, 35: 17.09090909090909, 36: 20.575757575757574, 37: 19.454545454545453, 38: 25.272727272727273, 39: 21.575757575757574, 40: 20.12121212121212, 41: 15.969696969696969, 42: 16.393939393939394, 43: 30.303030303030305, 44: 25.515151515151516, 45: 16.939393939393938, 46: 17.03030303030303, 47: 17.87878787878788, 48: 18.142857142857142, 49: 24.37142857142857, 50: 24.057142857142857, 51: 25.4, 52: 15.17142857142857, 53: 23.34285714285714, 54: 28.142857142857142, 55: 15.085714285714285, 56: 16.257142857142856, 57: 23.34285714285714, 58: 23.771428571428572, 59: 22.6, 60: 18.285714285714285, 61: 18.685714285714287, 62: 16.514285714285716, 63: 15.82857142857143, 64: 25.885714285714286, 65: 26.142857142857142, 66: 23.485714285714284, 67: 17.564102564102566, 68: 28.384615384615383, 69: 17.153846153846153, 70: 18.205128205128204, 71: 25.46153846153846, 72: 15.512820512820513, 73: 14.615384615384615, 74: 14.846153846153847, 75: 17.564102564102566, 76: 17.487179487179485, 77: 24.974358974358974, 78: 14.461538461538462, 79: 22.5, 80: 20.0625, 81: 19.84375, 82: 18.9375, 83: 20.25, 84: 31.59375, 85: 33.1875, 86: 18.34375, 
87: 24.71875, 88: 26.03125, 89: 18.09375, 90: 28.34375, 91: 29.1875, 92: 32.46875, 93: 30.09375, 94: 18.5625, 95: 31.9375, 96: 15.28125, 97: 32.3125, 98: 19.9375, 99: 18.625, 100: 19.8125, 101: 18.8125, 102: 18.40625, 103: 15.75, 104: 18.03125, 105: 22.1875, 106: 28.09375, 107: 26.34375, 108: 20.15625, 109: 29.4375, 110: 31.34375, 111: 31.78125, 112: 
29.84375, 113: 32.21875, 114: 32.625, 115: 33.5, 116: 24.46875, 117: 15.870967741935484, 118: 31.483870967741936, 119: 32.354838709677416, 120: 18.29032258064516, 121: 29.741935483870968, 122: 29.677419354838708, 123: 32.41935483870968, 124: 28.129032258064516, 125: 18.032258064516128, 126: 26.06451612903226, 127: 24.70967741935484, 128: 31.838709677419356, 129: 33.61290322580645, 130: 20.35483870967742, 131: 19.129032258064516, 132: 18.580645161290324, 133: 20.419354838709676, 134: 22.483870967741936, 135: 19.451612903225808, 136: 23.59375, 137: 30.78125, 138: 19.28125, 139: 16.03125, 140: 31.78125, 141: 19.625, 142: 19.09375, 143: 32.0625, 144: 20.65625, 145: 20.625, 146: 32.96875, 147: 20.71875, 148: 29.15625, 149: 29.5, 150: 17.875, 151: 29.0625, 152: 21.28125, 153: 18.84375, 154: 28.4375, 155: 24.84375, 156: 26.53125, 157: 29.0625, 158: 18.8125, 159: 20.375, 160: 24.53125, 161: 32.09375, 162: 15.5625, 163: 33.28125, 164: 18.34375, 165: 23.125, 166: 18.625, 167: 29.25, 168: 26.84375, 169: 28.125, 170: 19.3125, 171: 30.53125, 172: 20.875, 173: 28.75, 174: 18.53125, 175: 21.03125, 176: 29.40625, 177: 29.375, 178: 31.8125, 179: 20.34375}}

df = pd.DataFrame(data)
print(df)
           statut          id     niveau
0    titulaire_01  2002134607  13.605263
1    titulaire_01  2002043469  25.131579
2    titulaire_01    67156610  22.473684
3    titulaire_01    73201503  16.236842
4    titulaire_01  2000165962  15.789474
..            ...         ...        ...
175  titulaire_11    67256520  21.031250
176  titulaire_11  2002043003  29.406250
177  titulaire_11  2002043636  29.375000
178  titulaire_11  2002042627  31.812500
179  titulaire_11  2000037444  20.343750

if I do groupby("statut") keeping the max of the "niveau" column I have "id" duplicates, an "id" can be in several "titulaire_01" and "titulaire_02" etc.. the result should be 11 rows with no duplicates

4
  • I am not sure whether I am understanding your problem correctly, but a solution may be to sort niveau column, by descending order, and take the top 11. Commented Oct 7, 2022 at 10:15
  • sorry I use the translator I will modify to try to understand better, I have to keep the best possible combination without duplicates Commented Oct 7, 2022 at 10:26
  • This falls into a calss of problem similar to optimal packing. Essentially, given that each item has a cost and a value, how can you maximize your value within the hard constraints of the cost. Here your cost is 11 unique ids, and your optimization goal is to maximmize the nveau sum. I'm not sure how your data is structured but you have a couple of options. Firstly, and exhaustive approach where you try every legal combination. If your data is too big that would take too long. Or you can look into metaheuristic techniques such as genetic algroithms. Commented Oct 7, 2022 at 11:12
  • @pseudoabdul it looks like a classic assignment problem, which scipy knows how to solve (see my answer below for an example) Commented Oct 7, 2022 at 11:18

1 Answer 1

1

It looks like an optimization problem, you can pivot your data to a rectangular format, then use scipy.optimize.linear_sum_assignment:

from scipy.optimize import linear_sum_assignment

df2 = df.pivot_table(index='id', columns='statut', values='niveau',
                     fill_value=0) # or fill_value=-np.inf

ID, statut = linear_sum_assignment(df2, maximize=True)

out = (pd.DataFrame({'statut': df2.columns[statut], 'id': df2.index[ID]})
         .sort_values(by='statut', ignore_index=True)
      )

output:

          statut          id
0   titulaire_01  2002042688
1   titulaire_02  2002042628
2   titulaire_03    49037900
3   titulaire_04  2002042669
4   titulaire_05  2002043635
5   titulaire_06  2002042722
6   titulaire_07  2002042666
7   titulaire_08  2002042690
8   titulaire_09  2002042627
9   titulaire_10  2002043171
10  titulaire_11  2002042642
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.