2

I have a CSV file that contains information about journeys taken by car.

enter image description here

I want to wrangle this data so that I have a single list for each journey (every row). That list should contain the journey_code as the first item in the list and then all the subsequent MGRS units as separate items. Lastly, I would want to have all those journey lists grouped into a parent list.

If I was doing this manually, it would look like this:

journeyCodeA = ['journeyCodeA', 'mgrs1', 'mgrs2', 'mgrs3']
journeyCodeB = ['journeyCodeB', 'mgrs2', 'mgrs4', 'mgrs7']
combinedList = [journeyCodeA, journeyCodeB]

Here is what I have so far to create a list per row and to combine the desired columns.

comparison_journey_mgrs = pd.read_csv(r"journey-mgrs.csv", delimiter = ',')
comparison_journey_mgrs['mgrs_grids'] = comparison_journey_mgrs['mgrs_grids'].str.replace(" ","")
comparison_journey_list = []

for index, rows in comparison_route_mgrs.iterrows():
        holding_list = [rows.journey_code, rows.mgrs_grids]
        comparison_journey_list.append(holding_list)

The problem with this is that it is treating the mgrs_grids column as a single string.

My list looks like this:

[['7211863-140','18TWL927129,18TWL888113,18TWL888113,...,18TWL903128']]

But I want it to looks like this:

[['7211863-140','18TWL927129', '18TWL888113', '18TWL888113',..., '18TWL903128']]

I am struggling to find a way to iterate over each row of the dataframe, reference the mgrs_grids column and then convert the comma-separated string to a list in-place.

Thanks for any help!


{'driver_code': {0: 7211863, 1: 7211863, 2: 7211863, 3: 7211863},
 'journey_code': {0: '7211863-140',
  1: '7211863-105',
  2: '7211863-50',
  3: '7211863-109'},
 'mgrs_grids': {0: '18TWL927129,18TWL888113,18TWL888113,18TWL887113,18TWL888113,18TWL887113,18TWL887113,18TWL887113,18TWL903128',
  1: '18TWL927129,18TWL939112,18TWL939112,18TWL939113,18TWL939113,18TWL939113,18TWL939113,18TWL939113,18TWL939113,18TWL960111,18TWL960112',
  2: '18TWL927129,18TWL889085,18TWL889085,18TWL888085,18TWL888085,18TWL888085,18TWL888085,18TWL888085,18TWL890085',
  3: '18TWL927129,18TWL952106,18TWL952106,18TWL952106,18TWL952106,18TWL952106,18TWL952106,18TWL952106,18TWL952105,18TWL951103'}}
0

2 Answers 2

15
# use str split on the column
df.mgrs_grids = df.mgrs_grids.str.split(',')

# display(df)
   driver_code journey_code                                                                                                                                       mgrs_grids
0      7211863  7211863-140                            [18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1      7211863  7211863-105  [18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2      7211863   7211863-50                            [18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3      7211863  7211863-109               [18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]

print(type(df.loc[0, 'mgrs_grids']))
[out]:
list

separate row per value

# get a separate row for each value
df = df.explode('mgrs_grids').reset_index(drop=True)

# display(df.hea())
   driver_code journey_code   mgrs_grids
0      7211863  7211863-140  18TWL927129
1      7211863  7211863-140  18TWL888113
2      7211863  7211863-140  18TWL888113
3      7211863  7211863-140  18TWL887113
4      7211863  7211863-140  18TWL888113

Update

  • Here is another option, which combines the 'journey_code' to the front of 'mgrs_grids', and then splits the string into a list.
    • This list is assigned back to 'mgrs_grids', but can also be assigned to a new column.
# add the journey code to mgrs_grids and then split
df.mgrs_grids = (df.journey_code + ',' + df.mgrs_grids).str.split(',')

# display(df.head())
   driver_code journey_code                                                                                                                                                    mgrs_grids
0      7211863  7211863-140                            [7211863-140, 18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1      7211863  7211863-105  [7211863-105, 18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2      7211863   7211863-50                             [7211863-50, 18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3      7211863  7211863-109               [7211863-109, 18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]

# output to nested list
df.mgrs_grids.tolist()

[out]:
[['7211863-140', '18TWL927129', '18TWL888113', '18TWL888113', '18TWL887113', '18TWL888113', '18TWL887113', '18TWL887113', '18TWL887113', '18TWL903128'],
 ['7211863-105', '18TWL927129', '18TWL939112', '18TWL939112', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL960111', '18TWL960112'],
 ['7211863-50', '18TWL927129', '18TWL889085', '18TWL889085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL890085'],
 ['7211863-109', '18TWL927129', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952105', '18TWL951103']]
Sign up to request clarification or add additional context in comments.

Comments

3

You could also split and explode your dataframe into a tabular format.

df1 = df.join(df['mgrs_grids'].str.split(',',expand=True).stack().reset_index(1),how='outer')\
        .drop(['level_1','mgrs_grids'],1).rename(columns={0 : 'mgrs_grids'})


print(df1)

   driver_code journey_code   mgrs_grids
0      7211863  7211863-140  18TWL927129
0      7211863  7211863-140  18TWL888113
0      7211863  7211863-140  18TWL888113
0      7211863  7211863-140  18TWL887113
0      7211863  7211863-140  18TWL888113
0      7211863  7211863-140  18TWL887113
0      7211863  7211863-140  18TWL887113
0      7211863  7211863-140  18TWL887113
0      7211863  7211863-140  18TWL903128
1      7211863  7211863-105  18TWL927129
1      7211863  7211863-105  18TWL939112
1      7211863  7211863-105  18TWL939112
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL960111
1      7211863  7211863-105  18TWL960112
2      7211863   7211863-50  18TWL927129
2      7211863   7211863-50  18TWL889085
2      7211863   7211863-50  18TWL889085
2      7211863   7211863-50  18TWL888085
2      7211863   7211863-50  18TWL888085
2      7211863   7211863-50  18TWL888085
2      7211863   7211863-50  18TWL888085
2      7211863   7211863-50  18TWL888085
2      7211863   7211863-50  18TWL890085
3      7211863  7211863-109  18TWL927129
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952105
3      7211863  7211863-109  18TWL951103

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.