0

Overview: I am creating a recommendation system that compares a course already taken by a student to a catalog of available courses the student has not yet taken. The recommendation system will return 3 courses of recommended courses.

Issue: Using a custom recommendation function that returns 3 values in a for loop that iterates through a transcript to compare already taken classes. The loop essentially finds/recommends the 3 classes that the student should take next. The issue is all the classes appear in one column cell and I have not found an easy way to break the column into separate rows.

Deeper dive:

I have a function (c_recommend) that returns 3 recommendations in the form of a series:

output: Series

INDEX Program Title
123 program 1
456 program 2
789 program 3

I then use this function(c_recommend) inside a for loop to iterate over the rows of a transcript to find the course title to compare to the catalog of classes.

## created empty list
results = list()

## run through the transcript 
for i in transcript.index:
## append to the list the name of the student, the course already taken, the recommended courses (3 will appear)
results.append([transcript['student'].loc[i],transcript['Course'].loc[i],c_recommend(transc['Course'].loc[i])])

output: List

Student Taken Class Recommended Classes
111 program 1 program 2, program 3, program 4
222 program 2 program 5, program 1, program 3
333 program 3 program 2, program 1, program 4

The recommended classes are all bunched into one row due to the fact that the c_recommend function runs and returns three values. I need a way to separate those 3 values out into their own columns like so:

desired output:

Student Taken Class Recommended Classes Reco Class 2 Reco Class 3
111 program 1 program 2 program 3 program 4
222 program 2 program 5 program 1 program 3
333 program 3 program 2 program 1 program 4

I have tried converting the list to a pandas dataframe and separating, using regex to split the commas, using nested loops. Alas, I have failed and the columns does not separate :( Ideally after this issue is fixed, I would like to convert this to a pandas DF. Maybe there is an easier way to handle this with pandas?

I would appreciate all and any insight even if that means rewriting my function.

TIA!

1

1 Answer 1

0

The best would probably be to output directly a Series (i.e, several columns) in your initial transformation.

As you did not provided information on this, here is a way to rework your first output:

(df.drop('Recommended Classes', axis=1)
   .join(df['Recommended Classes']
           .str.split(', ', expand=True)  # split list of recommended classes
           .rename(columns=lambda x: x+1) # increment column name
           .add_prefix('Reco Class ')     # add prefix to column name
        )
)

output:

   Student  Taken Class  Reco Class 1 Reco Class 2 Reco Class 3
0       111   program 1     program 2    program 3    program 4
1       222   program 2     program 5    program 1    program 3
2       333   program 3     program 2    program 1    program 4
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.