1

Im working on a student project for machine learning, using python and pandas for analyzing webdata. Therefore I need to convert multiple lines of data (from one session) in one line. The session has a variable length, each row consist of 5 values : referer, ip, time, requestAdress, session, which i want to get stored into columns.

    df_row = pd.DataFrame()
    length_session = len(df_work[df_work['session'] == session])
    for row in df_work[df_work.session == session].itertuples():                                                      #tuple = referer, ip, time, requestAdress, session
        for i in range(1,len(row)):
            name = ['referer', 'ip', 'time', 'requestAdress', 'session']
            df_row[str(name[i-1]) + str(length_session)] = row[i]
            print row[i]
        length_row-=1
    print(df_row)

The Output is:

https://www.google.de/
x5d80e060.dyn.telefonica.de
2016-07-06 03:41:02
/kuenstlerbedarf/oelfarben/
-8730846718325754703

Empty DataFrame
Columns: [referer28, ip28, time28, requestAdress28, session28, referer27, ip27, time27, requestAdress27, session27, referer26, ip26, time26, requestAdress26, session26, referer25, ip25, time25, requestAdress25, session25, referer24, ip24, time24, requestAdress24, session24, referer23, ip23, time23, requestAdress23, session23, referer22, ip22, time22, requestAdress22, session22, referer21, ip21, time21, requestAdress21, session21, referer20, ip20, time20, requestAdress20, session20, referer19, ip19, time19, requestAdress19, session19, referer18, ip18, time18, requestAdress18, session18, referer17, ip17, time17, requestAdress17, session17, referer16, ip16, time16, requestAdress16, session16, referer15, ip15, time15, requestAdress15, session15, referer14, ip14, time14, requestAdress14, session14, referer13, ip13, time13, requestAdress13, session13, referer12, ip12, time12, requestAdress12, session12, referer11, ip11, time11, requestAdress11, session11, referer10, ip10, time10, requestAdress10, session10, referer9, ip9, time9, requestAdress9, session9, ...]
Index: []

So, the dynamic naming of the columns work, but the DataFrame remains empty. All I found according this problem was this and this Question.

I want to know why the assignment at: df_row[str(name[i-1]) + str(length_row)] = row[i] does not work, and how i can achieve my goal to fill the dynamically named columns with the given values.

A big THANX in advance!

11
  • 1
    If your df is initially empty then simply assigning like this won't add a row, you need to use append Commented Nov 15, 2016 at 14:34
  • what's df_row, all i can see is row Commented Nov 15, 2016 at 14:44
  • sry. edited code snippet. Commented Nov 15, 2016 at 14:51
  • ah ok, then edchum's comment is the right one... what you are doing there is changing the existing values of the dataframe which is empty, so you need to append the row... however, this will be slow... the suggested way would be to allocate a large enough dataframe and then trim it as needed. Also seems you can safely take name = ['referer', 'ip', 'time', 'requestAdress', 'session'] out of the loops. Commented Nov 15, 2016 at 16:15
  • @EdChum +JohnSmith : Thank you, like you said, the empty DataFrame was the Problem. Do you want to post an answer, or is this question going to the trash? :) Commented Nov 17, 2016 at 10:29

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.