Rearrange data for pandas dataframe?

Question

I received a tab delimited file from a server that outputs answers to questions on a per respondent basis. I'd like to import the data into a pandas dataframe where the columns are each question and the rows are each respondents' answer. Here's what it looks like for one respondent:

[2072]  Anonymous
Q-0 [01] Sat 25 May 2013  7:43 PM UTC +0000 0.14    Student (Graduate/ Undergraduate)
Q-1 [01] Sat 25 May 2013  7:43 PM UTC +0000 0.00    
Q-1 [01] Sat 25 May 2013  7:43 PM UTC +0000 0.00    1|1|1|1|4|
Q-2 [01] Sat 25 May 2013  7:43 PM UTC +0000 1.00    1-3
Q-3 [01] Sat 25 May 2013  7:43 PM UTC +0000 0.50    Male
Q-4 [01] Sat 25 May 2013  7:43 PM UTC +0000 0.33    18-24
Q-5 [01] Sat 25 May 2013  7:43 PM UTC +0000 1.00    
Q-6 [01] Sat 25 May 2013  7:43 PM UTC +0000 0.00    Prefer not to answer
Q-7 [01] Sat 25 May 2013  7:43 PM UTC +0000 0.50    Yes
Q-8 [01] Sat 25 May 2013  7:43 PM UTC +0000 0.13    Bachelor's Degree
Q-9 [01] Sat 25 May 2013  7:43 PM UTC +0000 0.00    Other
Q-10    [01] Sat 25 May 2013  7:43 PM UTC +0000 0.00    Mathematics
Q-11    [01] Sat 25 May 2013  7:43 PM UTC +0000 0.33    High school
Q-11    [01] Sat 25 May 2013  7:43 PM UTC +0000 0.33    College (introductory courses)
Q-12    [01] Sat 25 May 2013  7:43 PM UTC +0000 1.00    Professional
Q-13    [01] Sat 25 May 2013  7:43 PM UTC +0000 0.50    Mac OS X
Q-14    [01] Sat 25 May 2013  7:43 PM UTC +0000 0.25    Every week
Q-15    [01] Sat 25 May 2013  7:43 PM UTC +0000 0.00    A test that proves or disproves of some abstract theory about the world
Q-16    [01] Sat 25 May 2013  7:43 PM UTC +0000 0.00    
Q-17    [01] Sat 25 May 2013  7:43 PM UTC +0000 2.00    Yes
Q-18    [01] Sat 25 May 2013  7:43 PM UTC +0000 0.00    
Q-19    [01] Sat 25 May 2013  7:43 PM UTC +0000 0.20    Timely feedback from the instructor
Q-20    [01] Sat 25 May 2013  7:43 PM UTC +0000 0.00

There's a carriage return between each respondent's answers. Thanks for any help!

Hmm...why the downvote, gang? This seems like a well stated use case that may apply to others. — Dan Allan
– Dan Allan, Commented May 31, 2013 at 7:13

Dan Allan · Accepted Answer · 2013-05-30 15:28:01Z

1

The nontrivial step is delineating each respondent's block. What about rewriting the file to prefix each line with the respondent's ID? For example, in the case of "Anonymous," I see "2072".

import re

f = open('new_file', 'w')
for line in open('filename'):
    # line might be like [####] Student_Name or Q-...
    m = re.match('\[(\d+)\] .*', line)
    if m:
        # Line is like [####] Student_name.
        respondent_id = m.group(1)
        continue
    # Line is like Q-...
    # Write new line like #### Q-...
    f.write(str(respondent_id) + line)

Then use pandas read_csv to load this revised file, assigning to the index the first two columns. (They will be a MultiIndex.) Then use unstack to pivot the index of Qs into columns.

(Full Disclosure: I tested the regex, but I haven't tested it all.)

answered May 30, 2013 at 15:28

Dan Allan

35.5k6 gold badges72 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jeff Over a year ago

actually if they are fixed size blocks (e.g. 10 rows each), then could just read it in, then BinGroup, I think

Dan Allan Over a year ago

Cool. I had no idea that was a thing.

Jeff Over a year ago

actually, easier to do this: df.groupby(df.index.to_series()/3).sum() (for every 3 rows), BinGrouper have to specify the labels directly

dannycab · Accepted Answer · 2013-05-30 16:21:40Z

0

Here's what worked for me:

import re

f = open('new_file', 'w')
for line in open('filename'):
    m = re.match('\[\d+\]*', line)
    if m:
        respondent_id = m.group()
        continue
    f.write(str(respondent_id) + line)

answered May 30, 2013 at 16:21

dannycab

2491 gold badge2 silver badges5 bronze badges

Collectives™ on Stack Overflow

Rearrange data for pandas dataframe?

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related