How would I sort an outputted text file by the nth column - Python

Question

My code:

infile = open("ALE.txt", "r")
outfile = open("ALE_sorted.txt", "w")

for line in infile:
    data = line.strip().split(',')
    wins = int(data[2])
    percentage = 162 / wins
    p = str(data[0]) + ", " + data[1] + ", " + data[2] + ", " + 
str(round(percentage, 3)) + "\n"
    outfile.write(p)
infile.close()
outfile.close()

The original infile("ALE.txt") is just the first three columns below. The text file that is output from the code above looks like this:

Baltimore, 93, 69, 2.348
Boston, 69, 93, 1.742
New York, 95, 67, 2.418
Tampa Bay, 90, 72, 2.25
Toronto, 73, 89, 1.82

I know the code correctly calculates the win percentage (column 2/total wins), but I would like to sort this list by the 4th column (win percentage).

You might find this a lot simpler if you use pandas or at least, the csv module. Also eval? — pvg
– pvg, Commented Oct 23, 2017 at 2:20
@pvg changed eval to int. Didn't think it mattered - got same result — user8816916
– user8816916, Commented Oct 23, 2017 at 2:24
Create a tuple and sort it based on the forth column. stackoverflow.com/questions/12087905/… — user8753793
– user8753793, Commented Oct 23, 2017 at 2:25

Keerthana Prabhakaran · Accepted Answer · 2017-10-23 05:05:12Z

1

Append your data to a list, say d.

Sort it with the third item(4th column) of the list. Reference - operator.itemgetter

Write the sorted data to your output file.

Contents of input file

[kiran@localhost ~]$ cat infile.txt
Baltimore, 93, 69
Boston, 69, 93
New York, 95, 67
Tampa Bay, 90, 72
Toronto, 73, 89

Code::

>>> from operator import itemgetter
>>> d=[]
>>> with open('infile.txt','r') as infile:
...     for line in infile.readlines():
...             data = line.strip().split(',')
...             wins = int(data[2])
...             percentage = 162 / float(wins)
...             data.append(str(round(percentage, 3))) #add percentage to your list that already contains the name and two scores.
...             d.append(data) # add the line to a list `d`
...
>>> print d
[['Baltimore', ' 93', ' 69', '2.348'], ['Boston', ' 69', ' 93', '1.742'], ['New York', ' 95', ' 67', '2.418'], ['Tampa Bay', ' 90', ' 72', '2.25'], ['Toronto', ' 73', ' 89', '1.82']]
>>> d.sort(key=itemgetter(3)) #sort the list `d` with the third item(4th column) of your sublist.
>>> print d
[['Boston', ' 69', ' 93', '1.742'], ['Toronto', ' 73', ' 89', '1.82'], ['Tampa Bay', ' 90', ' 72', '2.25'], ['Baltimore', ' 93', ' 69', '2.348'], ['New York', ' 95', ' 67', '2.418']]
>>> #write the items in list d to your output file
>>>
>>> with open('outfile.txt','w') as outfile:
...     for line in d:
...             outfile.write(','.join(line)+'\n')
...
>>>

Content of output file:

[kiran@localhost ~]$ cat outfile.txt
Boston, 69, 93,1.742
Toronto, 73, 89,1.82
Tampa Bay, 90, 72,2.25
Baltimore, 93, 69,2.348
New York, 95, 67,2.418

edited Oct 23, 2017 at 5:05

answered Oct 23, 2017 at 4:11

Keerthana Prabhakaran

3,8071 gold badge16 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user8816916 Over a year ago

This is the best answer. Thank you. I had to change your line where you sorted the list because I was getting "itemgetter is not defined" to "d.sort(key=lambda x: x[3], reverse=True)", but it works and it's not a little easier to understand.

Keerthana Prabhakaran Over a year ago

Glad to have helped. Sorry, I forgot to add the import statement. from operator import itemgetter should get you through the error.

Keerthana Prabhakaran Over a year ago

I've added comments for your reference. Let me know if you need further explanation.

N M · Accepted Answer · 2017-10-23 02:51:26Z

0

First, when handling this, it is preferable to use line.split(',').strip().

import csv
with open('ALE.txt', 'r') as infile:
    reader = csv.reader(infile)
    data = []
    for line in reader:
        formatted_line = [i.strip() for i in line]
        wins = int(formatted_line[2])
        percentage = 100*wins/total_wins
        formatted_line.append(str(round(percentage,3)))
        data.append(formatted_line)
    data = sorted(p, lambda x: x[3])
with open('ALE_sorted.txt', 'w', newline='') as outfile:
    writer = csv.writer(outfile)
    writer.writerows(data)

answered Oct 23, 2017 at 2:51

N M

6255 silver badges20 bronze badges

6 Comments

user8816916 Over a year ago

I just get the error "total wins" is not defined when trying to use this code

OneCricketeer Over a year ago

@Choco Well, looking at the variables, can't you see that wins probably is the correct name?

user8816916 Over a year ago

Yes, that would be the correct variable, hence my confusion as to why you would write total_wins

N M Over a year ago

@ChocolateGoosePoosey I don't understand why you use 162/wins to get percentage. I simply included the wins/total_wins * 100 to get you to check your calculations.

user8816916 Over a year ago

total_wins is still not defined. Sorry, I don't follow

|

Erick Shepherd · Accepted Answer · 2017-10-23 03:35:58Z

0

Try this:

infile  = open("ALE.txt", "r")
outfile = open("ALE_sorted.txt", "w")

master_data = []

# Load in data from the infile and calculate the win percentage.
for line in infile:

    data = line.strip().split(', ')

    wins = int(data[2])
    percentage = 162 / wins
    data.append(str(round(percentage, 3)))

    master_data.append(data)

# Sort by the last column in reverse order by value and store the 
# sorted values and original indices in a list of tuples.
sorted_column = sorted([(float(data[-1]), index) for index, data in \
                        enumerate(master_data)], reverse = True)

# Reassign master_data according to the sorted positions.
master_data   = [master_data[data[1]] for data in sorted_column]

# Write each line to the outfile.
for data in master_data:

    outfile.write(str(", ".join(data) + "\n"))

infile.close()
outfile.close()

Where the contents of infile are the following:

Baltimore, 93, 69
Boston, 69, 93
New York, 95, 67
Tampa Bay, 90, 72
Toronto, 73, 89

The resultant outfile contains the following sorted by the values of the newly generated fourth column from highest to lowest:

New York, 95, 67, 2.418
Baltimore, 93, 69, 2.348
Tampa Bay, 90, 72, 2.25
Toronto, 73, 89, 1.82
Boston, 69, 93, 1.742

edited Oct 23, 2017 at 3:35

answered Oct 23, 2017 at 2:45

Erick Shepherd

1,46312 silver badges20 bronze badges

4 Comments

user8816916 Over a year ago

I forgot to mention that the infile("ALE.txt") is just the first 3 columns shown above. The one show is the intended format, just not sorted correctly. Is there a way to simply sort by the 4th column without changing my code too much?

Erick Shepherd Over a year ago

I just edited the post. Let me know if the changes better reflect what you were going for.

OneCricketeer Over a year ago

Why not use CsvWriter instead of joining a list on commas anyway?

Erick Shepherd Over a year ago

@cricket_007 That would definitely be more flexible, but was trying to work within the given format.

Beatriz Kanzki · Accepted Answer · 2017-10-23 03:57:38Z

0

The best way to sort the 4th column is to open your file using pandas. Here's how to do it:

import pandas as pd

outfile=pd.read_csv("ALE_sorted.txt")
column=outfile.columns.values.tolist()  # will give you the name of your column

#It will return [0L,1L,2L,3L] where 3L is your fourth column and refers to a long int.

outfile.sort_values(by=[3L])

print(outfile.3L)  # to see the sorted column

This will yield:

answered Oct 23, 2017 at 3:57

Beatriz Kanzki

692 bronze badges

Collectives™ on Stack Overflow

How would I sort an outputted text file by the nth column - Python

4 Answers 4

3 Comments

6 Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

6 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related