0

My code:

infile = open("ALE.txt", "r")
outfile = open("ALE_sorted.txt", "w")

for line in infile:
    data = line.strip().split(',')
    wins = int(data[2])
    percentage = 162 / wins
    p = str(data[0]) + ", " + data[1] + ", " + data[2] + ", " + 
str(round(percentage, 3)) + "\n"
    outfile.write(p)
infile.close()
outfile.close()

The original infile("ALE.txt") is just the first three columns below. The text file that is output from the code above looks like this:

Baltimore, 93, 69, 2.348
Boston, 69, 93, 1.742
New York, 95, 67, 2.418
Tampa Bay, 90, 72, 2.25
Toronto, 73, 89, 1.82

I know the code correctly calculates the win percentage (column 2/total wins), but I would like to sort this list by the 4th column (win percentage).

6
  • why not place it in a list, and sort ? Commented Oct 23, 2017 at 2:19
  • 1
    You might find this a lot simpler if you use pandas or at least, the csv module. Also eval? Commented Oct 23, 2017 at 2:20
  • @S.R. How would I go about doing that? Commented Oct 23, 2017 at 2:23
  • @pvg changed eval to int. Didn't think it mattered - got same result Commented Oct 23, 2017 at 2:24
  • Create a tuple and sort it based on the forth column. stackoverflow.com/questions/12087905/… Commented Oct 23, 2017 at 2:25

4 Answers 4

1

Append your data to a list, say d.

Sort it with the third item(4th column) of the list. Reference - operator.itemgetter

Write the sorted data to your output file.

Contents of input file

[kiran@localhost ~]$ cat infile.txt
Baltimore, 93, 69
Boston, 69, 93
New York, 95, 67
Tampa Bay, 90, 72
Toronto, 73, 89

Code::

>>> from operator import itemgetter
>>> d=[]
>>> with open('infile.txt','r') as infile:
...     for line in infile.readlines():
...             data = line.strip().split(',')
...             wins = int(data[2])
...             percentage = 162 / float(wins)
...             data.append(str(round(percentage, 3))) #add percentage to your list that already contains the name and two scores.
...             d.append(data) # add the line to a list `d`
...
>>> print d
[['Baltimore', ' 93', ' 69', '2.348'], ['Boston', ' 69', ' 93', '1.742'], ['New York', ' 95', ' 67', '2.418'], ['Tampa Bay', ' 90', ' 72', '2.25'], ['Toronto', ' 73', ' 89', '1.82']]
>>> d.sort(key=itemgetter(3)) #sort the list `d` with the third item(4th column) of your sublist.
>>> print d
[['Boston', ' 69', ' 93', '1.742'], ['Toronto', ' 73', ' 89', '1.82'], ['Tampa Bay', ' 90', ' 72', '2.25'], ['Baltimore', ' 93', ' 69', '2.348'], ['New York', ' 95', ' 67', '2.418']]
>>> #write the items in list d to your output file
>>>
>>> with open('outfile.txt','w') as outfile:
...     for line in d:
...             outfile.write(','.join(line)+'\n')
...
>>>

Content of output file:

[kiran@localhost ~]$ cat outfile.txt
Boston, 69, 93,1.742
Toronto, 73, 89,1.82
Tampa Bay, 90, 72,2.25
Baltimore, 93, 69,2.348
New York, 95, 67,2.418
Sign up to request clarification or add additional context in comments.

3 Comments

This is the best answer. Thank you. I had to change your line where you sorted the list because I was getting "itemgetter is not defined" to "d.sort(key=lambda x: x[3], reverse=True)", but it works and it's not a little easier to understand.
Glad to have helped. Sorry, I forgot to add the import statement. from operator import itemgetter should get you through the error.
I've added comments for your reference. Let me know if you need further explanation.
0

First, when handling this, it is preferable to use line.split(',').strip().

import csv
with open('ALE.txt', 'r') as infile:
    reader = csv.reader(infile)
    data = []
    for line in reader:
        formatted_line = [i.strip() for i in line]
        wins = int(formatted_line[2])
        percentage = 100*wins/total_wins
        formatted_line.append(str(round(percentage,3)))
        data.append(formatted_line)
    data = sorted(p, lambda x: x[3])
with open('ALE_sorted.txt', 'w', newline='') as outfile:
    writer = csv.writer(outfile)
    writer.writerows(data)

6 Comments

I just get the error "total wins" is not defined when trying to use this code
@Choco Well, looking at the variables, can't you see that wins probably is the correct name?
Yes, that would be the correct variable, hence my confusion as to why you would write total_wins
@ChocolateGoosePoosey I don't understand why you use 162/wins to get percentage. I simply included the wins/total_wins * 100 to get you to check your calculations.
total_wins is still not defined. Sorry, I don't follow
|
0

Try this:

infile  = open("ALE.txt", "r")
outfile = open("ALE_sorted.txt", "w")

master_data = []

# Load in data from the infile and calculate the win percentage.
for line in infile:

    data = line.strip().split(', ')

    wins = int(data[2])
    percentage = 162 / wins
    data.append(str(round(percentage, 3)))

    master_data.append(data)

# Sort by the last column in reverse order by value and store the 
# sorted values and original indices in a list of tuples.
sorted_column = sorted([(float(data[-1]), index) for index, data in \
                        enumerate(master_data)], reverse = True)

# Reassign master_data according to the sorted positions.
master_data   = [master_data[data[1]] for data in sorted_column]

# Write each line to the outfile.
for data in master_data:

    outfile.write(str(", ".join(data) + "\n"))

infile.close()
outfile.close()

Where the contents of infile are the following:

Baltimore, 93, 69
Boston, 69, 93
New York, 95, 67
Tampa Bay, 90, 72
Toronto, 73, 89

The resultant outfile contains the following sorted by the values of the newly generated fourth column from highest to lowest:

New York, 95, 67, 2.418
Baltimore, 93, 69, 2.348
Tampa Bay, 90, 72, 2.25
Toronto, 73, 89, 1.82
Boston, 69, 93, 1.742

4 Comments

I forgot to mention that the infile("ALE.txt") is just the first 3 columns shown above. The one show is the intended format, just not sorted correctly. Is there a way to simply sort by the 4th column without changing my code too much?
I just edited the post. Let me know if the changes better reflect what you were going for.
Why not use CsvWriter instead of joining a list on commas anyway?
@cricket_007 That would definitely be more flexible, but was trying to work within the given format.
0

The best way to sort the 4th column is to open your file using pandas. Here's how to do it:

import pandas as pd

outfile=pd.read_csv("ALE_sorted.txt")
column=outfile.columns.values.tolist()  # will give you the name of your column

#It will return [0L,1L,2L,3L] where 3L is your fourth column and refers to a long int.

outfile.sort_values(by=[3L])

print(outfile.3L)  # to see the sorted column

This will yield:

3L
1.742
1.82
2.25
2.348
2.418

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.