2

I am looking to read in a text file (see below) and then create columns for all the English leagues only. So I'll be looking to do something like where "Alias name" is "England_" then create a new column with the alias name as the header and then the player names in the rows. note that the first occurrence for Alias is down as "Aliases" in the text file.

"-----------------------------------------------------------------------------------------------------------" 
"-                                            NEW TEAM                                                    -" 
"-----------------------------------------------------------------------------------------------------------" 
Europe Players
17/04/2019
07:59 p.m.

Aliases for England_Premier League

-------------------------------------------------------------------------------
Harry Kane
Mohamed Salah
Kevin De Bruyne

The command completed successfully.

Alias name     England_Division 1
Comment        Teams

Members

-------------------------------------------------------------------------------
Will Grigg
Jonson Clarke-Harris
Jerry Yates
Ivan Toney
Troy Parrott
The command completed successfully.

Alias name     Spanish La Liga
Comment        

Members

-------------------------------------------------------------------------------
Lionel Messi
Luis Suarez
Cristiano Ronaldo
Sergio Ramos
The command completed successfully.

Alias name     England_Division 2
Comment        

Members

-------------------------------------------------------------------------------
Eoin Doyle
Matt Watters
James Vughan
The command completed successfully.

This is my current code on how I'm reading in the data

df = pd.read_csv(r'Desktop\SampleData.txt', sep='\n', header=None)

This gives me a pandas DF with one column. I'm fairly new to python so I'm wondering how I would go about getting the below result? should I use a delimiter when reading in the file?

England_Premier League England_Division 1 England_Division 2
Harry Kane Will Griggs Eoin Doyle
Mohamed Salah Jonson Clarke-Harris Matt Watters
Kevin De Bruyne Ivan Toney James Vughan
Troy Parrott

1 Answer 1

1

You can use re module for the task. For example:

import re
import pandas as pd


txt = """
"-----------------------------------------------------------------------------------------------------------" 
"-                                            NEW TEAM                                                    -" 
"-----------------------------------------------------------------------------------------------------------" 
Europe Players
17/04/2019
07:59 p.m.

Aliases for England_Premier League

-------------------------------------------------------------------------------
Harry Kane
Mohamed Salah
Kevin De Bruyne

The command completed successfully.

Alias name     England_Division 1
Comment        Teams

Members

-------------------------------------------------------------------------------
Will Grigg
Jonson Clarke-Harris
Jerry Yates
Ivan Toney
Troy Parrott
The command completed successfully.

Alias name     Spanish La Liga
Comment        

Members

-------------------------------------------------------------------------------
Lionel Messi
Luis Suarez
Cristiano Ronaldo
Sergio Ramos
The command completed successfully.

Alias name     England_Division 2
Comment        

Members

-------------------------------------------------------------------------------
Eoin Doyle
Matt Watters
James Vughan
The command completed successfully.
"""

r_competitions = re.compile(r"^Alias(?:(?:es for)| name)\s*(.*?)$", flags=re.M)
r_names = re.compile(r"^-+$\s*(.*?)\s*The command", flags=re.M | re.S)

dfs = []
for comp, names in zip(r_competitions.findall(txt), r_names.findall(txt)):
    if not "England" in comp:
        continue
    data = []
    for n in names.split("\n"):
        data.append({comp: n})

    dfs.append(pd.DataFrame(data))

print(pd.concat(dfs, axis=1).fillna(""))

Prints:

  England_Premier League    England_Division 1 England_Division 2
0             Harry Kane            Will Grigg         Eoin Doyle
1          Mohamed Salah  Jonson Clarke-Harris       Matt Watters
2        Kevin De Bruyne           Jerry Yates       James Vughan
3                                   Ivan Toney                   
4                                 Troy Parrott                   
Sign up to request clarification or add additional context in comments.

6 Comments

Very nice answer
@andrej that looks perfect, I just have a few questions if you don't mind. Are you calling the original text file "txt"? Also would it be possible for you to enter some simple comments if it's not too much trouble so I can see exactly what each line of code is doing?
@PythonBeginner txt is just variable name. You can load the file for example txt = open("your_file.txt", "r").read()
@AndrejKesely Thank you. I must do some researching into the For loop now to see how it is pulling the data.
@AndrejKesely would you mind walking me through the r.compile methods used
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.