Converting a text file to csv with columns

Question

I want to convert a text file to a csv file with the columns such name,date,Description Im new to python so not getting a proper way to do this can someone guide me regarding this. below is the sample text file.

================================================== ====
Title: Whole case
Location: oyuri
From: Aki 
Date: 2018/11/30 (Friday) 11:55:29
================================================== =====
1: Aki 
2018/12/05 (Wed) 17:33:17
An approval notice has been sent.
-------------------------------------------------- ------------------
2: Aki
2018/12/06 (Thursday) 17:14:30
I was notified by Mr. Id, the agent of the other party.

-------------------------------------------------- ------------------
3: kano, etc.
2018/12/07 (Friday) 11:44:45
Please call rito.
-------------------------------------------------- ------------------

Ferris · Accepted Answer · 2021-03-03 09:59:07Z

1

find the rows contains msg sep line, e.g. '-----', '======'
then use np.where(cond, 1, 0).cumsum() to tag every separate msg.
filter the lines without '-----' or '======'
groupby tag, and join with sep '\n', then use str.split to expand the columns.

# read the file with only one col
df = pd.read_csv(file, sep='\n', header=None)

# located the row contains ------ or ======
cond = df[0].str.contains('-----|======')
df['tag'] = np.where(cond, 1, 0).cumsum()

# filter the line contains msg
cond2 = df['tag'] >=2
dfn = df[(~cond & cond2)].copy()

# output
df_output = (dfn.groupby('tag')[0]
            .apply('\n'.join)
            .str.split('\n', n=2, expand=True))
df_output.columns = ['name', 'date', 'Description']

output:

              name                            date  \
tag                                                  
2.0        1: Aki        2018/12/05 (Wed) 17:33:17   
3.0         2: Aki  2018/12/06 (Thursday) 17:14:30   
4.0  3: kano, etc.    2018/12/07 (Friday) 11:44:45   

                                           Description  
tag                                                     
2.0                  An approval notice has been sent.  
3.0  I was notified by Mr. Id, the agent of the oth...  
4.0                                  Please call rito.

df:

                                                    0  tag
0   ==============================================...    1
1                                   Title: Whole case    1
2                                     Location: oyuri    1
3                                          From: Aki     1
4                  Date: 2018/11/30 (Friday) 11:55:29    1
5   ==============================================...    2
6                                             1: Aki     2
7                           2018/12/05 (Wed) 17:33:17    2
8                   An approval notice has been sent.    2
9   ----------------------------------------------...    3
10                                             2: Aki    3
11                     2018/12/06 (Thursday) 17:14:30    3
12  I was notified by Mr. Id, the agent of the oth...    3
13  ----------------------------------------------...    4
14                                      3: kano, etc.    4
15                       2018/12/07 (Friday) 11:44:45    4
16                                  Please call rito.    4
17  ----------------------------------------------...    5

you can continue handle the name:

obj = df_output['name'].str.strip().str.split(':\s*')
df_output['name'] = obj.str[-1]
df_output['idx'] = obj.str[0]
df_output = df_output.set_index('idx')

           name                            date  \
idx                                               
1           Aki       2018/12/05 (Wed) 17:33:17   
2           Aki  2018/12/06 (Thursday) 17:14:30   
3    kano, etc.    2018/12/07 (Friday) 11:44:45   

                                           Description  
idx                                                     
1                    An approval notice has been sent.  
2    I was notified by Mr. Id, the agent of the oth...  
3                                    Please call rito.

add more header columns:

cond = (df['tag'] == 1) & (df[0].str.contains(':'))
header_dict = dict(df.loc[cond, 0].str.split(': ', n=1).values)

    # {'Title': 'Whole case',
    #  'Location': 'oyuri',
    #  'From': 'Aki ',
    #  'Date': '2018/11/30 (Friday) 11:55:29'}

for k,v in header_dict.items():
    df_output[k] = v

edited Mar 3, 2021 at 9:59

answered Mar 3, 2021 at 5:59

Ferris

5,6611 gold badge18 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Tejas Over a year ago

ThankYou for the Answer Ferris. :) It is working completely fine What is we want to add the Header content as well?? In the Name column (From : i.e aki) Date in Date column and the rest in Description.

Tejas Over a year ago

I'm trying to do that. Thank you a lot Ferris

Tejas Over a year ago

I am trying to make the index and name column different but im not able to do that using above code

Ferris Over a year ago

you can post a new question, and describe the detail.

Tejas Over a year ago

yes sure i will do that . thank you ferris

Ali · Accepted Answer · 2021-03-03 05:06:12Z

1

I outline below a very simplistic approach to achieving your task. The general idea is to:

Read in your text file using open()
Split the text into a list
Isolate the information in each element of the list
Export the information to a csv using pandas

I would recommend using Jupyter Notebooks to get a better idea of what I have done here.

import pandas as pd

# open file and extract text
text_path = 'text.txt'
with open(text_path) as f:
    text = f.read()

# split text into a list
lines = text.split('\n')

# remove heading
len_heading = 6
lines = lines[6:]

# seperate information using divider
divider = '-----'
data = []
start = 0
for i, line in enumerate(lines):
    
    # add elements to data if divider found
    if line.startswith(divider):
        data.append(lines[start:i])
        start = i+1

# extract name, date and description from data
names, dates, description = [], [], []
for info in data:
    
    # this is a very simplistic approach, please add checks
    # to make sure you are getting the right data
    name = info[0][2:]
    date = info[1][:11]
    desc = info[2]
    
    names.append(name)
    dates.append(date)
    description.append(desc)

# create pandas dataframe
df = pd.DataFrame({'name': names, 'date': dates, 'description': description})

# export dataframe to csv
df.to_csv('converted_text.csv', index=False)

You should get a CSV file that looks like this.

answered Mar 3, 2021 at 5:06

Ali

3383 silver badges8 bronze badges

1 Comment

Tejas Over a year ago

Thank you for the help ALS777. it is partially working. working - Data is now seperated in columns Not Working - 1. description column is empty 2. Some of the entries are missing from the above

Collectives™ on Stack Overflow

Converting a text file to csv with columns

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related