I am trying to import a txt file (into a DataFrame) that looks like this
12345 20191113418824004 S20191013
23456 20191030T20.60XA X20191230
The data frame must look like
memberid Date1 Code Flag Date2
12345 20191113 418824004 S 20191013
23456 20191030 T20.60XA X 20191230
So far I tried doing
data = pd.read_csv ("diag.txt",delimiter = "\t")
df = pd.DataFrame(data, columns= ['memberid','Date1','Code','Flag','Date2'])
but I am getting all the columns as NAN. Not sure why even the memberid column is not picking up. Any guidance is much appreciated.
Here are the Rules for seperation-
- Lets take the first row:
12345 20191113418824004 S20191013. The first continuous series of numbers (until we hit the first space)12345are thememberid - In the next blob / chunk of numbers we encounter (
20191113418824004) the first8numbers of this become theDate1. Whatever is left after the first8numbers becomes theCode(In this case20191113becomes the date and the rest-418824004is the code ) - In the next chunk of data we encounter
S20191013. The first letter becomes theFlagand the rest becomes theDate2. This third "column" if I may say is alwaysvarchar(9). So in this case S is the flag and the rest20191013isDate2.
P.S This is all random mock data that I manually generated. No sensitive information.
txtfile but are the values actually stuck together like that in the file or are there tabs present betweenS20191013for example? And if the values are actually stuck together like that, can you outline the rules for how they should be separated into columns?