0

I have a TXT file that looks like this

ETP   474654 0|170122|160222|MXP|    14045.84|           |     4711.00|       0|      0|      0|      0|   4711|      0
BA6 91215257 1|310122|      |MXP|            |    9053.93|            |        |       |       |       |       |
TDO   301530 1|010222|      |MXP|            |     280.91|            |        |       |       |       |       |
ETP   475384 0|260122|250222|MXP|   198340.87|           |      917.70|       0|      0|      0|      0|    917|      0
ANC 91163164 2|290122|      |MXP|            |     200.66|            |        |       |       |       |       |
BA6 91215555 1|140222|      |MXP|            |  193278.06|            |        |       |       |       |       |
TDO   302435 1|150222|      |MXP|            |    3944.45|            |        |       |       |       |       |
ETP   481186 0|020422|020522|MXP|    53597.34|           |      184.08|       0|      0|    184|      0|      0|      0
ANC 91164671 2|120422|      |MXP|            |     324.32|            |        |       |       |       |       |
BA6 91217161 1|200422|      |MXP|            |   52027.16|            |        |       |       |       |       |
TDO   306773 1|210422|      |MXP|            |    1061.78|            |        |       |       |       |       |
ETP   481188 0|020422|020522|MXP|    82599.09|           |      275.29|       0|      0|    275|      0|      0|      0
BA6 91217159 1|200422|      |MXP|            |   80677.32|            |        |       |       |       |       |
TDO   306775 1|210422|      |MXP|            |    1646.48|            |        |       |       |       |       |
ETP   483241 0|020522|010622|MXP|   162587.22|           |    20367.05|       0|  20367|      0|      0|      0|      0
ANC 91165149 2|060522|      |MXP|            |    1930.81|            |        |       |       |       |       |
BA6 91217906 2|230522|      |MXP|            |  137083.58|            |        |       |       |       |       |
TDO   308497 1|240522|      |MXP|            |    3205.78|            |        |       |       |       |       |
ETP   485561 0|300522|290622|MXP|    43411.90|           |    43181.22|   43181|      0|      0|      0|      0|      0
ANC 91165759 2|020622|      |MXP|            |     230.68|            |        |       | 

I want to extract all of the data in each row that contains ETP.
The first 6 digit number ist the ETP ID.
The number |170122| is a date.
The number |160222| is a date.
The next Value |14045.84| value should also be displayed.
If there is a Non 0 value, in this case the Next non 0 value is |4711.00| it should also be displayed.

It should return something like

ETP 474654 | 170122 | 160222 | 14045.84 | 4711.00  

IDEALLY it should format the date and look like this

ETP 474654 | 17/01/22 | 16/02/22 | 14045.84 | 4711.00  

I am new to python and would like to know if this is possible and if someone could point me in the right direccion to solve this. Thanks for the help.

3
  • Have you already tried something? What particular problem occurred? Commented Jun 8, 2022 at 17:53
  • 2
    the csv module might be of interest to you Commented Jun 8, 2022 at 18:22
  • The first number has more than 6 digits. Commented Jun 8, 2022 at 18:32

3 Answers 3

1

Python allows you to read a file using the built-in open() function

https://docs.python.org/3/library/functions.html#open

you can then read though the file line-by-line via a for loop:

file = open("some_file.txt", 'r')
for line in file:
  ...

in order to format the data like you want, you'll have to use some of python's string formatting functions, namely:

columns = line.split("|")

which will get you a list of all the strings between the | characters, so the first element of the first line would be:

ETP 474654 0

if you want to get rid of the multiple spaces, you can do

line = " ".join(line.split()) before splitting it, which will break the entire line into individual words, and then rejoin them into a single string with only a single space between each of the words

you appear to have an extra column in the first row, so you can get rid of that by doing

first_column = columns[0].split(" ")

to split the three elements in that column apart, and then put them back into the array as

columns[0] = first_column[0] + " " + first_column[1]

to get rid of the unwanted third item

then to combine the rest of the columns you can just use a loop to append to a string

output_string = ""
for column in columns:
    if (column == ""):
        continue
    output_string += column + " | "

plus

output_string = output_string.rstrip(" |")

to get rid of the extra " | " you would have at the end, which will leave you with something like your example output with the non-formatted dates

Sign up to request clarification or add additional context in comments.

Comments

0

First, sorry if I'm being overly redundant in my answer, but taking into account this part of your question:

I am new to python and would like to know if this is possible and if someone could point me in the right direccion to solve this. Thanks for the help.

I'll assume you're a beginner and suggest a tool that may come in handy.

I recommend you take a look at the Pandas library, especially in the documentation and basic examples to get started. If you don't have it previously installed in your python package, just run in the (base) terminal the command: pip install pandas

the main idea then is to create a data frame so you can edit and work on it during your script. You can run then

import pandas as pd

data_frame = pd.read_csv('yout-file-name.txt') 

To open your file and create your data frame. From there, you can study some parameters of your data frame with the following commands:

print(data.shape)

to see the sizes of your file or

print(data.head(n=5))

to visualize the first lines of it. With this organized data frame in hand, there are several functions within Pandas to edit your content as desired, my main suggestion for you would be to transpose your data using

data_frame = data_frame.transpose()

and then run a loop to select just the ETP columns and edit your third line -after selecting the ETP's columns you can also transpose the data again to go back to working on column dates -(which you mentioned is relative to the date) to the desired format just by writing the "/" beetwen the numbers like the mechanism suggested here.

I hope this tool will be of some help in your work!

Comments

0

Thanks for the answers! Unfourtunatly, as I said I am new to Python and I am using a web compiler. Fourtunatly I was able to find a simpler solution, I am a bit surprised no one suggested it but then again, maybe you are also new to python.

file = open("data.txt", 'r')
word = input("Escribe ETP y presiona enter:")
s=" "
count=1

while(s):
 s=file.readline()
 L=s.split()
 if word in L:
         print("Linea:", count, ":",s)
 count+=1

1 Comment

"I am a bit surprised no one suggested it" ? On 8 June, ifier left an answer which pretty much covered what eventually ended up in your solution from today (30 June), including pointers to the official docs: stackoverflow.com/a/72550338/1923870 It would have made more sense to accept their answer instead of adding and accepting your own, now. You could still have added this code snippet in form of a comment on their answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.