0

I have a csv file that looks like this:

Mon-000101,100.27242,9.608597,11.082,10.034,0.39,I,0.39,I,31.1,31.1,,double with 1355,,,,,,,,
Mon-000171,100.2923,9.52286,14.834,14.385,0.45,I,0.45,I,33.7,33.7,,,,,,,,,,
Mon-000174,100.27621,9.563802,11.605,10.134,0.95,I,1.29,I,30.8,30.8,,,,,,,,,,

...it's a few hundred lines long.

I just want to grab the Mon-000101 (not just that specific one, but all the Mon-######) items. I have this really really ugly little script I threw together:

file_list1 = open(raw_input("Enter your list file: "))
file_lines = []
for line in file_list1:
    line.replace(' ','\n')
    for item in line.split('\n'):
        file_lines.append(item)
stringit = ''
for item in file_lines:
    stringit += item

IDs = re.findall('Mon-\d\d\d\d\d\d',stringit)
stringIDs = str(IDs)
new = stringIDs.replace(',','\n')

newer = new.replace('\'','')
newer2 = newer.replace('[\]','')
newer3 = newer2.replace(']','')
newer4 = newer3.replace('[','')
newer5 = newer4.replace(' ','')
file_write = open("Testit.txt","w+")
file_write.write(newer4)
print newer4
file_write.close()

I know it's ugly. Clearly I don't know what I'm doing with the regex stuff, but aside from that I want to know a more efficient way of replacing all the characters that I'm replacing. I know this isn't how it's done. I've tried something along the lines of

newer2 = newer.replace('([\',\[\] ])','') 

which I sorta pieced together from various posts. That didn't work though, in fact it didn't do anything.

I want to see what a more efficient way of doing this looks like.

Thanks.

I'm also aware that my variable naming is not sufficient/not up to the style guide. This is just something I quickly threw together.

2
  • What's supposed to get written to your file? I can't tell what you're trying to do with the multiple replace calls, but I'm almost certain this is trivially replaceable using the csv library. Commented Nov 27, 2013 at 21:17
  • I just want those Mon-###### IDs. This script works, but it's ridiculous. Commented Nov 27, 2013 at 21:18

2 Answers 2

3

Assuming the IDs are always the first part of the line, this is a simple way to do it:

import csv
with open('some_list_file.txt', 'rb') as list_file:
    reader = csv.reader(some_list_file)
    with open('Testit.txt', 'W+') as output_file:
        output_file.writelines(line[0] + '\n' for line in reader)

If the position varies, it gets just a little more complicated:

import csv
with open('some_list_file.txt', 'rb') as list_file:
    reader = csv.reader(some_list_file)
    with open('Testit.txt', 'W+') as output_file:
        for line in reader:
            IDs = [part for part in line if part.startswith('Mon-')]
            if IDs:
                output_file.write(IDs[0] + '\n') # or accept multiple ID values if that's a possibilty

You can shorten that a little if you're sure there's a Mon- entry in every line:

    with open('Testit.txt', 'W+') as output_file:
        output_file.writelines([part for part in line if part.startswith('Mon-')][0] + '\n' for line in reader])
Sign up to request clarification or add additional context in comments.

2 Comments

Assume that they aren't though...what then?
Nice. I'll look this over in finer detail when I have a bit more time. Thank you.
1

Use regex pattern ^Mon\-\d{6} with m modifier.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.