1

Python newbie here. I've been working my way through this code to basically create a string which includes a date. I have bits of the code working to get the data I want, however I need help formatting to string to tie in the data together.

This is what I have so far:

def get_rectype_count(filename, rectype):
    return int(subprocess.check_output('''zcat %s |  '''
                                       '''awk 'BEGIN {FS=";"};{print $6}' | '''
                                       '''grep -i %r | wc -l''' %
                                       (filename, rectype), shell=True))

str = "MY VALUES ("
rectypes = 'click', 'bounce'
for myfilename in glob.iglob('*.gz'):
        #print (rectypes)
        print str.join(rectypes)
        print (timestr)
        print([get_rectype_count(myfilename, rectype)
                               for rectype in rectypes])

My output looks like this:

clickMY VALUES (bounce
'2015-07-01'
[222, 0]

I'm trying to create this output file:

MY VALUES ('2015-07-01', click, 222)
MY VALUES ('2015-07-01', bounce, 0)
2
  • Why not just tie together the pieces of the string with the string concatination operator? I.e. a+b . Also, if the print statement contains a trailing comma, it will not output a newline character. Commented Jul 2, 2015 at 5:08
  • 4
    str is a built-in type, you shouldn't redefine it. Very confusing! Commented Jul 2, 2015 at 5:10

2 Answers 2

5

When you call join on a string it joins together everything in the sequence passed to it, using itself as the separator.

>>> '123'.join(['click', 'bounce'])
click123bounce

Python supports formatting strings using replacement fields:

>>> values = "MY VALUES ('{date}', {rec}, {rec_count})"
>>> values.format(date='2015-07-01', rec='click', rec_count=222)
"MY VALUES ('2015-07-01', click, 222)"

With your code:

for myfilename in glob.iglob('*.gz'):
    for rec in rectypes:
        rec_count = get_rectype_count(myfilename, rec)
        print values.format(date=timestr, rec=rec, rec_count=rec_count)

edit:

If you want to use join, you can join a newline, \n:

>>> print '\n'.join(['line1', 'line2'])
line1
line2

Putting it together:

print '\n'.join(values.format(date=timestr,
                              rec=rec,
                              rec_count=get_rectype_count(filename, rec))
                for filename in glob.iglob('*.gz')
                for rec in rectypes)
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for making it look easy.
I was testing the other solution using join. Got the following syntax error: for filename in glob.iglob('*.gz') ^ SyntaxError: invalid syntax
@noober there was a missing closing parenthesis for .format(
2

try this:

str1 = "MY VALUES ("
rectypes = ['click', 'bounce']
K=[]
for myfilename in glob.iglob('*.gz'):
        #print (rectypes)
        #print str.join(rectypes)

        #print (timestr)
        k=([get_rectype_count(myfilename, rectype)
                               for rectype in rectypes])

 for i in range(0,len(rectypes)):
      print str1+str(timestr)+","+rectypes[i]+","+str(k[i])+")"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.