1

The below code works, but assigns each newline of a multiline string to a new row vs. the desired state of one cell.

While researching a possible solutions I read:

  • That I should try enclose the string in double quotes
  • That replacing newline with carriage return helps
  • Parsing as a list/tuple via [] declaration

Having tried all the above, I'm thoroughly confused as to why this isn't working. If I remove newline it assigns everything to a single cell in the row but then it's hard to read.

Is there no way to preserve format when outputting a string with csv.writer?

def run_rip():
os.chdir('/REGRIPPERDIR/RegRipper2.8')
for k in ntDict:
    run_command = "".join(["./rip.pl", " -r /APATH/NTUSER.DAT -p ", str(k)])
    process = subprocess.Popen(run_command,
                               shell=True,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    out, err = process.communicate()
    pattern = re.compile('lastwrite|(\d{2}:\d{2}:\d{2})|alert|trust|Value')
    grouping = re.compile('(?P<first>.+?)(\n)(?P<second>.+?)([\n]{2})(?P<rest>.+[\n])', re.MULTILINE | re.DOTALL)
    if pattern.findall(out):
        match = re.search(grouping, out)
        first = match.group('first')
        second =match.group('second')
        rest = ('"%s' % os.linesep.join([s for s in match.group('rest').splitlines() if s]))
        rest = rest.replace('\n','\r\n')
        headers = ['Name', 'Description', 'Found']
        f = csv.writer(open('/APATH/ntuser.csv', 'ab'),
                       dialect=csv.excel,
                       delimiter='\t')
        f.writerow(headers)
        f.writerow((first, second, rest))
        print(out)
    ntDict.popitem(last=False)

run_rip()

Sample Output: Three Column Output

/edit: I was requested in the comments below to post a sample of the third string rest as it is collected. The below text is what would be passed to csv.writer.

Baseline\n #First string as defined by the regex
(All) scans a hive\n #Second String as defined by the regex

Total values checked    : 64\n   #This and below is the rest string
Number of binary value lengths : 11\n
...\n
Value -5: 176 bytes [# times: 1]\n
Value -4: 712 bytes [# times: 5]\n

Desired State: enter image description here

15
  • Can I suggest that you post some csv text (i.e. a string) that gives you the desired layout? The underlying problem is that csv isn't really a standard. What you're asking looks to be about whatever format the app that's producing your pasted images accepts. Commented Feb 20, 2016 at 6:04
  • @dementedhedgehog The app I'm interfacing with is Registry Ripper which outputs non-standard text and creates allot of excessive results that aren't needed. So my approach has been to call the app, read stdout, chuck the data into three variables as assigned above then output into a .csv. However the third variable has many newlines, which unfortunately are being read by csv.writer and output to separate cells as depicted in sample output. Is there a way to preserve newline/carriage return in output to .csv so that I can obtain the desired state? Commented Feb 21, 2016 at 16:35
  • Are there other .csv modules that would do the above? Am I not being clear in my question, is it lack of bounty? The above issue has been nagging me since I posted it. I've tried multiple hours of solutions and it's just out of my grasp (likely because I'm new to programming). Commented Feb 22, 2016 at 14:43
  • 1
    Check this out stackoverflow.com/questions/2668678/… Commented Feb 22, 2016 at 21:54
  • 1
    You can also write your own csv writer. It's pretty trivial (if you need to prepend a space to values that look like this: "2\n3"). I don't have excel so I can't test what formats excel imports successfully (and it's likely to vary by excel version anyway). Commented Feb 22, 2016 at 21:56

4 Answers 4

2

Not an answer... but I want code formatting.

import csv
x = ("1", "2\n3", "4")
f = csv.writer(open('foo', 'w'),
                   dialect=csv.excel,
                   delimiter='\t')
f.writerow(x)

Produces this:

$ less foo
1       "2
3"      4

And that's "valid" tab separated csv.. it's just that excel doesn't handle it "properly". Where things in quotes are really more implementation issues since it's not a standardized format. The extra double quotes around the 23 are kind of annoying.

Check out https://pypi.python.org/pypi/xlwt (pre xml excel formats) or http://xlsxwriter.readthedocs.org/ (xml excel formats) for 3rd party libraries to write excel directly.

Sign up to request clarification or add additional context in comments.

Comments

1

You can simply use \n to separate the items in the multiline cell. Just make sure the whole cell content appears between double-quote:

f = open(filename, "w", encoding='utf-8')

f.write('a,b,"c\nd",e\n')

This example writes a row in a csv file where the third cell has 2 lines c and d.

Comments

0

With the guidance of the comments I found my answer. Simply excel borks up formatting (for reasons covered in the comments). However when opened in LibreOffice the formatting is retained.

The suggested thread in the comments (Importing CSV with line breaks in Excel 2007) has a purposed workaround which includes quoting the actual newlines vs. quoting the entire string, which is what I was doing.

Comments

0

this is how I solved this problem.

Input Data:

('[email protected]', 'firstName', 'lastName', 'Address1', 'Address1
Address2', 'IP1
IP2
IP3', 'PhoneNumber')

Desired CSV Format: enter image description here

Here is a Python code to get the same:

try:
    cs.execute(sql)
    row = cs.fetchone()
    while row is not None:
        email = row[0]
        filename = '{0}.csv'.format(email)
        with open(filename, "w", encoding='utf-8-sig', newline='') as fp:
            writer = csv.writer(fp)
            writer.writerow(('REQUESTER EMAIL', 'FIRST NAME', 'LAST NAME', 'BILLING ADDRESSES', 'SHIPPING ADDRESSES', 'IP ADDRESSES', 'PHONE NUMBER'))
            writer.writerow((
                row[0],
                row[1],
                row[2],
                row[3],
                row[4],
                row[5],
                row[6],
            ))
        row = cs.fetchone()
finally:
    cs.close()
ctx.close()

See this line (#7):

open(filename, "w", encoding='utf-8-sig', newline='')

setting up a unicode to 'utf-8-sig' is doing all trick.

Thanks, Hussain Bohra

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.