9

I am writing a program that analyzes a large directory text file line-by-line. In doing so, I am trying to extract different parts of the file and categorize them as 'Name', 'Address', etc. However, due to the format of the file, I am running into a problem. Some of the text i have is split into two lines, such as:

'123 ABCDEF ST
APT 456'

How can I make it so that even through line-by-line analysis, Python returns this as a single-line string in the form of

'123 ABCDEF ST APT 456'?

2
  • I get the feeling that, since you're saying "Line by line analysis", you don't want all newlines removed, but only those, eg, between single-quotes. Is that true? Commented Aug 21, 2013 at 23:56
  • See also: Unindent and convert multiline string to single line Commented Oct 31, 2024 at 16:25

6 Answers 6

16

if you want to remove newlines:

"".join( my_string.splitlines())
Sign up to request clarification or add additional context in comments.

4 Comments

It works as it is supposed to do. However, after having removed the newlines some words "collide" with each other. How to fix this problem? How to put a space in between?
If you do " ".join( my_string.splitlines()) you'll get a space-separated string instead. But you will then get things like trailing spaces; at that point you probably want " ".join(line.strip() for line in mystring,splitlines())
How to use this for a csv file?
The whole file? Or just rows?
4

Assuming you are using windows if you do a print of the file to your screen you will see

'123 ABCDEF ST\nAPT 456\n'

the \n represent the line breaks.

so there are a number of ways to get rid of the new lines in the file. One easy way is to split the string on the newline characters and then rejoin the items from the list that will be created when you do the split

 myList = [item for item in myFile.split('\n')]
 newString = ' '.join(myList)

Comments

3

To replace the newlines with a space:

address = '123 ABCDEF ST\nAPT 456\n'
address.replace("\n", " ")

Comments

1
import re

def mergeline(c, l): 
    if c: return c.rstrip() + " " + l 
    else: return l

def getline(fname):
    qstart = re.compile(r'^\'[^\']*$')
    qend   = re.compile(r'.*\'$')
    with open(fname) as f:
        linecache, halfline = ("", False)
        for line in f:

            if not halfline: linecache = ""  
            linecache = mergeline(linecache, line)

            if halfline: halfline = not re.match(qend, line)
            else: halfline = re.match(qstart, line)

            if not halfline: 
                yield linecache
        if halfline: 
            yield linecache

for line in getline('input'):
    print line.rstrip()

4 Comments

This hurt my head to follow, but looks pretty efficient. I tested it and it works as long as there's only one {address} field per row, but OP mentioned Name and Address so could be multiple fields per row, each potentially split on multiple lines. For example, this wouldn't be processed correctly 'name' 'address\nmore address'. But yeah very functional, and unlike most of the other answers, won't just return one giant single line with all newlines replaced by spaces.
Thanks! I don't think this will fail for the case you are mentioning because qstart will not match the 'name'.
You're right that the qstart regex won't match name, because it only matches single quote at the beginning of a row followed by any character other than single quote until the end of the line. So my point was that the intention (I'm guessing of course) is probably that it should match the beginning of the address field and pull the rest of the address up to the same line, even when address is not the first field in the row. Ultimately this is probably better handled by the csv library or similar and opening with newlines=''
You may be right, my understanding is that the OP wants to read split quoted strings as a single line without worrying about this problem. I read the question again and can't say I'm right or wrong :)
0

Assuming you're iterating through your file with something like this:

with open('myfile.txt') as fh:
  for line in fh:
    # Code here

And also assuming strings in your text file are delimited with single quotes, I would do this:

while not line.endswith("'"):
  line += next(fh)

That's a lot of assuming though.

Comments

0

i think i might have found a easy solution just put .replace('\n', " ") to whatever string u want to convert

Example u have

my_string = "hi i am an programmer\nand i like to code in python"

like anything and if u want to convert it u can just do

my_string.replace('\n', " ")

hope it helps

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.