Is there a better way to concatenate continuous string elements in Python?

Question

Problem Context

I am trying to create a chat log dataset from Whatsapp chats. Let me just provide the context of what problem I am trying to solve. Assume message to be M and response to be R. The natural way in which chats happen is not always alternate, for e.g. chats tend to happen like this

[ M, M, M, R, R, M, M, R, R, M ... and so on]

I am trying to concatenate continuously occurring strings of M's and R's. for the above example, I desire an output like this

Desired Output

[ "M M M", "R R", "M M" , "R R", "M ... and so on ]

An Example of Realistic Data:

Input --> ["M: Hi", "M: How are you?", "R: Heyy", "R: Im cool", "R: Wbu?"] (length=5)

Output --> ["M: Hi M: How are you?", "R: Heyy R: Im cool R: Wbu?"] (length = 2)

Is there a fast and more efficient way of doing this? I have already read this Stackoverflow link to solve this problem. But, I didn't find a solution there.

So far, this is what I have tried.

final= []
temp = ''
change = 0
for i,ele in enumerate(chats):
    if i>0:
        prev = chats[i-1][0]
        current = ele[0]

        if current == prev:
            continuous_string += chats[i-1]  
            continue
        else:
            continuous_string += chats[i-1]
            final.append(temp)
            temp = ''

Explanation of my code: I have chats list in which the starting character of every message is 'M' and starting character of every response is 'R'. I keep track of prev value and current value in the list, and when there is a change (A transition from M -> R or R -> M), I append everything collected in the continuous_string to final list.

Again, my question is: Is there a shortcut in Python or a function to do the same thing effectively in less number of lines?

Why are you doing + '. ' if there is no . in the desired output? — sanyassh
– sanyassh, Commented Feb 24, 2019 at 13:21
Ahh! Those are just messages which I want to concatenate with ". ". For the sake of the problem, their presence is irrelevant. Thanks for pointing out. I will make an edit! @Sanya — Sssssuppp
– Sssssuppp, Commented Feb 24, 2019 at 13:25
Please add some realistic sample data to your question so that people will stop posting useless answers that only work with the letters "M" and "R". — Aran-Fey
– Aran-Fey, Commented Feb 24, 2019 at 13:38

Mykola Zotko · Accepted Answer · 2019-09-09 05:13:15Z

5

You can use the function groupby():

from itertools import groupby

l = ['A', 'A', 'B', 'B']

[' '.join(g) for _, g in groupby(l)]
# ['A A', 'B B']

To group data from your example you need to add a key to the the groupby() function:

l = ["M: Hi", "M: How are you?", "R: Heyy", "R: Im cool", "R: Wbu?"]

[' '.join(g) for _, g in groupby(l, key=lambda x: x[0])]
# ['M: Hi M: How are you?', 'R: Heyy R: Im cool R: Wbu?']

As @TrebuchetMS mentioned in the comments the key lambda x: x.split(':')[0] might be more reliable. It depends on your data.

edited Sep 9, 2019 at 5:13

answered Feb 24, 2019 at 13:26

Mykola Zotko

18.2k7 gold badges88 silver badges91 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sssssuppp Over a year ago

I have edited the question a bit. Can you just show how those changes, please?

Mykola Zotko Over a year ago

@Satya I added the solution for your realistic data.

TrebledJ Over a year ago

Maybe x.partition(':')[0] or x.split(':')[0] in the lambda might be more reliable for data where the first letter is the same for different users. E.g. ["Megan: .", "Max: ."].

Sssssuppp Over a year ago

@MykolaZotko sorry I'm troubling you but can you give a brief explanation of how groupby works? Even a link which could explain it properly would be alright.

Mykola Zotko Over a year ago

@Satya You get an iterator, which reterns consecutive keys and groups. Like a dict, where values are groups. For groupby(‘abbbcc’) you get an iteraretor which looks like {‘a’: [‘a’], ‘b’: [‘b’, ‘b’, ‘b’], ‘c’: [‘c’, ‘c’]} (lists from previous example in a gropby abject are iterators).

molamk · Accepted Answer · 2019-02-24 14:27:11Z

Algorithm

Initialize a temporary item. This will help determine if the speaker has changed
For each item
- Extract the speaker
- If it's the same, append to the text of the last item of the array
- Else append a new item in the list containing the speaker and text

Implementation

def parse(x):
    parts = x.split(':')
    return parts[0], ' '.join(parts[1:]).strip()


def compress(l):
    ans = []
    prev = ''
    for x in l:
        curr, text = parse(x)
        if curr != prev:
            prev = curr
            ans.append(x)
        else:
            ans[len(ans) - 1] += f' {text}'
    return ans

Character names

IN:  ["M: Hi", "M: How are you?", "R: Heyy", "R: Im cool", "R: Wbu?"]
OUT: ['M: Hi How are you?', 'R: Heyy Im cool Wbu?']

String names

IN:  ["Mike: Hi", "Mike How are you?", "Mary: Heyy", "Mary: Im cool", "Mary: Wbu?"]
OUT: ['Mike: Hi How are you?', 'Mary: Heyy Im cool Wbu?']

Collectives™ on Stack Overflow

Is there a better way to concatenate continuous string elements in Python?

2 Answers 2

5 Comments

Algorithm

Implementation

Character names

String names

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Algorithm

Implementation

Character names

String names

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related