String Compression using python

Question

Everything else seems to work just fine, but last character is always off by 1. For example, if I input abcccddd, I get a1b1c3d2 but I should get a1b1c3d3. Any hint would be much appreciated!

Prompt: String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. If the "compressed" string would not become smaller than the original string, your method should return the original string. You can assume the string has only uppercase and lowercase letters (a - z). Do the easy thing first. Compress the string, then compare the lengths. Be careful that you aren't repeatedly concatenating strings together, this can be very inefficient.

def compression(string): 
    hash = {}
    list = []
    count = 0
    for i in range(len(string) - 1): 
        if string[i - 1] != string[i] or i == 0: 
            if string[i] != string[i + 1] or i == len(string) - 2: 
                count = count + 1
                list.append(str(string[i]))
                list.append(str(count))
                count = 0
            elif string[i] == string[i + 1]: 
                count = count + 1
        elif string[i - 1] == string[i]:
            if string[i] != string[i + 1] or i == len(string) - 2: 
                count = count + 1
                list.append(str(string[i]))
                list.append(str(count))
                count = 0
            if string[i] == string[i + 1]: 
                count = count + 1
        print(list)
    result =  "".join(list)
    if len(result) == len(string): 
        return string
    else: 
        return result
string = "abcccfffgggg"
compression(string)

You're making this way more complicated than it needs to be. Each time through the loop, save the current character in a variable. On subsequent iterations, check whether the current character is the same as that variable. If it is, increment the counter, otherwise output the counter followed by the saved character, then reset the counter to 1. — Barmar
– Barmar, Commented Jun 2, 2022 at 23:10
1) does the code really need to consider the previous and the next characters, in addition to the current one? 2) What should happen the first time through the loop, when there isn't a previous character? What should happen the last time through the loop, when there isn't a last character? What should happen if there is only one character? What should happen if there are no characters at all? 3) What do you expect to happen when -1 is used as an index for the string? — Karl Knechtel
– Karl Knechtel, Commented Jun 2, 2022 at 23:16

Daniel Hao · Accepted Answer · 2022-06-02 23:53:04Z

3

If you are up to the itertools module - try groupby:

s = 'bbbbaacddd' # dddeeef gg'
groups = [(label, len(list(group))) 
                  for label, group in groupby(s) if label] #

compressed = "".join("{}{}".format(label, count) for label, count in groups)

print(compressed)  #    b4a2c1d3

Another way to achieve it, is to use more_itertools.run_length.


>>> compressed = list(run_length.encode(s))
>>> compressed
[('b', 4), ('a', 2), ('c', 1), ('d', 3)]
>>> ''.join("{}{}".format(label, count) for label, count in compressed)
'b4a2c1d3'

edited Jun 2, 2022 at 23:53

answered Jun 2, 2022 at 23:26

Daniel Hao

4,9793 gold badges13 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

EnriqueBet · Accepted Answer · 2022-06-02 23:16:01Z

1

You can make this easier by using a dictionary and deleting the characters whenever you use them, which counts the number of characters you want to compress

string = "aabccccaaaa"

output = ""
lastchar = string[0]
counts = {lastchar:1}

for i in range(1, len(string)):
    s = string[i]
    if s == lastchar:
        counts[s] += 1
    else:
        output += f"{lastchar}{counts[lastchar]}" if counts[lastchar] > 1 else lastchar
        del counts[lastchar]
        counts[s] = 1
    lastchar = s

print(output+f"{lastchar}{counts[lastchar]}" if counts[lastchar] > 1 else lastchar)

answered Jun 2, 2022 at 23:16

EnriqueBet

1,4742 gold badges15 silver badges23 bronze badges

1 Comment

Mark Over a year ago

What’s the point of using a dict that only ever has one key? Why not just a single integer variable? Also this is bad for i in range(1, len(string) when you can just use for s in string[1:].

Subhendu Sekhar Baug · Accepted Answer · 2023-09-23 10:09:13Z

0

Python function to perform string compression. For example, "aabcccccaaa" would become "a2b1c5a3".

def string_compression(s):
    result = ""
    if not s:
        return result
    char_count = 1  # Initialize character count to 1
    for i in range(1, len(s)):
        if s[i] == s[i - 1]:
            char_count += 1
        else:
            result += s[i - 1] + str(char_count)
            char_count = 1
    result += s[-1] + str(char_count)
    return result

print(string_compression('aabcccccaaa'))

edited Sep 23, 2023 at 10:09

answered Sep 23, 2023 at 8:03

Subhendu Sekhar Baug

11 silver badge1 bronze badge

Comments

The fourth bird · Accepted Answer · 2022-06-03 16:36:55Z

-1

You could use a pattern with a backreference ([a-z])\1 matching the repeating characters, and assemble the final string with counts using the length of the matches.

Then you can compare the length of the original string and the assembled string.

Example code

import re

strings = [
    "abcccddd",
    "aabcccccaaa",
    "abcd",
    "aabbccddeeffffffffffffff",
    "a"
]

def compression(s):
    res = ''.join([x.group(1) + str(len(x.group())) for x in re.finditer(r"([a-z])\1*", s, re.I)])
    return res if len(s) >= len(res) else s

for s in strings:
    print(compression(s))

Output

a1b1c3d3
a2b1c5a3
abcd
a2b2c2d2e2f14
a

answered Jun 3, 2022 at 16:36

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Collectives™ on Stack Overflow

String Compression using python

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related