Create a compress function in Python?

Question

I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.

Ex:

>>> compress("ddaaaff")
'd2a3f2'


 def compress(s):
     count=0

     for i in range(0,len(s)):
         if s[i] == s[i-1]:
             count += 1
         c = s.count(s[i])

     return str(s[i]) + str(c)

What have you tried code wise? This looks pretty lazy without any effort put into it. — Jesse Glover
– Jesse Glover, Commented Sep 30, 2015 at 0:30

Patrick Yu · Accepted Answer · 2015-09-30 02:42:46Z

19

Here is a short python implementation of a compression function:

def compress(string):

    res = ""

    count = 1

    #Add in first character
    res += string[0]

    #Iterate through loop, skipping last one
    for i in range(len(string)-1):
        if(string[i] == string[i+1]):
            count+=1
        else:
            if(count > 1):
                #Ignore if no repeats
                res += str(count)
            res += string[i+1]
            count = 1
    #print last one
    if(count > 1):
        res += str(count)
    return res

Here are a few examples:

>>> compress("ddaaaff")
'd2a3f2'
>>> compress("daaaafffyy")
'da4f3y2'
>>> compress("mississippi")
'mis2is2ip2i'

answered Sep 30, 2015 at 2:42

Patrick Yu

9721 gold badge7 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Cero Over a year ago

Thank you so much, the second step is where I was confused.

Patrick Yu Over a year ago

Glad to hear that :-)

Cero Over a year ago

Is there a way to do the reverse procedure?

Patrick Yu Over a year ago

Do you mean by "reverse procedure" as to uncompress text? If that's what you want, you can just iterate over the compressed string, and everytime you hit an integer n, you can print the last character n times.

Cero Over a year ago

Yes, what would the notation be for checking the integer in a string?

|

Jagrut Trivedi · Accepted Answer · 2020-05-12 08:55:39Z

9

Short version with generators:

from itertools import groupby
import re
def compress(string):
    return re.sub(r'(?<![0-9])[1](?![0-9])', '', ''.join('%s%s' % (char, sum(1 for _ in group)) for char, group in groupby(string)))

(1) Grouping by chars with groupby(string)

(2) Counting length of group with sum(1 for _ in group) (because no len on group is possible)

(3) Joining into proper format

(4) Removing 1 chars for single items when there is a no digit before and after 1

edited May 12, 2020 at 8:55

Jagrut Trivedi

1,3191 gold badge14 silver badges18 bronze badges

answered Apr 5, 2017 at 8:41

Alexandre Fedosov

1414 silver badges5 bronze badges

1 Comment

Jagrut Trivedi Over a year ago

This will not work with 11m character. Expected output should be m11 but it will give only m.

Prune · Accepted Answer · 2015-09-30 00:48:35Z

4

There are several reasons why this doesn't work. You really need to try debugging this yourself first. Put in a few print statements to trace the execution. For instance:

def compress(s):
    count=0

    for i in range(0, len(s)):
        print "Checking character", i, s[i]
        if s[i] == s[i-1]:
            count += 1
        c = s.count(s[i])
        print "Found", s[i], c, "times"

    return str(s[i]) + str(c)

print compress("ddaaaff")

Here's the output:

Checking character 0 d
Found d 2 times
Checking character 1 d
Found d 2 times
Checking character 2 a
Found a 3 times
Checking character 3 a
Found a 3 times
Checking character 4 a
Found a 3 times
Checking character 5 f
Found f 2 times
Checking character 6 f
Found f 2 times
f2

Process finished with exit code 0

(1) You throw away the results of all but the last letter's search. (2) You count all occurrences, not merely the consecutive ones. (3) You cast a string to a string -- redundant.

Try working through this example with pencil and paper. Write down the steps you use, as a human being, to parse the string. Work on translating those to Python.

answered Sep 30, 2015 at 0:48

Prune

78k14 gold badges63 silver badges83 bronze badges

7 Comments

Cero Over a year ago

How would I count all occurrences?

Prune Over a year ago

You did count all occurrences with the "count" function. Reworking the algorithm from here is your job; we'll help with code that's already written.

Cero Over a year ago

What would I do once I got the result of the first character?

Prune Over a year ago

Did you work through this with pencil & paper? What did you do there? [These are leading questions; give it a try.]

Prune Over a year ago

More specifically, how did you go about turning "ddaaaff" into "d2a3f2" when you worked on paper? Note the areas of the paper where you kept partial results: those are your variables. Write down the steps you used, and where you repeated things (loops).

|

Puneet Chaudhary · Accepted Answer · 2018-06-06 06:48:14Z

1

x="mississippi"
res = ""
count = 0
while (len(x) > 0):
    count = 1
    res= ""
    for j in range(1, len(x)):
        if x[0]==x[j]:
            count= count + 1
        else:
            res = res + x[j]
    print(x[0], count, end=" ")
    x=res

answered Jun 6, 2018 at 6:48

Puneet Chaudhary

111 bronze badge

1 Comment

Puneet Chaudhary Over a year ago

Output: m 1 i 4 s 4 p 2

Viv · Accepted Answer · 2019-01-02 18:11:35Z

1

Just another simplest way to perform this:

def compress(str1):
    output = ''
    initial = str1[0]
    output = output + initial
    count = 1
    for item in str1[1:]:
        if item == initial:
            count = count + 1
        else:
            if count == 1:
                count = ''
            output = output + str(count)
            count = 1
            initial = item
            output = output + item
    print (output)

Which gives the output as required, examples:

>> compress("aaaaaaaccddddeehhyiiiuuo")
a7c2d4e2h2yi3u2o

>> compress("lllhhjuuuirrdtt")
l3h2ju3ir2dt

>> compress("mississippi")
mis2is2ip2i

answered Jan 2, 2019 at 18:11

Viv

2051 gold badge2 silver badges8 bronze badges

Comments

Gaganpreet Sran · Accepted Answer · 2020-05-13 03:41:35Z

1

from collections import Counter
def string_compression(string):
    counter = Counter(string)
    result = ''
    for k, v in counter.items():
        result = result + k + str(v)
    print(result)

answered May 13, 2020 at 3:41

Gaganpreet Sran

111 bronze badge

1 Comment

publicdomain Over a year ago

DO NOT ANSWER CODE ONLY. Describe what you changed. Describe how it effects the outcome and what it improves. That greatly improves you answers quality ^^

tanz · Accepted Answer · 2018-08-14 17:33:15Z

0

input = "mississippi"
count = 1
for i in range(1, len(input) + 1):
    if i == len(input):
        print(input[i - 1] + str(count), end="")
        break
    else:
        if input[i - 1] == input[i]:
            count += 1
    else:
            print(input[i - 1] + str(count), end="")
            count = 1

Output : m1i1s2i1s2i1p2i1

answered Aug 14, 2018 at 17:33

tanz

1

1 Comment

Baptiste Mille-Mathias Over a year ago

I think the author wanted number for repeated character only, so it should "mis2is2p2i"

Syed Imad · Accepted Answer · 2018-09-03 17:22:24Z

0

s=input("Enter the string:")
temp={}
result=" "
for x in s:
    if x in temp:
        temp[x]=temp[x]+1
    else:
        temp[x]=1
for key,value in temp.items():
    result+=str(key)+str(value)

print(result)

answered Sep 3, 2018 at 17:22

Syed Imad

1

1 Comment

Baptiste Mille-Mathias Over a year ago

indentation of the last line is wrong, so explanation for the person asking the question would be great.

vramazing · Accepted Answer · 2019-09-23 18:39:42Z

0

Here is something I wrote.

def stringCompression(str1):
  counter=0
  prevChar = str1[0]
  str2=""
  charChanged = False
  loopCounter = 0

  for char in str1:
      if(char==prevChar):
          counter+=1
          charChanged = False
      else:
          str2 += prevChar + str(counter)
          counter=1
          prevChar = char
          if(loopCounter == len(str1) - 1):
              str2 += prevChar + str(counter)
          charChanged = True
      loopCounter+=1
  if(not charChanged):
      str2+= prevChar + str(counter)

  return str2

Not the best code I guess. But works well.

a -> a1

aaabbbccc -> a3b3c3

answered Sep 23, 2019 at 18:39

vramazing

921 gold badge1 silver badge6 bronze badges

Comments

ishmam · Accepted Answer · 2019-10-14 11:12:52Z

0

This is a solution to the problem. But keep in mind that this method only effectively works if there's a lot of repetition, specifically if consecutive characters are repetitive. Otherwise, it will only worsen the situation.

e.g.,
AABCD --> A2B1C1D1
BcDG ---> B1c1D1G1

def compress_string(s):
    result = [""] * len(s)
    visited = None

    index = 0
    count = 1

    for c in s:
        if c == visited:
            count += 1
            result[index] = f"{c}{count}"
        else:
            count = 1
            index += 1
            result[index] = f"{c}{count}"
            visited = c

    return "".join(result)

answered Oct 14, 2019 at 11:12

ishmam

962 bronze badges

Comments

Pedram · Accepted Answer · 2019-12-12 10:06:47Z

0

You can simply achieve that by:

gstr="aaabbccccdddee"
last=gstr[0]
count=0
rstr=""
for i in gstr:
    if i==last:
        count=count+1
    elif i!=last:
        rstr=rstr+last+str(count)
        count=1
        last=i
rstr=rstr+last+str(count)
print ("Required string for given string {} after conversion is {}.".format(gstr,rstr))

edited Dec 12, 2019 at 10:06

Pedram

3,9303 gold badges23 silver badges37 bronze badges

answered Dec 12, 2019 at 9:06

vikashkmr3188

1

1 Comment

www.hybriscx.com Over a year ago

Hi Vikas - Its recommended to add textual deails and code only answers are not highly appreciated on this forum.

Julian · Accepted Answer · 2020-01-04 15:21:01Z

0

Here is a short python implementation of a compression function:

#d=compress('xxcccdex')
#print(d)

def compress(word):
    list1=[]
    for i in range(len(word)):
        list1.append(word[i].lower())
    num=0
    dict1={}
    for i in range(len(list1)):
        if(list1[i] in list(dict1.keys())):
            dict1[list1[i]]=dict1[list1[i]]+1
        else:
            dict1[list1[i]]=1

    s=list(dict1.keys())
    v=list(dict1.values())
    word=''
    for i in range(len(s)):
        word=word+s[i]+str(v[i])
    return word

edited Jan 4, 2020 at 15:21

Julian

37.3k24 gold badges138 silver badges196 bronze badges

answered Jan 4, 2020 at 14:58

sagar bhadra

1

Comments

user4331 · Accepted Answer · 2020-01-25 09:18:18Z

0

Below logic will work irrespective of

Data structure
Group By OR Set or any sort of compression logic
Capital or non-capital characters

Character repeat if not sequential

def fstrComp_1(stng):
sRes = ""
cont = 1        
for i in range(len(stng)):

 if not stng[i] in sRes:
    stng = stng.lower()
    n = stng.count(stng[i])
    if  n > 1: 
        cont = n
        sRes += stng[i] + str(cont)
    else:
        sRes += stng[i]

    print(sRes)

fstrComp_1("aB*b?cC&")

answered Jan 25, 2020 at 9:18

user4331

12 bronze badges

Comments

Xavier Chia · Accepted Answer · 2020-03-03 03:21:28Z

I wanted to do it by partitioning the string. So aabbcc would become: ['aa', 'bb', 'cc']

This is how I did it:

def compression(string):

    # Creating a partitioned list
    alist = list(string)
    master = []
    n = len(alist)

    for i in range(n):
        if alist[i] == alist[i-1]:
            master[-1] += alist[i]
        else:
            master += alist[i]


    # Adding the partitions together in a new string
    newString = "" 
    for i in master:
        newString += i[0] + str(len(i))

    # If the newString is longer than the old string, return old string (you've not 
    # compressed it in length)
    if len(newString) > n:
        return string
    return newString



string = 'aabbcc'
print(compression(string))

AutomationTest · Accepted Answer · 2020-04-30 23:34:03Z

0

string = 'aabccccd' output = '2a3b4c4d'

new_string = " "
count = 1
for i in range(len(string)-1):
    if string[i] == string[i+1]:
        count = count + 1
    else:         
        new_string =  new_string + str(count) + string[i]
        count = 1 
new_string = new_string + str(count) + string[i+1]    
print(new_string)

answered Apr 30, 2020 at 23:34

AutomationTest

1092 gold badges3 silver badges15 bronze badges

Comments

tobych · Accepted Answer · 2020-05-18 07:48:35Z

For a coding interview, where it was about the algorithm, and not about my knowledge of Python, its internal representation of data structures, or the time complexity of operations such as string concatenation:

def compress(message: str) -> str:
    output = ""
    length = 0
    previous: str = None
    for char in message:
        if previous is None or char == previous:
            length += 1
        else:
            output += previous
            if length > 1:
                output += str(length)
            length = 1
        previous = char
    if previous is not None:
        output += previous
        if length > 1:
            output += str(length)
    return output

For code I'd actually use in production, not reinventing any wheels, being more testable, using iterators until the last step for space efficiency, and using join() instead of string concatenation for time efficiency:

from itertools import groupby
from typing import Iterator


def compressed_groups(message: str) -> Iterator[str]:
    for char, group in groupby(message):
        length = sum(1 for _ in group)
        yield char + (str(length) if length > 1 else "")


def compress(message: str) -> str:
    return "".join(compressed_groups(message))

Taking things a step further, for even more testability:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


def segments(message: str) -> Iterator[Segment]:
    for char, group in groupby(message):
        yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return "".join(str(s) for s in segments(message))

Going all-out and providing a Value Object CompressedString:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


class CompressedString(str):

    @classmethod
    def compress(cls, message: str) -> "CompressedString":
        return cls("".join(str(s) for s in cls._segments(message)))

    @staticmethod
    def _segments(message: str) -> Iterator[Segment]:
        for char, group in groupby(message):
            yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return CompressedString.compress(message)

user222213 · Accepted Answer · 2020-05-29 02:40:56Z

0

def compress(val):
    print(len(val))
    end=0
    count=1
    result=""
    for i in range(0,len(val)-1):
        #print(val[i],val[i+1])
        if val[i]==val[i+1]:
            count=count+1
            #print(count,val[i])
        elif val[i]!=val[i+1]:
            #print(end,i)
            result=result+val[end]+str(count)
            end=i+1
            count=1
    result=result+val[-1]+str(count)
    return result
res=compress("I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.")
print(len(res))

answered May 29, 2020 at 2:40

user222213

1111 gold badge2 silver badges12 bronze badges

Comments

harryghgim · Accepted Answer · 2020-08-28 04:34:13Z

0

Use python's standard library re.

def compress(string):
    import re
    p=r'(\w+?)\1+' # non greedy, group1 1
    sub_str=string
    for m in re.finditer(p,string):
        num=m[0].count(m[1])
        sub_str=re.sub(m[0],f'{m[1]}{num}',sub_str)
    return sub_str

string='aaaaaaaabbbbbbbbbcccccccckkkkkkkkkkkppp'
string2='ababcdcd'
string3='abcdabcd' 
string4='ababcdabcdefabcdcd' 

print(compress(string))
print(compress(string2))
print(compress(string3))
print(compress(string4))

Resut:

a8b9c8k11p3                                                                     
ab2cd2                                                                          
abcd2
ab2cdabcdefabcd2

edited Aug 28, 2020 at 4:34

answered Aug 28, 2020 at 3:32

harryghgim

1,5804 gold badges16 silver badges39 bronze badges

Comments

Tomerikoo · Accepted Answer · 2021-03-03 16:12:35Z

0

Using generators:

input = "aaaddddffwwqqaattttttteeeeeee"

from itertools import groupby
print(''.join(([char+str(len(list(group))) for char, group in groupby(input)])))

edited Mar 3, 2021 at 16:12

Tomerikoo

19.6k16 gold badges57 silver badges68 bronze badges

answered Mar 3, 2021 at 16:08

Malaya Raj Lenka

111 bronze badge

Comments

Mazhar · Accepted Answer · 2021-03-21 22:23:21Z

0

def compress(string):

    # taking out unique characters from the string

    unique_chars = []
    for c in string:
        if not c in unique_chars:
            unique_chars.append(c)

    # Now count the characters

    res = ""

    for i in range(len(unique_chars)):
        count = string.count(unique_chars[i])
        res += unique_chars[i]+str(count)

    return res


string = 'aabccccd'
compress(string)

answered Mar 21, 2021 at 22:23

Mazhar

1

Comments

Tomerikoo · Accepted Answer · 2021-08-08 18:36:20Z

0

from collections import Counter

def char_count(input_str):
    my_dict = Counter(input_str)
    print(my_dict)
    output_str = ""
    for i in input_str:
        if i not in output_str:
            output_str += i
            output_str += str(my_dict[i])
    return output_str

result = char_count("zddaaaffccc")
print(result)

edited Aug 8, 2021 at 18:36

Tomerikoo

19.6k16 gold badges57 silver badges68 bronze badges

answered Aug 8, 2021 at 18:02

srijitha

1

Comments

Suraj Vijay · Accepted Answer · 2022-04-05 15:03:40Z

0

This is the modification of Patrick Yu's code. It code fails for the below test cases.

SAMPLE INPUT:
c
aaaaaaaaaabcdefgh

EXPECTED OUTPUT:
c1
a10b1c1d1e1f1g1h1

OUPUT OF Patrick's Code:
c
a10bcdefgh

Below is the modified code:


def Compress(S):
    Ans = S[0]
    count = 1
    for i in range(len(S)-1):
        if S[i] == S[i+1]:
            count += 1
        else:
            if count >= 1:
                Ans += str(count)
            Ans += S[i+1]
            count = 1
    if count>=1:
        Ans += str(count)
    return Ans

Just the condition must be changed from greater(">") to greater than equal to(">=") when comparing the count with 1.

edited Apr 5, 2022 at 15:03

answered Sep 27, 2020 at 4:30

Suraj Vijay

134 bronze badges

Comments

Hemangi Santoki · Accepted Answer · 2023-04-12 22:24:43Z

-1

str_input = 'aabbccabca'
output = 'a2b2c2a1b1c1a1'
temp = str_input[0]
new_str = ''
tmp_dict = {}
for i in list(str_input):
    if temp == i:
        if i in tmp_dict.keys():
            tmp_dict[i]=tmp_dict[i]+1
        else:
            tmp_dict.update({i:1})
    else:
        for key in tmp_dict:
            new_str+=key+str(tmp_dict[key])
        tmp_dict = {}
        tmp_dict.update({i:1})
        temp = i
for key in tmp_dict:
    new_str+=key+str(tmp_dict[key])
print(new_str)

answered Apr 12, 2023 at 22:24

Hemangi Santoki

1

1 Comment

Chris Over a year ago

This question already has dozens of answers. Dumping code without any explanation isn't very helpful at this point. How does this improve upon what is already here? Why should we use this instead of one of the other answers?

Collectives™ on Stack Overflow

Create a compress function in Python?

23 Answers 23

8 Comments

1 Comment

7 Comments

1 Comment

Comments

1 Comment

1 Comment

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

23 Answers 23

8 Comments

1 Comment

7 Comments

1 Comment

Comments

1 Comment

1 Comment

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related