10

I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.

Ex:

>>> compress("ddaaaff")
'd2a3f2'


 def compress(s):
     count=0

     for i in range(0,len(s)):
         if s[i] == s[i-1]:
             count += 1
         c = s.count(s[i])

     return str(s[i]) + str(c)
1
  • 3
    What have you tried code wise? This looks pretty lazy without any effort put into it. Commented Sep 30, 2015 at 0:30

23 Answers 23

19

Here is a short python implementation of a compression function:

def compress(string):

    res = ""

    count = 1

    #Add in first character
    res += string[0]

    #Iterate through loop, skipping last one
    for i in range(len(string)-1):
        if(string[i] == string[i+1]):
            count+=1
        else:
            if(count > 1):
                #Ignore if no repeats
                res += str(count)
            res += string[i+1]
            count = 1
    #print last one
    if(count > 1):
        res += str(count)
    return res

Here are a few examples:

>>> compress("ddaaaff")
'd2a3f2'
>>> compress("daaaafffyy")
'da4f3y2'
>>> compress("mississippi")
'mis2is2ip2i'
Sign up to request clarification or add additional context in comments.

8 Comments

Thank you so much, the second step is where I was confused.
Glad to hear that :-)
Is there a way to do the reverse procedure?
Do you mean by "reverse procedure" as to uncompress text? If that's what you want, you can just iterate over the compressed string, and everytime you hit an integer n, you can print the last character n times.
Yes, what would the notation be for checking the integer in a string?
|
9

Short version with generators:

from itertools import groupby
import re
def compress(string):
    return re.sub(r'(?<![0-9])[1](?![0-9])', '', ''.join('%s%s' % (char, sum(1 for _ in group)) for char, group in groupby(string)))

(1) Grouping by chars with groupby(string)

(2) Counting length of group with sum(1 for _ in group) (because no len on group is possible)

(3) Joining into proper format

(4) Removing 1 chars for single items when there is a no digit before and after 1

1 Comment

This will not work with 11m character. Expected output should be m11 but it will give only m.
4

There are several reasons why this doesn't work. You really need to try debugging this yourself first. Put in a few print statements to trace the execution. For instance:

def compress(s):
    count=0

    for i in range(0, len(s)):
        print "Checking character", i, s[i]
        if s[i] == s[i-1]:
            count += 1
        c = s.count(s[i])
        print "Found", s[i], c, "times"

    return str(s[i]) + str(c)

print compress("ddaaaff")

Here's the output:

Checking character 0 d
Found d 2 times
Checking character 1 d
Found d 2 times
Checking character 2 a
Found a 3 times
Checking character 3 a
Found a 3 times
Checking character 4 a
Found a 3 times
Checking character 5 f
Found f 2 times
Checking character 6 f
Found f 2 times
f2

Process finished with exit code 0

(1) You throw away the results of all but the last letter's search. (2) You count all occurrences, not merely the consecutive ones. (3) You cast a string to a string -- redundant.

Try working through this example with pencil and paper. Write down the steps you use, as a human being, to parse the string. Work on translating those to Python.

7 Comments

How would I count all occurrences?
You did count all occurrences with the "count" function. Reworking the algorithm from here is your job; we'll help with code that's already written.
What would I do once I got the result of the first character?
Did you work through this with pencil & paper? What did you do there? [These are leading questions; give it a try.]
More specifically, how did you go about turning "ddaaaff" into "d2a3f2" when you worked on paper? Note the areas of the paper where you kept partial results: those are your variables. Write down the steps you used, and where you repeated things (loops).
|
1
x="mississippi"
res = ""
count = 0
while (len(x) > 0):
    count = 1
    res= ""
    for j in range(1, len(x)):
        if x[0]==x[j]:
            count= count + 1
        else:
            res = res + x[j]
    print(x[0], count, end=" ")
    x=res

1 Comment

Output: m 1 i 4 s 4 p 2
1

Just another simplest way to perform this:

def compress(str1):
    output = ''
    initial = str1[0]
    output = output + initial
    count = 1
    for item in str1[1:]:
        if item == initial:
            count = count + 1
        else:
            if count == 1:
                count = ''
            output = output + str(count)
            count = 1
            initial = item
            output = output + item
    print (output)

Which gives the output as required, examples:

>> compress("aaaaaaaccddddeehhyiiiuuo")
a7c2d4e2h2yi3u2o

>> compress("lllhhjuuuirrdtt")
l3h2ju3ir2dt

>> compress("mississippi")
mis2is2ip2i

Comments

1
from collections import Counter
def string_compression(string):
    counter = Counter(string)
    result = ''
    for k, v in counter.items():
        result = result + k + str(v)
    print(result)

1 Comment

DO NOT ANSWER CODE ONLY. Describe what you changed. Describe how it effects the outcome and what it improves. That greatly improves you answers quality ^^
0
input = "mississippi"
count = 1
for i in range(1, len(input) + 1):
    if i == len(input):
        print(input[i - 1] + str(count), end="")
        break
    else:
        if input[i - 1] == input[i]:
            count += 1
    else:
            print(input[i - 1] + str(count), end="")
            count = 1

Output : m1i1s2i1s2i1p2i1

1 Comment

I think the author wanted number for repeated character only, so it should "mis2is2p2i"
0
s=input("Enter the string:")
temp={}
result=" "
for x in s:
    if x in temp:
        temp[x]=temp[x]+1
    else:
        temp[x]=1
for key,value in temp.items():
    result+=str(key)+str(value)

print(result)

1 Comment

indentation of the last line is wrong, so explanation for the person asking the question would be great.
0

Here is something I wrote.

def stringCompression(str1):
  counter=0
  prevChar = str1[0]
  str2=""
  charChanged = False
  loopCounter = 0

  for char in str1:
      if(char==prevChar):
          counter+=1
          charChanged = False
      else:
          str2 += prevChar + str(counter)
          counter=1
          prevChar = char
          if(loopCounter == len(str1) - 1):
              str2 += prevChar + str(counter)
          charChanged = True
      loopCounter+=1
  if(not charChanged):
      str2+= prevChar + str(counter)

  return str2

Not the best code I guess. But works well.

a -> a1

aaabbbccc -> a3b3c3

Comments

0

This is a solution to the problem. But keep in mind that this method only effectively works if there's a lot of repetition, specifically if consecutive characters are repetitive. Otherwise, it will only worsen the situation.

e.g.,
AABCD --> A2B1C1D1
BcDG ---> B1c1D1G1

def compress_string(s):
    result = [""] * len(s)
    visited = None

    index = 0
    count = 1

    for c in s:
        if c == visited:
            count += 1
            result[index] = f"{c}{count}"
        else:
            count = 1
            index += 1
            result[index] = f"{c}{count}"
            visited = c

    return "".join(result)

Comments

0

You can simply achieve that by:

gstr="aaabbccccdddee"
last=gstr[0]
count=0
rstr=""
for i in gstr:
    if i==last:
        count=count+1
    elif i!=last:
        rstr=rstr+last+str(count)
        count=1
        last=i
rstr=rstr+last+str(count)
print ("Required string for given string {} after conversion is {}.".format(gstr,rstr))

1 Comment

Hi Vikas - Its recommended to add textual deails and code only answers are not highly appreciated on this forum.
0

Here is a short python implementation of a compression function:

#d=compress('xxcccdex')
#print(d)

def compress(word):
    list1=[]
    for i in range(len(word)):
        list1.append(word[i].lower())
    num=0
    dict1={}
    for i in range(len(list1)):
        if(list1[i] in list(dict1.keys())):
            dict1[list1[i]]=dict1[list1[i]]+1
        else:
            dict1[list1[i]]=1

    s=list(dict1.keys())
    v=list(dict1.values())
    word=''
    for i in range(len(s)):
        word=word+s[i]+str(v[i])
    return word

Comments

0

Below logic will work irrespective of

  1. Data structure
  2. Group By OR Set or any sort of compression logic
  3. Capital or non-capital characters
  4. Character repeat if not sequential

    def fstrComp_1(stng):
    sRes = ""
    cont = 1        
    for i in range(len(stng)):
    
     if not stng[i] in sRes:
        stng = stng.lower()
        n = stng.count(stng[i])
        if  n > 1: 
            cont = n
            sRes += stng[i] + str(cont)
        else:
            sRes += stng[i]
    
        print(sRes)
    
    fstrComp_1("aB*b?cC&")
    

Comments

0

I wanted to do it by partitioning the string. So aabbcc would become: ['aa', 'bb', 'cc']

This is how I did it:

def compression(string):

    # Creating a partitioned list
    alist = list(string)
    master = []
    n = len(alist)

    for i in range(n):
        if alist[i] == alist[i-1]:
            master[-1] += alist[i]
        else:
            master += alist[i]


    # Adding the partitions together in a new string
    newString = "" 
    for i in master:
        newString += i[0] + str(len(i))

    # If the newString is longer than the old string, return old string (you've not 
    # compressed it in length)
    if len(newString) > n:
        return string
    return newString



string = 'aabbcc'
print(compression(string))

Comments

0

string = 'aabccccd' output = '2a3b4c4d'

new_string = " "
count = 1
for i in range(len(string)-1):
    if string[i] == string[i+1]:
        count = count + 1
    else:         
        new_string =  new_string + str(count) + string[i]
        count = 1 
new_string = new_string + str(count) + string[i+1]    
print(new_string)

Comments

0

For a coding interview, where it was about the algorithm, and not about my knowledge of Python, its internal representation of data structures, or the time complexity of operations such as string concatenation:

def compress(message: str) -> str:
    output = ""
    length = 0
    previous: str = None
    for char in message:
        if previous is None or char == previous:
            length += 1
        else:
            output += previous
            if length > 1:
                output += str(length)
            length = 1
        previous = char
    if previous is not None:
        output += previous
        if length > 1:
            output += str(length)
    return output

For code I'd actually use in production, not reinventing any wheels, being more testable, using iterators until the last step for space efficiency, and using join() instead of string concatenation for time efficiency:

from itertools import groupby
from typing import Iterator


def compressed_groups(message: str) -> Iterator[str]:
    for char, group in groupby(message):
        length = sum(1 for _ in group)
        yield char + (str(length) if length > 1 else "")


def compress(message: str) -> str:
    return "".join(compressed_groups(message))

Taking things a step further, for even more testability:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


def segments(message: str) -> Iterator[Segment]:
    for char, group in groupby(message):
        yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return "".join(str(s) for s in segments(message))

Going all-out and providing a Value Object CompressedString:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


class CompressedString(str):

    @classmethod
    def compress(cls, message: str) -> "CompressedString":
        return cls("".join(str(s) for s in cls._segments(message)))

    @staticmethod
    def _segments(message: str) -> Iterator[Segment]:
        for char, group in groupby(message):
            yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return CompressedString.compress(message)

Comments

0
def compress(val):
    print(len(val))
    end=0
    count=1
    result=""
    for i in range(0,len(val)-1):
        #print(val[i],val[i+1])
        if val[i]==val[i+1]:
            count=count+1
            #print(count,val[i])
        elif val[i]!=val[i+1]:
            #print(end,i)
            result=result+val[end]+str(count)
            end=i+1
            count=1
    result=result+val[-1]+str(count)
    return result
res=compress("I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.")
print(len(res))

Comments

0

Use python's standard library re.

def compress(string):
    import re
    p=r'(\w+?)\1+' # non greedy, group1 1
    sub_str=string
    for m in re.finditer(p,string):
        num=m[0].count(m[1])
        sub_str=re.sub(m[0],f'{m[1]}{num}',sub_str)
    return sub_str
string='aaaaaaaabbbbbbbbbcccccccckkkkkkkkkkkppp'
string2='ababcdcd'
string3='abcdabcd' 
string4='ababcdabcdefabcdcd' 

print(compress(string))
print(compress(string2))
print(compress(string3))
print(compress(string4))

Resut:

a8b9c8k11p3                                                                     
ab2cd2                                                                          
abcd2
ab2cdabcdefabcd2 

Comments

0

Using generators:

input = "aaaddddffwwqqaattttttteeeeeee"

from itertools import groupby
print(''.join(([char+str(len(list(group))) for char, group in groupby(input)])))

Comments

0
def compress(string):

    # taking out unique characters from the string

    unique_chars = []
    for c in string:
        if not c in unique_chars:
            unique_chars.append(c)

    # Now count the characters

    res = ""

    for i in range(len(unique_chars)):
        count = string.count(unique_chars[i])
        res += unique_chars[i]+str(count)

    return res


string = 'aabccccd'
compress(string)

Comments

0
from collections import Counter

def char_count(input_str):
    my_dict = Counter(input_str)
    print(my_dict)
    output_str = ""
    for i in input_str:
        if i not in output_str:
            output_str += i
            output_str += str(my_dict[i])
    return output_str

result = char_count("zddaaaffccc")
print(result)

Comments

0

This is the modification of Patrick Yu's code. It code fails for the below test cases.

SAMPLE INPUT:
c
aaaaaaaaaabcdefgh

EXPECTED OUTPUT:
c1
a10b1c1d1e1f1g1h1

OUPUT OF Patrick's Code:
c
a10bcdefgh

Below is the modified code:


def Compress(S):
    Ans = S[0]
    count = 1
    for i in range(len(S)-1):
        if S[i] == S[i+1]:
            count += 1
        else:
            if count >= 1:
                Ans += str(count)
            Ans += S[i+1]
            count = 1
    if count>=1:
        Ans += str(count)
    return Ans

Just the condition must be changed from greater(">") to greater than equal to(">=") when comparing the count with 1.

Comments

-1
str_input = 'aabbccabca'
output = 'a2b2c2a1b1c1a1'
temp = str_input[0]
new_str = ''
tmp_dict = {}
for i in list(str_input):
    if temp == i:
        if i in tmp_dict.keys():
            tmp_dict[i]=tmp_dict[i]+1
        else:
            tmp_dict.update({i:1})
    else:
        for key in tmp_dict:
            new_str+=key+str(tmp_dict[key])
        tmp_dict = {}
        tmp_dict.update({i:1})
        temp = i
for key in tmp_dict:
    new_str+=key+str(tmp_dict[key])
print(new_str)

1 Comment

This question already has dozens of answers. Dumping code without any explanation isn't very helpful at this point. How does this improve upon what is already here? Why should we use this instead of one of the other answers?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.