0

I have a DataFrame that looks like:

import pandas as pd

df = pd.DataFrame(columns=['date', 'type', 'version'],
                  data=[
                      ['2017-07-01', 'critical::issue::A', 'version1'],
                      ['2017-07-01', 'critical::issue::A', 'version2'],
                      ['2017-07-01', 'hardware::issue::B', 'version1'],
                  ])

I'm returning the size of all the unique values for 'type' using the following;

sub_cat = ['critical::',
           'hardware::',
           'software::'
           ]

for cat in sub_cat:
    x = df[df.type.str.startswith(cat)]

    count = x.groupby('type').size()
    if len(count) > 0:
        print(count)
    else:
        print(cat, '0')

Results are correct but the output is sloppy:

type
critical::issue::A    2
dtype: int64
type
hardware::issue::B    1
dtype: int64
  software:: 0

I'd like to format the output to make it more readable like the following example.

type
critical::issue::A    2
hardware::issue::B    1
software:: 0

Any suggestions?

4 Answers 4

1

An alternative solution, if you just change:

print(count)

To:

print(count.to_string(header=False))

You get:

critical::issue::A    2
hardware::issue::B    1
software:: 0

So maybe add a print("type") before the loop and you are there?

Sign up to request clarification or add additional context in comments.

1 Comment

Perfect simple solution. Thank you Anton
0

You could loop through the rows of your count groupby variable to output the lines 1 by 1:

for cat in sub_cat:
    x = df[df.type.str.startswith(cat)]
    count = x.groupby('type').size()
    if len(count) > 0:
        for ind, row in count.iteritems():
            print(ind, row)
    else:
        print(cat, '0')

Output is as follows:

critical::issue::A 2
hardware::issue::B 1
software:: 0

Comments

0

Here is your code with suggested changes:

import pandas as pd

df = pd.DataFrame(columns=['date', 'type', 'version'],
                  data=[
                      ['2017-07-01', 'critical::issue::A', 'version1'],
                      ['2017-07-01', 'critical::issue::A', 'version2'],
                      ['2017-07-02', 'critical::issue::B', 'version3'],
                      ['2017-07-01', 'hardware::issue::B', 'version1'],
                  ])  

sub_cat = ['critical::',
           'hardware::',
           'software::']

print("type")

for cat in sub_cat:
    x = df[df.type.str.startswith(cat)]

    count = x.groupby('type').size()

    # 'count' is a Series object
    for i in range(len(count)):
        print("{}\t{}".format(count.index[i], count[i]))

    if len(count) == 0:
        print("{}\t{}".format(cat, 0)) 

It produces:

type
critical::issue::A      2
critical::issue::B      1
hardware::issue::B      1
software::      0

Comments

0

Consider this Pandas approach:

In [79]: res = df.groupby('type').size()

In [80]: res
Out[80]:
type
critical::issue::A    2
hardware::issue::B    1
dtype: int64

In [81]: s = pd.Series(sub_cat)

In [82]: idx = s[~s.isin(df.type.str.extract(r'(\w+::)', expand=False).unique())].values

In [83]: res = res.append(pd.Series([0] * len(idx), index=idx))

In [84]: res
Out[84]:
critical::issue::A    2
hardware::issue::B    1
software::            0
dtype: int64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.