3

Below is a sample string . How can i convert this string to a pandas Dataframe?

   str1 =
    """
    Feature Id & Feature Desc                             Status   Failed Total 
    ---------------------------------------------------   -------- ------ -----
    RKSPACE (RackSpace Test In)                           Passed   0      1     
    D1 (Drum 1 Test)                                      Passed   0      1     
    D2 (Drum 2 Test)                                      Passed   0      1     
    D3 (Drum 3 Test)                                      Passed   0      1     
    PRIMUS (PRIMUS Ink Test)                              Not-run  0      0     
    RGB (RGB Color Test)                                  Passed   0      1     
    YONO (App Test)                                       Not-run  0      0     
    PSENSE (Paper Sensor Test)                            Not-run  0      0     
    TFlag (Flag Test)                                     Not-run  0      0     
    MEMT (Memory Test)                                    Passed   0      1     
    CRG (CARRIAGE Test)                                   Not-run  0      0    
    """

I have tried the below code

    import pandas as pd
    from StringIO import StringIO        
    def get_dataframe(str1):
        test_data = StringIO(str1)
        df = pd.read_csv(test_data, sep=r'\s+', comment='--', engine='python')
        return df

The result I am getting is ugly and not correct. Result Image I have checked other posts, but didn't find any question that deals with spaces in the string. Normally, if there were no spaces in the 1st column, this would have been easy to get the Dataframe, but how do I convert it to dataframe preserving the same format as the str1? Any help would be appreciated . Thanks

1 Answer 1

3

You can use read_fwf:

str1 = """
Feature Id & Feature Desc                             Status   Failed Total 
---------------------------------------------------   -------- ------ -----
RKSPACE (RackSpace Test In)                           Passed   0      1     
D1 (Drum 1 Test)                                      Passed   0      1     
D2 (Drum 2 Test)                                      Passed   0      1     
D3 (Drum 3 Test)                                      Passed   0      1     
PRIMUS (PRIMUS Ink Test)                              Not-run  0      0     
RGB (RGB Color Test)                                  Passed   0      1     
YONO (App Test)                                       Not-run  0      0     
PSENSE (Paper Sensor Test)                            Not-run  0      0     
TFlag (Flag Test)                                     Not-run  0      0     
MEMT (Memory Test)                                    Passed   0      1     
CRG (CARRIAGE Test)                                   Not-run  0      0    
"""

df = pd.read_fwf(pd.compat.StringIO(str1), 
                 colspecs=[(0, 50), (51, 62), (63, 69), (70, 76)], 
                 skiprows=[2],
                 header=[1])
print (df)
      Feature Id & Feature Desc   Status  Failed  Total
0   RKSPACE (RackSpace Test In)   Passed       0      1
1              D1 (Drum 1 Test)   Passed       0      1
2              D2 (Drum 2 Test)   Passed       0      1
3              D3 (Drum 3 Test)   Passed       0      1
4      PRIMUS (PRIMUS Ink Test)  Not-run       0      0
5          RGB (RGB Color Test)   Passed       0      1
6               YONO (App Test)  Not-run       0      0
7    PSENSE (Paper Sensor Test)  Not-run       0      0
8             TFlag (Flag Test)  Not-run       0      0
9            MEMT (Memory Test)   Passed       0      1
10          CRG (CARRIAGE Test)  Not-run       0      0

Thanks @gyoza for simplify solution:

df = pd.read_fwf(pd.compat.StringIO(str1), 
                 skiprows=[2],
                 header=[1])
print (df)
      Feature Id & Feature Desc   Status  Failed  Total
0   RKSPACE (RackSpace Test In)   Passed       0      1
1              D1 (Drum 1 Test)   Passed       0      1
2              D2 (Drum 2 Test)   Passed       0      1
3              D3 (Drum 3 Test)   Passed       0      1
4      PRIMUS (PRIMUS Ink Test)  Not-run       0      0
5          RGB (RGB Color Test)   Passed       0      1
6               YONO (App Test)  Not-run       0      0
7    PSENSE (Paper Sensor Test)  Not-run       0      0
8             TFlag (Flag Test)  Not-run       0      0
9            MEMT (Memory Test)   Passed       0      1
10          CRG (CARRIAGE Test)  Not-run       0      0
Sign up to request clarification or add additional context in comments.

2 Comments

@gyoza - Thank you ;)
Thanks for the solution. When I use the exact solution as above, I am getting 'NaN' in all the rows and columns. But if I use 'header'=1 instead of 'header'=[1], it works fine. Anyway thanks for the solution again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.