Edit row of DataFrame IF row contains specific string

Question

I have multiple dataframes stored in a dictionary.

Each dataframe has 3 columns as shown below

exceldata_1['Sheet1']

                      0                     1       2

0   Sv2.55+Fv2.04R02[2022-01-01T00         16      29.464Z]
1   - SC                                   OK       NaN
2   - PC1 Number                            1       NaN
3   - PC1 Main Status                      OK       NaN
4   - PC1 PV                       4294954868       NaN
... ... ... ...
1046    - C Temperature                  17�C        NaN
1047    Sv2.55+Fv2.04R02[2022-01-01T23     16       30.782Z]
1048    - Level SS             High      NaN
1049    Sv2.55+Fv2.04R02[2022-01-01T23  16  34.235Z]
1050    Sv2.55+Fv2.04R02[2022-01-01T23  16  38.657Z]
1051 rows × 3 columns

I want to do this : Search each row of dataframe if it contains "Sv2." and change that row as follows Remove "Sv2.55+Fv2.04R02[" this part and combine the remaining data to form the date and time correctly in each column....Showing the desired outpu below...The last column can be deleted as it will not contain any data after performing this operation.

           0                    1                 2

0     2022-01-01               00:16:29          NaN
1   - SC                          OK             NaN
2   - PC1 Number                  1              NaN
3   - PC1 Main Status            OK              NaN
4   - PC1 PV                   4294954868        NaN
... ... ... ...
1046    - C Temperature           17�C           NaN
1047    2022-01-01             23:16:30          NaN
1048    - Level SS               High            NaN
1049    2022-01-01             23:16:34          NaN
1050    2022-01-01             23:16:38          NaN
1051 rows × 3 columns

How can I achieve this?

Could you please clarify your input format a bit more: Are those parts Unnamed: 0 the index of the dataframe, or is it actually part of column 0 and you want to contain it in column 0? Best would be if you do a df.head(10).to_dict() or so and add it to the question, to avoid confusion. — Timus
– Timus, Commented Aug 25, 2022 at 9:19
Thanks. I'm still a bit unsure regarding the format, but I have updated my first attempt (which only included the date part, and not the time). — Timus
– Timus, Commented Aug 25, 2022 at 11:25
Are the relevant string parts distributed over the 3 columns? (See my last edit.) — Timus
– Timus, Commented Aug 25, 2022 at 12:09

Irsyaduddin · Accepted Answer · 2022-08-25 08:24:34Z

1

Using regular expressions should work

for i in range(len(df)):
    text=df['0'][i]
    if re.search('Sv',text)!=None:
        item_list=re.split('\[|T|\s\s|Z',text[:-1])
        df.iloc[i,0]=item_list[1]
        df.iloc[i,1]=item_list[2]+':'+item_list[3]+':'+item_list[4]

edited Aug 25, 2022 at 8:24

answered Aug 25, 2022 at 7:30

Irsyaduddin

663 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Pradyumna TK Over a year ago

Hie @Irsyaduddin..glad you answered it in a short time...I could not understand "text" in re.search() and re.split()....Can you please elaborate if possible? But I tried to implement the same..It gives me "Name Error: 'text' is not defined.

Irsyaduddin Over a year ago

oh, I'm sorry, the text is supposed to be the text in your df column '0', i updated my answer

Pradyumna TK Over a year ago

@Irsyaduddin...Yup..understood that now :)...But I get an error "IndexError: list index out of range"...Think there is some problem with the split and list...I tried to check what is stored in the item_list and found ['Sv2.55+Fv2.04R02', '2022-01-01', '0'] only 3 items..but we were trying to access 4th element in the list.I tried to figure it but was not so successful :p

Irsyaduddin Over a year ago

ah, i see, can you give one example of the actual string? as im not sure the format for it, i copied yours and i got 2 whitespaces which is where the '\s\s' is for in the code

Pradyumna TK Over a year ago

@Iesyaduddin The actual string would be this Sv2.55+Fv2.04R02[2022-01-01T00 16 29.464Z] right?

|

Timus · Accepted Answer · 2022-08-25 12:26:53Z

With df one of your dataframes you could try the following:

m = df[0].str.contains("Sv2.")
ser = df.loc[m, 0] + " " + df.loc[m, 1] + " " + df.loc[m, 2]
datetime = pd.to_datetime(
    ser.str.extract(r"Sv2\..*?\[(.*?)\]")[0].str.replace(r"\s+", " ", regex=True),
    format="%Y-%m-%dT%H %M %S.%fZ"
)
df.loc[m, 0] = datetime.dt.strftime("%Y-%m-%d")
df.loc[m, 1] = datetime.dt.strftime("%H:%M:%S")
df.loc[m, 2] = np.NaN

First build a mask m that selects the rows that contain a "Sv2." in the first column.
Based on that build a series ser with the relevant strings, added together with a blank inbetween.
Use .str.extract to fetch the datetime-part via the capture group of a regex: Look for the "Sv2."-part, then go forward until the opening bracket "[", and then catch all until the closing bracket "]".
Convert those strings with pd.to_datetime to datetimes (see here for the format codes).
Extract the required parts with .dt.strftime into the resp. columns.

Alternative approach without real datetimes:

m = df[0].str.contains("Sv2.")
ser = df.loc[m, 0] + " " + df.loc[m, 1] + " " + df.loc[m, 2]
datetime = ser.str.extract(
    r"Sv2\..*?\[(\d{4}-\d{2}-\d{2}).*?(\d{2}\s+\d{2}\s+\d{2})\."
)
datetime[1] = datetime[1].str.replace(r"\s+", ":", regex=True)
df.loc[m, [0, 1]] = datetime
df.loc[m, 2] = np.NaN

Result for the following sample df (taken from your example)

                                0     1         2
0  Sv2.55+Fv2.04R02[2022-01-01T00    16  29.464Z]
1                            - SC    Ok       NaN
2                     - PC Number     1       NaN
3                         - PC MS    Ok       NaN
4                     - PC PValue     8       NaN
5                      - Level SS  High       NaN
6  Sv2.55+Fv2.04R02[2022-01-01T23    16  34.235Z]
7  Sv2.55+Fv2.04R02[2022-01-01T23    16  38.657Z]

is

             0         1   2
0   2022-01-01  00:16:29 NaN
1         - SC        Ok NaN
2  - PC Number         1 NaN
3      - PC MS        Ok NaN
4  - PC PValue         8 NaN
5   - Level SS      High NaN
6   2022-01-01  23:16:34 NaN
7   2022-01-01  23:16:38 NaN

Hello ...Thank You so much for investing your valuable time on my question....But I have achieved my desired result by making few changes to the answer given by @Irsyaduddin ... You answer looks very elaborate..Thanks for your time...Hope it will be helpful for others

Pradyumna TK · Accepted Answer · 2022-08-25 12:38:48Z

Thanks for the idea on how to proceed @Irsyaduddin ..With some modifications to his answer, I was able to achieve it.

Make sure all the data types in your dataframe are strings

import re
for i in range(len(df1)):
    text= (df1[0][i])+df1[1][i]+(df1[2][i]) #combining data from all cols
    if re.search('Sv',text)!=None:
        item_list=re.split('\[|T|Z',text)
        df1.iloc[i,0]=item_list[1]
        df1.iloc[i,1]=item_list[2][:2]+":"+item_list[2] 
                        [2:4]+":"+item_list[2][4:6]
        df1.iloc[i,2]='NaN'
  df1

Result:

                0                 1            2
    0   2022-01-01             00:16:29        NaN
    1   - Server Connection       OK           nan
    2   - PC1 Number               1           nan
    3   - PC1 MS                  OK           nan
    4   - PC1 PV               4294954868      nan
    ... ... ... ...
    1046    - C Temperature       17�C         nan
    1047    2022-01-01             23:16:30    NaN
    1048    - Level Sensor Status   High       nan
    1049    2022-01-01             23:16:34    NaN
    1050    2022-01-01             23:16:38    NaN
    1051 rows × 3 columns

Result of Split:

item_list
['Sv2.55+Fv2.04R02', '2022-01-01', '001629.464', '] ']

Collectives™ on Stack Overflow

Edit row of DataFrame IF row contains specific string

3 Answers 3

6 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related