Clean CSS Style Blocks from Pandas DataFrame

Question

I have a df with some records that look like this:

Untitledp { margin-top: 0px;margin-bottom: 0px;line-height: 1.15; } body { font-family: 'Times New Roman';font-style: Normal;font-weight: normal;font-size: 13.3333333333333px; } .Normal { telerik-style-type: paragraph;telerik-style-name: Normal;border-collapse: collapse; } .TableNormal { telerik-style-type: table;telerik-style-name: TableNormal;border-collapse: collapse; } .s_F0039783 { telerik-style-type: local;font-size: 13.34px; } .s_45EBF2E0 { telerik-style-type: local;font-family: 'Times New Roman';font-size: 13.3333333333333px;color: #000000; } A sentence that I actually want.

I want to remove the CSS style blocks and only return the sentence at the end. The number of css blocks can be different for each record. All records started with "Untitledp" and end with the text I want (with no style blocks after the text).

How should I clean these blocks? I use BeautifulSoup to clean html tags, but it doesn't apply to these blocks.

Unatiel · Accepted Answer · 2017-08-05 16:27:36Z

1

A regex can be used for this, with sub() :

regex = re.compile('.+\s*{.*}')
regex.sub('', s) # s is copy paste of your sample
' A sentence that I actually want.'

At least it works in this example. Be careful though, if there is {} in the sentence you're trying to get, this will fail. However, since sentences don't typically contain these characters...

edited Aug 5, 2017 at 16:27

answered Aug 4, 2017 at 20:43

Unatiel

1,0801 gold badge11 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Clean CSS Style Blocks from Pandas DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related