Regex help transforming Hexcodes and HTML snippet to Array and and simple list

Question

In the section below from the FA cheatsheet, I am trying to convert this to an array and a simple text file using notepad++ to scrape the following FA page as source or direct HTML copy.

First, I used the TexFx Plugin and it wont match the braces under TextFx->Quick options, which one should I try.
Second using help here on S.O., I also tried find: ^.*(fa-[^\s]*).* replace: \1 but since the copy-paste to NP++ from from the fa cheatsheet ended up in one-long line... the ^. did not work.
1. how to transform to -> 2D array, so I need to get CSV of 2 columns <div class="col-md-4 col-sm-6 col-lg-3">, and <i class="fa fa-bank"></i>
2. how to transform to -> simple text list with one description per line
fa-tumblr-square fa-bank ...

Please help me understand the regex snippet or the tool option whichever works is fine.

<div class="col-md-4 col-sm-6 col-lg-3">
  <i class="fa fa-fw"></i>
  fa-tty      
  <span class="muted">[&amp;#xf1e4;]</span>
</div>

<div class="col-md-4 col-sm-6 col-lg-3">
  <i class="fa fa-fw"></i>
  fa-tumblr      
  <span class="muted">[&amp;#xf173;]</span>
</div>

<div class="col-md-4 col-sm-6 col-lg-3">
  <i class="fa fa-fw"></i>
  fa-tumblr-square      
  <span class="muted">[&amp;#xf174;]</span>
</div>

Edit 1: @ShellFish Here is what I get, it shows nothing matched on using the regex with and withour newline option $. $$fa[^ ]*$$[^ ]*$ in the Replace I tried space and html comment as well ... $Here is what I get nothing matched on <code>$. $$fa[^ ]*$$[^ ]*$</code>$ Regex result on the Div list - does not match

ShellFish · Accepted Answer · 2015-07-05 12:12:11Z

1

Creating the HTML list

awk

You could use this using awk. First copy the site content in a text file. Then execute following script:

BEGIN {
    # set record separator to a space, file is split in records
    RS = " "
    # separate print variables using a double quote
    OFS = "\""
}
# if record (string in between spaces) is the word alias
$0 ~ "(alias)" {
    # skip this line and make sure line number isn't counted
    NR = NR - 1
    getline
}
# print if the record number is 1, 4, 7 (i.e. a symbol)
NR % 3 == 1 {
    print "<div class=", "col-md-4 col-sm-6 col-lg-3", ">"
    # $1 contains first field which is the entire record
    print "  <i class=", "fa fa-fw", ">" $1 "</i>"
}     
# print lines 2, 5...
NR % 3 == 2 {
    print "  " $1
}   
# analogous for lines 3, 6, 9 ...
NR % 3 == 0 { 
    # sub amp
    sub (/&/, "&amp;", $1)
    print "  <span class=", "muted", ">" $1 "</span>"
    print "</div>\n"
}

The comments should make the script clear. You can use it as following:

$ awk -f script.awk file

where file is the path to the file with the site content and script.awk contains above code.

Example usage:

$ awk -f script.awk file | head -n 11
<div class="col-md-4 col-sm-6 col-lg-3">
  <i class="fa fa-fw"></i>
  fa-adn
  <span class="muted">[&amp;#xf170;]</span>
</div>

<div class="col-md-4 col-sm-6 col-lg-3">
  <i class="fa fa-fw"></i>
  fa-align-center
  <span class="muted">[&amp;#xf037;]</span>
</div>

notepad

First remove all aliases from the file, look for
```
 (alias)
```
and remove all occurrences (including the leading space).
Look for the following pattern in the file:
```
(. )(fa[^ ]*)([^ ]*)
```
this matches exactly one item from the list. Replace this with the following string:
```
 <div class="col-md-4 col-sm-6 col-lg-3">\r\n<i class="fa fa-fw">$1</i>\r\n$2\r\n<span class="muted">$3</span>\r\n</div>\r\n\r\n
```
Here $i resembles the i-th group captured in the regex. A group is a regex in between ( and ). Perhaps you have to access the groups using \i if this doesn't work. The new replacement string becomes:
```
<div class="col-md-4 col-sm-6 col-lg-3">\r\n<i class="fa fa-fw">\1</i>\r\n\2\r\n<span class="muted">\3</span>\r\n</div>\r\n\r\n
```
substitute the amps, look for & and sub by &

Creating Item List

Creating this list can be done from either the html list of the copied file. Both times you simply need to grab (fa[^ ])*. This can be done for the one-line file as follows:

Remove alias again (see above).
Search pattern:
```
. (fa[^ ]*)[^ ]*
```
and replace for \1\r\n or $1\r\n if that doesn't work.

Creating `div` lines

To create the div lines simply match (. fa[^ ]*[^ ]*) and replace it with \1\r\n or $1\r\n if the backslash doesn't work. This will place newlines after each div entry.

edited Jul 5, 2015 at 12:12

answered Jul 4, 2015 at 21:21

ShellFish

4,5711 gold badge25 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

aggie Over a year ago

appreciate the answer. However, I only have to do this on win and I have Notepad++ on win, and no awk on win machine... I am trying to incorporate this into my build process.

ShellFish Over a year ago

Does notepad allow substitution using regex and backreferences?

aggie Over a year ago

notepad++ does notepad-plus-plus.org - the win Notepad is cr@p - I found an AWK scriptwriter online trying that now... tutorialspoint.com/execute_awk_online.php

ShellFish Over a year ago

@aggie I added a solution for notepad++, should work I think.

aggie Over a year ago

tried it... doesnt match for the regex, I pasted a picture above. I tried it on both the div code and this..

 fa-adjust [&#xf042;]  fa-adn [&#xf170;]  fa-align-center [&#xf037;]  fa-align-justify [&#xf039;]  fa-align-left [&#xf036;]  fa-align-right [&#xf038;]  fa-ambulance [&#xf0f9;]  fa-anchor [&#xf13d;]  fa-android [&#xf17b;]  fa-angellist [&#xf209;]  fa-angle-double-down [&#xf103;]

- trying to scrape this page fontawesome.io/icons

|

Collectives™ on Stack Overflow

Regex help transforming Hexcodes and HTML snippet to Array and and simple list

1 Answer 1

Creating the HTML list

awk

notepad

Creating Item List

Creating `div` lines

14 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Creating the HTML list

awk

notepad

Creating Item List

Creating div lines

14 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Creating `div` lines