0

In the section below from the FA cheatsheet, I am trying to convert this to an array and a simple text file using notepad++ to scrape the following FA page as source or direct HTML copy.

  • First, I used the TexFx Plugin and it wont match the braces under TextFx->Quick options, which one should I try.

  • Second using help here on S.O., I also tried find: ^.*(fa-[^\s]*).* replace: \1 but since the copy-paste to NP++ from from the fa cheatsheet ended up in one-long line... the ^. did not work.

    1. how to transform to -> 2D array, so I need to get CSV of 2 columns <div class="col-md-4 col-sm-6 col-lg-3">, and <i class="fa fa-bank"></i>
    2. how to transform to -> simple text list with one description per line

    fa-tumblr-square fa-bank ...

Please help me understand the regex snippet or the tool option whichever works is fine.


<div class="col-md-4 col-sm-6 col-lg-3">
  <i class="fa fa-fw"></i>
  fa-tty      
  <span class="muted">[&amp;#xf1e4;]</span>
</div>

<div class="col-md-4 col-sm-6 col-lg-3">
  <i class="fa fa-fw"></i>
  fa-tumblr      
  <span class="muted">[&amp;#xf173;]</span>
</div>

<div class="col-md-4 col-sm-6 col-lg-3">
  <i class="fa fa-fw"></i>
  fa-tumblr-square      
  <span class="muted">[&amp;#xf174;]</span>
</div>

Edit 1: @ShellFish Here is what I get, it shows nothing matched on using the regex with and withour newline option \(. \)\(fa[^ ]*\)\([^ ]*\) in the Replace I tried space and html comment as well ... Here is what I get nothing matched on <code>\(. \)\(fa[^ ]*\)\([^ ]*\)</code> Regex result on the Div list - does not match Regex result on the Div list - does not match

1 Answer 1

1

Creating the HTML list

awk

You could use this using . First copy the site content in a text file. Then execute following script:

BEGIN {
    # set record separator to a space, file is split in records
    RS = " "
    # separate print variables using a double quote
    OFS = "\""
}
# if record (string in between spaces) is the word alias
$0 ~ "(alias)" {
    # skip this line and make sure line number isn't counted
    NR = NR - 1
    getline
}
# print if the record number is 1, 4, 7 (i.e. a symbol)
NR % 3 == 1 {
    print "<div class=", "col-md-4 col-sm-6 col-lg-3", ">"
    # $1 contains first field which is the entire record
    print "  <i class=", "fa fa-fw", ">" $1 "</i>"
}     
# print lines 2, 5...
NR % 3 == 2 {
    print "  " $1
}   
# analogous for lines 3, 6, 9 ...
NR % 3 == 0 { 
    # sub amp
    sub (/&/, "&amp;", $1)
    print "  <span class=", "muted", ">" $1 "</span>"
    print "</div>\n"
}

The comments should make the script clear. You can use it as following:

$ awk -f script.awk file

where file is the path to the file with the site content and script.awk contains above code.

Example usage:

$ awk -f script.awk file | head -n 11
<div class="col-md-4 col-sm-6 col-lg-3">
  <i class="fa fa-fw"></i>
  fa-adn
  <span class="muted">[&amp;#xf170;]</span>
</div>

<div class="col-md-4 col-sm-6 col-lg-3">
  <i class="fa fa-fw"></i>
  fa-align-center
  <span class="muted">[&amp;#xf037;]</span>
</div>

notepad

  1. First remove all aliases from the file, look for

     (alias)
    

    and remove all occurrences (including the leading space).

  2. Look for the following pattern in the file:

    (. )(fa[^ ]*)([^ ]*)
    

    this matches exactly one item from the list. Replace this with the following string:

     <div class="col-md-4 col-sm-6 col-lg-3">\r\n<i class="fa fa-fw">$1</i>\r\n$2\r\n<span class="muted">$3</span>\r\n</div>\r\n\r\n
    

    Here $i resembles the i-th group captured in the regex. A group is a regex in between ( and ). Perhaps you have to access the groups using \i if this doesn't work. The new replacement string becomes:

    <div class="col-md-4 col-sm-6 col-lg-3">\r\n<i class="fa fa-fw">\1</i>\r\n\2\r\n<span class="muted">\3</span>\r\n</div>\r\n\r\n
    
  3. substitute the amps, look for & and sub by &amp;

Creating Item List

Creating this list can be done from either the list of the copied file. Both times you simply need to grab (fa[^ ])*. This can be done for the one-line file as follows:

  1. Remove alias again (see above).
  2. Search pattern:

    . (fa[^ ]*)[^ ]*
    

    and replace for \1\r\n or $1\r\n if that doesn't work.

Creating div lines

To create the div lines simply match (. fa[^ ]*[^ ]*) and replace it with \1\r\n or $1\r\n if the backslash doesn't work. This will place newlines after each div entry.

Sign up to request clarification or add additional context in comments.

14 Comments

appreciate the answer. However, I only have to do this on win and I have Notepad++ on win, and no awk on win machine... I am trying to incorporate this into my build process.
Does notepad allow substitution using regex and backreferences?
notepad++ does notepad-plus-plus.org - the win Notepad is cr@p - I found an AWK scriptwriter online trying that now... tutorialspoint.com/execute_awk_online.php
@aggie I added a solution for notepad++, should work I think.
tried it... doesnt match for the regex, I pasted a picture above. I tried it on both the div code and this..  fa-adjust [&#xf042;]  fa-adn [&#xf170;]  fa-align-center [&#xf037;]  fa-align-justify [&#xf039;]  fa-align-left [&#xf036;]  fa-align-right [&#xf038;]  fa-ambulance [&#xf0f9;]  fa-anchor [&#xf13d;]  fa-android [&#xf17b;]  fa-angellist [&#xf209;]  fa-angle-double-down [&#xf103;] - trying to scrape this page fontawesome.io/icons
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.