1

I have some text in which I want to replace with an actual link.

The text looks like this:

Some text here 
[...]
-   CRAN Task View: [Bayesian](Bayesian.html)
-   CRAN Task View: [Cluster](Cluster.html)
-   CRAN Task View: [Databases](Databases.html)
-   CRAN Task View: [Environmetrics](Environmetrics.html)
[...]
End of text here

But as you can see, there is no HTML link to the pages. E.g., Bayesian.html should be http://cran.rstudio.com/web/views/Bayesian.html

The final result should be

Some text here 
[...]
-   CRAN Task View: [Bayesian](http://cran.rstudio.com/web/views/Bayesian.html)
-   CRAN Task View: [Cluster](http://cran.rstudio.com/web/views/Cluster.html)
-   CRAN Task View: [Databases](http://cran.rstudio.com/web/views/Databases.html)
-   CRAN Task View: [Environmetrics](http://cran.rstudio.com/web/views/Environmetrics.html)
[...]
End of text here

So far, I was able to "subset" my text file using the following command:

grep "CRAN Task View: \[" $FILE

But when I try to pipe to this:

sed -e 's|\\([a-zA-Z]*\\)\\.html|http://cran.rstudio.com/web/views/\\1.html|'

It doesn't work. How would it be possible to sed inline from the grep command?

I'm on macOS Mojave.

2 Answers 2

4

This sed should work for you:

sed -E '/CRAN Task View:/s~\(([^)]+)\)~(http://cran.rstudio.com/web/views/\1)~' file

Some text here
[...]
-   CRAN Task View: [Bayesian](http://cran.rstudio.com/web/views/Bayesian.html)
-   CRAN Task View: [Cluster](http://cran.rstudio.com/web/views/Cluster.html)
-   CRAN Task View: [Databases](http://cran.rstudio.com/web/views/Databases.html)
-   CRAN Task View: [Environmetrics](http://cran.rstudio.com/web/views/Environmetrics.html)
[...]
End of text here

RegEx Details:

  • /CRAN Task View:/: Only if line matches text "CRAN Task View:"
  • s~: Substitute
  • \(: Match a (
  • ([^)]+): Match 1+ non-) characters in capture group #1
  • \): Match a )
  • (http://cran.rstudio.com/web/views/\1) is replacement that creates a link using back-reference #1
Sign up to request clarification or add additional context in comments.

Comments

1

sed -e 's|\\([a-zA-Z]*\\)\\.html|http://cran.rstudio.com/web/views/\\1.html|' It doesn't work.

This is a quoting issue. Inside single quotes '...' backslashes \ need no escaping. Bash parses '\\(' as \\( and sends it to sed which interprets it as the literal string \(. Therefore, you are replacing the literal string " \(someLetters\)\.html " which never occurs in your file.

You probably meant sed 's|\([a-zA-Z]*\)\.html|http://cran.rstudio.com/web/views/\1.html|'.

By the way: sed can also do the grep part for you. Also, with -E you need less backslashes. But since you append the .html again, you don't need the group \(....\) in the first place.

sed -E -n '/CRAN Task View: \[/s|[a-zA-Z]*\.html|http://cran.rstudio.com/web/views/&|p'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.