2

Lets say : I have a user input "placeofjo.blogspot.com"

My code extracts links from this website and place the links in the text file.

Now the text file has this contents :

http://www.twitter.com/jozefinfin/
http://www.facebook.com/jozefinfin/
http://placeofjo.blogspot.com/2008_08_01_archive.html
http://placeofjo.blogspot.com/2008_09_01_archive.html
http://placeofjo.blogspot.com/2008_10_01_archive.html
http://placeofjo.blogspot.com/2008_11_01_archive.html
http://placeofjo.blogspot.com/2008_12_01_archive.html
http://placeofjo.blogspot.com/2009_01_01_archive.html
http://placeofjo.blogspot.com/2009_02_01_archive.html
http://placeofjo.blogspot.com/2009_03_01_archive.html
http://placeofjo.blogspot.com/2009_04_01_archive.html
http://placeofjo.blogspot.com/2009_05_01_archive.html
http://placeofjo.blogspot.com/2009_06_01_archive.html
http://placeofjo.blogspot.com/2009_07_01_archive.html
http://placeofjo.blogspot.com/2009_08_01_archive.html
http://placeofjo.blogspot.com/2009_09_01_archive.html
http://placeofjo.blogspot.com/2009_10_01_archive.html
http://placeofjo.blogspot.com/2009_11_01_archive.html
http://placeofjo.blogspot.com/2010_01_01_archive.html
http://placeofjo.blogspot.com/2010_02_01_archive.html
http://placeofjo.blogspot.com/2010_04_01_archive.html
http://placeofjo.blogspot.com/2010_06_01_archive.html
http://placeofjo.blogspot.com/2010_07_01_archive.html
http://placeofjo.blogspot.com/2010_08_01_archive.html
http://placeofjo.blogspot.com/2010_10_01_archive.html
http://placeofjo.blogspot.com/2010_11_01_archive.html
http://placeofjo.blogspot.com/2011_01_01_archive.html
http://placeofjo.blogspot.com/2011_02_01_archive.html
http://placeofjo.blogspot.com/2011_03_01_archive.html
http://endlessdance.blogspot.com
http://blogskins.com/me/aaaaaa
http://weheartit.com

I would like to delete

http://www.twitter.com/jozefinfin/
http://www.facebook.com/jozefinfin/
http://endlessdance.blogspot.com
http://blogskins.com/me/aaaaaa
http://weheartit.com

and left it with only the strings which is only similar to the user's input. How do I do this?

Desired contents of the text file :

 http://placeofjo.blogspot.com/2008_08_01_archive.html
    http://placeofjo.blogspot.com/2008_09_01_archive.html
    http://placeofjo.blogspot.com/2008_10_01_archive.html
    "                    "
    "                    "

4 Answers 4

1
  1. Read the file Line by Line
  2. Check the line if it contains User input
  3. If so, write it to new File
Sign up to request clarification or add additional context in comments.

Comments

0

Assuming that you can hold the whole list of links in memory at the same time, which you likely can since its the links from a website...

  1. Read in the file, split on newlines, and generate a List of links.
  2. Filter the list to remove any non-matching links
  3. Write the resulting filtered list back to the file, replacing the old contents of the file

For the matching in the filter, my thought would be to use

string.indexOf(inputToMatch) > 0 // it matches

Comments

0

Instead of building a text file and then filtering it. Do the filter when you parse the web page. Just look for links that match your criteria and only write good links to the file.

Comments

0

Here is regex way to solve this problem.. But , you should not use this solution with big files..

import java.io.File;
import java.io.IOException;
import java.util.regex.Pattern;
import org.apache.commons.io.FileUtils;

public class FileReplacer {


    public static void main(String[] args) {
        replaceFileContent();
    }

    public static void replaceFileContent() {
        try {
            String allStr = FileUtils.readFileToString(new File("c:/temp/data.txt"));
            Pattern pattern =Pattern.compile("^(?!http://placeofjo\\.blogspot\\.com/.*$).+$(\\r\\n)?", Pattern.MULTILINE);
            String newAllStr = pattern.matcher(allStr).replaceAll("");
            FileUtils.writeStringToFile(new File("c:/temp/newdata.txt"), newAllStr);

        } catch (IOException e) {
            // TODO Auto-generated catch block
            throw new RuntimeException(e);
        }
    }
}

2 Comments

if the pattern was compiled once and then used in a loop instead of multiline, would that be so bad preformance wise? Thats what I would do.
@ArtB well, in that case, performance would not degrade too much.. Because, only a line would be considered, but again if your line contains thousands of character, it would not be a good choice..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.