1

I'm trying to extract a certain character from a buffer that isn't ASCII. I'm reading in a file that contains movie names that have some non ASCII character sprinkled in it like so.

1|Tóy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Gét Shorty (1995)

I was able to pick off the lines that contained the non ASCII characters, but I'm trying to figure out how to get that particular character from the lines that have said non ASCII character and replace it with an ACSII character from the map I've made.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class Main {
    public static void main(String[] args) {

        HashMap<Character, Character>Char_Map = new HashMap<>();
        Char_Map.put('o','ó');
        Char_Map.put('e','é');
        Char_Map.put('i','ï');

        for(Map.Entry<Character,Character> entry: Char_Map.entrySet())
        {
            System.out.println(entry.getKey() + " -> "+ entry.getValue());
        }

        try
        {
            BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
            String contentLine= br.readLine();


            while(contentLine != null)
            {
                String[] contents = contentLine.split("\\|");
                boolean result = contents[1].matches("\\A\\p{ASCII}*\\z");

                if(!result)
                {
                    System.out.println(contentLine);

                    

                    //System.out.println();
                }

                contentLine= br.readLine();

            }
        }
        catch (IOException ioe)
        {
            System.out.println("Cannot open file as it doesn't exist");
        }
    }
}

I tried using something along the lines of:

if((contentLine.charAt(i) == something

But I'm not sure.

1

2 Answers 2

1

You can just use replaceAll. Put this in the while loop, so that it works on each line you read from the file. With this change, you won't need the split and if (... matches) anymore.

contentLine.replaceAll("ó", "o");
contentLine.replaceAll("é", "e");
contentLine.replaceAll("ï", "i");

If you want to keep a map, just iterate over its keys and replace with the values you want to map to:

Map<String, String> map = new HashMap<>();
map.put("ó", "o");
// ... and all the others

Later, in your loop reading the contents, you replace all the characters:

for (Map.Entry<String, String> entry : map.entrySet())
{
    String oldChar = entry.getKey();
    String newChar = entry.getValue();
    contentLine = contentLine.replaceAll(oldChar, newChar);
}

Here is a complete example:

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.Map;

public class Main {
    public static void main(String[] args) throws Exception {
        HashMap<String, String> nonAsciiToAscii = new HashMap<>();
        nonAsciiToAscii.put("ó", "o");
        nonAsciiToAscii.put("é", "e");
        nonAsciiToAscii.put("ï", "i");

        BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
        String contentLine = br.readLine();
        while (contentLine != null)
        {
            for (Map.Entry<String, String> entry : nonAsciiToAscii.entrySet())
            {
                String oldChar = entry.getKey();
                String newChar = entry.getValue();
                contentLine = contentLine.replaceAll(oldChar, newChar);
            }

            System.out.println(contentLine); // or whatever else you want to do with the cleaned lines

            contentLine = br.readLine();
        }
    }
}

This prints:

robert:~$ javac Main.java && java Main
1|Toy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Get Shorty (1995)
robert:~$
Sign up to request clarification or add additional context in comments.

3 Comments

hmm. perhaps because my file is encoded with ISO 8859 1 it doesn't work?
Perhaps. Try passing the charset to the FileReader like this: new FileReader("movie-names.txt", java.nio.charset.StandardCharsets.ISO_8859_1). In the long term I'd convert to UTF8 though.
Wait, I think I got it. Not entirely sure how though. I decided to compile in the command line and ran it and worked! Thank you very much mate.
0

You want to flip your keys and values:

Map<Character, Character> charMap = new HashMap<>();
charMap.put('ó','o');
charMap.put('é','e');
charMap.put('ï','i');

and then get the mapped character:

char mappedChar = charMap.getOrDefault(inputChar, inputChar);

To get the chars for a string, call String#toCharArray()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.