0

I have a program which gets a very big txt data and changes the order of some columns in this txt data. For more details about what it does exactly see my question here. I use a list with maps and I can imagine that this is too much for the java virtual machine since the txt file has 400,000 entries but I have no idea what do else. I have tried it with a smaller txt file and then it works fine. Otherwise it runs for more than an hour and then I get an OutOfMemoryError.

Here is my code:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

public class Final {

public static void main(String[] args) {

    String path = "C:\\\\\\\\Users\\\\\\\\Ferid\\\\\\\\Downloads\\\\\\\\secdef\\\\\\\\secdef.txt";

    File file = new File(path);

    new Final().updateFile(file);
}

private void updateFile(File file) {

    List<String> allRows = getAllRows(file);

    String[] baseRow = allRows.get(0).split("\\|");

    List<String> columns = getBaseColumns(baseRow);
    System.out.println(columns.size());

    appendNewColumns(allRows, columns);
    System.out.println(columns.size());

    List<Map<String, String>> mapList = convertToMap(allRows, columns);

    List<String> newList = new ArrayList<String>();

    appendHeader(columns, newList);

    appendData(mapList, newList, columns);

    String toPath = "C:\\\\\\\\Users\\\\\\\\Ferid\\\\\\\\Downloads\\\\\\\\secdef\\\\\\\\finalz2.txt";

    writeToNewFile(newList, toPath);

}

/**
 * Gibt alle Zeilen aus der Datei zurück.
 */
private static List<String> getAllRows(File file) {

    List<String> allRows = new ArrayList<>();
    BufferedReader reader = null;
    try {
        reader = new BufferedReader(new FileReader(file));
        String row = null;
        int i = 0;
        while ((row = reader.readLine()) != null) {
            allRows.add(row);

        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    return allRows;
}

/**
 * Gibt die Hauptspalten aus der 1. Zeile zurück.
 */
private static List<String> getBaseColumns(String[] baseRow) {
    List<String> columns = new ArrayList<>();
    for (String rowEntry : baseRow) {
        String[] entry = rowEntry.split("=");
        columns.add(entry[0]);
    }
    return columns;
}

/**
 * Fügt alle neuen Spalten hinzu.
 */
private static void appendNewColumns(List<String> rows, List<String> columns) {
    for (String row : rows) {
        String[] splittedRow = row.split("\\|");
        for (String column : splittedRow) {
            String[] entry = column.split("=");
            if (columns.contains(entry[0])) {
                continue;
            }
            columns.add(entry[0]);
        }
    }
}

/**
 * Konvertiert die Listeneinträge zu Maps.
 */
private static List<Map<String, String>> convertToMap(List<String> rows, List<String> columns) {
    List<Map<String, String>> mapList = new ArrayList<>();
    for (String row : rows) {
        Map<String, String> map = new TreeMap<>();
        String[] splittedRow = row.split("\\|");
        List<String> rowList = Arrays.asList(splittedRow);
        for (String col : columns) {
            String newCol = findByColumn(rowList, col);
            if (newCol == null) {
                map.put(col, "null");
            } else {
                String[] arr = newCol.split("=");
                map.put(col, arr[1]);
            }
        }
        mapList.add(map);
    }
    return mapList;

}

/**
 * 
 */
private static String findByColumn(List<String> row, String col) {
    return row.stream().filter(o -> o.startsWith(col)).findFirst().orElse(null);
}

/**
 * Fügt die Header-Zeile in die neue Liste hinzu.
 */
private static void appendHeader(List<String> columns, List<String> list1) {
    String header = "";
    for (String column : columns) {
        header += column + "|";
    }
    list1.add(header + "\n");
}

/**
 * Fügt alle Daten in die entsprechenden neuen Dateien hinzu.
 */
private static void appendData(List<Map<String, String>> mapList, List<String> list1, List<String> columns) {
    for (Map<String, String> entry : mapList) {
        String line = "";
        for (String key : columns) {
            // for (String key : entry.keySet()) {
            line += entry.get(key) + "|";
        }

        list1.add(line + "\n");
    }
}

/**
 * Schreibt alle Werte in die neue Datei.
 */
private static void writeToNewFile(List<String> list, String path) {
    FileOutputStream out = null;
    try {
        out = new FileOutputStream(new File(path));
        for (String line : list) {
            out.write(line.getBytes());
        }
        out.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

}
2
  • How much memory did you assign to Java? Commented Aug 5, 2019 at 11:00
  • @JFMeier where can I see this? Commented Aug 5, 2019 at 11:00

3 Answers 3

4

In cases like this, it makes sense, if at all possible, to read the file line by line and handle each line seperately, and NOT keep the whole file in memory.

Currently your code looks like this:

  1. read all lines into list L
  2. for each row in L find all columns
  3. convert rows in L to maps (using "null" string inside the map instead of not setting a value!! This is probably what really bites you in the end!)
  4. serialize the maps as rows

I call bs on just increasing the available memory, then it will just fail later. You have a general problem with memory usage and performance here. Let me propose a different way:

 1. for each line read (don't read the whole file at once!):
    1.1 find columns, collect in List C
 2. for each line read (again, don't read the whole file at once, do it as you read):
    2.2 for each column in C, write value if the row contains it, or null
    2.3 append to the result file (also don't keep the result in memory!)

So somewhat like this:

  BufferedReader reader = null;
  BufferedWriter writer = null;
    try {
        reader = new BufferedReader(new FileReader(file));
        String row = null;
        int i = 0;
        List<String> columns = new ArrayList<>();
        while ((row = reader.readLine()) != null) {
            columns.addAll(getColumns(row));

        }

        reader = new BufferedReader(new FileReader(file));
        writer = new BufferedWriter(new FileWriter(outFile));
        int i = 0;
        while ((row = reader.readLine()) != null) {
            writeRow(row, columns, writer);

        }
    } catch (IOException e) {
        e.printStackTrace();
    }
Sign up to request clarification or add additional context in comments.

Comments

3

You can specify the maximum memory JVM can use by specyfying: -Xmx

eg. -Xmx8G, Use M or G

Comments

1

We aren't really able to give a concrete recommendation for the amount of memory to allocate, because that will depend greatly on your server setup, the size of your user base, and their behaviour. You will need to find a value that works for you, i.e. no noticeable GC pauses, and no OutOfMemory errors.

For reference, the 3 most common parameters used to change the memory (heap) allocation are:

  • Xms - the minimum size of the heap
  • Xmx - the maximum size of the heap
  • XX:MaxPermSize - the maximum size of PermGen (this is not used in Java 8 and above)

If you do decide to increase the memory settings, there are a few general guidelines to follow.

  • Increase Xmx in small increments (eg 512mb at a time), until you no longer experience the OutOfMemory error. This is because increasing the heap beyond the capabilities of your server to adequately Garbage Collect can cause other problems (eg performance/freezing)
  • If your error is java . lang . OutOfMemoryError : PermGen space, increase the -XX:MaxPermSize parameter in 256mb increments until the error stops occurring.
  • If your error does not reference PermGen, there is no need to increase it. In a simplistic explanation, PermGen is used to store classes, and is generally quite static in size, and has been removed in Java 8. More info here. Consider setting Xms and Xmx to the same value, as this can decrease the time GC takes to occur, as it will not attempt to resize the heap down on each collection.

If you start Confluence as a Service on Windows, then you should not use these instructions. Instead, refer to the "Windows Service" section below.

You should only follow these instructions if you are starting Confluence via the batch file. The batch file is not used when Confluence is started as a Service.

To Configure System Properties in Windows Installations When Starting from the .bat File,

  1. Shutdown Confluence
  2. From /bin (Stand-alone) or /bin (EAR-WAR installation), open setenv.bat.
  3. Find the section

CATALINA_OPTS="-Xms1024m -Xmx1024m -XX:+UseG1GC $CATALINA_OPTS" in Confluence 5.8 or above

CATALINA_OPTS="$CATALINA_OPTS -Xms1024m -Xmx1024m -XX:MaxPermSize=256m -XX:+UseG1GC" in Confluence 5.6 or 5.7

JAVA_OPTS="-Xms256m -Xmx512m -XX:MaxPermSize=256m in previous versions

  1. Xmx is maximum, Xms is minimum, and MaxPermSize is PermGen.
  2. Start Confluence

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.