1

I have a log file with the following format:

3
1 2 3
1 2 3 
1 2 3
1 2 3
4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4

The single number states the width of the matrix, as they always have the same height. And there can be several matrixes within the same log file. I wan't to parse the matrix data into an array. I read the lines with scala.io.Source.fromFile(f).getLines.mkString, but I'm struggling to fill the array.

for(i <- 0 to 3) {
    for(j <- 0 to N-1) {
        matrix(i)(j) = ...
    }
}

If the lines would have been indexed the same way as I want the matrix to be, this wouldn't be so hard. But when the lines(n) contains whitespace, newlines.. What am I doing wrong?

1
  • Btw, why not fromFile(f).mkString instead of fromFile(f).getLines.mkString? Commented Sep 20, 2013 at 14:15

7 Answers 7

6

You can do this quite easily in a few simple steps:

  1. First break the input into a List of lines
  2. Then break each line into a List of Strings
  3. Then convert each String in the list to an Int
  4. And finally summarize this List of Lists of Lists to a List of Arrays (using a simple state machine)

The state machine is quite simple.

  1. It first reads the number of lines in the next matrix and memorizes it
  2. It then reads in that number of lines to the current matrix
  3. After it has read the memorized number of lines it adds the current matrix to the list of read matrixes and goes back to step 1

The code will look something like this:

    import io.Source

    def input = Source.fromString(
       """|3
          |1 2 1
          |1 2 2 
          |1 2 3
          |4
          |1 2 3 1
          |1 2 3 2
          |1 2 3 3
          |1 2 3 4""".stripMargin) // You would probably use Source.fromFile(...)

    type Matrix = List[Array[Int]]

    sealed trait Command
    case object ReadLength extends Command
    case class ReadLines(i: Int, matrix: Matrix) extends Command

    case class State(c: Command, l: List[Matrix])

    val parsedMatrixes = input.getLines().map(_.split(" ")).map(_.map(_.toInt)).foldLeft(State(ReadLength, List())) {
       case (State(ReadLength, matrixes), line) => State(ReadLines(line(0), List()), matrixes)
       case (State(ReadLines(1, currentMatrix), matrixes), line) => State(ReadLength,((line::currentMatrix).reverse)::matrixes)
       case (State(ReadLines(i, currentMatrix), matrixes), line) => State(ReadLines(i - 1, line::currentMatrix), matrixes)
    }.l.reverse

And gives you the following result:

parsedMatrixes: List[Matrix] = 
List(
  List(Array(1, 2, 1), 
       Array(1, 2, 2), 
       Array(1, 2, 3)), 
  List(Array(1, 2, 3, 1), 
       Array(1, 2, 3, 2), 
       Array(1, 2, 3, 3), 
       Array(1, 2, 3, 4)))

Please be aware that this cannot be the final solution because it does not have any error handling. And it does not free up its resources (closing the source).

Sign up to request clarification or add additional context in comments.

20 Comments

Wow. Really wish I knew Scala a bit better. Thanks so much! =)
How can I use stripMargin() when using Source.fromFile(...)? It seems dependent on using that to work properly.
stripMargin is only a trick to put multi line strings into a Scala file. When working with files, you might use def input=Source.fromFile("path/to/file") or something similar.
Yes, I'm using scala.io.Source.fromFile(file), but it doesn't work as it does with your fromString(). Any idea why? When I try to access the second matrix array, I get an error.
Hm, I just reran the code with def input=Source.fromFile("""C:\Users\sschwets\Desktop_tmp\matrix.txt""") and it worked without problems. I got exactly the same result as in the example above. The type of input is scala.io.BufferedSource with the file. This might be the first thing to check.
|
4

I think the state machine is not needed; the following will give you a data structure equivalent in shape and content to the state machine solution:

import scala.io.Source

val input = Source.fromString(
  """|3
     |1 2 1
     |1 2 2
     |1 2 3
     |3 2 1
     |4
     |1 2 3 1
     |1 2 3 2
     |1 2 3 3
     |1 2 3 4""".stripMargin)

val matrices = input.getLines.grouped(5).map {
  case List(w, l1, l2, l3, l4) =>
    // feel free to use the value of `w.toInt` to do an assertion on the 4 lines
    List(l1, l2, l3, l4) map { _.split(' ').map(_.toInt).toList }
}

for (matrix <- matrices)
  println(matrix.map(_.mkString("[", ", ", "]")).mkString("\n"))

// prints:
// [1, 2, 1]
// [1, 2, 2]
// [1, 2, 3]
// [3, 2, 1]
// [1, 2, 3, 1]
// [1, 2, 3, 2]
// [1, 2, 3, 3]
// [1, 2, 3, 4]

3 Comments

Nice, your code is quite short. But you can make it even shorter. val matrices = input.getLines.grouped(5).map(.tail.map(.split(" ").map(_.toInt).toList))
Yes; however, I'd argue the shorter version would 1) (arguably) be less readable and 2) would not allow the capture of the matrix width value in case it's needed for input validation.
The state machine is needed when parsing matrices with different numbers of rows. If it is guaranteed that each matrix has the same number of rows, then no state machine is needed.
0

stefan's code is a great example of a functional state machine but I personally would prefer something like this

import io.Source

val input = Source.fromString(
   """|3
      |1 2 1
      |1 2 2
      |1 2 3
      |1 2 4
      |4
      |1 2 3 1
      |1 2 3 2
      |1 2 3 3
      |1 2 3 4""".stripMargin)

type Matrix = List[List[Int]]

def readMatrix(list: List[Int], height: Int, width: Int): Matrix =  {
  list.take(height * width).grouped(width).toList
}

def readMatrices(list: List[Int]): List[Matrix] = {
  if (list.isEmpty) List()
  else readMatrix(list.tail, 4, list.head) :: readMatrices(list.drop(4 * list.head + 1))
}

def printMatrix(matrix: Matrix) = println(matrix.map(_.mkString("", ", ", "")).mkString("", "\n", "\n"))

val parsedMatrices = readMatrices(input.mkString.split("\\s+").map(_.toInt).toList)
parsedMatrices.foreach(printMatrix)

Comments

0

How about the following recursive solution?

val fixedHeight = 4

def readMatrices(lines: List[String]): List[Array[Array[Int]]] = {

  def readMatrices0(lines: List[String], result: ListBuffer[Array[Array[Int]]]): List[Array[Array[Int]]] = lines match {
    case None => result.toList
    case head :: tail =>
      val n = head.toInt 
      val mat = readMatrix(tail.take(fixedHeight))
      // check that the matrix has width n:
      require(mat.forall(_.length == n), "Incorrect width")
      readMatrices0(tail.drop(fixedHeight), result + mat)
  }

  def readMatrix(lines: List[String]): Array[Array[Int]] = 
    lines.map(_.split(' ').map(_.toInt).toArray

  readMatrices0(lines, new ListBuffer)
}

val mats = readMatrices(scala.io.Source.fromFile(f).getLines)

Comments

0

OK, I think I have a nice one:

  • It deals with matrices with different number of rows (I bet there are matrices with different row numbers in the posters log files :-)
  • It does not parse the whole input at once, but step by step (and uses hence not that much memory)
  • It allows validation of the input

Preparation

The first part is nearly the same as with my first answer. But I added zipWithIndex to preserve the line numbers of the input.

    import io.Source

    def rawInput = Source.fromString(
       """|3
          |1 2 1
          |1 2 2 
          |1 2 3
          |4
          |1 2 3 1
          |1 2 3 2
          |1 2 3 3
          |1 2 3 4""".stripMargin) // You would probably use Source.fromFile(...)

    type Matrix = List[Array[Int]]

    def parsedInput = rawInput.getLines().map(_.split(" ")).map(_.map(_.toInt)).zipWithIndex

Version with Iterator

This version uses a classical Java iterator with mutable state. It is not in functional style, but should run quite fast:

    def matrixIterator= new Iterator[Matrix] {
      val input = parsedInput

      var expectedNumerOfRows : Option[Int] = None

      override def hasNext = input.hasNext

      override def next() : Matrix = {
        import collection.mutable.MutableList
        var matrix : MutableList[Array[Int]] = MutableList()
        while (input.hasNext) {
          val (currentLine, lineNumber)=input.next()
          if (currentLine.size==1){
            expectedNumerOfRows=Some(currentLine.head)
            return matrix.toList
          }else{
            matrix+=currentLine
            expectedNumerOfRows.filter(_ != currentLine.size).foreach{ expected : Int =>
              //println(String.format("Warning in line %s: Expected %s columns, got %s", lineNumber+1, expected, currentLine.size))
            }
          }
        }
        return matrix.toList
      }
    }.next()

Version with Stream

This version uses Scala streams.It is recursive (although not tail recursive) and does not use mutable variables. It should be a little bit slower than the Iterator version, but is much more readable:

    def matrixStream : Stream[Matrix] = {
      def matrix(input : Iterator[(Array[Int], Int)], numberOfColumns : Int, currentMatrix : Matrix) : Stream[Matrix] = {
        if (!input.hasNext) {
          currentMatrix #:: Stream.empty
        }else{
          val (line, number) = input.next()
          if (line.size == 1) {
            currentMatrix.reverse #:: matrix(input, line.head, List.empty)
          }else{
            //if (numberOfColumns != line.size) println(...)
            matrix(input, numberOfColumns, line :: currentMatrix)
          }
        }
      }
      matrix(parsedInput,0,List()).drop(1)
    }

Comments

-1

Even without regular expressions:

for (i <- 0 to 3) {
  matrix(i) = line.split(" ")
}

1 Comment

But then I only get the first 4 numbers, it doesn't iterate over the entire line. And it also has some problems with newlines... :-/ for(i <- 0 to 3) { puzzle.clues(i) = lines.split(" ").map(_.toInt); } results in every row like "1231"
-1

Regular expressions can help.

val space = " ".r
val arrayOfNumbersAsStrings = space.split(line)
val arrayOfNumbersAsInts = arrayOfNumbersAsStrings.map(_.toInt)

UPD

val arrayOfNumbersAsStrings = space.split(' ')
val arrayOfNumbersAsInts = arrayOfNumbersAsStrings.map(_.toInt)

4 Comments

This just exemplifies how to spilt a line and convert the resulting items to Ints. It does not however attempt to solve the problem of the OP. Also, line.split(" ") will do the job—regular expressions are not needed at all here.
Don't you know that line.split(" ") uses regular expression? And it recompiles the pattern " " every time. However, if you suggest to use line.split(' ') then yes, that would be much faster.
OK, line.split(' ') then—my point was that mentioning regexps in a simple case like this seems like an overkill :)
Well, creating a state machine just for reading a couple of ints ... Regular expressions is a powerful tool for any sort of parsing. If @timmyc31 had slightly different format he would easily parse it with RegEx'es.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.