1

I have a txt file with the following sample data:

host{
      Entry {
          id: "foo"
      }
       Entry {
          id: "bar"
      }
    }

port{
      Entry {
          id: "lorem"
      }
       Entry {
          id: "ipsum"
      }
    }

It has +300 of those Entry values. I'd like to read the file and extract the id values belonging to the port section. It's not valid JSON so I can't use the json decoder, is there any other way of extracting the values?

2
  • The text file in question has multiple id values, but I'm only interested in the ones in the port section Commented May 5, 2015 at 12:08
  • 1
    If it's more complex than this, try writing a parser/lexer: blog.gopheracademy.com/advent-2014/parsers-lexers Commented May 5, 2015 at 16:41

2 Answers 2

1

If the structure is the same throughout and all you want is the id values you can do something like this (on the Playground):

package main

import (
    "fmt"
    "strings"
)

func main() {
    // This will work only if ids don't have spaces
    fields := strings.Fields(input1)
    for i, field := range fields {
        if field == "id:" {
            fmt.Println("Got an id: ", fields[i+1][1:len(fields[i+1])-1])
        }
    }
    fmt.Println()

    // This will extract all strings enclosed in ""
    for i1, i2 := 0, 0;; {
        i := strings.Index(input2[i1:], "\"") // find the first " starting after the last match
        if i > 0 { // if we found one carry on
            i1 = i + 1 + i1 // set the start index to the absolute position in the string
            i2 = strings.Index(input2[i1:], "\"") // find the second "
            fmt.Println(input2[i1 : i1+i2]) // print the string between ""
            i1 += i2 + 1 // set the new starting index to after the last match
        } else { // otherwise we are done
            break
        }
    }


    // Reading the text line by line and only processing port sections
    parts := []string{"port{", "  Entry {", "      id: \"foo bar\"", "  }", "   Entry {", "      id: \"more foo bar\"", "  }", "}"}        
    isPortSection := false
    for _, part := range parts {
        if string.HasPrefix(part, "port"){
            isPortSection = true
        }
        if string.HasPrefix(part, "host"){
            isPortSection = false
        }
        if isPortSection && strings.HasPrefix(strings.TrimSpace(part),"id:") {
            line := strings.TrimSpace(part)
            fmt.Println(line[5:len(line)-1])
        }
    }
}

var input1 string = `port{
  Entry {
      id: "foo"
  }
   Entry {
      id: "bar"
  }
}`

var input2 string = `port{
  Entry {
      id: "foo bar"
  }
   Entry {
      id: "more foo bar"
  }
}`

Prints:

Got an id:  foo
Got an id:  bar

foo bar
more foo bar

Instead of printing them in the loop you can stick them into a slice or map or do whatever you want/need to. And of course instead of using the string literal you read in the lines from your file.

Sign up to request clarification or add additional context in comments.

6 Comments

Simple solution, but only works if the value of the ids does not contain spaces.
Good point. Added a second solution that will work with ids containing spaces as well.
Is there a way to do the above when u have multiple id values but you're only interested in the ones contained in the "port" section?
Sure, how you'd do that depends on the format your strings are in. If you have each section as a string like in my sample you can then simply test with if strings.hasPrefix(input, "port"). Process the string when true and skip it otherwise.
Not sure how it would work with the sample text I've provided..care to share?
|
1

I believe text/scanner might be very useful here. It's not plug&play, but will allow you to tokenise input and will parse your strings nicely (spaces, escaped values etc.). A quick proof of concept, scanner with a simple state machine to capture all id: {str} patterns which are in Entry section:

var s scanner.Scanner
s.Init(strings.NewReader(src))

// Keep state of parsing process
const (
    StateNone = iota
    StateID
    StateIDColon
)
state := StateNone

lastToken := ""        // last token text
sections := []string{} // section stack

tok := s.Scan()
for tok != scanner.EOF {
    txt := s.TokenText()
    switch txt {
    case "id":
        if state == StateNone {
            state = StateID
        } else {
            state = StateNone
        }
    case ":":
        if state == StateID {
            state = StateIDColon
        } else {
            state = StateNone
        }
    case "{":
        // Add section
        sections = append(sections, lastToken)
    case "}":
        // Remove section
        if len(sections) > 0 {
            sections = sections[0 : len(sections)-1]    
        }
    default:
        if state == StateIDColon && sections[0] == "port" {
            // Our string is here
            fmt.Println(txt)
        }
        state = StateNone
    }
    lastToken = txt
    tok = s.Scan()
}

You can play it here. This surely requires some more work if you need validate the input structure etc. but seems like a good starting point to me.

4 Comments

thanks for the contrib...I'm not a fan of "magic numbers" which is what I see in your code, what does "state=1, state=2, state = 0" mean? makes it quite hard for someone to look at the code and figure out what it's doing or not doing..
Sure, I just wanted to produce more compact example... or I was simply lazy. I've updated my answer (and play link) with constants which should better explain the behaviour. As I said, this not validating the structure of you data, so IDs lying outside of your Entry blocks will be parsed as well. You'd need more states to be able to validate that.
cool..thanks..what I actually now need is way to extract id values from only one section..
You will need to keep track of the current section stack (tracing curly brackets). I've updated my answer for you, so you can, for example, take only IDs where first section on stack is port.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.