3

I have a [textual] tree like this:

+---step-1
|   +---step_2
|   |   +---step3
|   |   \---step4
|   +---step_2.1
|   \---step_2.2
+---step1.2

Tree2

+---step-1
|   \---step_2
|   |   +---step3
|   |   \---step4
+---step1.2

This is just a small example, tree can be deeper and with more children and etc..

Right now I'm doing this:

for (int i = 0; i < cmdOutList.Count; i++)
{
    string s = cmdOutList[i];
    String value = Regex.Match(s, @"(?<=\---).*").Value;
    value = value.Replace("\r", "");
    if (s[1].ToString() == "-")
    {
        DirectoryNode p = new DirectoryNode { Name = value };
        //p.AddChild(f);
        directoryList.Add(p);
    }
    else
    {
        DirectoryNode f = new DirectoryNode { Name = value };
        directoryList[i - 1].AddChild(f);
        directoryList.Add(f);
    }
}

But this doesn't handle the "step_2.1" and "step_2.2"

I think I'm doing this totally wrong, maybe someone can help me out with this.

EDIT:

Here is the DirectoryNode class to make that a bit more clear..

public class DirectoryNode
{
    public DirectoryNode()
    {
        this.Children = new List<DirectoryNode>();
    }
    public DirectoryNode ParentObject { get; set; }
    public string Name;
    public List<DirectoryNode> Children { get; set; }

    public void AddChild(DirectoryNode child)
    {
        child.ParentObject = this;
        this.Children.Add(child);
    }
}
4
  • I think your tree is ambigious - or I misunderstand some part. Step3 is followed by step4 after a \--, but when the same happens for step2.1 we get step2.2? Commented Feb 10, 2011 at 15:19
  • Names in the tree a relavant, think them as random folder names.. Commented Feb 10, 2011 at 15:23
  • If there were a step_2.3 in your example, would it be preceded by +--- or \---? Commented Feb 10, 2011 at 15:31
  • Justin, if there would be a step_2.3 then it would be preceed with \--- and step_2.2 would preceed with +--- Commented Feb 10, 2011 at 15:35

2 Answers 2

3

If your text is that simple (just either +--- or \--- preceded by a series of |), then a regex might be more than you need (and what's tripping you up).

DirectoryNode currentParent = null;
DirectoryNode current = null;
int lastStartIndex = 0;

foreach(string temp in cmdOutList)
{
    string line = temp;

    int startIndex = Math.Max(line.IndexOf("+"), line.IndexOf(@"\");

    line = line.Substring(startIndex);

    if(startIndex > lastStartIndex) 
    {
        currentParent = current;
    }
    else if(startIndex < lastStartIndex)
    {
        for(int i = 0; i < (lastStartIndex - startIndex) / 4; i++)
        {
            if(currentParent == null) break;

            currentParent = currentParent.ParentObject;
        }
    }

    lastStartIndex = startIndex;

    current = new DirectoryNode() { Name = line.Substring(4) };

    if(currentParent != null)
    {
        currentParent.AddChild(current);
    }
    else
    {
        directoryList.Add(current);
    }
}
Sign up to request clarification or add additional context in comments.

11 Comments

OK, was testing it and it brakes when tree looks like on the tree2 then the step1.2 is still added as a step-1 child..
@value: Tree-2 looks to be improperly formed (at least based on the rules I wrote this using). Shouldn't +---step_2 be \---step_2, since it's the last node under its parent?
@Adam, yes sorry about that. was just a typo in the tree and has nothing to do with the problem.
@value: I see; See if the edit I just made takes care of the issue. I'm not in front of VS right now, so I can't test it myself, but it's a fairly minor change.
@Adam, the change you made breaks the code for the first tree example. It wont add step_2.1 and step_2.1 as childs of the step2.
|
0

Regex definitely looks unnecessary here, since the symbols in your markup language (that's what it is, after all) are both static and few. That is: Although the label names may vary, the tokens you need to look for when trying to parse them into relevant pieces will never be anything other than +---, \---, and ..

From a question I answered yesterday: "Regexes are extremely useful for describing a whole class of needles in a largely unknown haystack, but they're not the right tool for input that's in a very static format."

String manipulation is what you want for parsing this, especially since you're dealing with a recursive markup language, which can't be fully understood by regex anyway. I'd also suggest creating a tree-type data structure to store the data (which, surprisingly, doesn't seem to be included in the framework unless they added it after 2.0).

As an aside, your regex above seems to have an unnecessary \ in it, but that doesn't matter in most regex flavors.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.