1

Update

I want to have an expression (XPath, or Regex Expression, similar) that can match an XML element with a particular namespace. For example, I want to locate the value of the link element (e.g. I need the http://url within <b:link>http://url</b:link>) shown below. However, the namespace prefix varies depending on different xml files as shown in cases 1-3.

Considering the allowed character for namespace prefix (e.g. is any character allowed/valid) , could anyone provide the solution (XPath, Regex Expression or similar?

Please note that because the xml file is unknown, thus, the namespace and prefix are unknown until runtime. Does it mean I cannot use this XDocument/XmlDocument, because it requires namespace to be known in the code.

Update

Case 1

<A xmlns:b="link">
<b:link>http://url
</b:link>
</A>

Case 2

<A xmlns="link">
<link>http://url
</link>
</A>

Case 3

<A xmlns:a123="link">
<a123:link>http://url
</a123:link>
</A>

Please note that the url within the link element could be any http url, and unknown until runtime.

Update

Please mark up my question.

0

2 Answers 2

6

You need to know the namespaces you will be dealing with and register them with an XmlNamespaceManager. Here is an example:

    XmlDocument doc = new XmlDocument();
    doc.LoadXml("<A xmlns:b='link'><b:Books /></A>");
    XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
    nsmgr.AddNamespace("b", "link");

    XmlNodeList books = doc.SelectNodes("//b:Books", nsmgr);

And if you want to do this using XDocument, which I would recommend for its brevity, here is how:

    XDocument xDoc = XDocument.Parse("<A xmlns:b='link'><b:Books /></A>");
    XNamespace ns = "link";
    var books = xDoc.Descendants(ns + "Books");

If you do not know the namespace(s) ahead of time, see this post which shows how to query across an XDocument using only the local name. Here's an example:

XDocument xDoc = XDocument.Parse("<A xmlns:b='link'><b:Books /></A>");
var books = xDoc.Descendants().Where(e => e.Name.LocalName.ToLower() == "books");
Sign up to request clarification or add additional context in comments.

2 Comments

Because the xml file is unknown, thus, the namespace and prefix are unknown until runtime. Does it mean I cannot use this way.?
Please see my updated answer, it should work when you don't know the namespaces. XDocument is a nice API for processing Xml.
1

Use an XML parser, not a regex.

That being said, you could use:

<(?:(.+?):)?Books />

And the namespace would be in captured group 1.

In fact, I'd more strongly recommend you use

<(?:([^<>]+?):)?Books />

To prevent mistakes like matching over another set of XML tags (who would use <> in a namespace anyway?!)

4 Comments

If you have XDocument/XMLDocument then regex is entirely unnecessary, that's the point of such parsers. I'm not a C# guru so someone else will have to help you out on the usage of that class.
I changed the element to <a123:link>url here </a123:link>, rather than <a123:link />, what changes need to be made? Please refer to my updated post on the cases. Thanks
What do you want to extract? the URL? (your question still says you want Book tags). I'll give you a regex to do that if you want, but I'd recommend @GemCer's answer over mine as being much, much, much better for parsing XML than regex.
Thanks! @GemCer has a better solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.