What is wrong with my use of XPath in C#?

Question

I'm trying to do a bit of scraping in a c# application.

I am trying to access 4 pieces of information on the following page: https://smstestbed.nist.gov/vds/current

CreationTime
Availibility
Linear X and Y coords

The following function is where I am polling a live data feed from a remote machining tool. The problem I have is that whilst I have been able to print 'CreationTime' to a terminal, my XPath use is horrifically clunky and as far as This Link seems to suggest I should be able to do what I am doing in the 2 lines after my comment

"//This should be a far better way of accessing the data but for some reason the second line fails"

Unfortunately I am getting AvailabilityNode was Null.

public static void PollNIST()
    {
        string NISTSourceURL = "https://smstestbed.nist.gov/vds/current";  // Gives us a human friendly reference to the HTM
        //-------------------------------- Current (mostly) Working Version---------------------------------------------------------------------------------
        // Retrieve raw HTML
        var NISTTargetURL = NISTSourceURL;
        var NISTHttpClient = new HttpClient();
        var NISTXMLRaw = NISTHttpClient.GetStringAsync(NISTTargetURL);  // We now have all of the HTML / XML Data as a raw string
                                                                        //Console.WriteLine(MazXMLRaw.Result);                   // Prints the resulting HTML to a terminal as a debug tool    (Works)   
        XmlDocument CurNISTXML = new XmlDocument();               // Generate Blank XML Doc
        CurNISTXML.LoadXml(NISTXMLRaw.Result);                     // This (".result") passes the actual string?, should then be loaded into new XML file

        var elementHeader = CurNISTXML.GetElementsByTagName("Header");
        var curNISTHeader = elementHeader.Item(0);
        var creationTime = curNISTHeader.Attributes[0];  // We actually have the creationTime            
        string CurNISTTime = creationTime.InnerText; ; //      //*[@id="mtconnect content"]/ul/li[1]

        //This should be a far better way of accessing the data but for some reason the second line fails
        XmlNode AvailabilityNode = CurNISTXML.SelectSingleNode("/table[1]/tbody/tr[1]");  //*[@id="mtconnect content"]/table[1]/tbody/tr[1]/td[7] // Xpath Availability
        var CurNISTStatus = AvailabilityNode.InnerText; //      //*[@id="mtconnect content"]/ul/li[1]


        string CurNistX = ""; //      //*[@id="mtconnect content"]/table[5]/tbody/tr/td[7]
        string CurNistY = ""; //      //*[@id="mtconnect content"]/table[6]/tbody/tr/td[7]

        Console.WriteLine("-------BEGIN NIST DATA PACKET-------");
        Console.WriteLine("NIST Time  : " + creationTime.InnerText);
        Console.WriteLine("NIST Status: " + CurNISTStatus);    
        Console.WriteLine("NIST X Pos.: " + CurNistX);
        Console.WriteLine("NIST Y Pos.: " + CurNistY);
        Console.WriteLine("--------END NIST DATA PACKET--------");

        //var currentNIST = new NISTDataSet()// Create new instance ofNISTdata object
    }

Any ideas?

YOu are trying to parse an html webpage using xml. YOu are using the wrong URL. The data is avaiable as XML but you need to use s different URL. See : nist.gov/programs-projects/materials-data-curation-system — jdweng
– jdweng, Commented Nov 6, 2018 at 10:51
Are you sure? If I print the XML doc to console it's all there, and creationtime works just fine. — GigaJoules
– GigaJoules, Commented Nov 6, 2018 at 10:56
This is my first time writing c# so I'm getting stuck with things that are probably quite simple — GigaJoules
– GigaJoules, Commented Nov 6, 2018 at 11:07
The timestamp is gained only using the link given in the first line of the method — GigaJoules
– GigaJoules, Commented Nov 6, 2018 at 11:27

Michael Kay · Accepted Answer · 2018-11-06 12:25:25Z

1

The XPath expression

/table[1]/tbody/tr[1]

will succeed only if the outermost element of the document is a table element, which seems unlikely. I haven't tried to understand the logic of the page or of your code, but this definitely looks wrong. "/" at the start of a path expression selects from the root of the tree.

answered Nov 6, 2018 at 12:25

Michael Kay

165k11 gold badges97 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

GigaJoules Over a year ago

Yeah I though that, I've tried several different things there which is why I think that single slash is there

Mate Mrše Over a year ago

@GigaJoules Does '//table[1]/tbody/tr[1]' select what you wanted? It is unclear to me which element you are trying to select.

Michael Kay Over a year ago

@GigaJoules We see a lot of questions where people have scattered random punctuation around their XPath expressions in the hope that it will act as magic fairy dust. It's rarely an effective strategy. Save yourself time, read the manual.

GigaJoules Over a year ago

I'm looking to pull the word 'available' from the top right cell of the first table, and the 'value' number of tables 'linear x' and 'linear y'

GigaJoules Over a year ago

Going for the attribute ID was a far better option in the end, as each element has a unique identifier and only occurs once.

GigaJoules · Accepted Answer · 2018-11-13 12:42:07Z

So it turns out there was nothing wrong with how I was extracting the XML, only with my Paths.

public static void PollNIST()
        {
            string NISTSourceURL = "https://smstestbed.nist.gov/vds/current";  // Gives us a human friendly reference to the HTMl
            // string NistXmlUrl = // Someone on stackexchange is claiming that there is another url for the XML but viewsource says otherwise 
            //-------------------------------- Current (mostly) Working Version---------------------------------------------------------------------------------
            var NISTHttpClient = new HttpClient();
            var NISTXMLRaw = NISTHttpClient.GetStringAsync(NISTSourceURL);  // We now have all of the HTML / XML Data as a raw string
                                                                            //Console.WriteLine(MazXMLRaw.Result);                   // Prints the resulting HTML to a terminal as a debug tool    (Works)   
            XmlDocument CurNISTXML = new XmlDocument();               // Generate Blank XML Doc
            CurNISTXML.LoadXml(NISTXMLRaw.Result);                     // This (".result") passes the actual string?, should then be loaded into new XML file

            // Get CreationTime (WORKING!)
            XmlNodeList elementHeader = CurNISTXML.GetElementsByTagName("Header");
            XmlNode curNISTHeader = elementHeader.Item(0);
            XmlAttribute creationTime = curNISTHeader.Attributes[0];  // We now have the creationTime element          
            string CurNISTTime = creationTime.InnerText;  //      //*[@id="mtconnect content"]/ul/li[1]

            // Get availability (WORKING!)
            XmlNodeList nodeAvailability = CurNISTXML.GetElementsByTagName("Availability");
            XmlNode availability = nodeAvailability.Item(0); // I think this is maybe a bit of a hackish / improper way to do this?
            string curNISTStatus = availability.InnerText;

            //Get linear tool X Coord.
            XmlNodeList deviceStream = CurNISTXML.GetElementsByTagName("ComponentStream");
            XmlNode linearCompXStream = deviceStream.Item(4);
            string curNISTX = linearCompXStream.InnerText; //  We do not need to break down the nodes any further as the value is the only text within

            //Get Linear tool y Coord.            
            XmlNode linearCompYStream = deviceStream.Item(5);
            string curNISTY = linearCompYStream.InnerText; //  We do not need to break down the nodes any further as the value is the only text within


            Console.WriteLine("-------BEGIN NIST DATA PACKET-------");
            Console.WriteLine("NIST Time  : " + creationTime.InnerText);
            Console.WriteLine("NIST Status: " + curNISTStatus);    
            Console.WriteLine("NIST X Pos.: " + curNISTX);
            Console.WriteLine("NIST Y Pos.: " + curNISTY);
            Console.WriteLine("--------END NIST DATA PACKET--------");

            //var currentNIST = new NISTDataSet()// Create new instance ofNISTdata object
        }

works nicely.

Collectives™ on Stack Overflow

What is wrong with my use of XPath in C#?

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related