1

I'm trying to do a bit of scraping in a c# application.

I am trying to access 4 pieces of information on the following page: https://smstestbed.nist.gov/vds/current

  • CreationTime
  • Availibility
  • Linear X and Y coords

The following function is where I am polling a live data feed from a remote machining tool. The problem I have is that whilst I have been able to print 'CreationTime' to a terminal, my XPath use is horrifically clunky and as far as This Link seems to suggest I should be able to do what I am doing in the 2 lines after my comment

"//This should be a far better way of accessing the data but for some reason the second line fails"

Unfortunately I am getting AvailabilityNode was Null.

public static void PollNIST()
    {
        string NISTSourceURL = "https://smstestbed.nist.gov/vds/current";  // Gives us a human friendly reference to the HTM
        //-------------------------------- Current (mostly) Working Version---------------------------------------------------------------------------------
        // Retrieve raw HTML
        var NISTTargetURL = NISTSourceURL;
        var NISTHttpClient = new HttpClient();
        var NISTXMLRaw = NISTHttpClient.GetStringAsync(NISTTargetURL);  // We now have all of the HTML / XML Data as a raw string
                                                                        //Console.WriteLine(MazXMLRaw.Result);                   // Prints the resulting HTML to a terminal as a debug tool    (Works)   
        XmlDocument CurNISTXML = new XmlDocument();               // Generate Blank XML Doc
        CurNISTXML.LoadXml(NISTXMLRaw.Result);                     // This (".result") passes the actual string?, should then be loaded into new XML file

        var elementHeader = CurNISTXML.GetElementsByTagName("Header");
        var curNISTHeader = elementHeader.Item(0);
        var creationTime = curNISTHeader.Attributes[0];  // We actually have the creationTime            
        string CurNISTTime = creationTime.InnerText; ; //      //*[@id="mtconnect content"]/ul/li[1]

        //This should be a far better way of accessing the data but for some reason the second line fails
        XmlNode AvailabilityNode = CurNISTXML.SelectSingleNode("/table[1]/tbody/tr[1]");  //*[@id="mtconnect content"]/table[1]/tbody/tr[1]/td[7] // Xpath Availability
        var CurNISTStatus = AvailabilityNode.InnerText; //      //*[@id="mtconnect content"]/ul/li[1]


        string CurNistX = ""; //      //*[@id="mtconnect content"]/table[5]/tbody/tr/td[7]
        string CurNistY = ""; //      //*[@id="mtconnect content"]/table[6]/tbody/tr/td[7]

        Console.WriteLine("-------BEGIN NIST DATA PACKET-------");
        Console.WriteLine("NIST Time  : " + creationTime.InnerText);
        Console.WriteLine("NIST Status: " + CurNISTStatus);    
        Console.WriteLine("NIST X Pos.: " + CurNistX);
        Console.WriteLine("NIST Y Pos.: " + CurNistY);
        Console.WriteLine("--------END NIST DATA PACKET--------");

        //var currentNIST = new NISTDataSet()// Create new instance ofNISTdata object
    }

Any ideas?

12
  • 1
    YOu are trying to parse an html webpage using xml. YOu are using the wrong URL. The data is avaiable as XML but you need to use s different URL. See : nist.gov/programs-projects/materials-data-curation-system Commented Nov 6, 2018 at 10:51
  • Are you sure? If I print the XML doc to console it's all there, and creationtime works just fine. Commented Nov 6, 2018 at 10:56
  • This is my first time writing c# so I'm getting stuck with things that are probably quite simple Commented Nov 6, 2018 at 11:07
  • What xml link are you using? What you posted is only html. Commented Nov 6, 2018 at 11:25
  • The timestamp is gained only using the link given in the first line of the method Commented Nov 6, 2018 at 11:27

2 Answers 2

1

The XPath expression

/table[1]/tbody/tr[1]

will succeed only if the outermost element of the document is a table element, which seems unlikely. I haven't tried to understand the logic of the page or of your code, but this definitely looks wrong. "/" at the start of a path expression selects from the root of the tree.

Sign up to request clarification or add additional context in comments.

5 Comments

Yeah I though that, I've tried several different things there which is why I think that single slash is there
@GigaJoules Does '//table[1]/tbody/tr[1]' select what you wanted? It is unclear to me which element you are trying to select.
@GigaJoules We see a lot of questions where people have scattered random punctuation around their XPath expressions in the hope that it will act as magic fairy dust. It's rarely an effective strategy. Save yourself time, read the manual.
I'm looking to pull the word 'available' from the top right cell of the first table, and the 'value' number of tables 'linear x' and 'linear y'
Going for the attribute ID was a far better option in the end, as each element has a unique identifier and only occurs once.
0

So it turns out there was nothing wrong with how I was extracting the XML, only with my Paths.

public static void PollNIST()
        {
            string NISTSourceURL = "https://smstestbed.nist.gov/vds/current";  // Gives us a human friendly reference to the HTMl
            // string NistXmlUrl = // Someone on stackexchange is claiming that there is another url for the XML but viewsource says otherwise 
            //-------------------------------- Current (mostly) Working Version---------------------------------------------------------------------------------
            var NISTHttpClient = new HttpClient();
            var NISTXMLRaw = NISTHttpClient.GetStringAsync(NISTSourceURL);  // We now have all of the HTML / XML Data as a raw string
                                                                            //Console.WriteLine(MazXMLRaw.Result);                   // Prints the resulting HTML to a terminal as a debug tool    (Works)   
            XmlDocument CurNISTXML = new XmlDocument();               // Generate Blank XML Doc
            CurNISTXML.LoadXml(NISTXMLRaw.Result);                     // This (".result") passes the actual string?, should then be loaded into new XML file

            // Get CreationTime (WORKING!)
            XmlNodeList elementHeader = CurNISTXML.GetElementsByTagName("Header");
            XmlNode curNISTHeader = elementHeader.Item(0);
            XmlAttribute creationTime = curNISTHeader.Attributes[0];  // We now have the creationTime element          
            string CurNISTTime = creationTime.InnerText;  //      //*[@id="mtconnect content"]/ul/li[1]

            // Get availability (WORKING!)
            XmlNodeList nodeAvailability = CurNISTXML.GetElementsByTagName("Availability");
            XmlNode availability = nodeAvailability.Item(0); // I think this is maybe a bit of a hackish / improper way to do this?
            string curNISTStatus = availability.InnerText;

            //Get linear tool X Coord.
            XmlNodeList deviceStream = CurNISTXML.GetElementsByTagName("ComponentStream");
            XmlNode linearCompXStream = deviceStream.Item(4);
            string curNISTX = linearCompXStream.InnerText; //  We do not need to break down the nodes any further as the value is the only text within

            //Get Linear tool y Coord.            
            XmlNode linearCompYStream = deviceStream.Item(5);
            string curNISTY = linearCompYStream.InnerText; //  We do not need to break down the nodes any further as the value is the only text within


            Console.WriteLine("-------BEGIN NIST DATA PACKET-------");
            Console.WriteLine("NIST Time  : " + creationTime.InnerText);
            Console.WriteLine("NIST Status: " + curNISTStatus);    
            Console.WriteLine("NIST X Pos.: " + curNISTX);
            Console.WriteLine("NIST Y Pos.: " + curNISTY);
            Console.WriteLine("--------END NIST DATA PACKET--------");

            //var currentNIST = new NISTDataSet()// Create new instance ofNISTdata object
        }

works nicely.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.