0

When viewing the page source for a page I use CTRL-F to find all occurrences of "id=", which gives me 82 results. What I want to do is to extract only the numbers after the "id=". For example, if the attribute is id=344 then I only want to get the 344 as string and add it to the List.

The way I'm doing it now I'm not getting links I thought I will get all the links this way and make filter after it but I'm getting empty string and some texts nothing from what I wanted. I guess doing InnerText is wrong.

Source View

idsnumbers = new List<string>();
HtmlWeb hw = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load("http://www.tapuz.co.il/forums2008/");
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
    idsnumbers.Add(link.InnerText);
}

Update getting null exception:

System.NullReferenceException was unhandled
  _HResult=-2147467261
  _message=Object reference not set to an instance of an object.
  HResult=-2147467261
  IsTransient=false
  Message=Object reference not set to an instance of an object.
  Source=WindowsFormsApplication1
  StackTrace:
       at WindowsFormsApplication1.Form1..ctor() in d:\C-Sharp\Tapuz Images\WindowsFormsApplication1\WindowsFormsApplication1\Form1.cs:line 50
       at WindowsFormsApplication1.Program.Main() in d:\C-Sharp\Tapuz Images\WindowsFormsApplication1\WindowsFormsApplication1\Program.cs:line 19
       at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException: 

1 Answer 1

1

You should read ids from the attributes. InnerText is just for the text inside the tag, between the opening and closing brackets. So:

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
    idsnumbers.Add(link.Attributes["id"].Value);
}

And if you want to further extract only numbers from ids, you could use RegEx or int.TryParse.

Sign up to request clarification or add additional context in comments.

2 Comments

torvin i'm getting exception null on the line: idsnumbers.Add(link.Attributes["id"].Value); i added the exception full message to my question.
If link.Attributes["id"] is null, then your <a> doesn't have it. Just add a null check.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.