Java – Could the value of an html anchor tag be fetched using xpath


If I have HTML that looks like:

<td class="blah">&nbs;<a href="http://.....">????</a>&nbsp;</td>

Could I get the ???? value using xpath?
What would it look like?

Best Solution

To use XPath you usually need XML not HTML, but some parsers (e.g. the one built into PHP) have a relaxed Mode which will parse most HTML, too.
If you want to find all <a> that are direct children of <td class="blah"> the XPath you need is

//td[@class = 'blah']/a
//td[@class = 'blah']/a[@href = 'http://...']

(depending on whether you only want the one url or all urls)
This will give you a Set of Nodes. You'll need to iterate through it and then check for the nodeType of the firstChild (supposed to be a text node) and the number of child nodes (supposed to be 1). Then the firstChild will contain the ????