The readLines function displays all the content of the source page in one line.
con = url("target_url_here")
htmlcode = readLines(con)
readLines function has concatenated all the lines of the source page in one line. So there is no way I can navigate to the 15th line in the original html source page.
Next approach is to try parsing it using XML package or httr package.
library("httr")
html <- GET("target_url_here")
content2 = content(html,as="text")
parsedHtml = htmlParse(content2,asText=TRUE)
By printing out the parsedHtml, it retains the html format and displays all the contents as it can be seen in the source page.
Now suppose I want to extract the title, so the function
xpathSApply(parsedHtml,"//title",xmlValue)
will give the title.
But my question is, how do I navigate to any line say the 15th line of the html? In other words, how can I treat the html as a vector of strings, where each element of the vector is a separate line in the html page/parsed html object.
Best Solution
Having a better look at the docs for
readLines()
, it actually returns:So in your case:
you can easily do
htmlCode[15]
to access the 15th line in the original html source page.