The readLines function displays all the content of the source page in one line.
con = url("target_url_here") htmlcode = readLines(con)
readLines function has concatenated all the lines of the source page in one line. So there is no way I can navigate to the 15th line in the original html source page.
Next approach is to try parsing it using XML package or httr package.
library("httr") html <- GET("target_url_here") content2 = content(html,as="text") parsedHtml = htmlParse(content2,asText=TRUE)
By printing out the parsedHtml, it retains the html format and displays all the contents as it can be seen in the source page.
Now suppose I want to extract the title, so the function
will give the title.
But my question is, how do I navigate to any line say the 15th line of the html? In other words, how can I treat the html as a vector of strings, where each element of the vector is a separate line in the html page/parsed html object.