R – How to simulate a web site login in ASP.NET, then scrape some data from a page


Does anyone have any recommendations for performing the following in ASP.NET code:

1) Login into a password protected site with a username and password (target site is not necessarily ASP.NET)

2) Navigate to a specific page and/or perform a search

3) Pull specific data from the page (this is the easy part)

Although using an API would be nice, the source site does not provide this capability.

The login is very straightforward (Username, Password, Submit Button) — no CAPTCHA's, etc…

Best Solution

Check out my answer to this question:
surfing with the same CookieContainer

There's a WebClient class built into .Net, but it's not so good at getting through authentication barriers, so I wrote that some time ago to help with the grunt work. Unfortunately, you still need to study responses to know what requests to send and how to parse the results. And make sure to read my disclaimers: parts of my code frankly aren't very good and it's in VB.Net (which is a problem for some people). But mainly it works pretty well.

Related Question