Anyone? I have posted in the Statalist and no one responded yet (after 7 days). I need to use it for my research. Please help.
How can I use Stata for webscraping?
-
depends on what you want to do - what kind of pages? if the pages are even moderately complex, parsing the html as a text file will be a mess and you better know regular expressions really well.
Honestly it would be easier (and much more useful in the future) to learn webscraping in R/Python that trying to do this in stata. Look at BeautifulSoup in python or httr/rcurl in r
-
rvest or XML are the more relevant packages for scraping in R.
depends on what you want to do - what kind of pages? if the pages are even moderately complex, parsing the html as a text file will be a mess and you better know regular expressions really well.
Honestly it would be easier (and much more useful) to learn webscraping in R/Python that trying to do this in stata. Look at BeautifulSoup in python or httr/rcurl in r -
OP obviously trying to troll. If you actually want to do it in Stata, it is very easy, I have done it before. They have an article on it in Stata Journal. But given you appear to be trolling, if you want the link go find it yourself. If I remember correctly, they were using Presidential Data.
Thanks all. I want to parse the html page not just download. I will learn Python/R.