Webscraping for Emergency Data Collection

 

During the COVID-19 pandemic, emidemiologists are scrambling to compile information that will allow them to analyze trends from diverse local sources. In the Americas region, analysts at the Pan-American Health Organization (PAHO/WHO) must pull data daily from dozens of countries’ national websites. Some countries publish downloadable compiled datasets; while others only publish data embedded in a website’s html.

Webscraping tools can save analysts precious time during a rapidly evolving epidemic - or frankly, in any other circumstances. This repo contains R code for scraping websites with minimal formatting (where the html table can be easily identified) and those with heavy formatting (where individual elements are bundled in layer after layer of formatting containers).

Full code, description and instructions can be found here.