Learn how to pull data from an API. This tutorial covers API scraping concepts, challenges, and will walk you through creating your own Twitter API scraper.
I’ve done a few projects that involve API scraping of some sort, whether it’s Twitter, AWS, Google, Medium, JIRA, you name it — it’s a fairly common task when you’re a freelance developer. Throughout these implementations, I’ve used a few libraries, including bottleneck, promise-queue, or just making my own. However, none of the existing solutions covered every aspect of scraping.
That’s why I created my own solution, api-toolkit, as a basis for API scraping. I also created another project, the twitter-toolkit based on it. This api-toolkit solves 90% of the challenges you will encounter in scraping your own APIs including:
- Key/Secret Management
- Building a simple queue that can transition between 4 states: Queued, Pending, Complete, Failed
- Logging
- Wait time between requests
- Concurrency
- Multiple Queues
- Rate Limiting
- Error Handling
- Progress Bars
- Debugging with Chrome Inspector
- Pagination
- Pausing/Resuming
If at any point you get stuck as to how the code works, you can look in those two repos for a working example. api-toolkit is the base set of utilities that you will share across all your APIs, and twitter-toolkit is an example of how you would use this base set for scraping the Twitter API.
Source: API Scraping in the Real World