API Scraping in the Real World

Learn how to pull data from an API. This tutorial covers API scraping concepts, challenges, and will walk you through creating your own Twitter API scraper.

I’ve done a few projects that involve API scraping of some sort, whether it’s Twitter, AWS, Google, Medium, JIRA, you name it — it’s a fairly common task when you’re a freelance developer. Throughout these implementations, I’ve used a few libraries, including bottleneck, promise-queue, or just making my own. However, none of the existing solutions covered every aspect of scraping.

That’s why I created my own solution, api-toolkit, as a basis for API scraping. I also created another project, the twitter-toolkit based on it. This api-toolkit solves 90% of the challenges you will encounter in scraping your own APIs including:

  • Key/Secret Management
  • Building a simple queue that can transition between 4 states: Queued, Pending, Complete, Failed
  • Logging
  • Wait time between requests
  • Concurrency
  • Multiple Queues
  • Rate Limiting
  • Error Handling
  • Progress Bars
  • Debugging with Chrome Inspector
  • Pagination
  • Pausing/Resuming

If at any point you get stuck as to how the code works, you can look in those two repos for a working example. api-toolkit is the base set of utilities that you will share across all your APIs, and twitter-toolkit is an example of how you would use this base set for scraping the Twitter API.

