Duplicate Transaction
API Scraping in the Real World

API Scraping in the Real World – Code Mentor

Learn how to pull data from an API. This tutorial covers API scraping concepts, challenges, and will walk you through creating your own Twitter API scraper.

I’ve done a few projects that involve API scraping of some sort, whether it’s Twitter, AWS, Google, Medium, JIRA, you name it — it’s a fairly common task when you’re a freelance developer. Throughout these implementations, I’ve used a few libraries, including bottleneck, promise-queue, or just making my own. However, none of the existing solutions covered every aspect of scraping.

That’s why I created my own solution, api-toolkit, as a basis for API scraping. I also created another project, the twitter-toolkit based on it. This api-toolkit solves 90% of the challenges you will encounter in scraping your own APIs including:

  • Key/Secret Management
  • Building a simple queue that can transition between 4 states: Queued, Pending, Complete, Failed
  • Logging
  • Wait time between requests
  • Concurrency
  • Multiple Queues
  • Rate Limiting
  • Error Handling
  • Progress Bars
  • Debugging with Chrome Inspector
  • Pagination
  • Pausing/Resuming

If at any point you get stuck as to how the code works, you can look in those two repos for a working example. api-toolkit is the base set of utilities that you will share across all your APIs, and twitter-toolkit is an example of how you would use this base set for scraping the Twitter API.

Source: API Scraping in the Real World

Kourosh

Your Header Sidebar area is currently empty. Hurry up and add some widgets.