This allows batches to be spaced out, reducing the resulting rate-limit when requests start to be rejected. Four flavours of backoff are available based on the usage of jitter: None, full, equal, and decorr - decorrelated.Įxponential backoff is calculated by multiplying the base_backoff by 2 to the power of the number of failed batches. Exponential BackoffĮxponential backoff can be used by setting the limit_type to backoff. A maximum recommended value is 100 requests per minute.Īdditionally, the rate-limiting behaviour can be constrained by the max_sleep parameter which allows you to select a maximum period of time to sleep between requests. Increasing this value above 60 will increase the number of rejected requests and will increase the burden on the Pushshift server. Providing a rate_limit value is optional, this defaults to 60 requests per minute which is the recommended value for interacting with the Pushshift API. PMAW by default rate limits using rate-averaging so that the concurrent API requests to the Pushshift server are limited to your provided rate. If you're unsure on which to use, refer to the benchmark comparison. Multiple different options are available for rate-limiting your Pushshift API requests, and are defined by two different types, rate-averaging and exponential backoff. If you are unsure how many processors you have use: os.cpu_count().
REDDIT DEFAULTS PLUS
REDDIT DEFAULTS INSTALL
To install it via pip, run:įrom pmaw import PushshiftAPI api = PushshiftAPI() PMAW currently supports Python 3.5 or later. Implementing intelligent rate limiting can ensure that we minimize the number of rejected requests, and the time it takes to complete. Since API requests are I/O-bound they can benefit from being run asynchronously using multiple threads.
REDDIT DEFAULTS CODE
The time it takes for your code to complete pulling all this data is limited by both your network latency and the response time of the Pushshift server, which can vary throughout the day.Ĭurrent API libraries such as PRAW and PSAW currently run requests sequentially, which can cause thousands of API calls to take many hours to complete.
When building large datasets from Reddit submission and comment data it can require thousands of API calls to the Pushshift API. Search Submission Comment IDs: search_submission_comment_ids.The following three methods are currently supported:
REDDIT DEFAULTS GENERATOR
When using a method PMAW will complete all the required API calls to complete the query before returning a Response generator object.
Parameters are provided through keyword arguments when calling the method, some methods will have required parameters. General usage is through the PushshiftAPI class which provides methods for interacting with different Pushshift endpoints, please view the Pushshift Docs for more details on the endpoints and accepted parameters. PMAW is a wrapper for the Pushshift API which uses multithreading to retrieve Reddit comments and submissions.