Does Spotipy break after a certain number of requests? Hitting API too much? - spotify

I had a script that ran for about ~1hr.
df is a data frame created by rows I queried from my own Postgres database.
I'm iterating over that to get the artist_name value and make the API call.
There are ~47k records in that table.
After an hour or so, the script stopped giving me any responses. There are no errors.
The line that breaks is results = artist_albums(...)
Putting a breakpoint() before works but once that line runs, It stops. No status errors, 429, etc...
Did I hit the Spotipy API too much?
for idx, val in enumerate(df['artist_name']):
#get albums for each artist
results = artist_albums('spotify:artist:'+df['artist_name'], album_type='album')
albums = results['items']
while results['next']:
results = spotify.next(results)
albums.extend(results['items'])
sleep(0.5)
for album in albums:
print(album['name'])
try:
q = (album['name'],
album['id'],
album['uri'],
album['release_date'],
album['release_date_precision'],
album['total_tracks'],
album['artists'][0]['id'],
album['type'],
album['external_urls']['spotify']
)
)
cur.execute("""insert into schema.table values (
%s, %s, %s, %s, %s,
%s, %s, %s, %s)""", q)
conn.commit()
```

You have probably hit Spotiy API's rate limits, which works in a 30-seconds rolling window.
If your app makes a lot of Web API requests in a short period of time
then it may receive a 429 error response from Spotify. This indicates
that your app has reached our Web API rate limit. The Web API has rate
limits in order to keep our API reliable and to help third-party
developers use the API in a responsible way.
Spotify’s API rate limit is calculated based on the number of calls that your app makes to Spotify in a rolling 30 second window.
A way to avoid this would be to introduce some waiting time between API calls, for example using time.sleep, i.e.:
import time
time.sleep(10) # sleeps for 10 seconds

Related

Google Photos API mediaItems list/search methods ignore pageSize param

I am attempting to do a retrieve of all media items that a given Google Photos user has, irrespective of any album(s) that they are in. However when I attempt to use either the mediaItems.list or the mediaItems.search methods, the pageSize param I am including in either request is either being ignored or not fully fullfilled.
Details of mediaItems.list request
GET https://photoslibrary.googleapis.com/v1/mediaItems?pageSize=<###>
Details of mediaItems.search request
POST https://photoslibrary.googleapis.com/v1/mediaItems:search
BODY { 'pageSize': <###> }
I have made a simple implementation of these two requests here as an example for this question, it just requires a valid accessToken to use:
https://jsfiddle.net/zb2htog1/
Running this script with the following pageSize against a Google Photos account with 100s of photos and 10s of albums consistently returns the same unexpected amount of result for both methods:
Request pageSize
Returned media items count
1
1
25
9
50
17
100
34
I know that Google states the following for the pageSize parameter for both of these methods:
“Maximum number of media items to return in the response. Fewer media
items might be returned than the specified number. The default
pageSize is 25, the maximum is 100.”
I originally assumed that the reason fewer media items might be returned is because an account might have less media items in total than a requested pageSize, or that a request with a pageToken has reached the end of a set of paged results. However I am now wondering if this just means that results may vary in general?
Can anyone else confirm if they have the same experience when using these methods without an album ID for an account with a suitable amount of photos to test this? Or am I perhaps constructing my requests in an incorrect fashion?
I experience something similar. I get back half of what I expect.
If I don't set the pageSize, I get back just 13, If I set to 100, I get back 50.

Problem with revolut api transactions list

are few days that we are experiencing trouble with revolut api.
We use that library: https://github.com/useme-com/revolut-python
Now when we try to retrive a list of transactions we receive:
root## python3 transactions.py
HTTP 400 for https://b2b.revolut.com/api/1.0/transactions: Duplicate key User#XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXX (attempted merging values XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXX and YYYYYYYY-YYYY-YYYY-YYYY-YYYYYYYYYY)
The code is pretty straightforward, to debug basically:
[...]
# Enable Session
session = RenewableSession(refreshtoken,clientid,jwttoken)
# Create API Client
revolut = Client(session)
# Transactions Display
for transaction in revolut.transactions():
print(transaction)
[...]
The same code, from our side, worked until 3 days ago, without errors.
Any ideas on what's going on?
Possible that exist a failure from revolut side?
They are not responding on this (already opened a ticket about).
ty
I got this issue while using a high count param
https://b2b.revolut.com/api/1.0/transactions?count=1000
Reducing count to 100 or 200 made me get a good response. I think Revolut has some issues when sending API response that includes a very old historical transaction due to changes in data structure/merging etc at their end

I am getting this keyerror: 'groups' when trying to fetch nearby venues using Foursquare API

Hi I am getting keyerror: 'groups' when trying to fetch nearby venues using Foursquare API. Following is my code:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
venues_list = []
for lat, long, post, borough, neighborhood, hospital in zip(hospital_df['Latitude'], hospital_df['Longitude'], hospital_df['Pincode'], hospital_df['District'], hospital_df['Location'], hospital_df['Hospital_Name']):
print(neighborhood)
print(borough)
url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v=
{}&ll={},{}&radius={}&limit={}".format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
long,
radius,
LIMIT)
results = requests.get(url).json()["response"]['groups'][0]['items']
venues_list.append([(
post,
borough,
neighborhood,
hospital,
lat,
lng,
v['venue']['name'],
v['venue']['location']['lat'],
v['venue']['location']['lng'],
v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['PostalCode', 'Borough', 'Neighborhood', 'Hospital', 'Neighborhood_Latitude', 'Neighborhood_Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
I keep getting the following error:
KeyError: 'groups'
I had the same issue "KeyError: 'groups'" with very similar code. What I found was that while my URL variable that I built was incorrectly formed (I added a filter for categoryId, but I passed an incorrectly formatted variable for this value).
Once I corrected this formula, the "results = requests.get(url).json()["response"]['groups'][0['items']" was able to process without errors.
My guess is that when I submitted the incorrectly formed URL, the JSON return was an error message without the correct formatting that included 'groups'.
This was due to exceeding the number of free calls to the Foursquare API in my case. I have very similar code, prewritten as part of an online course. I was running it fine for many days with no issues. Then, suddenly I got the 'groups' key error a few times. Then, I stopped working, and the next morning the code ran fine. Then, after a few calls I got the error again. So I checked the .json file, and it didn't contain the key 'groups' because it was basically a .json file telling me the quota was exceeded.
Try resetting your CLIENT SECRET from Foursquare account. It worked for me.

gcp:datastore - get overall status __Stat_Total__ using Python API

I want to get overall datastore Statistics for my Dashboard Application. I went through the docs there are no tutorials about how to get the statistics using datastore. Somehow to get the status details there is GQL QuerySELECT * FROM __Stat_Total__ which displays
builtin_index_bytes,
builtin_index_count,
bytes,
composite_index_bytes,
composite_index_count,
count,
entity_bytes,
timestamp.
I want to display all these details through Python API Client.
I tried a few examples which didn't work out.
def get_summary_statistics(self):
#[START getting the Summary Statistics]
stats = self.client.query(kind= self.kind_name)
overall_stats = stats.__Stat_Total__ ()
return overall_stats
How do I get all datastore Statistics?
The Cloud Datastore NDB administration documentation has some information about __Stat_Total__ and other stat entities, along with a small example script that queries Datastore stats:
from google.appengine.ext.ndb import stats
global_stat = stats.GlobalStat.query().get()
print 'Total bytes stored: %d' % global_stat.bytes
print 'Total entities stored: %d' % global_stat.count

twitter api count more than 100, using twitter search api

i want to search-tweet related 'data' and count more than 100
this is python grammer
from twython import Twython
twitter= Twython(app_key=APP_KEY,app_secret=APP_SECRET)
for status in twitter.search(q='"data"',count =10000)["statuses"]:
user =status["user"]["screen_name"].encode('utf-8')
text =status["text"]
data = "{0} {1} {2}".format(user ,text,'\n\n')
print(data)
f.writelines(data)
So what you're trying to do uses the Twitter API. Specifically the GET search/tweets endpoint.
In the docs for this endpoint:
https://dev.twitter.com/rest/reference/get/search/tweets
We can see that count has a maximum value of 100:
So even though you specify 10000, it only returns 100 because that's the max.
I've not tried either, but you can likely use the until or max_id parameters also mentioned in the docs to get more results/the next 100 results.
Keep in mind: "that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week" - the docs
Hope this helps!
You can use the field next_token of the response to get more tweets.
Refer to these articles:
https://lixinjack.com/how-to-collect-more-than-100-tweets-when-using-twitter-api-v2/
https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/paginate
The max_id parameter is the key and it is further explained here:
To use max_id correctly, an application’s first request to a timeline
endpoint should only specify a count. When processing this and
subsequent responses, keep track of the lowest ID received. This ID
should be passed as the value of the max_id parameter for the next
request, which will only return Tweets with IDs lower than or equal to
the value of the max_id parameter.
https://developer.twitter.com/en/docs/tweets/timelines/guides/working-with-timelines
In other words, using the lowest id retrieved from a search, you can access the older tweets. As mentioned by Tyler, the non-commercial version is limited to 7-day, but the commercial version can search up to 30 days.

Resources