Filter out own retweets Tweepy - python-3.x

My application (Python3 + Tweepy) finds a hashtag and retweets it.
I get a "Retweet is not permissible for this status" error because of retweeting own tweets.
How to filter out?
# retweet function
def hashtag_Retweet():
print (tweet.id)
api.retweet(tweet.id) # retweet
print(tweet.text)
return
query = '#foosball'
our_own_id = '3678887154' #Made up for this post
tweets = api.search(query)
for tweet in tweets:
# make sure that tweet does not come from host
hashtag_Retweet()

Something like this would work.
for tweet in tweets:
if tweet.user.id != our_own_id:
hashtag_Retweet()
Hope it helps.

Related

How to fix KeyError 'statuses' while collecting tweets?

I was collecting users' tweets using TwitterAPI when i stumbled upon this error.
Since i'm planning to crawl atleast 500 tweets with different attributes and each query only returns 100 tweets maxium, i made a function.
!pip install TwitterAPI
from TwitterAPI import TwitterAPI
import json
CONSUMER_KEY = #ENTER YOUR CONSUMER_KEY
CONSUMER_SECRET = #ENTER YOUR CONSUMER_SECRET
OAUTH_TOKEN = #ENTER YOUR OAUTH_TOKEN
OAUTH_TOKEN_SECRET = #ENTER YOUR OAUTH_TOKEN_SECRET
api = TwitterAPI(CONSUMER_KEY, CONSUMER_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
Here's how my function goes:
def retrieve_tweets(api, keyword, batch_count, total_count):
tweets = []
batch_count = str(batch_count)
resp = api.request('search/tweets', {'q': 'keyword',
'count':'batch_count',
'lang':'en',
'result_type':'recent',
}
)
# store the tweets in the list
tweets += resp.json()['statuses']
# find the max_id_str for the next batch
ids = [tweet['id'] for tweet in tweets]
max_id_str = str(min(ids))
# loop until as many tweets as total_count is collected
number_of_tweets = len(tweets)
while number_of_tweets < total_count:
print("{} tweets are collected for keyword {}. Last tweet created at {}".format(number_of_tweets, keyword, tweets[number_of_tweets-1]['created_at']))
resp = api.request('search/tweets', {'q': 'keyword',#INSERT YOUR CODE
'count':'batch_count',
'lang':'en',
'result_type': 'recent',
'max_id': 'max_id_str'
}
)
tweets += resp.json()['statuses']
ids = [tweet['id'] for tweet in tweets]
max_id_str = str(min(ids))
number_of_tweets = len(tweets)
print("{} tweets are collected for keyword {}. Last tweet created at {}".format(number_of_tweets, keyword, tweets[number_of_tweets-1]['created_at']))
return tweets
After that, i ran the function as follow:
first_group = retrieve_tweets(api, 'Rock', 100, 500)
It kept running fine until around 180th tweet, then this popped up:
179 tweets are collected for keyword Rock. Last tweet created at Mon Apr 29 02:04:05 +0000 2019
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-9-cbeb6ede7a5a> in <module>
8 # Your function call should look like this: retrieve_tweets(api,'keyword',single_count,total_count)
9
---> 10 k1_tweets = retrieve_tweets(api, 'Restaurant', 100, 500) #INSERT YOUR CODE HERE
11
12
<ipython-input-7-0d0c87e7c3e9> in retrieve_tweets(api, keyword, batch_count, total_count)
55 )
56
---> 57 tweets += resp.json()['statuses']
58 ids = [tweet['id'] for tweet in tweets]
59 max_id_str = str(min(ids))
KeyError: 'statuses'
It should've been smoothly done till 500 and i've tested the keyword 'statuses' multiple times before.
Additionally, this happened randomly at different point of the tweets collecting phase, there is a time when i managed to finish my first group of 500 tweets. But then, this error would pop up during the collection of the second group
Also, when this error pops up, i can't use the key 'statuses' anymore until i shutdown my editor and run it all over again.
Here's the simple test that i always run before and after the Error occured.
a = api.request('search/tweets', {'q': 'Fun', 'count':'10'})
a1 = a.json()
a1['statuses']
You use dict.get to get value for key statuses, which returns None if the key is not present, other gives the value for key statuses
tweets += resp.json().get('statuses')
if tweets:
ids = [tweet['id'] for tweet in tweets]
max_id_str = str(min(ids))
number_of_tweets = len(tweets)
The JSON response from Twitter will not always contain a statuses. You need to handle a response that contains an errors key as well. Error responses are documented here https://developer.twitter.com/en/docs/ads/general/guides/response-codes.html
Also, your code uses resp.json() to get this JSON structure. This is fine, but you also can use the iterator that comes with TwitterAPI. The iterator will iterate items contained in either statuses or errors. Here is the usage:
resp = api.request('search/tweets', {'q':'pizza'})
for item in resp.get_iterator():
if 'text' in item:
print item['text']
elif 'message' in item:
print '%s (%d)' % (item['message'], item['code'])
One more thing you may not be aware of is TwitterAPI comes with a utility class that will make successive requests and keep track of max_id for you. Here's a short example https://github.com/geduldig/TwitterAPI/blob/master/examples/page_tweets.py

crawl only tweets metadata without the tweet text using an ID list

CONTEXT: I have a list of tweet ids and their textual content and I need to crawl their metadata. However, my code crawls the tweet metadata and text as well. Since I have about 100K tweet ids I do not wish to waste time crawling the tweet text again.
Question: How can I adapt the following code so I would be able to download only tweet metadata. I'm using tweepy and python 3.6.
def get_tweets_single(twapi, idfilepath):
#tweet_id = '522778758168580098'
tw_list = []
with open(idfilepath,'r') as f1:#A File that Contains tweet IDS
lines = f1.readlines()
for line in lines:
try:
print(line.rstrip('\n'))
tweet = twapi.get_status(line.rstrip('\n'))#tweepy function to crawl tweet metadata
tw_list.append(tweet)
#tweet = twapi.statuses_lookup(id_=tweet_id,include_entities=True, trim_user=True)
with open(idjsonFile,'a',encoding='utf-8')as f2:
json.dump(tweet._json,f2)
except tweepy.TweepError as te:
print('Failed to get tweet ID %s: %s', tweet_id, te.message)
def main(args):
print('hello')
# connect to twitter
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
api = tweepy.API(auth)
get_tweets_single(api, idfilepath)
You cannot only download metadata about the tweet.
Looking at the documentation you can choose to exclude information about the user with trim_user=true - but that's the only thing you can strip out.

How to extract tweets related to a particular country?

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import tweepy
import textblob
import re
from textblob import TextBlob
import pandas as pd
import numpy as np
ACCESS_TOKEN="XXXX"
ACCESS_SECRET="XXXX"
CONSUMER_KEY="XXXX"
CONSUMER_SECRET="XXXX"
def twitter_setup():
"""
Utility function to setup the Twitter's API
with our access keys provided.
"""
# Authentication and access using keys:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# Return API with authentication:
api = tweepy.API(auth)
return api
extractor = twitter_setup()
tweets = extractor.user_timeline(screen_name="realDonaldTrump", count=200)
data = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])
data['len'] = np.array([len(tweet.text) for tweet in tweets])
data['ID'] = np.array([tweet.id for tweet in tweets])
data['Date'] = np.array([tweet.created_at for tweet in tweets])
data['Source'] = np.array([tweet.source for tweet in tweets])
data['Likes'] = np.array([tweet.favorite_count for tweet in tweets])
data['RTs'] = np.array([tweet.retweet_count for tweet in tweets])
def clean_tweet(tweet):
'''
Utility function to clean the text in a tweet by removing
links and special characters using regex.
'''
return ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
def analize_sentiment(tweet):
'''
Utility function to classify the polarity of a tweet
using textblob.
'''
analysis = TextBlob(clean_tweet(tweet))
#print(analysis.sentiment.polarity)
if analysis.sentiment.polarity > 0:
return 1
elif analysis.sentiment.polarity == 0:
return 0
else:
return -1
data['SA'] = np.array([ analize_sentiment(tweet) for tweet in data['Tweets'] ])
display(data.head(200))
I am working on a Project, in this project we are extracting tweets of some of the world leaders and then we will try to compare their relationships with other countries based on their twitter comment. So far we have extracted the tweets from Donald Trump Account We have categorized the tweets into positive and negative but what problem I am facing is how we can separate the tweets country-wise, Is their any way by which only those tweets are extracted in which he/she has tweeted about some country and the rest of the tweets are ignored so that we can only get the tweets related to the country.
I don't have enough reputation to add a comment, but you need to know that you have posted all your access tokens and that is a bad idea.
You might load-up a list of countries such as: github repo by marijn. It also contains a list with nationalities github repo by marijn
Check per tweet whether a name in the list occurs (so you would have to iterate over the list). You might add a counter for each country occuring per tweet. Add this counter-data as a column to your dataframe (similar to your earlier approach to analyze the sentiment).
This is just an idea, I'm not able to comment yet due to the fact I'm new.

How to retrieve all historical public tweets with Twitter Premium Search API in Sandbox version (using next token)

I want to download all historical tweets with certain hashtags and/or keywords for a research project. I got the Premium Twitter API for that. I'm using the amazing TwitterAPI to take care of auth and so on.
My problem now is that I'm not an expert developer and I have some issues understanding how the next token works, and how to get all the tweets in a csv.
What I want to achieve is to have all the tweets in one single csv, without having to manually change the dates of the fromDate and toDate values. Right now I don't know how to get the next token and how to use it to concatenate requests.
So far I got here:
from TwitterAPI import TwitterAPI
import csv
SEARCH_TERM = 'my-search-term-here'
PRODUCT = 'fullarchive'
LABEL = 'here-goes-my-dev-env'
api = TwitterAPI("consumer_key",
"consumer_secret",
"access_token_key",
"access_token_secret")
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
{'query':SEARCH_TERM,
'fromDate':'200603220000',
'toDate':'201806020000'
}
)
csvFile = open('2006-2018.csv', 'a')
csvWriter = csv.writer(csvFile)
for item in r:
csvWriter.writerow([item['created_at'],item['user']['screen_name'], item['text'] if 'text' in item else item])
I would be really thankful for any help!
Cheers!
First of all, TwitterAPI includes a helper class that will take care of this for you. TwitterPager works with many types of Twitter endpoints, not just Premium Search. Here is an example to get you started: https://github.com/geduldig/TwitterAPI/blob/master/examples/page_tweets.py
But to answer your question, the strategy you should take is to put the request you currently have inside a while loop. Then,
1. Each request will return a next field which you can get with r.json()['next'].
2. When you are done processing the current batch of tweets and ready for your next request, you would include the next parameter set to the value above.
3. Finally, eventually a request will not include a next in the the returned json. At that point break out of the while loop.
Something like the following.
next = ''
while True:
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
{'query':SEARCH_TERM,
'fromDate':'200603220000',
'toDate':'201806020000',
'next':next})
if r.status_code != 200:
break
for item in r:
csvWriter.writerow([item['created_at'],item['user']['screen_name'], item['text'] if 'text' in item else item])
json = r.json()
if 'next' not in json:
break
next = json['next']

How to subscribe with youtube api?

https://developers.google.com/youtube/v3/code_samples/apps-script#subscribe_to_channel
Hello,
I cant figure how to subscribe to a youtube channel with a post request. Im not looking to use YoutubeSubscriptions as shown above. Im simple looking to pass an api key, but cant seem to figure it out. Any suggestions?
If you don't want to use the YoutubeSubscriptions, you have to get the session_token after login youtube account.
The session_token is stored in the hidden input tag:
document.querySelector('input[name=session_token]').value
or full-text search XSRF_TOKEN field, the corresponding value is session_token, reference regular:
const regex = /\'XSRF_TOKEN\':(.*?)\"(.*?)\"/g
Below is an implementation in Python:
def YouTubeSubscribe(url,SessionManager):
while(1):
try:
html = SessionManager.get(url).content
session_token = (re.findall("XSRF_TOKEN\W*(.*)=", html , re.IGNORECASE)[0]).split('"')[0]
id_yt = url.replace("https://www.youtube.com/channel/","")
params = (('name', 'subscribeEndpoint'),)
data = [
('sej', '{"clickTrackingParams":"","commandMetadata":{"webCommandMetadata":{"url":"/service_ajax","sendPost":true}},"subscribeEndpoint":{"channelIds":["'+id_yt+'"],"params":"EgIIAg%3D%3D"}}'),
('session_token', session_token+"=="),
]
response = SessionManager.post('https://www.youtube.com/service_ajax', params=params, data=data)
check_state = json.loads(response.content)['code']
if check_state == "SUCCESS":
return 1
else:
return 0
except Exception as e:
print "[E] YouTubeSubscribe:"+ str(e)
pass

Resources