I'm trying to get info about users, who added specific tweet to favorites, but I can't find it in documentation.
It is unfair that twitter can do that, but doesn't give this method as API.
Apparently, the only way to do this is to scrape Twitter's website:
import urllib2
from lxml.html import parse
#returns list(retweet users),list(favorite users) for a given screen_name and status_id
def get_twitter_user_rts_and_favs(screen_name, status_id):
url = urllib2.urlopen('https://twitter.com/' + screen_name + '/status/' + status_id)
root = parse(url).getroot()
num_rts = 0
num_favs = 0
rt_users = []
fav_users = []
for ul in root.find_class('stats'):
for li in ul.cssselect('li'):
cls_name = li.attrib['class']
if cls_name.find('retweet') >= 0:
num_rts = int(li.cssselect('a')[0].attrib['data-tweet-stat-count'])
elif cls_name.find('favorit') >= 0:
num_favs = int(li.cssselect('a')[0].attrib['data-tweet-stat-count'])
elif cls_name.find('avatar') >= 0 or cls_name.find('face-pile') >= 0:#else face-plant
for users in li.cssselect('a'):
#apparently, favs are listed before retweets, but the retweet summary's listed before the fav summary
#if in doubt you can take the difference of returned uids here with retweet uids from the official api
if num_favs > 0:#num_rt > 0:
#num_rts -= 1
num_favs -= 1
#rt_users.append(users.attrib['data-user-id'])
fav_users.append(users.attrib['data-user-id'])
else:
#fav_users.append(users.attrib['data-user-id'])
rt_users.append(users.attrib['data-user-id'])
return rt_users, fav_users
#example
if __name__ == '__main__':
print get_twitter_user_rts_and_favs('alien_merchant', '674104400013578240')
Short answer: You can't do this perfectly.
Long answer: You can do this with some effort but it isn't going to be even close to perfect. You can use the twitter api to monitor the activity of up to 4000 user id's. If a tweet is created by one of the 4k people you monitor, then you can get all the information including the people who have favourited the tweet. This also requires that you push all the information about the people you monitor onto a database (I use mongodb). You can then query the database for information about your tweet.
Twitter API v2 has new likes functionality:
https://twittercommunity.com/t/announcing-twitter-api-v2-likes-lookup-and-blocks-lookup/154353
To get users who have liked a Tweet, use the GET /2/tweets/:id/liking_users endpoint.
They've also provided example code on their github repo.
Use the endpoint favorites/list with max_id set to the tweet you're looking for.
https://dev.twitter.com/rest/reference/get/favorites/list
Related
I'm trying to get all the questions with details from Stack Exchange API for a given user ID using following code:
response = requests.get("http://api.stackexchange.com/2.2/users/2593236/questions?")
However, I receive this error message.
{"error_id":400,"error_message":"site is required","error_name":"bad_parameter"}
Can anyone help me with this issue and retrieve all user asked questions according to their user ID?
To download all questions or answers from a specific user and stack, you can use:
import requests, traceback, json
all_items = []
user = 2593236
stack = "stackoverflow.com"
qa = "questions" # or answers
page = 1
while 1:
u = f"https://api.stackexchange.com/2.2/users/{user}/{qa}?site={stack}&page={page}&pagesize=100"
j = requests.get(u).json()
if j:
all_items += j["items"]
if not j['has_more']:
print("No more Pages")
break
elif not j['quota_remaining']:
print("No Quota Remaining ")
break
else:
print("No Questions")
break
page+=1
if all_items:
print(f"How many {qa}? ", len(all_items))
# save questions/answers to file
with open(f"{user}_{qa}_{stack}.json", "w") as f:
f.write(json.dumps(all_items))
Demo
The error message is pretty clear: you have to include a site parameter, as explained in the documentation:
Each of these methods operates on a single site at a time, identified by the site parameter. This parameter can be the full domain name (ie. "stackoverflow.com"), or a short form identified by api_site_parameter on the site object.
Try
http://api.stackexchange.com/2.2/users/2593236/questions?site=stackoverflow.com
I am using TwitterAPI in python3 for premium search to find archived tweets that are retweeted by user1 from user2 with specific keywords. After some suggestions, I have used https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/operators-by-product and https://github.com/geduldig/TwitterAPI to make this code, but when I run the code I am not getting any output or error message.
The code works fine when I am not using the retweets_of and from operators, but these are the rules I want to use to get my data.
I know my code shows a premium Sandbox search, but I will upgrade it to premium Full Archive search when I have the right code.
from TwitterAPI import TwitterAPI
#Keys and Tokens from Twitter Developer
consumer_key = "xxxxxxxxxxxxx"
consumer_secret = "xxxxxxxxxxxxxxxxxxx"
access_token = "xxxxxxxxxxxxxxxxxxx"
access_token_secret = "xxxxxxxxxxxxxxxxx"
PRODUCT = '30day'
LABEL = 'MyLABELname'
api = TwitterAPI(consumer_key, consumer_secret, access_token, access_token_secret)
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
{'query':'retweets_of:user.Tesla from:user.elonmusk Supercharger battery'})
for item in r:
print (item['text'] if 'text' in item else item)
Does someone know what the problem is with my code or is there any other way to use the retweets_of and from operators for a premium search. Is it also possible to add a count operator to my code so it will give numbers as output and not all of the tweets in writing?
You should omit "user." in your query.
Also, by specifying "Supercharger battery", which is perfectly fine, you require both in the search results. However, if you require only either word to be present, you would use "Supercharger OR battery".
Finally, to specify a larger number of results, use the maxResults parameter (10 to 100).
Here is your example with all of the above:
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
{'query':'retweets_of:Tesla from:elonmusk Supercharger OR battery',
'maxResults':100})
Twitter's Premium Search doc may be helpful: https://developer.twitter.com/en/docs/tweets/search/api-reference/premium-search.html
How can I use the Newspaper library for websites that need authentication?
I'm using the newspaper3k library in order to download the html of several articles from different news sites (which is so far working just fine). However, as I need the full content I would need to authenticate (username, password) before requesting the html. I would appreciate any pointers in the right direction!
I assume this has to happen before I use newspaper.build() ?
(I just wanted to say at this point, this is the first time I am coding with python (or just generally coding anything) so any help at all would be great)
import newspaper #import newspaper library
from newspaper import news_pool
guardian = newspaper.build('https://www.theguardian.com/uk-news/all', language='en', memoize_articles=True)
telegraph = newspaper.build('https://www.telegraph.co.uk/news/uk/', language='en', memoize_articles=True)
dagbladet = newspaper.build('https://www.svd.se/sverige', language='sv', memoize_articles=True)
dagensnyheter = newspaper.build('https://www.dn.se/nyheter/sverige/', language='sv', memoize_articles=True)
allpapers = [guardian, telegraph, dagbladet, dagensnyheter]
for papers in allpapers:
newpathpaper = r'/Users/articles/' + today + "/" + naming #naming is just a variable from further up that gives the name of each newspaper
if not os.path.exists(newpathpaper):
os.makedirs(newpathpaper)
#parsing, downloading and creating files for articles
pointer = 0
while(papers.size() > pointer):
papers_article = papers.articles[pointer]
papers_article.download()
if papers_article.download_state == 2: #checking if article has been downloaded
time.sleep(2)
papers_article.parse()
print(papers_article.url)
#receiving publishing date so it is comparable
published_today = papers_article.publish_date #newspaper extractor
published = str(published_today)[0:10]
#writing html
if published == today: #today was declared earlier
f = open('articles/%s/%s/%s_article_%s.html' %(today, naming, naming, pointer), 'w+') #writing html file
f.write(papers_article.html)
print("written successfully")
count_writes +=1
else:
print("not from today")
else:
print("article %s" %pointer)
print(papers_article.url)
print("Has not downloaded!")
pointer += 1
I want to download all historical tweets with certain hashtags and/or keywords for a research project. I got the Premium Twitter API for that. I'm using the amazing TwitterAPI to take care of auth and so on.
My problem now is that I'm not an expert developer and I have some issues understanding how the next token works, and how to get all the tweets in a csv.
What I want to achieve is to have all the tweets in one single csv, without having to manually change the dates of the fromDate and toDate values. Right now I don't know how to get the next token and how to use it to concatenate requests.
So far I got here:
from TwitterAPI import TwitterAPI
import csv
SEARCH_TERM = 'my-search-term-here'
PRODUCT = 'fullarchive'
LABEL = 'here-goes-my-dev-env'
api = TwitterAPI("consumer_key",
"consumer_secret",
"access_token_key",
"access_token_secret")
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
{'query':SEARCH_TERM,
'fromDate':'200603220000',
'toDate':'201806020000'
}
)
csvFile = open('2006-2018.csv', 'a')
csvWriter = csv.writer(csvFile)
for item in r:
csvWriter.writerow([item['created_at'],item['user']['screen_name'], item['text'] if 'text' in item else item])
I would be really thankful for any help!
Cheers!
First of all, TwitterAPI includes a helper class that will take care of this for you. TwitterPager works with many types of Twitter endpoints, not just Premium Search. Here is an example to get you started: https://github.com/geduldig/TwitterAPI/blob/master/examples/page_tweets.py
But to answer your question, the strategy you should take is to put the request you currently have inside a while loop. Then,
1. Each request will return a next field which you can get with r.json()['next'].
2. When you are done processing the current batch of tweets and ready for your next request, you would include the next parameter set to the value above.
3. Finally, eventually a request will not include a next in the the returned json. At that point break out of the while loop.
Something like the following.
next = ''
while True:
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
{'query':SEARCH_TERM,
'fromDate':'200603220000',
'toDate':'201806020000',
'next':next})
if r.status_code != 200:
break
for item in r:
csvWriter.writerow([item['created_at'],item['user']['screen_name'], item['text'] if 'text' in item else item])
json = r.json()
if 'next' not in json:
break
next = json['next']
How can I get public content of all the users tagged in a specific picture, is it possible.
Use this API to get media:
https://api.instagram.com/v1/media/{media-id}?access_token=ACCESS-TOKEN
or
https://api.instagram.com/v1/media/shortcode/{short-code}?access_token=ACCESS-TOKEN
the JSON response will have users_in_photo which will have all the users tagged in the photo
https://www.instagram.com/developer/endpoints/media/
Since this features was developed after instagram guys had deprecated their official client, the only way to get it is using a maintained fork.
If you use python you will be able to use this one, it allows you to get the data you need.
Here a sample script:
#install last version of maintained fork
#sudo pip install --upgrade git+https://github.com/MabrianOfficial/python-instagram
from instagram.client import InstagramAPI
access_token = "YOUR-TOKEN"
api = InstagramAPI(access_token = access_token)
count = 33 #max count allowed
max_id = '' #the most recent posts
hashtag = 'cats' #sample hashtag
next_url = '' #first iteration
while True:
result, next_url = api.tag_recent_media(count, max_id, hashtag,with_next_url=next_url)
for m in result:
if m.users_in_photo:
for uip in m.users_in_photo:
print "user: {} -> {}".format(uip.user.username, uip.user.id)
print "position : ({},{})".format(uip.position.x, uip.position.y)