Python: Facebook Graph API - pagination request using facebook-sdk - python-3.x

I'm trying to query Facebook for different information, for example - friends list. And it works fine, but of course it only gives limited number of results. How do I access the next batch of results?
import facebook
import json
ACCESS_TOKEN = ''
def pp(o):
with open('facebook.txt', 'a') as f:
json.dump(o, f, indent=4)
g = facebook.GraphAPI(ACCESS_TOKEN)
pp(g.get_connections('me', 'friends'))
The result JSON does give me paging-cursors-before and after values - but where do I put it?

I'm exploring Facebook Graph API through the facepy library for Python (works on Python 3 too), but I think I can help.
TL-DR:
You need to append &after=YOUR_AFTER_CODE to the URL you've called (e.g: https://graph.facebook/v2.8/YOUR_FB_ID/friends/?fields=id,name), giving you a link like this one: https://graph.facebook/v2.8/YOUR_FB_ID/friends/?fields=id,name&after=YOUR_AFTER_CODE, that you should make a GET Request.
You'll need requests in order to make a get request for Graph API using your user ID (I'm assuming you know how to find it programatically) and some url similar to the one I give you below (see URL variable).
import facebook
import json
import requests
ACCESS_TOKEN = ''
YOUR_FB_ID=''
URL="https://graph.facebook.com/v2.8/{}/friends?access_token={}&fields=id,name&limit=50&after=".format(YOUR_FB_ID, ACCESS_TOKEN)
def pp(o):
all_friends = []
if ('data' in o):
for friend in o:
if ('next' in friend['paging']):
resp = request.get(friend['paging']['next'])
all_friends.append(resp.json())
elif ('after' in friend['paging']['cursors']):
new_url = URL + friend['paging']['cursors']['after']
resp = request.get(new_url)
all_friends.append(resp.json())
else:
print("Something went wrong")
# Do whatever you want with all_friends...
with open('facebook.txt', 'a') as f:
json.dump(o, f, indent=4)
g = facebook.GraphAPI(ACCESS_TOKEN)
pp(g.get_connections('me', 'friends'))
Hope this helps!

Related

How to get a download link which requires checkboxes checking in additional dialog box

I want to download the last publicly available file from https://sam.gov/data-services/Exclusions/Public%20V2?privacy=Public
while trying to download manually, the real download links look like:
https://falextracts.s3.amazonaws.com/Exclusions/Public%20V2/SAM_Exclusions_Public_Extract_V2_22150.ZIP?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T143743Z&X-Amz-SignedHeaders=host&X-Amz-Expires=2699&X-Amz-Credential=AKIAY3LPYEEXWOQWHCIY%2F20220530%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=3eca59f75a4e1f6aa59fc810da8f391f1ebfd8ca5a804d56b79c3eb9c4d82e32
My function gets only initial link, which refers to the real link:
import json
import requests
from operator import itemgetter
files_url = 'https://sam.gov/api/prod/fileextractservices/v1/api/listfiles?random=1653676394983&domain=Exclusions/Public%20V2&privacy=Public'
def get_file():
response = requests.get(files_url, stream=True)
links_resp = json.loads(response.text)
links_dicts = [d for d in links_resp['_embedded']['customS3ObjectSummaryList'] if d['displayKey'].count('SAM_Exclus')]
sorted_links = sorted(links_dicts, key=itemgetter('dateModified'), reverse=True)
return sorted_links[0]['_links']['self']['href']
get_file()
Result:
'https://s3.amazonaws.com/falextracts/Exclusions/Public V2/SAM_Exclusions_Public_Extract_V2_22150.ZIP'
But by following the above link, I get Access denied
So I will appreciate any hints on how to get real download links
I've edited your code as much as possible so you can understand. The requests library can convert it to json itself.
imports that are not at the beginning of the code do not look very good for reading...
import requests as req
from operator import itemgetter
files_url = "https://sam.gov/api/prod/fileextractservices/v1/api/listfiles?random=1653676394983&domain=Exclusions/Public%20V2&privacy=Public"
down_url = "https://sam.gov/api/prod/fileextractservices/v1/api/download/Exclusions/Public%20V2/{}?privacy=Public"
def get_file():
response = req.get(files_url, stream=True).json()
links_dicts = [d for d in response["_embedded"]["customS3ObjectSummaryList"]]
sorted_links = sorted(links_dicts, key=itemgetter('dateModified'), reverse=True)
key = sorted_links[0]['displayKey']
down = req.get(down_url.format(key))
if not down.status_code == 200:
return False
print(key)
open(key, 'wb').write(down.content)
return True
get_file()

python3, Trying to get an output from my function I defined, need some guidance

I found pretty cool ASN API tool that allows me to supply an AS # and it will go out and pull down the subnets that relate with that ASN.
Here is (rough) but partial code. I am defining a function ASNNUMBER (to which I will supply the number through another file)
When I call url here, it just gives me an n...
What I'm trying to do here, is append my str(ASNNUMBER) to the end of the ?q= parameter in the URL.
Once I do that, I'd like to display my results and output it to a file
import requests
def asnfinder(ASNNUMBER):
print('n\n######## Running ASNFinder ########\n')
url = 'https://api.hackertarget.com/aslookup?q=' + str(ASNNUMBER)
response = requests.get(url)
My results I'd like to get is an output of the get request I'm performing
## Running ASNFinder
n
Try to write something like that:
import requests
def asnfinder(ASNNUMBER):
print('n\n######## Running ASNFinder ########\n')
url = 'https://api.hackertarget.com/aslookup?q=' + str(ASNNUMBER)
response = requests.get(url)
data = response.text
print(data)
with open('filename', 'r') as f:
f.write(data)
It must works fine
P.S. If it helped ya, please make sure you mark this as the answer :)

How to download bulk amount of images from google or any website

actually, I need to do a project on machine learning. In that I want a lot of images for training. I searched for this problem, but I failed to do so.
can anyone help me to solve this. Thanks in advance.
I used google images to download images using selenium. It is just a basic approach.
from selenium import webdriver
import time
import urllib.request
import os
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome("path\\to\\the\\webdriverFile")
browser.get("https://www.google.com")
search = browser.find_element_by_name(‘q’)
search.send_keys(key_words,Keys.ENTER) # use required key_words to download images
elem = browser.find_element_by_link_text(‘Images’)
elem.get_attribute(‘href’)
elem.click()
value = 0
for i in range(20):
browser.execute_script(“scrollBy(“+ str(value) +”,+1000);”)
value += 1000
time.sleep(3)
elem1 = browser.find_element_by_id(‘islmp’)
sub = elem1.find_elements_by_tag_name(“img”)
try:
os.mkdir(‘downloads’)
except FileExistsError:
pass
count = 0
for i in sub:
src = i.get_attribute('src')
try:
if src != None:
src = str(src)
print(src)
count+=1
urllib.request.urlretrieve(src,
os.path.join('downloads','image'+str(count)+'.jpg'))
else:
raise TypeError
except TypeError:
print('fail')
if count == required_images_number: ## use number as required
break
check this for detailed explanation.
download driver here
My tip to you is: Use pictures API. This is my favourite: Bing Image Search API
Following text from Send search queries using the REST API and Python.
Running the quickstart
To get started, set subscription_key to a valid subscription key for the Bing API service.
Python
subscription_key = None
assert subscription_key
Next, verify that the search_url endpoint is correct. At this writing, only one endpoint is used for Bing search APIs. If you encounter authorization errors, double-check this value against the Bing search endpoint in your Azure dashboard.
Python
search_url = "https://api.cognitive.microsoft.com/bing/v7.0/images/search"
Set search_term to look for images of puppies.
Python
search_term = "puppies"
The following block uses the requests library in Python to call out to the Bing search APIs and return the results as a JSON object. Observe that we pass in the API key via the headers dictionary and the search term via the params dictionary. To see the full list of options that can be used to filter search results, refer to the REST API documentation.
Python
import requests
headers = {"Ocp-Apim-Subscription-Key" : subscription_key}
params = {"q": search_term, "license": "public", "imageType": "photo"}
response = requests.get(search_url, headers=headers, params=params)
response.raise_for_status()
search_results = response.json()
The search_results object contains the actual images along with rich metadata such as related items. For example, the following line of code can extract the thumbnail URLS for the first 16 results.
Python
thumbnail_urls = [img["thumbnailUrl"] for img in search_results["value"][:16]]
Then use the PIL library to download the thumbnail images and the matplotlib library to render them on a $4 \times 4$ grid.
Python
%matplotlib inline
import matplotlib.pyplot as plt
from PIL import Image
from io import BytesIO
f, axes = plt.subplots(4, 4)
for i in range(4):
for j in range(4):
image_data = requests.get(thumbnail_urls[i+4*j])
image_data.raise_for_status()
image = Image.open(BytesIO(image_data.content))
axes[i][j].imshow(image)
axes[i][j].axis("off")
plt.show()
Sample JSON response
Responses from the Bing Image Search API are returned as JSON. This sample response has been truncated to show a single result.
JSON
{
"_type":"Images",
"instrumentation":{
"_type":"ResponseInstrumentation"
},
"readLink":"images\/search?q=tropical ocean",
"webSearchUrl":"https:\/\/www.bing.com\/images\/search?q=tropical ocean&FORM=OIIARP",
"totalEstimatedMatches":842,
"nextOffset":47,
"value":[
{
"webSearchUrl":"https:\/\/www.bing.com\/images\/search?view=detailv2&FORM=OIIRPO&q=tropical+ocean&id=8607ACDACB243BDEA7E1EF78127DA931E680E3A5&simid=608027248313960152",
"name":"My Life in the Ocean | The greatest WordPress.com site in ...",
"thumbnailUrl":"https:\/\/tse3.mm.bing.net\/th?id=OIP.fmwSKKmKpmZtJiBDps1kLAHaEo&pid=Api",
"datePublished":"2017-11-03T08:51:00.0000000Z",
"contentUrl":"https:\/\/mylifeintheocean.files.wordpress.com\/2012\/11\/tropical-ocean-wallpaper-1920x12003.jpg",
"hostPageUrl":"https:\/\/mylifeintheocean.wordpress.com\/",
"contentSize":"897388 B",
"encodingFormat":"jpeg",
"hostPageDisplayUrl":"https:\/\/mylifeintheocean.wordpress.com",
"width":1920,
"height":1200,
"thumbnail":{
"width":474,
"height":296
},
"imageInsightsToken":"ccid_fmwSKKmK*mid_8607ACDACB243BDEA7E1EF78127DA931E680E3A5*simid_608027248313960152*thid_OIP.fmwSKKmKpmZtJiBDps1kLAHaEo",
"insightsMetadata":{
"recipeSourcesCount":0,
"bestRepresentativeQuery":{
"text":"Tropical Beaches Desktop Wallpaper",
"displayText":"Tropical Beaches Desktop Wallpaper",
"webSearchUrl":"https:\/\/www.bing.com\/images\/search?q=Tropical+Beaches+Desktop+Wallpaper&id=8607ACDACB243BDEA7E1EF78127DA931E680E3A5&FORM=IDBQDM"
},
"pagesIncludingCount":115,
"availableSizesCount":44
},
"imageId":"8607ACDACB243BDEA7E1EF78127DA931E680E3A5",
"accentColor":"0050B2"
}
}

How to retrieve all historical public tweets with Twitter Premium Search API in Sandbox version (using next token)

I want to download all historical tweets with certain hashtags and/or keywords for a research project. I got the Premium Twitter API for that. I'm using the amazing TwitterAPI to take care of auth and so on.
My problem now is that I'm not an expert developer and I have some issues understanding how the next token works, and how to get all the tweets in a csv.
What I want to achieve is to have all the tweets in one single csv, without having to manually change the dates of the fromDate and toDate values. Right now I don't know how to get the next token and how to use it to concatenate requests.
So far I got here:
from TwitterAPI import TwitterAPI
import csv
SEARCH_TERM = 'my-search-term-here'
PRODUCT = 'fullarchive'
LABEL = 'here-goes-my-dev-env'
api = TwitterAPI("consumer_key",
"consumer_secret",
"access_token_key",
"access_token_secret")
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
{'query':SEARCH_TERM,
'fromDate':'200603220000',
'toDate':'201806020000'
}
)
csvFile = open('2006-2018.csv', 'a')
csvWriter = csv.writer(csvFile)
for item in r:
csvWriter.writerow([item['created_at'],item['user']['screen_name'], item['text'] if 'text' in item else item])
I would be really thankful for any help!
Cheers!
First of all, TwitterAPI includes a helper class that will take care of this for you. TwitterPager works with many types of Twitter endpoints, not just Premium Search. Here is an example to get you started: https://github.com/geduldig/TwitterAPI/blob/master/examples/page_tweets.py
But to answer your question, the strategy you should take is to put the request you currently have inside a while loop. Then,
1. Each request will return a next field which you can get with r.json()['next'].
2. When you are done processing the current batch of tweets and ready for your next request, you would include the next parameter set to the value above.
3. Finally, eventually a request will not include a next in the the returned json. At that point break out of the while loop.
Something like the following.
next = ''
while True:
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
{'query':SEARCH_TERM,
'fromDate':'200603220000',
'toDate':'201806020000',
'next':next})
if r.status_code != 200:
break
for item in r:
csvWriter.writerow([item['created_at'],item['user']['screen_name'], item['text'] if 'text' in item else item])
json = r.json()
if 'next' not in json:
break
next = json['next']

Tweepy Search API Writing to File Error

Noob python user:
I've created file that extracts 10 tweets based on the api.search (not streaming api). I get a screen results, but cannot figure how to parse the output to save to csv. My error is TypeError: expected a character buffer object.
I have tried using .join(str(x) and get other errors.
My code is
import tweepy
import time
from tweepy import OAuthHandler
from tweepy import Cursor
#Consumer keys and access tokens, used for Twitter OAuth
consumer_key = ''
consumer_secret = ''
atoken = ''
asecret = ''
# The OAuth process that uses keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(atoken, asecret)
# Creates instance to execute requests to Twitter API
api = tweepy.API(auth)
MarSec = tweepy.Cursor(api.search, q='maritime security').items(10)
for tweet in MarSec:
print " "
print tweet.created_at, tweet.text, tweet.lang
saveFile = open('MarSec.csv', 'a')
saveFile.write(tweet)
saveFile.write('\n')
saveFile.close()
Any help would be appreciated. I've gotten my Streaming API to work, but am having difficulty with this one.
Thanks.
tweet is not a string or a character buffer. It's an object. Replace your line with saveFile.write(tweet.text) and you'll be good to go.
saveFile = open('MarSec.csv', 'a')
for tweet in MarSec:
print " "
print tweet.created_at, tweet.text, tweet.lang
saveFile.write("%s %s %s\n"%(tweet.created_at, tweet.lang, tweet.text))
saveFile.close()
I just thought I'd put up another version for those who might want to save all
the attributes of a tweepy.models.Status object, if you're not yet sure which attributes of each tweet you want to save to file.
import json
search_results = []
for status in tweepy.Cursor(api.search, q=search_text).items(5000):
search_results.append(status._json)
with open('search_results.json', 'w') as f:
json.dump(search_results, f)
The first block will store the search results into a list of dictionaries, and the second block will output all the tweets into a json file.
Please beware, this might use up a lot of memory if the size of your search results is very big.
This is Twitter's classic error code when something is wrong while sending a wrong image.
Try to find images you are trying to upload and check the format of the images.
The only thing I did was erase the images that MY media player of Windows can´t read and that's all! the script run perfectly.

Resources