I want a script code to collecting random tweet from Chicago without any keyword that every 30 min run automatically and collect tweet for 20 millisecond (for example)
All Available codes need keywords and in most of them I can't define geographic location.
Thanks for your helps.
See these pages : An Introduction to Text Mining using Twitter Streaming API and Python and this page too run a python script every hour
This is very doable. With Twitter's REST API a keyword is required; however, Twitter also provides a Streaming API which can use either a keyword or a location to filter tweets. In your case, you would need to define the bounding box of of Chicago in longitudes and latitudes. Then supply this to Twitter's statuses/filter endpoint documented here: https://developer.twitter.com/en/docs/tweets/filter-realtime/api-reference/post-statuses-filter.html. This endpoint has a locations parameter that you would use. It returns tweets as they are posted. No timer required.
You can use tweepy for this. Or, with TwitterAPI you would simply do something like this:
from TwitterAPI import TwitterAPI
api = TwitterAPI(CONSUMERKEY,CONSUMERSECRET,ACCESSTOKENKEY,ACCESSTOKENSECRET)
r = api.request('statuses/filter', {'locations':'-87.9,41.6,-87.5,42.0'})
for item in r:
print(item)
Related
I'm working with the Twitter API v2 with Tweepy, and I'm trying to get all of Elon Musk's tweets from the past year. client.get_users_tweets("44196397", end_time="2022-03-31T11:59:59Z", exclude=['retweets', 'replies'], start_time="2021-03-31T11:59:59Z"), and this code only gives me 10 results, and the limit parameter only lets me use 100 as the max, obviously Elon Musk has tweeted a lot more in the past year.
How do I get more? In the Tweepy documentation I saw the parameter of pagination, and pagination as its own function but don't quite understand it (I'm assuming pagination is going to be the solution).
I tried reading the documentation but didn't understand much. The only relevant code is there, please let me know if you need anything else.
Any help would be appreciated, thanks!
The tweepy.Paginator is very easy to use. Basically, you give it the method and the arguments that you want to use, and you then just have to iterate through it.
paginator = tweepy.Paginator(
client.get_users_tweets, # The method you want to use
"44196397", # Some argument for this method
end_time="2022-03-31T11:59:59Z", # Some argument for this method
exclude=['retweets', 'replies'], # Some argument for this method
start_time="2021-03-31T11:59:59Z", # Some argument for this method
max_results=100, # How many tweets per page
limit=5 # How many pages to retrieve
)
for page in paginator:
print(page)
print(page.data) # The tweets are here
print(page.meta) # The count etc. are here
print(page.includes) # The includes are here
If you only want the tweets, you can even ask Tweepy to flatten the pages:
paginator = tweepy.Paginator(
client.get_users_tweets, # The method you want to use
"44196397", # Some argument for this method
end_time="2022-03-31T11:59:59Z", # Some argument for this method
exclude=['retweets', 'replies'], # Some argument for this method
start_time="2021-03-31T11:59:59Z", # Some argument for this method
max_results=100 # How many tweets asked per request
)
for tweet in paginator.flatten(limit=250): # Total number of tweets to retrieve
print(tweet)
And if you want as much tweets as possible, you can remove the limit.
But in that case, don't forget the handle the rate limit (see Twitter doc):
paginator = tweepy.Paginator(
client.get_users_tweets, # The method you want to use
"44196397", # Some argument for this method
end_time="2022-03-31T11:59:59Z", # Some argument for this method
exclude=['retweets', 'replies'], # Some argument for this method
start_time="2021-03-31T11:59:59Z", # Some argument for this method
max_results=100 # How many tweets asked per request
)
try:
for tweet in paginator.flatten(): # Default to inf
print(tweet)
except tweepy.RateLimitError as exc:
print('Rate limit!')
A few more points:
For Twitter API V1 endpoints, you will have to use tweepy.Cursor instead.
Every limit that you set is a requested maximum, Twitter can send you less data.
I have seen many developpers say that they faced consistency errors when they tried to access Elon Musk data. I don't know why or even if this is still the case, but keep it in mind.
Is it possible to obtain the url from Google search result page, given the keyword? Actually, I have a csv file that contains a lot of companies name. And I want there website which shows up on the top of search result in google, when I upload that csv file it fetch the company name/keyword and put it on the search field.
For eg: - stack overflow, this is one of the entry in my csv file and it should be fetched and put in the search field, and it should return the best match/first url from search result. Eg: - www.stackoverflow.com
And this returned result should be stored in the same file which I have uploaded and next to the keyword for it searched.
I am not aware much about these concepts, so any help will be very appreciated.
Thanks!
google package has one dependency on beautifulsoup which need to be installed first.
then install :
pip install google
search(query, tld='com', lang='en', num=10, start=0, stop=None, pause=2.0)
query : query string that we want to search for.
tld : tld stands for top level domain which means we want to search our result on google.com or google.in or some other domain.
lang : lang stands for language.
num : Number of results we want.
start : First result to retrieve.
stop : Last result to retrieve. Use None to keep searching forever.
pause : Lapse to wait between HTTP requests. Lapse too short may cause Google to block your IP. Keeping significant lapse will make your program slow but its safe and better option.
Return : Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever.
Below code is the solution for your question.
import pandas
from googlesearch import search
df = pandas.read_csv('test.csv')
result = []
for i in range(len(df['keys'])):
for j in search(df['keys'][i], tld="com", num=10, stop=1, pause=2):
result.append(j)
dict1 = {'keys': df['keys'], 'url': result}
df = pandas.DataFrame(dict1)
df.to_csv('test.csv')
Sample input format file image:
Output File Image:
I recently learned that Companies House has API that allows access to companies filling history and I want to get data from the API and load it in pandas dataframe.
I have set up API account but I am having difficulties with the python wrapper companies-house 0.1.2 https://pypi.org/project/companies-house/
from companies_house.api import CompaniesHouseAPI
ch = CompaniesHouseAPI('my_api_key')
This works, but when I try to get the data with get_company or get_company_filing_history I seem to pass incorrect parameters. I tried passing CompaniesHouseAPI.get_company('02627406') but get KeyError: 'company_number'. Quite puzzled as there is no example provided in the documentation. Please help me figure out what should I pass as a parameter/parameters in both functions.
# what errors
CompaniesHouseAPI.get_company('02627406')
I am not a python expert but want to learn by doing interesting projects. Please help. If you know how to get financial history from Companies House API using another python wrapper your solution is also welcome.
I recently wrote a blog post describing how to make your own wrapper and then use that to create an application that loads the data into a pandas dataframe as you described. You can find it here.
By creating your own wrapper class, you avoid the limitations of whichever library you have chosen. You may also learn a lot about calling an API from python and working with the response.
Here is a code example that does not need a Companies House-specific library.
import requests
import json
url = "https://api.companieshouse.gov.uk/search/companies?q={}"
query = "tesco"
api_key = "vLmk-4YxYS-QH8nMi8767zJSlcPlo3MKn41-d" #Fake key - insert your key here
response = requests.get(url.format(query),auth=(api_key,''))
json_search_result = response.text
search_result = json.JSONDecoder().decode(json_search_result)
for company in search_result['items']:
print(company['title'])
Running this should give you the top 20 matches for the keyword "tesco" from the Companies House search function. Check out the blog post to see how you could adapt this to perform any function from the API.
I have an entity extraction tasks which needs KBs like wikidata, freebase, DBpedia. Given the huge size of them, it is hard to download and extract entities from them. Is there a python client which can make API calls to get the extractions through them with unstructured text as input?
For DBPedia at least, you can use DBPedia Spotlight, something like that:
spotlight_url = 'http://api.dbpedia-spotlight.org/en/annotate?'
params = dict(text="Barack Obama was a president", confidence='0.2', support='10')
headers = {'Accept':'application/json'}
resp = requests_retry_session().get(url=spotlight_url, params=params,headers=headers)
results = resp.json()
If you were to do loads of queries, you'd have a local install of the knowledge base in a triplestore and a local install of Spotlight too.
I want to know were Quantopian gets data from?
If I want to do an analysis on a stock market other than NYSE, will I get the data? If not, can I manually upload the data so that I can run my algorithms on it.
1.) Quantopian gets its data from several places, and provides most online although some are premium and require subscription.
2.) Yes, you can get standard stock market data, but if you have something like a Bloomberg, other subscription or something else you've built and want to pull it in, you can use fetcher.
The basic code is:
fetch_csv(url, pre_func=None, post_func=None, date_column='date',
date_format='%m/%d/%y', timezone='UTC', symbol=None, **kwargs)
Here is an example for something like Dropbox:
def initialize(context):
# fetch data from a CSV file somewhere on the web.
# Note that one of the columns must be named 'symbol' for
# the data to be matched to the stock symbol
fetch_csv('https://dl.dropboxusercontent.com/u/169032081/fetcher_sample_file.csv',
date_column = 'Settlement Date',
date_format = '%m/%d/%y')
context.stock = symbol('NFLX')
def handle_data(context, data):
record(Short_Interest = data.current(context.stock, 'Days To Cover'))
You can get data for non-NYSE stocks as well like Nasdaq securities. Screens are also available by fundamentals(market, exchange, market cap). These screens can limit stocks analyzed from the broad universe.
You can get stock data from Yahoo or other quant sites.