How would one utilise Twitter's Streaming API to retrieve Tweets from a specified country? - search

My goal is to retrieve all Tweets, or as significant a proportion as I can, that originate from certain small countries such as Ireland,New Zealand,Lithuania etc.
Twitter's Search API allows the searching of statuses in a radius around a given lat/long. Twitter decides which results to return by the geotag data included in a user's profile, or by reverse-geocoding the user's Hometown location in their profile.
The public status stream in the Streaming API can be filtered by geobox, but Twitter does not perform reverse-geocoding when returning these results. Research so far has indicated that not very many people at all in these countries use geo-tagging.
Obviously, if I had access to the firehose stream, then using the Streaming API would be the way to go, as I could perform the reverse-geocoding myself. At the default access level, however, the random sample stream does not provide enough relevant users to me.
What, then, if any benefit could I get from using the Streaming API? Should I simply stick to the Search API, as I am unlikely to get any unique data from the Streaming API?

You can pass locations parameter to specify the area of geotagged tweets.
http://dev.twitter.com/pages/streaming_api_methods#locations
With Twitter4J, use FilterQuery#locations().
http://twitter4j.org/en/javadoc/twitter4j/FilterQuery.html#locations(double[][])

Related

Using web scraping to recive all Twitter followers

I want to create giveaways which require the participants to follow the twitter account of the giveaway creator.
My first idea was to use the Twitter API (endpoint: "/2/users/:id/followers"). This works fine for me however I always run into rating limits. The API allows me to send 15 requests every 15 minutes and returns a maximum of 1000 users per request. Since many accounts have more then 15000 followers and since many request happen at the same time (many users want to participate in a giveaway) this solution is not suitable for me.
My secound idea was to use web scraping instead (e.g Node Fetch). I was following along this tutoria: However doing so I always run into the issue that Twitter uses random strings to name their html elements. You can see in the picture there is no defined class to grap the elements.
So my main question is how can I access these element ?
Random Follower of my Twitter Account
I also have a follow up question regarding the effictivness of this method. Assuming I have multiple people who want to particpate in a short amount of time (e.g 10 people in 5 minutes) and they all need to follow a big twitter account (e.g 100k followers).
Is it efficent to scrape all 100k followers each time or should I instead try to fetch the 100k followers once, safe them to my database and use my database to check for each user later ?
As a side note, I am using node.js and node-fetch, however I have no problems to switch the framework. In addition I think the grabbing of the element as well as the performance should be universal.
Thanks for your help :)
They're going to detect your servers excessive calls. There is a Twitter Developer Portal where you can request elevated access which may raise the limits for you.
https://developer.twitter.com

Track multiple context for the same Bot

We have a bot that will be used by different customers and depending on their database, sector of activity we're gonna have different answers from the bot and inputs from users. Intents etc will be the same for now we don't plan to make a custom bot for each customer.
What would be the best way to separate data per customer within Chatbase?
I'm not sure if we should use
A new API key for each customer (Do we have a limitation then?)
Differentiate them by the platform filter (seems to not be appropriated)
Differentiate them by the version filter (same it would feel a bit weird to me)
Using Custom Event, not sure how though
Example, in Dialogflow we pass the customer name/id as a context parameter.
Thank you for your question. You listed the two workarounds I would suggest, I will detail the pros/cons:
New API Key for each customer: Could become unwieldy to have to change bots everytime you want to look at a different users' metrics. You should also create a general api (bot) where you send all messages in order to get the aggregate metrics. This would mean making two api calls per message.
Differentiate by version filter: This would be the preferred method, however it could lengthen load times for your reports as your number of users grows. The advantage would be that all of your metrics are in one place, and they will be aggregated while only having to send one api call per message.

Instagram API receiving unrelated media for specified hashtag

I'm currently making requests to the Instagram API. Specifically, the endpoint https://api.instagram.com/v1/tags/{tag-name}/media/recent?access_token=ACCESS-TOKEN (specified here).
Using the example tag doggo, I'm receiving mostly responses whose JSON tags contain doggo; however, I also receive media with empty tags entries:
Would anybody happen to know why this sort of situation might occur? The range of untagged media that appears to slip in varies wildly in relevance to the specified tag, so I'm curious if this is a bug or some sort of Instagram algorithm to infer a media item's relevance to the specified tag.
Thanks!

How to grab instagram users based on a hashtag?

is there a way to grab instagram users based on a specific hashtag ?
I run contests based on re posting photos with specified hashtag then randomly pick a winner, i need a tool that can grab the usernames of those who reposted that photo and used that hashtag.
You can query instagram using the API. There are official clients for both python and ruby.
You didn't specify what language/platform you are using, so I'll give you the generic approach.
Query instagram using the Tag Recent Media endpoint.
In the response, you will receive a user object that has the user's username, id, profile url, and so on. This should be enough to do what you are describing.
As far as tools, there aren't great options to probably do things exactly how you want. If you just want a simple contest, you could use statigram, but it's not free.
If you roll your own solution, I highly recommend you also do the following:
Implement a rate limiting mechanism such as a task queue so you don't exceed your API calls (5000 per hour for most calls). Also useful for failures/network hicups, etc.
Have users authenticate so you can use OAuth to extend your API calls to 5000/per user/hour to get around #1.
Try the subscribe API if there won't be many items. You can subscribe to a specific tag as well, and you will get a change notification. At that point though you need to retrieve the actual media item(s), and this can cost a lot of API calls depending on how frequent and what volume these changes occur.
If your users don't have much photos/relatively small/known in advance, you can actually query the user's recent media instead and filter in your own code by hash tag.

What sort of technologies were used to build NoHomophobes.com (real-time keyword tracker for Twitter)?

Do you think that they plugged directly into the Twitter API, or do they have some sort of backend which is what connects to the Twitter API directly instead? I didn't realize this kind of functionality was available to standard users.
Link: NoHomophobes.com
This site has a (short) piece about the technology used - it does seem like they're using the standard, public API:
"Using Twitter's API, tweets [...] were pulled, tracked and displayed
in real time
[...]
We couldn't simply pull every tweet ... A lot of research and testing
was conducted to determine which words and phrases to capture, as well
as what parameters the tweets had to follow in order to be funneled
onto the site"
Also, the site's own T&C's mention
This website contains a licensed real time display of Tweets
At a guess, they're effectively continually searching for certain terms in public tweets (as any Twitter client can do) and displaying the results.
Basically, the site uses the Twitter Streaming APIs which allow a persistent connection with Twitter. And as filtered tweets come through, it processes the data and delivers it to website users through web sockets via a 3rd party service called Pusher.

Resources