Twitter. Getting old tweets in home_timeline - search

Anybody help me please... How can i access to the tweets in the home_timeline by date? not by id, but by date?

Twitter's ID algorithm is based on Snowflake, of which the first 41 bits are the timestamp. You should be able to reverse the Snowflake algorithm and find a few IDs to use as search criteria.
https://github.com/twitter/snowflake
Do note that Twitter simply doesn't offer you old home_timeline responses. You can't go back a year, and probably not even more than a few days.

Related

How to better implement a more complex sorting strategy

I have an application with posts. Those posts are shown in the home view in descending order with the creation date.
I want to implement a more complex sorting strategy based on for example, posts which users have more posts, posts which have more likes, or views. Not complex, simple things. Everything picking random ones. Let's say I have the 100 posts more liked, I pick 10 of them.
To achieve this I don't want to do it in the same query, since I don't want to affect it's performance. I am using mongodb, and I need to use lookup which wouldn't be advisable to use in the most critical query of the app.
What would be the best approach to implement this?.
I thought doing all those calculations using for example AWS Lambda, or maybe triggers in mongo atlas, each 30 seconds and store the resultant information in database, which could be consumed by query.
That way each 30 seconds lets say the first 30 posts will be updated depending on the criteria.
I don't really know if this is a good approach or not. I need something not complex, but be able to "mix" all the post and show first the ones the comply with the criteria.
Thanks!

How to deal with the response time from a huge mongo db database

I have a mongoDb database with one of the collection having 2300000 documents and growing. Till the database had 1000000 documents the api response time was quick and the webpage loaded quickly, as soon as it crossed the 2000000 mark it started giving issues and took about a 100 seconds to find and throw the data. I dont know what to do with this sudden surge in the data, are there any practices that I have to follow inorder to manage and reduce the response time from the APIs
The data that im trying to fetch is based on date and the query has to run through the entire database inorder to find data for just one day.
I searched for a lot of things but am not able to find the solution.
[Not enough reputation to comment]
Index is probably the solution for you.
Can you provide example of both a typical document and the query you run?
Are you retrieving (or do you really need) the whole documents, or just some fields on them?
Typically i would suggest to create an index on your date field, with inverse order, it will surely improve your search if it concerns the more recent documents. I can help you to achieve it if you need.
This doc will help you to understand indexes and how to optimize queries.

Phonetic Algorithm to search Usernames

I got DynamoDB to store user profiles. The primary key here is an id. It is necessary that the key is an id.
A user profile contains information like his username, a set of friends,...
So now here is the first problem: user A wants to search user B by his name. I dont want to do a full DynamoDB scan each time this happens.
Since I already got a redis server I though I could just store username-id-pairs there.
So now the real problem: what do I search for?
For example my username could be Eric1996. A friend of mine doesnt remember the last digits so he just searches for Eric19.
Or maybe he just forgets the capital letter at the begining and searches for eric1996. In an other case he might misspell the name like erik1996, erick1996, erich1996.
I searched for that topic a bit and learend that there is something called Phonetic algorithms which search words by what they sound. That would fix the example above.
But would such algorithms work for other usernames as well? You now some users come up with really 3x0tic names or just use random letters. I know a guy who calls himselfe something like dadddddx__7 online.
I assume this is much harder than a spelling corrector since a user might have a name that is misspelled on purpose
Dynamodb or redis is an incorrect tool for your requirements.
I would recommend using dyanmodb or redis for your datastore, and use Solr or ElasticSearch ( or their AWS version Amazon CloudSearch, which provides both solr and elasticsearch)
You can store your user profiles in dynamodb, and store searchable fields in your search store ( you can even store full profiles in search store).
Then search functionalities like spelling errors, ranking friends based on some score are easy to implement.

Is the twissandra data model efficient one ?

help me please,
I am new in cassandra world, so i need some advice.
I am trying to make data model for cassandra DB.
In my project i have
- users which can follow each other,
- articles which can be related with many topics.
Each user can follow many topics.
So the goal is make the aggregated feed where user will get:
articles from all topics which he follow +
articles from all friends which he follow +
self articles.
I have searched about same tasks and found twissandra example project.
As i understood in that example we storing only ids of tweets in timeline, and when we need to get timeline we getting ids of tweets and then getting each tweet by id in separate non blocking request. After collecting all tweets we returning list of tweets to user.
So my question is: is it efficient ?
Making ~41 requests to DB for getting one page of tweets ?
And second question is about followers.
When someone creating tweet we getting all of his followers and putting tweet id to their timeline,
but what if user have thousands of followers ?
It means that for creating only one tweet we should write (1+followers_count) times to DB ?
twissandra is more a toy example. It will work for some workloads, but you possibly have more you need to partition the data more (break up huge rows).
Essentially though yes, it is fairly efficient - it can be made more so by including the content in the timeline, but depending on requirements that may be a bad idea (if need deleting/editing). The writes should be a non-issue, 20k writes/sec/node is reasonable providing you have adequate systems.
If I understand your use case correctly, you will probably be good with twissandra like schema, but be sure to test it with expected workloads. Keep in mind at a certain scale everything gets a little more complicated (ie if you expect millions of articles you will need further partitioning, see https://academy.datastax.com/demos/getting-started-time-series-data-modeling).

azure search. What if I have a lot of facets

in a commercial application it is not uncommun to have hundreds facets. Of course not all products are flaged with all of them.
But when searching I need to add a facet querystring parameter that list all the facets that I want to get back. As I don't know by advance the list of relevant one, I have to pass all of them in the query.
This is not practical we more than a few facets.
Is there a way to solve this issue or is it a limitation of the product?
The Azure Search doc:
https://msdn.microsoft.com/fr-fr/library/azure/dn798927.aspx
You are correct that this is a current limitation of Azure Search in that you need to pass all the facets in the query string. Please know that we are aware of this and in fact it can be an even bigger issue for customers where they have so many parameters or facets in their query string that it exceeds the max size of the url. For this reason, we are investigating what can be done about this to accommodate this.
I apologize that I do not yet have a date for when this is to be available other than to say it is on our short term roadmap.
Liam
It looks like Azure Search now supports both a GET and POST method, and recommends using POST when the length of the URL would exceed the max limit of 2048 characters (1024 for just the querystring).
https://learn.microsoft.com/en-us/rest/api/searchservice/search-documents

Resources