How to Design Searching Users and Friends using ElasticSearch? - search

In our app, we have users and users can have friends (think Facebook, relationship is bi-directional). We would like to be able to:
Have a site-wide search for users by name or username
Allow each user to search her friends by name or username
What would be the best approach to design this keeping in mind that:
A user can have up to 50k friends.
Users can change their names and usernames all the time

I am going to suggest another technology which I think will help you with this problem. You can check Neo4j (graph database) which will help you to make relations (user-friend) and traverse graph easily.
You can also use Lucene as an seperate Index engine with Neo4j and make full-text search. Check here.
Also, you can find an examples below which could be helpful.
Lucene Integration with Neo4j
Lucene Full Text Indexing with Neo4j
PS : I have no relationship with Neo4j.

Have documents like:
type:friendship
parties_name:[mark zuckerburg, bill gates]
parties_id:[1, 753634] (what if many people are named bill gates)
So there will be one such row for each friendship in your network, and when our particular mark zuckerburg updates his friendships (and name), all rows parties_id:1 must be reindexed.

Related

MongoDB, how to manage user related records

I'm currently trying to learn Node.js and Mongoodb by building the server side of a web application which should manage insurance documents for the insurance agent.
So let's say i'm the user, I sign in, then I start to add my customers and their insurances.
So I have 2 collection related, Customers and Insurances.
I have one more collection to store the users login data, let's call it Users.
I don't want the new users to see and modify the customers and the insurances of other users.
How can I "divide" every user related record, so that each user can work only with his data?
I figured out I can actually add to every record, the _id of the one user who created the record.
For example I login as myself, I got my Id "001", I could add one field with this value in every customer and insurance.
In that way I could filter every query with this code.
Would it be a good idea? In my opinion this filtering is a waste of processing power for mongoDB.
If someone has any idea of a solution, or even a link to an article about it, it would be helpful.
Thank you.
This is more a general permissions problem than just a MongoDB question. Also, without knowing more about your schemas it's hard to give specific advice.
However, here are some approaches:
1) Embed sub-documents
Since MongoDB is a document store allowing you to store arbitrary JSON-like objects, you could simply store the customers and licenses wholly inside each user object. That way querying for a user would return their customers and licenses as well.
2) Denormalise
Common practice for NoSQL databases is to denormalise related data (ie. duplicate the data). This might include embedding a sub-document that is a partial representation of your customers/licenses/whatever inside your user document. This has the similar benefit to the above solution in that it eliminates additional queries for sub-documents. It also has the same drawbacks of requiring more care to be taken for preserving data integrity.
3) Reference with foreign key
This is a more traditionally relational approach, and is basically what you're suggesting in your question. Depending on whether you want the reference to be bi-directional (both documents reference each other) or uni-directional (one document references the other) you can either store the user's ID as a foreign user_id field, or store an array of customer_ids and insurance_ids in the user document. In relational parlance this is sometimes described to as "has many" or "belongs to" (the user has many customers, the customer belongs to a user).

Phonetic Algorithm to search Usernames

I got DynamoDB to store user profiles. The primary key here is an id. It is necessary that the key is an id.
A user profile contains information like his username, a set of friends,...
So now here is the first problem: user A wants to search user B by his name. I dont want to do a full DynamoDB scan each time this happens.
Since I already got a redis server I though I could just store username-id-pairs there.
So now the real problem: what do I search for?
For example my username could be Eric1996. A friend of mine doesnt remember the last digits so he just searches for Eric19.
Or maybe he just forgets the capital letter at the begining and searches for eric1996. In an other case he might misspell the name like erik1996, erick1996, erich1996.
I searched for that topic a bit and learend that there is something called Phonetic algorithms which search words by what they sound. That would fix the example above.
But would such algorithms work for other usernames as well? You now some users come up with really 3x0tic names or just use random letters. I know a guy who calls himselfe something like dadddddx__7 online.
I assume this is much harder than a spelling corrector since a user might have a name that is misspelled on purpose
Dynamodb or redis is an incorrect tool for your requirements.
I would recommend using dyanmodb or redis for your datastore, and use Solr or ElasticSearch ( or their AWS version Amazon CloudSearch, which provides both solr and elasticsearch)
You can store your user profiles in dynamodb, and store searchable fields in your search store ( you can even store full profiles in search store).
Then search functionalities like spelling errors, ranking friends based on some score are easy to implement.

Storing information in a forum like web site

Suppose that we have a web site where each person has a profile and other people write comments to the persons profile. (like the wall in facebook). What is the best way to store the comments made for a person ? I was thinking like a relational database type of thing where there will be a field to hold all the comments for a person in the form of a long string separated with some kind of delimiter but I am not sure if this is the best way. Any ideas ?
You'll have two separate tables one for Users one for Comments, all the entries having their unique IDs, schema would go like:
Users (ID, name, mail, etc)
Comments (ID, for, from, time, content, etc)
Where for and from fields are User IDs.
postgresql, mysql, sqlite or even leveldb if you want simple key value store. There's a lot of tutorials out there to get started with any of them.
The problem with Relational databases is that they do not scale well to super massive social networking sites. When your table starts to get huge the queries will start to take more and more time. If your site is going to be pretty small then a relational database is fine. I think that you may want to investigate "NoSql" databases.
Start here:
http://nosql-database.org/

Is MongoDb suited for my application?

I'm building an application on node.js that has users and products in a many-to-many relationship (one user has many products and the same product might belong to multiple users). Each user has also location info.
Mostly I need to do a lot of writes on the user first visit (a few writes on the following visits) and then I need to match users that, for instance, have the most number of products in common and return that same products in common. I may also want to match users by location (or sort them by matching location)
I'm using postgres right now but I think I would be better of doing mongo in the long run. Problem is that I never worked on NOSQL DB (no fears ;) )
The question is, is the following "schema" suited for the described above?
[user]{
_id
name
age
[location]{
streep
town
country
}
}
[products]{
_id
name
color
[users]{
user_id_1
user_id_2
user_id_3
}
}
I think, because of the requirements, I'm better of this way than with embeding. Am I right? Do you think I should store the products_id in the user document?
Thanks!!
Your data seems quite relational to me. I would not see a great advantage for MongoDB or NoSQL solutions. They work well for document-based solutions that aren't relational.
I would get some data if you're having problems with scaling or performance. Don't assume a solution until you know what the root cause is. It could be node.js - who knows? Some people don't care much for it.

Full-text personalized search product

What full-text search technology is out there to support full-text personalized search?
For instance, contact search in your webmail provider of choice: it's full text but only searches your personal contacts and not the entire universe of contacts.
There are countless full-text search packages out there but I don't know how you could use most full-text search packages such that every user only sees a small subset of the universe of documents.
In the case of email, it's simple: use any popular search toolkit and build an index per user. It's simple because the indexes shouldn't overlap, or you'd be violating users' privacy. Also, overlap might skew figures like IDF. (You might be tempted to index emails sent to multiple users only once, but the security and privacy implications of that aren't worth it. Disk is cheap.)
If a common collection of documents should be indexed for personalized search, you're on your own, I'm afraid.
I would recommend build lucene index of all contacts with special fields like contact_list_id, usage_freuency. At time of search for each user add their specific params ie text:"John smith" AND contact_list_id:"$current_user_id" order by usege_freuency. In this case you will have optimized index all data compressed in one place and it is also personilized by field like usage_freuency or more robust rank. Think about index as DB with highly effective search by text.

Resources