So here's my deal.
I'm using node on the express framework. The website i'm working on grabs scraped data and stores it for each user on the website. That data can then be displayed on the users page whenever they want to access it, so the data will be scraped, put in a database or storage, whatever i decide the best way to do it is, and then pulled back out for the user.
I'm trying to figure out what the best database setup would be. There will potentially be large amounts of data per user, especially over long periods of time. I've read some stuff about using redis to cache some data like the user login info and that basic stuff, and then using mongodb for the big data. But I don't know, i'm new to database stuff so I am open to some new teachings and some ideas from the masters.
What would you guys suggest I do? I want it to be fast and be able to handle multiple queries at the same time, but really, I have no idea what i'm talking about, so please help me.
What would you guys suggest I do?
This really depends on the nature of your data, how you model your domain and how you want to persist it. I would first try to figure out the basic model and based on that choose the most suitable database system. Don't jump at quick conclusions around caching with redis when you don't even know if you will need it in the first place.
Suggestion might also depend on how much time you want to spend with database layer of your application. Some database systems provide more functionality than others depending on their concepts. If you are a beginner choose a single mainstream solution that is well documented with established community like MongoDB or MySQL that will cover all your needs from the beginning so that you won't end up managing multitude of systems.
Related
I have two separate cloud-based APIs that I am working on integrating together. Neither software directly talks to each other so I am creating something in the middle to get them to communicate. I have had trouble finding examples or documentation on how exactly to do this, does anyone know of any resources that could help me out?
My plan going in was to use a MERN Stack, running on a local server to do GET and POST requests to both APIs, use some mapping and logic to transpose the data into the correct format and send it to the other software. I do not have a client per se (other than myself) on my end, so I really will be skipping the React part of MERN, at least that is what I'm thinking. I'll be using Mongo to keep track of both sets of data for redundancy. I also considered using a LAMP Stack but felt that MERN would be faster in handling the data, and Mongo is more flexible in handling different data formats. If there is another process or technology that could help me that I'm not thinking of, I would be grateful to hear about it.
Has anyone encountered something like this before? Thank you.
As with most architecture questions, there's no completely right or wrong answer here. You could certainly design a well-built system to handle for this purpose with either stack; even more-so when you mention that your front-end framework is not an important consideration. Instead, ask yourself questions like this:
Which stack do you have more experience with, and is this an appropriate time to learn a new set of technologies, or is it important to do the best work you're capable of right now (how important is time, cost, or quality in this case)?
Another generalization I'll stick my neck out for is a data-first approach; what sort of data are you dealing with from each cloud integration, and what kind of data do you need to support and/or create in order to make your system work? Mongo, being a NoSQL persistence layer, will allow you to change your data model and handle more varied data in a quicker and easier manner than a SQL solution will. This is a double-edged sword, however, as lack of validation and a strongly-constrained (typed?) data model will make your application harder to work with and debug as it grows. In short - how big might this application grow?
If you have a handy and familiar way to manage the three different data models you're dealing with (cloud service 1, cloud service 2, and your app) via MySQL, then that's a compelling reason to use it. However, if your style is to start dumping data into your database and you're comfortable with a more iterative approach (which may require more, albeit shorter rounds of refactoring), then Mongo with MERN may be the preferable choice.
Finally, will others ever be working on this application? If so, which language would you prefer to be dealing with them upon - PHP or Javascript?
I want to know following things so that I can fix my server architecture and make it more flexible.
Is it good to store home feed data [ex: Facebook homeFeed] to the variable for future manipulation or just fetch data related to homeFeed and manipulate everything which needs to be done on run time.
Please note that data set of home feed can contain anything. [ not developed yet ]
Is there any limit to request to MongoDB at any given time which can create a delay in data processing?
Are node.js and MongoDB a good option for social network development?
If you know anything related to social network development then please share the pros and cons.
Is it good to store home feed data [ex: Facebook homeFeed] to the variable for future manipulation or just fetch data related to homeFeed and manipulate everything which needs to be done on run time.
You can (and sometimes should) pre-compute home feed data for certain users (for example those who are the most active). You don't store that in a variable though, you cache the results with something like Redis.
Generating the home feed on a "request" basis is also possible and good.
Both approaches require careful thinking about your system's architecture, performance, scalability, robustness, fault-tolerance, etc...
Is there any limit to request to MongoDB at any given time. which can create a delay in data processing?
Yes. A MongoDB instance (or any other database) has limited resources. Look at the Sharding and Replication docs of MongoDB for more info about how to work with MongoDb at scale.
Are node.js and MongoDB are a good option for social network development.
Node.js and MongoDb are a good combinations for quick prototyping, you can get productive fairly quickly. Any language(s) you are familiair with is/are a good choice here, since your focus seems to be on architecture. Go, Java and PHP are good candidates too.
In the real world social networks are built with a lot more tools than that. Since the teams use various programming languages, databases and frameworks depending on the task at hand.
I think it is important that I elaborate on where I am coming from so that you can understand my use case, please bear with me.
Background: I’m looking to migrate my app from CouchDB 1 to 2 and this migration is going to take a decent amount of work. I just want to double check that I’m not reinventing the wheel and make sure that there isn’t a better design to what I will elaborate on below, especially since CouchDB 2 appears to have some awesome new features.
Consider the following simplified use case for an app that allows students to submit quiz answers digitally. Each student should be able to submit her/his quiz answers and the teacher should be able to view all the answers. This design needs to work with PouchDB as PouchDB speaks directly to the DB and this saves us a lot of time as otherwise an elaborate set of APIs would need to be written.
My chosen design consists of one database per student and one database per teacher, i.e. a database per user. Only the owner of the database can edit her/his database and this is enforced via CouchDB roles. When a student submits an answer, it is synced with her/his database via PouchDB. The answers are then replicated to the teacher’s database. This in turn allows the students to quickly load their answers in the app and the teachers to load all the answers for all their students. Of course, there are views in the teacher databases that segment the answers by class, quiz, etc… so that the teacher doesn’t have to load the answers for all their students at once. If we didn’t have the teacher database then a teacher would need access to all the students’ databases and would have to sync with all of the their student’s databases.
At first glance, the _replicator database appears to be the the obvious way to replicate the data from the student databases to a single teacher database. The big gotcha is that when you use continuous replication, it consumes a file handle and a database connection which means that you can very quickly starve a database of its resources. For example, if we have say 10,000 students in our database then we need 10,000 concurrent file handles and database connections just for the replications. This is pretty crazy considering that it is unlikely that even say 100 of these 10,000 students would be using the app simultaneously.
Instead, I developed a service that listens to the _db_updates feed and then only replicates a database when there is a change to that specific database. With this method, we only worry about consuming resources when there are changes and as a result we end up with plenty of free file handles and database connections.
I’ve briefly experimented with CouchDB 2 and it appears that the _replicator database is just as greedy with resources as it was in CouchDB 1.
Is this database-per-user design for both students and teachers the best solution or is there a better solution? If it is the best solution, is there a better way of replicating this data that doesn’t consume as many resources?
I've open sourced my solution, called Spiegel, which provides the missing piece: scalable CouchDB replication and change listening. Spiegel is currently being used in production with a db-per-user design and is efficiently handling the replication of over 10,000 databases for Quizster.
What is the best/easiest way to store data offline? I have a website that I only run local (it's just for personal use) so I am not using any php or sql. I have a lot of posts containing a date, a time, a description the consist of a lot of text and a few of them contain an audio file (there are very few audio files so they may be stored separately from the rest). Now I want to make a website which can show these posts at request, but since I am not using either a server or a database I'm not sure how to store them. Use of any kind of framework or library is allowed, as long as I can use it without an internet connection.
Thanks.
EDIT: JSON is a good way to read data without a server-side language, but I don't know if it's possible to or how to write to a file without a server-side language. To summarize: I want a database (for both storing and accessing) without the need for a server.
Easy way without setting up a web or database server is to use JSON files imo. The syntax is very easy to learn!
Edit: I'd there is a better way to do this without dB setup / server side languages I'd like to hear it
I am writing a nodejs / express / mongodb based web application and all is working great.
What I've made for learning purposes is a twitter clone.
People can tweet whatever they want and it will show up on their profile and anyone who follows them stream.
On my home page it shows everyone who you are followings posts.
I want this stream to automatically update when someone they are following posts something new.
So they can just sit their on the home page and see all new posts come in in real time.
I've worked with Socket.IO in the past and loved it for it's awesome simplicity of use.
But is Socket.IO an appropriate use for this sort of situation?
Are their better options I can use, perhaps going with a simpler approach of AJAX polling would be more efficient for scalability?
So basically what is the best to use for an application like this?
I need:
Realtime updates to the client
Scalability and effeciency
Thanks!
You have multiple option
My firstoption RacerJS
and Socket.IO
Get data from mongodb and send through racerjs or socket.io
RacerJS is synchronization model built on top of ShareJS, which has underlying technique called Operational Transformation, this is used to do collaborative work on the same data in real-time (like Google docs) I guess it does not really apply to your case or its not the case where it should be used. Because ShareJS does a lot of work to keep clients able to edit at the same time on the same data. This is not the case with Activity streams like your case.
A good option would be Meteor