How to extend the address space into your SQL database with bidirectional mirroring, which immediately reflects any value change in the variable or database end in the opposite end.
So if I have a table in Database, whose values can be changed from outside(for-example data could be added, deleted or updated), how would my node-opcua server would be notified?
In OPC UA, any server which is created will follow SoA architecture. Meaning server will process request only when some service request.
In your case, you can achieve that with the help of Subscribing for Data Change and Monitoring the node which exposes the table in your data base to client. Subscribing for data change will be possible only when that node is exposed to client.
Once node is subscribed for data change, there are 2 values which is needed by server from client.
Sampling interval: how frequently server should refresh data from source
Publishing interval: how frequently client is going to ask for notification from server.
Lets say for example Sampling interval is 100 milliseconds and Publishing interval is 1 minute. Meaning Server has to collect the samples from the source (in your case it could be data base) at every 100 milliseconds, But Client will request for all those collected samples every 1 minute.
In you will be able to achieve updating the server with the changed values for table in database.
If SDK supports multi threading, then there is another way to achieve what is mentioned in question.
In server application, let the data source (i.e. data base) object be running in one thread.
Create a callback to server application layer and intialise data source object with this callback.
When data changes in data base, trigger a call to data source thread from data base. and if it is the required data and need to be informed to server, call the callback function which is initialized earlier.
I hope this answered your question.
Related
I am re-designing a project I built a year ago when I was just starting to learn how to code. I used MEAN stack, back then and want to convert it to a PERN stack now. My AWS knowledge has also grown a bit and I'd like to expand on these new skills.
The application receives real-time data from an api which I clean up to write to a database as well as broadcast that data to connected clients.
To better conceptualize this question I will refer to the following items:
api-m1 : this receives the incoming data and passes it to my schema I then send it to my socket-server.
socket-server: handles the WSS connection to the application's front-end clients. It also will write this data to a postgres database which it gets from Scraper and api-m1. I would like to turn this into clusters eventually as I am using nodejs and will incorporate Redis. Then I will run it behind an ALB using sticky-sessions etc.. for multiple EC2 instances.
RDS: postgres table which socket-server writes incoming scraper and api-m1 data to. RDS is used to fetch the most recent data stored along with user profile config data. NOTE: RDS main data table will have max 120-150 UID records with 6-7 columns
To help better visualize this see img below.
From a database perspective, what would be the quickest way to write my data to RDS.
Assuming we have during peak times 20-40 records/s from the api-m1 + another 20-40 records/s from the scraper? After each day I tear down the database using a lambda function and start again (as the data is only temporary and does not need to be saved for any prolonged period of time).
1.Should I INSERT each record using a SERIAL id, then from the frontend fetch the most recent rows based off of the uid?
2.a Should I UPDATE each UID so i'd have a fixed N rows of data which I just search and update? (I can see this bottlenecking with my Postgres client.
2.b Still use UPDATE but do BATCHED updates (what issues will I run into if I make multiple clusters i.e will I run into concurrency problems where table record XYZ will have an older value overwrite a more recent value because i'm using BATCH UPDATE with Node Clusters?
My concern is UPDATES are slower than INSERTS and I don't want to make it as fast as possible. This section of the application isn't CPU heavy, and the rt-data isn't that intensive.
To make my comments an answer:
You don't seem to need SQL semantics for anything here, so I'd just toss RDS and use e.g. Redis (or DynamoDB, I guess) for that data store.
I have an API which allows other microservices to call on to check whether a particular product exists in the inventory. The API takes in only one parameter which is the ID of the product.
The API is served through API Gateway in Lambda and it simply queries against a Postgres RDS to check for the product ID. If it finds the product, it returns the information about the product in the response. If it doesn't, it just returns an empty response. The SQL is basically this:
SELECT * FROM inventory where expired = false and product_id = request.productId;
However, the problem is that many services are calling this particular API very heavily to check the existence of products. Not only that, the calls often come in bursts. I assume those services loop through a list of product IDs and check for their existence individually, hence the burst.
The number of concurrent calls on the API has resulted in it making many queries to the database. The rate can burst beyond 30 queries per sec and there can be a few hundred thousands of requests to fulfil. The queries are mostly the same, except for the product ID in the where clause. The column has been indexed and it takes an average of only 5-8ms to complete. Still, the connection to the database occasionally time out when the rate gets too high.
I'm using Sequelize as my ORM and the error I get when it time out is SequelizeConnectionAcquireTimeoutError. There is a good chance that the burst rate was too high and it max'ed out the pool too.
Some options I have considered:
Using a cache layer. But I have noticed that, most
of the time, 90% of the product IDs in the requests are not repeated.
This would mean that 90% of the time, it would be a cache miss and it
will still query against the database.
Auto scale up the database. But because the calls are bursty and I don't
know when they may come, the autoscaling won't complete in time to
avoid the time out. Moreover, the query is a very simple select statement and the CPU of the RDS instance hardly crosses 80% during the bursts. So I doubt scaling it would do much too.
What other techniques can I do to avoid the database from being hit hard when the API is getting burst calls which are mostly unique and difficult to cache?
Use cache in the boot time
You can load all necessary columns into an in-memory data storage (redis). Every update in database (cron job) will affect cached data.
Problems: memory overhead of updating cache
Limit db calls
Create a buffer for ids. Store n ids and then make one query for all of them. Or empty the buffer every m seconds!
Problems: client response time extra process for query result
Change your database
Use NoSql database for these data. According to this article and this one, I think choosing NoSql database is a better idea.
Problems: multiple data stores
Start with a covering index to handle your query. You might create an index like this for your table:
CREATE INDEX inv_lkup ON inventory (product_id, expired) INCLUDE (col, col, col);
Mention all the columns in your SELECT in the index, either in the main list of indexed columns or in the INCLUDE clause. Then the DBMS can satisfy your query completely from the index. It's faster.
You could start using AWS lambda throttling to handle this problem. But, for that to work the consumers of your API will need to retry when they get 429 responses. That might be super-inconvenient.
Sorry to say, you may need to stop using lambda. Ordinary web servers have good stuff in them to manage burst workload.
They have an incoming connection (TCP/IP listen) queue. Each new request coming in lands in that queue, where it waits until the server software accept the connection. When the server is busy requests wait in that queue. When there's a high load the requests wait for a bit longer in that queue. In nodejs's case, if you use clustering there's just one of these incoming connection queues, and all the processes in the cluster use it.
The server software you run (to handle your API) has a pool of connections to your DBMS. That pool has a maximum number of connections it it. As your server software handles each request, it awaits a connection from the pool. If no connection is immediately available the request-handling pauses until one is available, then handles it. This too smooths out the requests to the DBMS. (Be aware that each process in a nodejs cluster has its own pool.)
Paradoxically, a smaller DBMS connection pool can improve overall performance, by avoiding too many concurrent SELECTs (or other queries) on the DBMS.
This kind of server configuration can be scaled out: a load balancer will do. So will a server with more cores and more nodejs cluster processes. An elastic load balancer can also add new server VMs when necessary.
I'm building a website that some users will enter and after a specific amount of time an algorithm has to run in order to take the input of the users that is stored in the database and create some results for them storing the results also in the database. The problem is that in nodejs i cant figure out where and how should i implement this algorithm in order to run after a specific amount of time and only once(every few minutes or seconds).
The app is builded in nodejs-expressjs.
For example lets say that i start the application and after 3 minutes the algorithm should run and take some data from the database and after the algorithm has created some output stores it in database again.
What are the typical solutions for that (at least one is enough). thank you!
Let say you have a user request that saves url to crawl and get listed products
So one of the simplest ways would be to:
On user requests create in DB "tasks" table
userId | urlToCrawl | dateAdded | isProcessing | ....
Then in node main site you have some setInterval(findAndProcessNewTasks, 60000)
so it will get all tasks that are not currently in work (where isProcessing is false)
every 1 min or whatever interval you need
findAndProcessNewTasks
will query db and run your algorithm for every record that is not processed yet
also it will set isProcessing to true
eventually once algorithm is finished it will remove the record from tasks (or mark some another field like "finished" as true)
Depending on load and number of tasks it may make sense to process your algorithm in another node app
Typically you would have a message bus (Kafka, rabbitmq etc.) with main app just sending events and worker node.js apps doing actual job and inserting products into db
this would make main app lightweight and allow scaling worker apps
From your question it's not clear whether you want to run the algorithm on the web server (perhaps processing input from multiple users) or on the client (processing the input from a particular user).
If the former, then use setTimeout(), or something similar, in your main javascript file that creates the web server listener. Your server can then be handling inputs from users (via the app listener) and in parallel running algorithms that look at the database.
If the latter, then use setTimeout(), or something similar, in the javascript code that is being loaded into the user's browser.
You may actually need some combination of the above: code running on the server to periodically do some processing on a central database, and code running in each user's browser to periodically refresh the user's display with new data pulled down from the server.
You might also want to implement a websocket and json rpc interface between the client and the server. Then, rather than having the client "poll" the server for the results of your algorithm, you can have the client listen for events arriving on the websocket.
Hope that helps!
If I understand you correctly - I would just send the data to the client-side while rendering the page and store it into some hidden tag (like input type="hidden"). Then I would run a script on the server-side with setTimeout to display the data to the client.
I am writing an application whereby some external module/component is updating a SQLite database with new data every few hundred milliseconds or so, and my job is to write an application that queries that data and broadcasts it over sockets every few hundred milliseconds as well.
So currently I'm doing something like this with node, express, and socket.io:
timer = setInterval(function() {
db.all('SELECT * FROM cache', function(err, rows) {
io.emit('data', rows);
});
},
400
);
But I feel like there should be a more direct approach to this, whereby I can maintain a socket connection directly to the database, and listen for changes "live", rather than having to do blind queries (even if the data may not have changed), and emit.
Maybe this is not supported by SQLite (which is fine, I think I have some flexibility in the storage system I'm using), but is what I'm asking at all possible?
Note that I don't have control over the database updating process, so I can't just emit the data I'm about to store in the database. That whole process is a black box C program and I ONLY have access to the database itself.
What you're looking for is commonly called pub/sub (short for publish and subscribe). Clients waiting for data connect to a server and subscribe to the sort of events they want to receive. The data originators also connect to this server and publish events. The RPC with events that Socket.IO gives you are really similar to this. The clients have set up handlers for certain types of events, and the server fires these events with the appropriate data.
The problem is, pub/sub isn't typically implemented in a database. (Redis is an exception.) SQLite certainly has no capability for this. Since you can't modify the original application and only have access to the file database, there is nothing you can do. What you need is to effectively make your server an adapter from polling the database to broadcasting messages.
I do see a problem though with your setup. The first is that you are querying the database every 400 milliseconds. Don't do that. What if your query takes 500 milliseconds? Now you have a second query piling up. What if those two queries are now slow because they are both attempting to run at the same time? Now you have 3, 4, 5, and then 100 queries piling up. Don't schedule your next query to run until one is done. Check out an implementation of throttle for this.
The next problem is that you are blindly sending out all of the results to the client every time. I don't know what your application does, but I'm guessing that there is a chance for overlap from the previous query. Does your database have columns with timestamps? You could modify your query to use them. Or, modify your application to filter them.
I'm building a REST web service that receives a request and must return "Ok" if the operation was done correctly. How could I deal with the possibility of the loose of the connection while returning this "Ok" message?
For example, a system like Amazon SimpleDB.
1) It receives a request.
2) Process the request (store and replicates the content).
3) Return a confirmation message.
If the connection was lost between phases 2 and 3, the client thinks the operation was not successful then submits again.
Thanks!
A system I reviewed earlier this year had a process similar to this. The solution they implemented was to have the client reply to the commit message, and clear a flag on the record at that point. There was a periodic process that checked every N minutes, and if an entry existed that was completed, but that the client hadn't acknowledged, that transaction was rolled back. This allowed a client to repost the transaction, but not have 2 'real' records committed on the server side.
In the event of the timeout scenario, you could do the following:
Send a client generated unique id with the initial request in a header.
If the client doesn't get a response, then it can resend the request with the same id.
The server can keep a list of ids successfully processed and return an OK, rather than repeating the action.
The only issue with this is that the server will need to eventually remove the client ids. So there would need to be a time window for the server to keep the ids before purging them.
Depends on the type of web service. The whole nature of HTTP and REST is that it's basically stateless.
e.g. In the SimpleDB case, if you're simply requesting a value for a given key. If in the process of returning it the client connection is dropped then the client can simply re-request the data at a later time. That data is likely to have been cached by the db engine or the operating system disk cache anyway.
If you're storing or updating a value and the data is identical then quite often the database engines know the data hasn't changed and so the update won't take very long at all.
Even complex queries can run quicker the second time on some database engines.
In short, I wouldn't worry about it unless you can prove there is a performance problem. In which case, start caching the results of some recent queries yourself. Some REST based frameworks will do this for you. I suspect you won't even find it to be an issue in practice though.