duplicate data insert on multi containers that runs the same service - node.js

I have a problem with service replica that process the same data .
my code get data from a socket and then insert that data to db. the problem is that if I have 2 containers of the same service (and I have more then 2) they both insert the same and I get duplicated data on my db.
Is there a way to tell only one of them to do the insert ?
I'm using docker and kubernetes but I'm still new with them
function dataStream(data) { // get the data from the socket
const payload = formatPayload(lines);
addToDb(payload); // I want this to happen only from 1 service
broadcast(payload)
}

Related

Timed out fetching a new connection from the connection pool prisma

I have a nestjs scheduler which will run every one hour
I'm using multiple library to connect to postgres database through nestjs app
prisma
Knex
I have scheduler table that will have url to run on what datetime
& a rule table that will have tablename, columnname, logicaloperator(i.e >,<,=,!=) & conditional operator(AND, OR)
knex will create a query that is stored in database
for(const t of schedules) {
//this wont stop and will make call simultanously to url
fetch("url").catch()
}
the url will insert records it will take 1, 2, 3 hrs depending on the url
but after certain time
i'm getting Timed out fetching a new connection from the connection pool prisma error
is it because i'm using multiple client to connect database?
You can configure the connection_limit and pool_timeout parameters while passing them in the connection string. You can set the connection_limit to 1 to make sure that prisma doesn't initiate new database connections, this way you won't get timeout errors.
Increasing the pool timeout would give the query engine more time to process queries in the queue.
Reference for connection_limit and pool_timeout parameters: Reference.

How to lock the azure sql database?

I am running an application that will connect an azure SQL database. I am running into a problem that the application will connect to the database in multiple threads, thus causing race conditions. I want to run them sequentially.
I can't change the application, is there a way to set the database to allow a single connection only, if it doesn't then wait until it connects?
Single user mode is not available on Azure SQL Database.
However, maybe you should consider adding a retry logic on the application so it can retry X times and the interval between retries can be increased after a retry fails.
public void HandleTransients()
{
var connStr = "some database";
var _policy = RetryPolicy.Create < SqlAzureTransientErrorDetectionStrategy(
retryCount: 3,
retryInterval: TimeSpan.FromSeconds(5));
using (var conn = new ReliableSqlConnection(connStr, _policy))
{
// Do SQL stuff here.
}
}

NodeJS/Express: ECONNRESET when doing multiples requests using Sequelize/Epilogue

I'm building a webapp using the following the architecture:
a postgresql database (called DB),
a NodeJS service (called DBService) using Sequelize to manipulate the DB and Epilogue to expose a REST interface via Express,
a NodeJS service called Backend serving as a backend and using DBService threw REST calls
an AngularJS website called Frontend using Backend
Here are the version I'm using:
PostgreSQL 9.3
Sequelize 2.0.4
Epilogue 0.5.2
Express 4.13.3
My DB schema is quite complex containing 36 tables and some of them contains few hundreds of records. The DB is not meant to write data very often, but mostly to read them.
But recently I created a script in Backend to make a complete check up of datas contained inside the DB: basically this script retrieve all datas of all tables and do some basic checks on datas. Currently the script only does reading on database.
In order to achieve my script I had to remove the pagination limit of Epilogue by using the option pagination: false (see https://github.com/dchester/epilogue#pagination).
But now when I launch my script I randomly obtained that kind of error:
The request failed when trying to retrieve a uniquely associated objects with URL:http://localhost:3000/CallTypes/178/RendererThemes.
Code : -1
Message : Error: connect ECONNRESET 127.0.0.1:3000
The error randomly appears during the script execution: then it's not always this URL which is returned, and even not always the same tables or relations. The error message before code is a custom message returned by Backend.
The URL is a reference to the DBService but I don't see any error in it, even using logging: console.log in Sequelize and DEBUG=express:* to see what happens in Express.
I tried to put some setTimeout in my Backend script to slow it, without real change. I also tried to manipulate different values like PostgreSQL max_connections limit (I set the limit to 1000 connections), or Sequelize maxConcurrentQueries and pool values, but without success yet.
I did not find where I can customize the pool connection of Express, maybe it should do the trick.
I assume that the error comes from DBService, from the Express configuration or somewhere in the configuration of the DB (either in Sequelize/Epilogue or even in the postgreSQL server itself), but as I did not see any error in any log I'm not sure.
Any idea to help me solve it?
EDIT
After further investigation I may have found the answer which is very similar to How to avoid a NodeJS ECONNRESET error?
: I'm using my own object RestClient to do my http request and this object was built as a singleton with this method:
var NodeRestClient : any = require('node-rest-client').Client;
...
static getClient() {
if(RestClient.client == null) {
RestClient.client = new NodeRestClient();
}
return RestClient.client;
}
Then I was always using the same object to do all my requests and when the process was too fast, it created collisions... So I just removed the test if(RestClient.client == null) and for now it seems to work.
If there is a better way to manage that, by closing request or managing a pool feel free to contribute :)

Interacting with deepstream.io records server side

I've been doing some reading up on deepstream.io and so far I've discovered the following:
All records are stored in the same table (deepstream_records by default)
To interact with this data, the client can be used, both client side (browser) and server side (node), but should not be used on the server side (node).
Questions:
How should I interact with records from the server side?
Is there something stopping me from changing records in the database?
Would changes to records in the database update client subscriptions?
Would this be considered bad practice?
Why are all the records stored in the same table?
Data example from RethinkDB:
{
"_d": { },
"_v": 0,
"ds_id": "users/"
}, {
"_d": { },
"_v": 0,
"ds_id": "users/admin"
}
Why are all the records stored in the same table?
server.set( 'storage', new RethinkDBStorageConnector( {
port: 5672,
host: 'localhost' ,
/* (Optional) A character that's used as part of the
* record names to split it into a tabel and an id part, e.g.
*
* books/dream-of-the-red-chamber
*
* would create a table called 'books' and store the record under the name
* 'dream-of-the-red-chamber'
*/
splitChar: '/'
}));
server.start();
Did you not mention the splitChar? ( It doesn't default )
How should I interact with records on the server side?
To interact with this data you would create a node client that connects to your server using tcp ( default port 6021 ). The server itself is a very efficient message broker that can distribute messages with low latency, and our recommendation is to not include any custom code that isn't necassary, even when using permissioning and dataTransforms.
https://deepstream.io/tutorials/core/transforming-data
You can see this explained in the FX provider example in the tutorials:
https://deepstream.io/tutorials/core/active-data-providers
And the tank game tutorial example:
https://github.com/deepstreamIO/ds-tutorial-tanks
Is there something stopping me from changing records in the database?
deepstream maintains its low latency by actually doing all the writes/reads to the cache. Writing to the database happens secondary in order not to introduce a hit. Because of this changing the record directly will not actually notify any of the users, as well as break some logic used for merge handling...

How to manage centralized values in a sharded environment

I have an ASP.NET app being developed for Windows Azure. It's been deemed necessary that we use sharding for the DB to improve write times since the app is very write heavy but the data is easily isolated. However, I need to keep track of a few central variables across all instances, and I'm not sure the best place to store that info. What are my options?
Requirements:
Must be durable, can survive instance reboots
Must be synchronized. It's incredibly important to avoid conflicting updates or at least throw an exception in such cases, rather than overwriting values or failing silently.
Must be reasonably fast (2000+ read/writes per second
I thought about writing a separate component to run on a worker role that simply reads/writes the values in memory and flushes them to disk every so often, but I figure there's got to be something already written for that purpose that I can appropriate in Windows Azure.
I think what I'm looking for is a system like Apache ZooKeeper, but I dont' want to have to deal with installing the JRE during the worker role startup and all that jazz.
Edit: Based on the suggestion below, I'm trying to use Azure Table Storage using the following code:
var context = table.ServiceClient.GetTableServiceContext();
var item = context.CreateQuery<OfferDataItemTableEntity>(table.Name)
.Where(x => x.PartitionKey == Name).FirstOrDefault();
if (item == null)
{
item = new OfferDataItemTableEntity(Name);
context.AddObject(table.Name, item);
}
if (item.Allocated < Quantity)
{
allocated = ++item.Allocated;
context.UpdateObject(item);
context.SaveChanges();
return true;
}
However, the context.UpdateObject(item) call fails with The context is not currently tracking the entity. Doesn't querying the context for the item initially add it to the context tracking mechanism?
Have you looked into SQL Azure Federations? It seems like exactly what you're looking for:
sharding for SQL Azure.
Here are a few links to read:
http://msdn.microsoft.com/en-us/library/windowsazure/hh597452.aspx
http://convective.wordpress.com/2012/03/05/introduction-to-sql-azure-federations/
http://searchcloudapplications.techtarget.com/tip/Tips-for-deploying-SQL-Azure-Federations
What you need is Table Storage since it matches all your requirements:
Durable: Yes, Table Storage is part of a Storage Account, which isn't related to a specific Cloud Service or instance.
Synchronized: Yes, Table Storage is part of a Storage Account, which isn't related to a specific Cloud Service or instance.
It's incredibly important to avoid conflicting updates: Yes, this is possible with the use of ETags
Reasonably fast? Very fast, up to 20,000 entities/messages/blobs per second
Update:
Here is some sample code that uses the new storage SDK (2.0):
var storageAccount = CloudStorageAccount.DevelopmentStorageAccount;
var table = storageAccount.CreateCloudTableClient()
.GetTableReference("Records");
table.CreateIfNotExists();
// Add item.
table.Execute(TableOperation.Insert(new MyEntity() { PartitionKey = "", RowKey ="123456", Customer = "Sandrino" }));
var user1record = table.Execute(TableOperation.Retrieve<MyEntity>("", "123456")).Result as MyEntity;
var user2record = table.Execute(TableOperation.Retrieve<MyEntity>("", "123456")).Result as MyEntity;
user1record.Customer = "Steve";
table.Execute(TableOperation.Replace(user1record));
user2record.Customer = "John";
table.Execute(TableOperation.Replace(user2record));
First it adds the item 123456.
Then I'm simulating 2 users getting that same record (imagine they both opened a page displaying the record).
User 1 is fast and updates the item. This works.
User 2 still had the window open. This means he's working on an old version of the item. He updates the old item and tries to save it. This causes the following exception (this is possible because the SDK matches the ETag):
The remote server returned an error: (412) Precondition Failed.
I ended up with a hybrid cache / table storage solution. All instances track the variable via Azure caching, while the first instance spins up a timer that saves the value to table storage once per second. On startup, the cache variable is initialized with the value saved to table storage, if available.

Resources