Round Robin for gRPC (nodejs) on kubernetes with headless service - node.js

I have a a 3 nodejs grpc server pods and a headless kubernetes service for the grpc service (returns all 3 pod ips with dns tested with getent hosts from within the pod). However all grpc client request always end up at a single server.
According to https://stackoverflow.com/a/39756233/2952128 (last paragraph) round robin per call should be possible Q1 2017. I am using grpc 1.1.2
I tried to give {"loadBalancingPolicy": "round-robin"} as options for new Client(address, credentials, options) and use dns:///service:port as address. If I understand documentation/code correctly this should be handed down to the c-core and use the newly implemented round robin channel creation. (https://github.com/grpc/grpc/blob/master/doc/service_config.md)
Is this how round-robin load balancer is supposed to work now? Is it already released with grpc 1.1.2?

After diving deep into Grpc-c core code and the nodejs adapter I found that it works by using the option key "grpc.lb_policy_name". Therefore, constructing the gRPC client with
new Client(address, credentials, {"grpc.lb_policy_name": "round_robin"})
works.
Note that in my original question I also used round-robin instead of the correct round_robin
I am still not completely sure how to set the serviceConfig from the service side with nodejs instead of using client (channel) option override.

I'm not sure if this helps, but this discussion shows how to implement load balancing strategies via grpc.service_config.
const options = {
'grpc.ssl_target_name_override': ...,
'grpc.lb_policy_name': 'round_robin', // <--- has no effect in grpc-js
'grpc.service_config': JSON.stringify({ loadBalancingConfig: [{ round_robin: {} }] }), // <--- but this still works
};

Related

Error: 14 UNAVAILABLE: No connection established when connecting to gRPC Service

I deployed a gRPC service to Cloud Run and was given a service url of the form:
https://customer-service-random-string-ue.a.run.app
I now attempted to connect to that service from localhost using the following code:
let client = new customer_proto.Customer("dns:customer-service-random-string-ue.a.run.app:50051", grpc.credentials.createInsecure());
but all I keep getting back is the following error:
Error: 14 UNAVAILABLE: No connection established
What can I do differently to make this work?
Thank you
We don't know the implementation of customer_proto.Customer so it's not possible to provide a definitive answer.
It's unlikely that the method expects the endpoint to be prefixed dns:. I would expect it to want the Cloud Run service endpoint and, because gRPC doesn't have default ports, (and as I mentioned previously, Cloud Run maps services to) :443 too, i.e.
customer-service-random-string-ue.a.run.app:443
A good tool that I use to test gRPC services is gRPCurl. Postman is quite common for REST and now supports gRPC
If your service support reflection, you can:
grpcurl customer-service-random-string-ue.a.run.app:443 list
Otherwise, you'll need to reference the service's proto file(s), the service and a method (possibly with data) to test:
grpcurl \
-proto {your-proto} \
customer-service-random-string-ue.a.run.app:443 \
{package}.{Service}/{MethodName}
You can confirm that the host is defined and responds with:
nslookup customer-service-random-string-ue.a.run.app
telnet customer-service-random-string-ue.a.run.app 443

Random 21/42 seconds timeout in outgoing traffic on Azure Web Sites

I have an ASP.NET MVC 5 application running in the azure german cloud as Azure Web App (single instance - Standard S3 size).
I'm calling a non azure hosted REST/SOAP service on a particular host and the web requests either succeed promptly or timeout after 21 / 42 seconds.
I've load tested the requests and the percentile of requests timing out is between 20 and 80.
One particular remarkable property of the timeout is, that they occur after exactly 21 or 42 seconds (this is serious, no reference to hitchhiker's guide to the galaxy intended).
Calling a different service from the web app works just fine, temporarily at least.
We've already checked the firewall of the non azure service and if the timeout occurs, not a single packet reached the host.
This issue occurred once in the past one year ago and support was unable to tell what the cause was until the issue suddenly went away roughly two weeks after first occuring, so the ticket got closed as fixed itself but now its back.
The code is using https://github.com/canton7/RestEase (uses HttpClient underneath) and looks like
[Header("Content-Type", "application/json")]
public interface IApi
{
[Post("/Login")]
Task<LoginToken> Login([Body]LoginRequest request);
}
private static Dictionary<string, IApi> ApiClientsByHost = new Dictionary<string, IApi>();
private IApi GetApiForHost(string host)
{
if (!ApiClientsByHost.TryGetValue(host, out var client))
{
lock (ApiClientsByHost)
{
if (!ApiClientsByHost.TryGetValue(host, out client))
{
ApiClientsByHost[host] = client = RestClient.For<IApi>(host);
}
}
}
return client;
}
var client = GetApiForHost("https://production/");
var loginToken = await client.Login(new LoginRequest { Username = username, Password = password });
By different service, i mean using "https://testserver/" instead of "https://production/" (testserver is located in a different data center with different IP and all).
The API authentication is passing a token via query but it timeouts already before being able to get a token.
The code is caching the IApi to avoid the TCP starvation problems of disposing HttpClients (but i've never run into port exhaustion).
Restarting the app does not resolve the issue and the issue only occurs to production currently (but a year ago, when this issue occurred on production, we've switched to testserver which worked initially but after some time, ran into the same problem)
EDIT: Found some explanation in the last answer as to where those magical 21 seconds are comming from.
EDIT: One way i've found to workaround is, is to setup a azure vm with a proxy on it and configure defaultProxy to pass through that vm.
That's TCP retransmission timing out. It's odd that you are getting different values though.

Meteor MongoDB subscription delivering data in 10 second intervals instead of live

I believe this is more of a MongoDB question than a Meteor question, so don't get scared if you know a lot about mongo but nothing about meteor.
Running Meteor in development mode, but connecting it to an external Mongo instance instead of using Meteor's bundled one, results in the same problem. This leads me to believe this is a Mongo problem, not a Meteor problem.
The actual problem
I have a meteor project which continuosly gets data added to the database, and displays them live in the application. It works perfectly in development mode, but has strange behaviour when built and deployed to production. It works as follows:
A tiny script running separately collects broadcast UDP packages and shoves them into a mongo collection
The Meteor application then publishes a subset of this collection so the client can use it
The client subscribes and live-updates its view
The problem here is that the subscription appears to only get data about every 10 seconds, while these UDP packages arrive and gets shoved into the database several times per second. This makes the application behave weird
It is most noticeable on the collection of UDP messages, but not limited to it. It happens with every collection which is subscribed to, even those not populated by the external script
Querying the database directly, either through the mongo shell or through the application, shows that the documents are indeed added and updated as they are supposed to. The publication just fails to notice and appears to default to querying on a 10 second interval
Meteor uses oplog tailing on the MongoDB to find out when documents are added/updated/removed and update the publications based on this
Anyone with a bit more Mongo experience than me who might have a clue about what the problem is?
For reference, this is the dead simple publication function
/**
* Publishes a custom part of the collection. See {#link https://docs.meteor.com/api/collections.html#Mongo-Collection-find} for args
*
* #returns {Mongo.Cursor} A cursor to the collection
*
* #private
*/
function custom(selector = {}, options = {}) {
return udps.find(selector, options);
}
and the code subscribing to it:
Tracker.autorun(() => {
// Params for the subscription
const selector = {
"receivedOn.port": port
};
const options = {
limit,
sort: {"receivedOn.date": -1},
fields: {
"receivedOn.port": 1,
"receivedOn.date": 1
}
};
// Make the subscription
const subscription = Meteor.subscribe("udps", selector, options);
// Get the messages
const messages = udps.find(selector, options).fetch();
doStuffWith(messages); // Not actual code. Just for demonstration
});
Versions:
Development:
node 8.9.3
mongo 3.2.15
Production:
node 8.6.0
mongo 3.4.10
Meteor use two modes of operation to provide real time on top of mongodb that doesn’t have any built-in real time features. poll-and-diff and oplog-tailing
1 - Oplog-tailing
It works by reading the mongo database’s replication log that it uses to synchronize secondary databases (the ‘oplog’). This allows Meteor to deliver realtime updates across multiple hosts and scale horizontally.
It's more complicated, and provides real-time updates across multiple servers.
2 - Poll and diff
The poll-and-diff driver works by repeatedly running your query (polling) and computing the difference between new and old results (diffing). The server will re-run the query every time another client on the same server does a write that could affect the results. It will also re-run periodically to pick up changes from other servers or external processes modifying the database. Thus poll-and-diff can deliver realtime results for clients connected to the same server, but it introduces noticeable lag for external writes.
(the default is 10 seconds, and this is what you are experiencing , see attached image also ).
This may or may not be detrimental to the application UX, depending on the application (eg, bad for chat, fine for todos).
This approach is simple and and delivers easy to understand scaling characteristics. However, it does not scale well with lots of users and lots of data. Because each change causes all results to be refetched, CPU time and network bandwidth scale O(N²) with users. Meteor automatically de-duplicates identical queries, though, so if each user does the same query the results can be shared.
You can tune poll-and-diff by changing values of pollingIntervalMs and pollingThrottleMs.
You have to use disableOplog: true option to opt-out of oplog tailing on a per query basis.
Meteor.publish("udpsPub", function (selector) {
return udps.find(selector, {
disableOplog: true,
pollingThrottleMs: 10000,
pollingIntervalMs: 10000
});
});
Additional links:
https://medium.baqend.com/real-time-databases-explained-why-meteor-rethinkdb-parse-and-firebase-dont-scale-822ff87d2f87
https://blog.meteor.com/tuning-meteor-mongo-livedata-for-scalability-13fe9deb8908
How to use pollingThrottle and pollingInterval?
It's a DDP (Websocket ) heartbeat configuration.
Meteor real time communication and live updates is performed using DDP ( JSON based protocol which Meteor had implemented on top of SockJS ).
Client and server where it can change data and react to its changes.
DDP (Websocket) protocol implements so called PING/PONG messages (Heartbeats) to keep Websockets alive. The server sends a PING message to the client through the Websocket, which then replies with PONG.
By default heartbeatInterval is configure at little more than 17 seconds (17500 milliseconds).
Check here: https://github.com/meteor/meteor/blob/d6f0fdfb35989462dcc66b607aa00579fba387f6/packages/ddp-client/common/livedata_connection.js#L54
You can configure heartbeat time in milliseconds on server by using:
Meteor.server.options.heartbeatInterval = 30000;
Meteor.server.options.heartbeatTimeout = 30000;
Other Link:
https://github.com/meteor/meteor/blob/0963bda60ea5495790f8970cd520314fd9fcee05/packages/ddp/DDP.md#heartbeats

How to construct a subscription message for XMPP in NodeJS package?

I'm using node-xmpp-client package to connect to an XMPP service. The service publishes messages when it receives them from some external source. My goal:
Connect to the service
Get authenticated
Subscribe to some nodes that I'm interested in. (Node name is known)
Receive stanza from the node to know new message has come in and handle it.
I'm referencing to the sample code here.
I have managed to connect to the service with code below. Does this automatically authenticate me to the server? I don't receive any "authenticate" event. If it doesn't, how do I explicitly request for authentication?
var client = new xmpp.Client({
jid: 'someuser#somedomain.com',
password: 'somepassword',
host:'somehost',
port:5222
})
Next, how do I subscribe to a publisher node? Should I do a client.send(new xmpp.Message(..))? If yes, how the xmpp.Message should be constructed? I can see the XMPP subscription iq in XMPP's web but have difficulty in mapping it back to node-xmpp's api.
<iq type='set'
from='francisco#denmark.lit/barracks'
to='pubsub.shakespeare.lit'
id='sub1'>
<pubsub xmlns='http://jabber.org/protocol/pubsub'>
<subscribe
node='princely_musings'
jid='francisco#denmark.lit'/>
</pubsub>
</iq>
Probably easiest to look at how I did this for xmpp-ftw-pubsub https://github.com/xmpp-ftw/xmpp-ftw-pubsub/blob/master/lib/pubsub.js#L111-L137

Using memcached failover servers in nodejs app

I'm trying to set up a robust memcached configuration for a nodejs app with the node-memcached driver, but it does not seem to use the specified failover servers when one server dies.
My local experiment goes as follows:
shell
memcached -p 11212
node
MC = require('memcached')
c = new MC('localhost:11211', //this process does not exist
{failOverServers: ['localhost:11212']})
c.get('foo', console.log) //this will eventually time out
c.get('foo', console.log) //repeat 5 or 6 times to exceed the retries number
//wait until all the connection errors appear in the console
//at this point, the failover server should be in use
c.get('foo', console.log) //this still times out :(
Any ideas of what might we be doing wrong?
It seems that the failover feature is somewhat buggy in node-memcached.
To enable failover you must set the remove options:
c = new MC('localhost:11211', //this process does not exist
{failOverServers: ['localhost:11212'],
remove : true})
Unfortunately, this is not going to work because of the following error:
[depricated] HashRing#replaceServer is removed.
[depricated] the API has no replacement
That is, when trying to replace a dead server with a replacement from the failover list, node-memcached outputs a deprecation error from the HashRing library (which, in turn, is maintained by the same author of node-memcached). IMHO, feel free to open a bug :-)
This is come when your nodejs server not getting any session id from memcached
Please check properly in php.ini file you are setting properly or not for memcached
session.save = 'memcache'
session.path = 'tcp://localhost:11212'

Resources