Scaling socket.io broadcast - node.js

I want to broadcast a 1Kb message with socket.io (node.js framework), every 3 seconds to a large number of users. What is the best way to scale it (1 user = 1 'listener' with socket.on('periodicMessage',callback) )?
There is no other CPU usage (one read of an external database which is filled by an other external module every 3 seconds), so i am trying to know if a simple heroku server can broadcast a message to 10 000, 100 000, 1 million or more users.

We have easily scaled to tens of thousands of 'listeners' on a single node.js process. I am not sure how many you actually can scale to, given that each socket is a file descriptor, and the plain vanilla kernel can have 65K fd's for each process, no more.
CPU would not be a problem. If at all, upload bandwidth would be (1KB * 50K users / 3 sec = 50M/3sec = 16MB/s upstream. I never measured Heroku, so don't know if they sustain this. I suppose they do, but maybe they limit you, since they are paying Amazon for this, after all).

Related

Download multiple files: best performance-wise solution

I am writing an application which needs to read thousands (let's say 10000) of small (let's say 4KB) files from a remote location (e.g., an S3 bucket).
The single read from remote takes around 0.1 seconds and processing the file takes just few milliseconds (from 10 to 30). Which is the best solution in terms of performance?
The worse I can do is to download everything serially: 0.1*10000 = 1000 seconds (~16 minutes).
Compressing the files in a single big one would surely help, but it is not an option in my application.
I tried to spawn N threads (where N is the number of logical cores) that download and process 10000/N files each. This gave me 10000/N * 0.1 seconds. If N = 16, it takes indeed around 1 minute to complete
I also tried to spawn K*N threads (this helps sometimes on high latency systems, as GPUs) but I did not get much speed up for any value of K (I tried 2,3,4), maybe due to lot of context thread switching.
My general question is: how would you design such a similar system? Is 1 minute really the best I can achieve?
Thank you,
Giuseppe

What does BandwidthIn and BandwidthOut graph represent for a service?

I have a service and its bandwidth graph looks like this
What does it represent.? I am using tutum which shows me these graphs.!
Should I worry about it.? Please Explain! Any help is appreciated.!
Bandwidth is the the amount of data sent (Out) or received (In) in a period of time. Mbps stands for Mega bits per seconds, i.e., how many bits did you send or receive during that past whole second.
I am sure you heard about xxx Mpbs from your internet provider, in which case, it correspond to the maximum speed you can have, but you are not required to use the whole bandwidth all the time.
Same thing on Tutum, depending on your hosting provider / instance type you will also have a maximum Mbps bandwidth, but at any given t time, you are using YY Mbps out of your XX Mpbs maximum.
As the graph increase, it simply means that you send/receive more data, which can mean that you have a higher traffic or you are doing some kind of networking activity.

How to determine Redis memory leak?

Our redis servers are, since yesterday, gradually (200MB/hour) using more memory, while the amount of keys (330K) and their data (132MB redis-rdb-tools) stay about the same.
Output of redis-cli info shows 6.89G used memory?!
redis_version:2.4.10
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.6
process_id:3437
uptime_in_seconds:296453
uptime_in_days:3
lru_clock:1905188
used_cpu_sys:8605.03
used_cpu_user:1480.46
used_cpu_sys_children:1035.93
used_cpu_user_children:3504.93
connected_clients:404
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:7400076728
used_memory_human:6.89G
used_memory_rss:7186984960
used_memory_peak:7427443856
used_memory_peak_human:6.92G
mem_fragmentation_ratio:0.97
mem_allocator:jemalloc-2.2.5
loading:0
aof_enabled:0
changes_since_last_save:1672
bgsave_in_progress:0
last_save_time:1403172198
bgrewriteaof_in_progress:0
total_connections_received:3616
total_commands_processed:127741023
expired_keys:0
evicted_keys:0
keyspace_hits:18817574
keyspace_misses:8285349
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:1619791
vm_enabled:0
role:slave
master_host:***BLOCKED***
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
db0:keys=372995,expires=372995
db6:keys=68399,expires=68399
The problem started when we updated our (.net) client code from BookSleeve 1.1.0.4 to ServiceStack v3.9.71 to prepare for an upgrade to Redis 2.8. But a lot of other stuff was updated to And our session state store (also redis, but with harbour client) does not show the same symptoms.
Where is all that Redis memory going? How can I troubleshoot it's usage?
Edit: I just restarted this instance and memory returned to 350M and is now climbing again. The top 10 largest objects are still the same size, ranging from 100K to 25M for nr 1. The amount of keys has dropped to 270K (330K earlier).
Here are some sources of "hidden" memory consumption in Redis:
Marc already mentioned the buffers maintained by the master to feed the slave. If a slave is lagging behind its master (because it runs on a slower box for instance), then some memory will be consumed on the master.
when long running commands are detected, Redis logs them in the SLOWLOG area, which takes some memory. You may want to use the SLOWLOG LEN command to check the number of records you have here.
communication buffers can also take memory. As far as I remember, with old versions of Redis (and 2.4 is quite old - you should really upgrade), it was unbounded, meaning that if you transfer a big object at a point, the communication buffer associated to this client connection will grow and never shrink. If there are many clients dealing occasionally with large objects, it could be a possible explanation. If you use commands retrieving very large data from Redis (in one shot), it can be an explanation as well. For instance, a simple KEYS * command applied on a Redis server storing millions of keys will consume a significant amount of memory.
You mentioned that you have objects as big as 25 MB. You have 404 client connections, if each of them needs to access such objects at a point in time, it will consume 10 GB of memory.

Give reads priority over writes in Elasticsearch

I have an EC2 server running Elasticsearch 0.9 with a nginx server for read/write access. My index has about 750k small-medium documents. I have a pretty continuous stream of minimal writes (mainly updates) to the content. The speeds/consistency I receive with search is fine with me, but I have some sporadic timeout issues with multi-get (/_mget).
On some pages in my app, our server will request a multi-get of a dozen to a few thousand documents (this usually takes less than 1-2 seconds). The requests that fail, fail with a 30,000 millisecond timeout from the nginx server. I am assuming this happens because the index was temporarily locked for writing/optimizing purposes. Does anyone have any ideas on what I can do here?
A temporary solution would be to lower the timeout and return a user friendly message saying documents couldn't be retrieved (however they still would have to wait ~10 seconds to see an error message).
Some of my other thoughts were to give read priority over writes. Anytime someone is trying to read a part of the index, don't allow any writes/locks to that section. I don't think this would be scalable and it may not even be possible?
Finally, I was thinking I could have a read-only alias and a write-only alias. I can figure out how to set this up through the documentation, but I am not sure if it will actually work like I expect it to (and I'm not sure how I can reliably test it in a local environment). If I set up aliases like this, would the read-only alias still have moments where the index was locked due to information being written through the write-only alias?
I'm sure someone else has come across this before, what is the typical solution to make sure a user can always read data from the index with a higher priority over writes. I would consider increasing our server power, if required. Currently we have 2 m2x-large EC2 instances. One is the primary and the replica, each with 4 shards.
An example dump of cURL info from a failed request (with an error of Operation timed out after 30000 milliseconds with 0 bytes received):
{
"url":"127.0.0.1:9200\/_mget",
"content_type":null,
"http_code":100,
"header_size":25,
"request_size":221,
"filetime":-1,
"ssl_verify_result":0,
"redirect_count":0,
"total_time":30.391506,
"namelookup_time":7.5e-5,
"connect_time":0.0593,
"pretransfer_time":0.059303,
"size_upload":167002,
"size_download":0,
"speed_download":0,
"speed_upload":5495,
"download_content_length":-1,
"upload_content_length":167002,
"starttransfer_time":0.119166,
"redirect_time":0,
"certinfo":[
],
"primary_ip":"127.0.0.1",
"redirect_url":""
}
After more monitoring using the Paramedic plugin, I noticed that I would get timeouts when my CPU would hit ~80-98% (no obvious spikes in indexing/searching traffic). I finally stumbled across a helpful thread on the Elasticsearch forum. It seems this happens when the index is doing a refresh and large merges are occurring.
Merges can be throttled at a cluster or index level and I've updated them from the indicies.store.throttle.max_bytes_per_sec from the default 20mb to 5mb. This can be done during runtime with the cluster update settings API.
PUT /_cluster/settings HTTP/1.1
Host: 127.0.0.1:9200
{
"persistent" : {
"indices.store.throttle.max_bytes_per_sec" : "5mb"
}
}
So far Parmedic is showing a decrease in CPU usage. From an average of ~5-25% down to an average of ~1-5%. Hopefully this can help me avoid the 90%+ spikes I was having lock up my queries before, I'll report back by selecting this answer if I don't have any more problems.
As a side note, I guess I could have opted for more balanced EC2 instances (rather than memory-optimized). I think I'm happy with my current choice, but my next purchase will also take more CPU into account.

How much data can I send through a socket.emit?

So I am using node.js and socket.io. I have this little program that takes the contents of a text box and sends it to the node.js server. Then, the server relays it back to other connected clients. Kind of like a chat service but not exactly.
Anyway, what if the user were to type 2-10k worth of text and try to send that? I know I could just try it out and see for myself but I'm looking for a practical, best practice limit on how much data I can do through an emit.
As of v3, socket.io has a default message limit of 1 MB. If a message is larger than that, the connection will be killed.
You can change this default by specifying the maxHttpBufferSize option, but consider the following (which was originally written over a decade ago, but is still relevant):
Node and socket.io don't have any built-in limits. What you do have to worry about is the relationship between the size of the message, number of messages being send per second, number of connected clients, and bandwidth available to your server – in other words, there's no easy answer.
Let's consider a 10 kB message. When there are 10 clients connected, that amounts to 100 kB of data that your server has to push out, which is entirely reasonable. Add in more clients, and things quickly become more demanding: 10 kB * 5,000 clients = 50 MB.
Of course, you'll also have to consider the amount of protocol overhead: per packet, TCP adds ~20 bytes, IP adds 20 bytes, and Ethernet adds 14 bytes, totaling 54 bytes. Assuming a MTU of 1500 bytes, you're looking at 8 packets per client (disregarding retransmits). This means you'll send 8*54=432 bytes of overhead + 10 kB payload = 10,672 bytes per client over the wire.
10.4 kB * 5000 clients = 50.8 MB.
On a 100 Mbps link, you're looking at a theoretical minimum of 4.3 seconds to deliver a 10 kB message to 5,000 clients if you're able to saturate the link. Of course, in the real world of dropped packets and corrupted data requiring retransmits, it will take longer.
Even with a very conservative estimate of 8 seconds to send 10 kB to 5,000 clients, that's probably fine in chat room where a message comes in every 10-20 seconds.
So really, it comes down to a few questions, in order of importance:
How much bandwidth will your server(s) have available?
How many users will be simultaneously connected?
How many messages will be sent per minute?
With those questions answered, you can determine the maximum size of a message that your infrastructure will support.
Limit = 1M (By Default)
Use this example config to specify your custom limit for maxHttpBufferSize:
const io = require("socket.io")(server, {
maxHttpBufferSize: 1e8, pingTimeout: 60000
});
1e8 = 100,000,000 : that can be good for any large scale response/emit
pingTimeout : When the emit was large then it take time and you need increase pingtime too
Read more from Socket.IO Docs
After these setting, if your problem is still remaining then you can check related proxy web server configs, like client_max_body_size, limit_rate in
nginx (if u have related http proxy to socketIO app) or client/server Firewall rules.
2-10k is fine, there arn't any enforced limits or anything, it just comes down to bandwidth and practicality.. 10k is small though in the grand scheme of things so you should be fine if that's somewhat of an upper bound for you.

Resources