GlusterFS extremely slow (creating 100 empty files takes ~2 minutes) - glusterfs

I am trying glusterfs for the first time,
I used this tutorial to get started:
https://www.digitalocean.com/community/tutorials/how-to-create-a-redundant-storage-pool-using-glusterfs-on-ubuntu-servers
And I did not create a separate partition like suggested in the official docs.
Everything is up and running on EC2 (ubuntu instances), but it is extremely slow creating 100 empty files takes around 2 minutes
time sudo touch file{1..100}
real 1m43.220s
user 0m0.008s
sys 0m0.004s
Am I missing something or doing something wrong ?
Current setup: I am using 2 servers and a client, all in the same region, the replication level is 2.
This is the command I used to create the pool:
sudo gluster volume create myvolume replica 2 transport tcp host1:/gluster-storage host2:/gluster-storage force
CORRECTION:
The client was in a different region, this is the speed when I create 100 files from a client inside the same region:
time sudo touch file{1..100}
real 0m1.237s
user 0m0.004s
sys 0m0.000s
I will leave the question open because I think 2 minutes to create empty files is too long even if the client is in a different region (Oregon -> N. Virginia)

As such stretched cluster (means across regions in your case) is NOT supported.
Note, there are various communications happens back and forth between the
nodes and then the client getting success reply.
maybe you can try the same within same region(LAN) and check out the latency.

Related

What should my expectations be regarding IPFS response times?

IPFS-SERVER
I have a go-ipfs daemon, configured with the standard ipfs "server" profile, running on a linux server, hosted by a large cloud provider .
IPFS-CLIENT
I have an go-ipfs daemon, configured with the ipfs "default" profile, running on a Windows 10 laptop at my SOHO, behind NAT.
Observation #1
When I "publish" via CLI or API (ipfs name publish...) the multihash of a small text file from "IPFS-SERVER" the command takes about 120 seconds to 150 seconds to complete.
When I "cat" via CLI or API (ipfs cat /ipns/multihash) from "IPFS-CLIENT" the command takes about 60 seconds to 120 seconds to complete.
Questions
Are these the typical or expected response times for these commands?
Are there tweaks that can be made to the ipfs config on the client and/or server to reduce these response times?
Observation #2
When I use the same setup but with a "private swarm" the response times are almost instantaneous.
Things I've Tried
I've tried adding "IPFS-SERVER" to "IPFS-CLIENT" bootstrap list with
no improvement
I've tried a "swarm connect" from "IPFS-CLIENT" to
"IPFS-SERVER" with no improvement
I suspect being part of the "public swarm" comes with this performance hit as the DHT is larger and so takes longer to parse? Or is there some other mechanism at play here? - Thank You!!!
First thing up is that you're measuring IPNS response time and not IPFS response times. There are some tradeoffs regarding the mutability property of IPNS that cause it to be slower than immutable IPFS.
I suspect being part of the "public swarm" comes with this performance hit as the DHT is larger and so takes longer to parse? Or is there some other mechanism at play here?
Yes, the reason the public swarm search takes longer is because of the DHT's performance. As of go-ipfs v0.5.0 the DHT algorithm is much more performant, however the properties of the DHT depend on its members and many of them are still pre v0.5.0. As more people upgrade (or if there is some version bump to the DHT protocol the effectively forks away from old) things should improve.
Are these the typical or expected response times for these commands?
Your measurements seem on the high end (I average about 30 seconds for IPNS Publish/Resolve, and 2 minutes in the outlier cases), but I'm not surprised by them. Note: the time to do ipfs cat /ipfs/Hash should be much faster than ipfs cat /ipns/Hash (unless you are running IPNS over PubSub and the publisher of /ipns/Hash has the data it references, e.g. /ipfs/Hash)
Are there tweaks that can be made to the ipfs config on the client and/or server to reduce these response times?
If you enable IPNS over PubSub --enable-namesys-pubsub on both the SERVER and CLIENT your search times should DRASTICALLY improve. As a bonus, IPNS over PubSub (as of go-ipfs v0.5.0) will get even faster if you happen to already be connected to someone else who has the IPNS record (e.g. the publisher or another IPNS over PubSub subscriber who has previously fetched that record).
If you don't want to enable IPNS over PubSub you can also modify the settings for ipfs name resolve, such as setting --dht-record-count to a low number (e.g. 1 if you're not so picky about finding the latest version, or if the data updates infrequently) or setting --stream if you're ok getting the latest records as you discover them.

Cloud Run Qs :: max-instances + concurrency + threads (gunicorn thread)

(I'm learning Cloud Run acknowledge this is not development or code related, but hoping some GCP engineer can clarify this)
I have a PY application running - gunicorn + Flask... just PoC for now, that's why minimal configurations.
cloud run deploy has following flags:
--max-instances 1
--concurrency 5
--memory 128Mi
--platform managed
guniccorn_cfg.py files has following configurations:
workers=1
worker_class="gthread"
threads=3
I'd like to know:
1) max-instances :: if I were to adjust this, does that mean a new physical server machine is provisioned whenever needed ? Or, does the service achieve that by pulling a container image and simply starting a new container instance (docker run ...) on same physical server machine, effectively sharing the same physical machine as other container instances?
2) concurrency :: does one running container instance receive multiple concurrent requests (5 concurrent requests processed by 3 running container instances for ex.)? or does each concurrent request triggers starting new container instance (docker run ...)
3) lastly, can I effectively reach concurrency > 5 by adjusting gunicorn thread settings ? for ex. 5x3=15 in this case.. for ex. 15 concurrent requests being served by 3 running container instances for ex.? if that's true any pros/cons adjusting thread vs adjusting cloud run concurrency?
additional info:
- It's an IO intensive application (not the CPU intensive). Simply grabbing the HTTP request and publishing to pubsub/sub
thanks a lot
First of all, it's not appropriate on Stackoverflow to ask "cocktail questions" where you ask 5 things at a time. Please limit to 1 question at a time in the future.
You're not supposed to worry about where containers run (physical machines, VMs, ...). --max-instances limit the "number of container instances" that you allow your app to scale. This is to prevent ending up with a huge bill if someone was maliciously sending too many requests to your app.
This is documented at https://cloud.google.com/run/docs/about-concurrency. If you specify --concurrency=10, your container can be routed to have at most 10 in-flight requests at a time. So make sure your app can handle 10 requests at a time.
Yes, read Gunicorn documentation. Test if your setting "locally" lets gunicorn handle 5 requests at the same time... Cloud Run’s --concurrency setting is to ensure you don't get more than 5 requests to 1 container instance at any moment.
I also recommend you to read the officail docs more thoroughly before asking, and perhaps also the cloud-run-faq once which pretty much answers all these.

Node.JS WebSocket High Memory Usage

We currently have a production node.js application that has been underperforming for a while. Now the application is a live bidding platform, and also runs timed auctions. The actual system running live sales is perfect and works as required. We have noticed that while running our timed sales (where items in a sale have timers and they incrementally finish, and if someone bids within the last set time, it will increment the time up X amount of seconds).
Now the issue I have found is that during the period of a timed sale finishing (which can go on for hours) if items have 60 seconds between each lots and have extensions if users bid in the last 10 seconds. So we were able to connect via the devtools and I have done heap memory exports to see what is going on, but all I can see is that all indications point to stream writeable and the buffers. So my question is what am I doing wrong. See below a screenshot of a heap memory export:
As you can see from the above, there is a lot of memory being used specifically for this it was using 1473MB of physical RAM. We saw this rise very quickly (within 30 mins) and each increment seemed to be more than the last. So when it hit 3.5GB it was incrementing at around 120MB each second, and then as it got higher around 5GB it was incrementing at 500MB per second and got to around 6GB and then the worker crashed (has a max heap size of 8GB), and then we were a process down.
So let me tell you about the platform. It is of course a bidding platform as I said earlier, the platform uses Node (v11.3.0) and is clustered using the built in cluster library. It spawns 4 workers, and has the main process (so 5 altogether). The system accepts bids, checks other bids, calculates who is winning and essentially pushes updates to the connected clients via Redis PUB/SUB then that is broadcasted to that workers connected users.
All data is stored within redis and mysql is used to refresh data into redis as redis has performed 10x faster than mysql was able to.
Now the way this works is on connection a small session is created against the connection, this is then used to authenticate the user (which is a message sent from the client) all message events are sent to a handler which pushes it to the correct command these commands are then all set as async functions and run async.
Now this has no issue on small scale, but we had over 250 connections and was seeing the above behaviour and are unsure where to find a fix. We noticed when opening the top obejct, it was connected to buffer.js and stream_writable.js as well. I can also see all references are connected to system / JSArrayBufferData and all refer back to these, there are lots of objects, and we are unable to fix this issue.
We think one of the following:
We log to file using append mode, which logs lots of information to the console and to a file using fs.writeFile and append mode. We did some research and saw that writing to console can be a cause of this kind of behaviour.
It is the get lots function which outputs all the lots for that page (currently set to 50) every time an item finishes, so if the timer ends it will ask for a full page load for all the items on that page, instead of adding new lots in.
There is something else happening here that we are unaware of, maybe the external library we are using that may not be removing a reference.
I have listed the libraries of interest that we require here:
"bluebird": "^3.5.1", (For promisifying the redis library)
"colors": "^1.2.5", (Used on every console.log (we call logs for everything that happens this can be around 50 every few seconds.)
"nodejs-websocket": "^1.7.1", (Our websocket library)
"redis": "^2.8.0", (Our redis client)
Anyway, if there is anything painstakingly obvious I would love to hear, as everything I have followed online and other stack overflow questions does not relate close enough to the issue we are facing.

Run Node JS on a multi-core cluster cloud

Is there a service or framework or any way that would allow me to run Node JS for heavy computations letting me choose the number of cores?
I'll be more specific: let's say I want to run some expensive computation for each of my users and I have 20000 users.
So I want to run the expensive computation for each user on a separate thread/core/computer, so I can finish the computation for all users faster.
But I don't want to deal with low level server configuration, all I'm looking for is something similar to AWS Lambda but for high performance computing, i.e., letting me scale as I please (maybe I want 1000 cores).
I did simulate this with AWS Lambda by having a "master" lambda that receives the data for all 20000 users and then calls a "computation" lambda for each user. Problem is, with AWS Lambda I can't make 20000 requests and wait for their callbacks at the same time (I get a request limit exceeded error).
With some setup I could user Amazon HPC, Google Compute Engine or Azure, but they only go up to 64 cores, so if I need more than that, I'd still have to setup all the machines I need separately and orchestrate the communication between them with something like Open MPI, handling the different low level setups for master and compute instances (accessing via ssh and etc).
So is there any service I can just paste my Node JS code, maybe choose the number of cores and run (not having to care about OS, or how many computers there are in my cluster)?
I'm looking for something that can take that code:
var users = [...];
function expensiveCalculation(user) {
// ...
return ...;
}
users.forEach(function(user) {
Thread.create(function() {
save(user.id, expensiveCalculation(user));
});
});
And run each thread on a separate core so they can run simultaneously (therefore finishing faster).
I think that your problem is that you feel the need to process 20000 inputs at once on the same machine. Have you looked into SQS from Amazon? Maybe you push those 20000 inputs into SQS and then have a cluster of servers pull from that queue and process each one individually.
With this approach you could add as many servers, processes or add as many AWS Lambda invokes as you want. You could even use a combination of the 3 to see what's cheaper or faster. Adding resources will only reduce the amount of time it would take to complete the computations. Then you wouldn't have to wait for 20000 requests or anything to complete. The process could tell you when it completes the computation by sending some notification after it completes.
So basically, you could have a simple application that just grabbed 10 of these inputs at a time and ran your computation on them. After it finishes you could then have this process delete them from SQS and send a notification somewhere (Maybe SNS?) to notify the user or some other system that they are done. Then it would repeat the process.
After that you could scale the process horizontally and you wouldn't need a super computer in order to process this. So you could either get a cluster of EC2 instances that ran several of these applications a piece or have a Lambda function invoked periodically in order to pull items out of SQS and process them.
EDIT:
To get started using an EC2 instance I would look at the docs here. To start with I would pick the smallest, cheapest instance (T2.micro I think), and leave everything at it's default. There's no need to open any port other than the one for SSH.
Once it's setup and you login, the first thing you need to do is run aws configure to setup your profile that way you can access AWS resources from the instance. After that install Node and get your application on there using git or something. Once it's setup though, go to the EC2 console and in your Actions menu there will be an option to create an image from the instance.
Once you create an image, then you can go to Auto Scaling groups and create a launch configuration using that AMI. Then it'll let you specify how many instances you want to run.
I feel like this could also be done more easily using their container service, but honestly I don't know how to use it yet.

Load testing bottleneck on nodejs with Google Compute Engine

I cannot figure out what is the cause of the bottleneck on this site, very bad response times once about 400 users reached. The site is on Google compute engine, using an instance group, with network load balancing. We created the project with sailjs.
I have been doing load testing with Google container engine using kubernetes, running the locust.py script.
The main results for one of the tests are:
RPS : 30
Spawn rate: 5 p/s
TOTALS USERS: 1000
AVG(res time): 27500!! (27,5 seconds)
The response time initially is great, below one second, but when it starts reaching about 400 users the response time starts to jump massively.
I have tested obvious factors that can influence that response time, results below:
Compute engine Instances
(2 x standard-n2, 200gb disk, ram:7.5gb per instance):
Only about 20% cpu utilization used
Outgoing network bytes: 340k bytes/sec
Incoming network bytes: 190k bytes/sec
Disk operations: 1 op/sec
Memory: below 10%
MySQL:
Max_used_connections : 41 (below total possible)
Connection errors: 0
All other results for MySQL also seem fine, no reason to cause bottleneck.
I tried the same test for a new sailjs created project, and it did better, but still had terrible results, 5 seconds res time for about 2000 users.
What else should I test? What could be the bottleneck?
Are you doing any file reading/writing? This is a major obstacle in node.js, and will always cause some issues. Caching read files or removing the need for such code should be done as much as possible. In my own experience, serving files like images, css, js and such trough my node server would start causing trouble when the amount of concurrent requests increased. The solution was to serve all of this trough a CDN.
Another proble could be the mysql driver. We had some problems with connection not being closed correctly (Not using sails.js, but I think they used the same driver at the time I encountered this), so they would cause problems on the mysql server, resulting in long delays when fetching data from the database. You should time/track the amount of mysql queries and make sure they arent delayed.
Lastly, it could be some special issue with sails.js and Google compute engine. You should make sure there arent any open issues on either of these about the same problem you are experiencing.

Resources