I'm a sysadmin and I managing 5 cluster of RIAK:
two of them are LXC containers on the same physical machine (3 nodes per cluster)
one of them are LXC containers located on different physical machines (6 nodes)
one of them are LXC containers located on different physical machines and XEN VMs (6 nodes)
and the last of them are VMware ESX VMs (3 nodes)
Our application works correctly on the first four clusters, but it doesn't work as we expected on the last one.
When we update a key and we retrieve this key in order to write it again, it has an old value (it doesn't have the first value that we wrote), for example:
The key has: lalala
We retrieve the key, and add lololo, so it should be lalala,lololo
We retrieve the key again, and try to add lelele, so it should be now: lalala,lololo,lelele, but when we retrieve it again, we only have: lalala,lelele
In the second write action, when we retrieve the key, we obtained a key with the old value. We set r, w, pr and rw to 3 to the REST requests, but it doesn't help.
All the configuration files are very similiar and we don't have any major differences in the disk I/O and network performance of the nodes of the clusters.
Did anyone have a similar issue?
Regards.
I resolved the issue. VMware have some issues with the clocks, they were synchronized. The servers doesn't have a direct internet connection, but they can use a HTTP proxy. I installed and configured htpdate on these servers in order to synchronize theirs clocks through HTTP headers.
Related
Let's say I have a cluster of 3 nodes for ScyllaDB in my local network (it can be AWS VPC).
I have my Java application running in the same local network.
I am concerned how to properly connect app to DB.
Do I need to specify all 3 IP addresses of DB nodes for the app?
What if over time one or several nodes die and get resurrected on other IPs? Do I have to manually reconfigure application?
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
I would be much grateful for a code sample of how to connect Java app to multi-node cluster.
You need to specify contact points (you can use DNS names instead of IPs) - several nodes (usually 2-3), and driver will connect to one of them, and will discover the all nodes of the cluster after connection (see the driver's documentation). After connection is established, driver keeps the separate control connection opened, and via it receives the information about nodes that are going up & down, joining or leaving the cluster, etc., so it's able to keep information about cluster topology up-to-date.
If you're specifying DNS names instead of the IP addresses, then it's better to specify configuration parameter datastax-java-driver.advanced.resolve-contact-points as true (see docs), so the names will be resolved to IPs on every reconnect, instead of resolving at the start of application.
Alex Ott's answer is correct, but I wanted to add a bit more background so that it doesn't look arbitrary.
The selection of the 2 or 3 nodes to connect to is described at
https://docs.scylladb.com/kb/seed-nodes/
However, going forward, Scylla is looking to move away from differentiating between Seed and non-Seed nodes. So, in future releases, the answer will likely be different. Details on these developments at:
https://www.scylladb.com/2020/09/22/seedless-nosql-getting-rid-of-seed-nodes-in-scylla/
Answering the specific questions:
Do I need to specify all 3 IP addresses of DB nodes for the app?
No. Your app just needs one to work. But it might not be a bad idea to have a few, just in case one is down.
What if over time one or several nodes die and get resurrected on other IPs?
As long as your app doesn't stop, it maintains its own version of gossip. So it will see the new nodes being added and connect to them as it needs to.
Do I have to manually reconfigure application?
If you're specifying IP addresses, yes.
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
By abstracting the need for a specific IP, using something like Consul. If you wanted to, you could easily build a simple restful service to expose an inventory list or even the results of nodetool status.
I have a very basic question but I have been searching the internet for days without finding what I am looking for.
I currently run one instance on AWS.
That instance has my node server and my database on it.
I would like to make use of ELB by separating the one machine that hosts both the server and the database:
One machine that is never terminated, which hosts the database
One machine that runs the basic node server, which as well is never terminated
A policy to deploy (and subsequently terminate) additional EC2 instances that run the server when traffic demands it.
First of all I would like to know if this setup makes sense.
Secondly,
I am very confused about the way this should work in practice:
Do all deployed instances run using the same volume or is a snapshot of the volume is used?
In general, how do I set such a system? Again, I searched the web and all of the tutorials and documentations are so generalized for every case that I cannot seem to figure out exactly what to do in my case.
Any tips? Links? Articles? Videos?
Thank you!
You would have an AutoScaling Group with a minimum size of 1, that is configured to use an AMI based on your NodeJS server. The AutoScaling Group would add/remove instances to the ELB as instances are created and deleted.
EBS volumes can not be attached to more than one instance at a time. If you need a shared disk volume you would need to look into the EFS service.
Yes you need to move your database onto a separate server that is not a member of the AutoScaling Group.
I have my virtual machine running Linux. I've created it via new "Resource manager". Then I added data disk to it.
Then I created new Virtual Machine. And I want it to use the same data disk attached to the first one (at least in read-only mode).
When I try to "attach existing disk" to this new machine I get this error:
Failed to attach existing disk 'DISK-NAME.vhd' to the virtual machine 'MACHINE-NAME'. Error: Failed to acquire lease while creating disk 'DISK-NAME.vhd' using blob with URI https://BLOB-URI-disk1.vhd. Blob is already in use.
How do I attach existing data disk which is in use by another machine to my current machine?
Simply, you can't.
A disk in Azure can only be attached to a single VM at a time. in order to attach it to another VM you need to disconnect it from the first.
If you need to have data shared amongst many machines, you could use Azure File shares which provides SMB 2.1 and SMB 3.0. Most modern Linux versions can connect to this quite seamlessly.
If you need block storage, i.e. sharing an actual disk, you would need to spin up a separate VM and use a protocol like iscsi (or NFS) to share that disk amongst multiple machines.
Maybe the "StorSimple" Solution from Microsoft Azure could be a way to go? I would describe it as SAN on Azure.
I have not tested it today, but it should be possible to connect several virtual machines to it, and share the files.
You can find more information in the documentation:
Azure StorSimple
Extending Michael's answer a bit, based on your comments under his answer:
If the goal is to provide data access when a VM goes down for some reason, and the data is on an attached disk, then you can detach the disk from the downed VM and reattach it to another VM (the detach breaks the lease, and the reattach creates a new lease). But be aware: This is a time-consuming process - it might take a minute or two for each operation. But you can certainly do it, and you can do it programmatically.
Regarding disk replication: Yes, Azure disks are triple-replicated (or replicated 6 times, if you enable georeplication). But logically, it's a single disk; it's replicated for durability, not for you to attach to different replicas.
Michael mentioned Azure File Service. Maybe it wasn't clear what that was but... there's no Virtual Machine involved with File Service - it's a durable-storage SMB service, with its own SLA unrelated to your VMs. You may attach to it from multiple VMs and read/write files as you would a locally-attached disk (which seems to be the problem you're trying to tackle).
Regarding replication of data across VM's: If you choose to go this route, and make physical copies yourself, it's strictly up to you how you do it - there is no "best way." But this is the type of thing database engines are built for (and you can imagine how complex they are, dealing with replication, journaling, errors, etc.).
I need to design a Clustered application which runs separate instances on 2 nodes. These nodes are both Linux VM's running on VMware. Both application instances need to access a database & a set of files.
My intention is that a shared storage disk (external to both nodes) should contain the database & files. The applications would co-ordinate (via RPC-like mechanism) to determine which instance is master & which one is slave. The master would have write-access to the shared storage disk & the slave will have read-only access.
I'm having problems determining the file system for the shared storage device, since it would need to support concurrent access across 2 nodes. Going for a proprietary clustered file system (like GFS) is not a viable alternative owing to costs. Is there any way this can be accomplished in Linux (EXT3) via other means?
Desired behavior is as follows:
Instance A writes to file foo on shared disk
Instance B can read whatever A wrote into file foo immediately.
I also tried using SCSI PGR3 but it did not work.
Q: Are both VM's co-located on the same physical host?
If so, why not use VMWare shared folders?
Otherwise, if both are co-located on the same LAN, what about good old NFS?
try using heartbeat+pacemaker, it has couple of inbuilt options to monitor cluster. Should have something to look for data too
you might look at an active/passive setup with "drbd+(heartbeat|pacemaker)" ..
drbd gives you a distributed block device over 2 nodes, where you can deploy an ext3-fs ontop ..
heartbeat|pacemaker gives you a solution to handle which node is active and passive and some monitoring/repair functions ..
if you need read access on the "passive" node too, try to configure a NAS too on the nodes, where the passive node may mount it e.g. nfs|cifs ..
handling a database like pgsq|mysql on a network attached storage might not work ..
Are you going to be writing the applications from scratch? If so, you can consider using Zookeeper for the coordination between master and slave. This will put the coordination logic purely into the application code.
GPFS Is inherently a clustered filesystem.
You setup your servers to see the same lun(s), build the GPFS filesystem on the lun(s) and mount the GPFS filesystem on the machines.
If you are familiar with NFS, it looks like NFS, but it's GPFS, A Clustered filesystem by nature.
And if one of your GPFS servers goes down, if you defined your environment correctly, no one is the wiser and things continue to run.
I'm trying to set up a Jenkins system where a certain program has to be run on a board on the network, accessed using telnet. We're talking about hundreds of such jobs here, therefore we will be setting up multiple boards. Therefore, each job has to be allocated a board, but the catch is that only one job can have a certain board at the same time, otherwise the program fails.
The solution I have right now is using a master-slave set-up where I connect to the same machine using SSH (so a master and multiple slaves on the same machine). Each of the slave nodes then has a label for the IP address the program has to telnet to. This works, scheduling wise, but it might cause issues because all nodes connect using SSH to the same machine. Connecting to the boards using SSH is not an option.
Is there any way to get the same functionality as above, but then without using SSH to connect to the same machine? So basically I want to be able to say: we have n available machines, when a job comes in give it one of those machines and pass it a label belonging to that machine (its IP address in this case); now there are n-1 machines left.
Mutual exclusion comes close, but does not allow the above functionality, and jobs waiting for a resource take up one of the executors of a node.
Thanks a lot!
I realize your problem is probably solved already years ago, but in case someone else is looking for the answer and runs into this.
You can use "Lockable resources" plugin and set the ip address as the name of the resource and use label such use test-board-ip.It is simple and easy to use.
Another possibility is to use "External resources dispatcher" plugin. It provides a bit more possibilities, but it has a bug that causes it to hang sometimes. And it seems there is no maintenance any more (last updates from 2013).
Maybe you should hava a look at the Lock and Latches Plugin. You are able to lock a resource with this plugin with only requireing the job to lock the board you want to.
https://wiki.jenkins-ci.org/display/JENKINS/Locks+and+Latches+plugin