GlusterFS noatime alternative - glusterfs

It seems noatime has been disabled for glusterfs on the version I'm using. Installing direct via apt-get, I'm on v3.4.2. I've tried manually switching over to 3.7 and 3.8 via the PPAs, which did work fine, but the noatime flag still isn't accepted when mounting.
I'm running GlusterFS across several PHP servers. The vast majority of my load comes from GlusterFS. I'm rarely writing files, so I'm assuming the load is coming from gluster syncing access times across the servers. PHP mode, caching, etc. aside, what can I do to reduce the insane load caused by gluster?
Running gluster volume top www_storage open shows massive open counts for config.ini files. The highest value in read is about a quarter of the highest opens, and writes are tiny.

Related

Weird "Stale file handle, errno=116" on remote cluster after dozens of hours running

I'm now running a simulation code called CMAQ on a remote cluster. I first ran a benchmark test in serial to see the performance of the software. However, the job always runs for dozens of hours and then crashes with the following "Stale file handle, errno=116" error message:
PBS Job Id: 91487.master.cluster
Job Name: cmaq_cctm_benchmark_serial.sh
Exec host: hs012/0
An error has occurred processing your job, see below.
Post job file processing error; job 91487.master.cluster on host hs012/0Unknown resource type REJHOST=hs012.cluster MSG=invalid home directory '/home/shangxin' specified, errno=116 (Stale file handle)
This is very strange because I never modify the home directory and this "/home/shangxin/" is surely my permanent directory where the code is....
Also, in the standard output .log file, the following message is always shown when the job fails:
Bus error
100247.930u 34.292s 27:59:02.42 99.5% 0+0k 16480+0io 2pf+0w
What does this message mean specifically?
I once thought this error is due to that the job consumes the RAM up and this is a memory overflow issue. However, when I logged into the computing node while running to check the memory usage with "free -m" and "htop" command, I noticed that both the RAM and swap memory occupation never exceed 10%, at a very low level, so the memory usage is not a problem.
Because I used "tee" to record the job running to a log file, this file can contain up to tens of thousands of lines and the size is over 1MB. To test whether this standard output overwhelms the cluster system, I ran another same job but without the standard output log file. The new job still failed with the same "Stale file handle, errno=116" error after dozens of hours, so the standard output is also not the reason.
I also tried running the job in parallel with multiple cores, it still failed with the same error after dozens of hours running.
I can make sure that the code I'm using has no problem because it can successfully finish on other clusters. The administrator of this cluster is looking into the issue but also cannot find out the specific reasons for now.
Has anyone ever run into this weird error? What should we do to fix this problem on the cluster? Any help is appreciated!
On academic clusters, home directories are frequently mounted via NFS on each node in the cluster to give you a uniform experience across all of the nodes. If this were not the case, each node would have its own version of your home directory, and you would have to take explicit action to copy relevant files between worker nodes and/or the login node.
It sounds like the NFS mount of your home directory on the worker node probably failed while your job was running. This isn't a problem you can fix directly unless you have administrative privileges on the cluster. If you need to make a work-around and cannot wait for sysadmins to address the problem, you could:
Try using a different network drive on the worker node (if one is available). On clusters I've worked on, there is often scratch space or other NFS directly under root /. You might get lucky and find an NFS mount that is more reliable than your home directory.
Have your job work in a temporary directory local to the worker node, and write all of its output files and logs to that directory. At the end of your job, you would need to make it copy everything to your home directory on the login node on the cluster. This could be difficult with ssh if your keys are in your home directory, and could require you to copy keys to the temporary directory which is generally a bad idea unless you restrict access to your keys with file permissions.
Try getting assigned to a different node of the cluster. In my experience, academic clusters often have some nodes that are more flakey than others. Depending on local settings, you may be able to request certain nodes directly, or potentially request resources that are only available on stable nodes. If you can track which nodes are unstable, and you find your job assigned to an unstable node, you could resubmit your job, then cancel the job that is on an unstable node.
The easiest solution is to work with the cluster administrators, but I understand they don't always work on your schedule.

Unable to increase disk size on file system

I'm currently trying to log in to one of the instances created on google cloud, but found myself unable to do so. Somehow the machine escaped my attention and the hard disk got completely full. Of course I wanted to free some disk space and make sure the server running could restart, but I am facing some issues.
First off, I have found the guide on increasing the size of the persistent disk (https://cloud.google.com/compute/docs/disks/add-persistent-disk). I followed that and already set it 50 GB which should be fine for now.
However, on file system level because my disk is full I cannot make any SSH connection. The error is simply a timeout caused by the fact that there is absolutely no space for the SSH deamon to write to its log. Without any form of connection I cannot free some disk space and/or run the "resize2fs" command.
Furthermore, I already tried different approaches.
I seem to not be able to change the boot disk to something else.
I created a snapshot and tried to increase the disk size on the new
instance I created from that snapshot, but it has the same problem
(filesystem is stuck at 15GB).
I am not allowed to mount the disk as an additional disk in another
instance.
Currently I'm pretty much out of ideas. The important data on the disk was back-upped but I'd rather have the settings working as well. Does anyone have any clues as where to start?
[EDIT]
Currently still trying out new things. I have also tried to run shutdown- and startup scripts that remove /opt/* in order to free some temporary space but the script either don't run or provide some error I cannot catch. It's pretty frustrating working nearly blind I must say.
The next step for me would be to try and get the snapshot locally. It should be doable using the bucket but I will let you know.
[EDIT2]
Getting a snapshot locally is not an option either or so it seems. Images from the google cloud instances can only be created or deleted, but not downloaded.
I'm now out of ideas.
So I finally found the answer. These steps were taken:
In the GUI I increased the size of the disk to 50 GB.
In the GUI I detached the drive by deleting the machine whilst
ensuring that I did not throw away the original disk.
In the GUI I created a new machine with a sufficiently big harddisk.
On the command line (important!!) I attached the disk to the newly
created machine (the GUI option has a bug still ...)
After that I could mount the disk as a secondary disk and perform all the operations I needed.
Keep in mind: By default google cloud solutions do NOT use logical volume management, so pvresize/lvresize/etc. is not installed and resize2fs might not work out of the box.

Processing speed over mounted path

I have two scenarios.
Scenario 1: Machine A contains 1000 documents as folders. This folder of machine A is mounted in machine B. I process documents within these folders in machine B and store the output result in mounted path in machine B.
Scenario 2: The documents in machine A is directly copied into machine B and processed
Scenario 2 is much faster than Scenario 1. I could guess its because there is no data transfer happening over the network between 2 machines. Is there a way I can use mounting and still achieve better performance?
Did you try enabling a cache? - for NFS: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/fscachenfs.html - CIFS should have caching enabled by default (unless you disabled it)
The other option would be to use something like Windows’ offline files, which copies files and folders between client and server in the background, so you don’t need to deal with it. The only thing I’ve found for linux is OFS.
But the performance depends on the size of the files and if you read them randomly or sequentially. For instance when I am encoding videos, I access the file right away via the network from my NFS, because it takes as much time as it would take to read and write the file. This way no additional time is “wasted” on the encoding, as the application can encode the stream which is coming from the network.
So for large files you might want to change the algorithms to a sequential read, on the other hand small files which are copied within seconds, could be also synced between server and client using rsync, bittorrent sync, dropbox or one of the other hundreds of tools. And this is actually quite commonly done.

"Unstable" NFS mount point

First of all, this is the first time I'm posting a question on StackOverflow, so please don't kill me if I've done anything wrong.
There goes my issue:
We have few dedicated servers with a well known French provider. With one of those servers ewe have recently acquired a 5.000GB backup space which can be mounted via NFS, and that's what we've done.
The issue comes when backing up big files. Every night we back up several VM's running on that host and we know from fact that the backups are not being properly done (the file size differs a lot from one day to the other plus we've checked the content of the backup and there's stuff missing).
So, it seems like the mount point is not stable and the backups are not being properly done. Seems like there are micro network cuts and therefore the hypervisor finishes the current backup and starts with the next one.
This is how it's mounted right now:
xxx.xxx.xxx:/export/ftpbackup/xxx.ip-11-22-33.eu/ /NFS nfs auto,timeo=5,retrans=5,actimeo=10,retry=5,bg,soft,intr,nolock,rw,_netdev,mountproto=tcp 0 0
Any advise? Is there any parameter you would change?
We need to be sure that the NFS mount point is correctly working in order to have proper backups.
Thank you so much
By specifying "soft" as an option, you're saying that it's OK for the mount to be unreliable -- for the kernel to return an I/O error instead of running the I/O to completion when things are taking too long. Using a hard mount, without the "soft" option instructs the kernel to avoid returning I/O errors for timeouts.
This will fix your corrupted backups, but... your backup process will hang hard until I/O's complete. An alternative is to use much longer timeout values.
You're using TCP for the mount protocol, but not for NFS itself. If your server supports it, consider adding "tcp" to the options line.

Linux disc partition and Nginx

in the linux bible book, i've found that it will be useful to install linux on different partitions; for example separating /var will be benefinc to avoid that an attacker will fill the hard drive and stops the OS (since the page will be in (/var/www/), and letting the application which is in /usr running, (nginx for example) how can we do this?
am sorry for that question, because am new in linux system, when i've tried the first time to load another partition (the d: in windows), it asked me to mount it first (i've made a shortcut to a document in the d: and the shortcut dont work untill i mount the partition), so does it make sense to make 5 partitions (/boot, /usr, /var, /home, /tmp) to load the OS?
do the web hosters make the same strategy?
even you divide the partitions.
Attacker can fill the logs and make the web service unstable. Which mostly or defaultly located in /var/log folder. Some distros even log folder in /etc/webserver/log folder.
there are some uploading related flaws that made php upload features fill up the file limit on tmp folder.
This will not protect you at all. You must look the security from another perspective.

Resources