How can I improve Glusterfs performance with small files?

How can I improve Glusterfs performance with small files? - glusterfs

I am new with glusterfs.
I have two glusterfs server with one volume called vol1. And the vol1 was was mounted with client servers ( using fuse ) which end users where uploading their data. Here is my issue:- Gulsterfs performance is really poor with smaller files. for an eg:- getting 20 sec to execute "ls -l | wc -l" against 4580 files ( each files less than 10 KB ). On the other hand I didn't have any issue with bigger files size.
It will take few second to execute the same command against the files if i copied those to my root volumes.
My Glusterfs server is running on cloud and both servers has 1GB connectivity. And the version i am using is glusterfs 3.7.16.
I really appreciate if anyone can guide me to improve the gulster performance with samllerfiles

There're some new tuning options in GlusterFs 3.9:
gluster volume set glustervol1 features.cache-invalidation on
gluster volume set glustervol1 features.cache-invalidation-timeout 600
gluster volume set glustervol1 performance.stat-prefetch on
gluster volume set glustervol1 performance.cache-invalidation on
#Only for SMB access
gluster volume set glustervol1 performance.cache-samba-metadata on
gluster volume set glustervol1 performance.md-cache-timeout 600
It could improve second access to dir, but first listing still very slow.
Details: http://blog.gluster.org/2016/11/announcing-gluster-3-9/

Related

I have installed gitlab on a VM , with the disk of 300GB however its showing only 50GB?

I have created a Linux vm on FreeNAS with an HDD size of 300GB but
In system Info it's showing 50GB in rootfs
here is the screenshot
enter image description here
How can i increase the 50GB to higher?

Thanks for informing.
As the question here I will post the solution if anyone is looking it
This requires you to be able to ssh into the instance using the root user account and that no services be running as users out of /home on the target machine.
The examples are from a default installation with no customation-you NEED to know what you're working with for volumes/partitions to not horribly break things.
By default, CentOS 7 uses XFS for the file system and Logical Volume Manager (LVM), creating 3 partitions: /,/home and
Step 1 - Copy /home Contents
To backup the contents of /home, do the following:
mkdir /temp
cp -a /home /temp/
Once that is finished at your back at the prompt, you can proceed to step 2.
Step 2 - Unmount the /home directory
umount -fl /home
Step 3 - Note the size of the home LVM volume
We run the lvs command to display the attributes of the LVM volumes
lvs
Sample output:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
home cl -wi-a----- 406.94g
root cl -wi-ao---- 50.00g
swap cl -wi-ao---- 7.81g
Step 4 - Remove the home LVM volume
lvremove /dev/cl/home
Step 5 - Resize the root LVM volume
Based on the output of lvs above, I can safely extend the root LVM by 406GiB.
lvextend -L+406G /dev/cl/root
Step 6 - Resize the root partition
xfs_growfs /dev/mapper/cl-root
Step 7 - Copy the /home contents back into the /home directory
cp -a /temp/home /
Step 8 - Remove the temporary location
rm -rf /temp
Step 9 - Remove the entry from /etc/fstab
Using your preferred text editor, ensure you open /etc/fstab and remove the line for
Step 10 - Don't miss this!
Run the following command to sync systemd up with the changes.
dracut --regenerate-all --force
Credit : troyfontaine(github)
Source : https://gist.github.com/troyfontaine/87091bd6a5c68f45dd62ced3d12bc377

gcloud instance disk space

I am trying to do some computing on cloud. For this I created a computing instance and then I attached an external storage with about 10TB. But it seemed that I did something wrong and I got only 200GB available for my datalab. Any comment will be helpful
To check this I used
df -h
and
sudo lsblk
Thanks.

As I can see from lsblk command, you have the right size of your datalab-pd disk.
But you can use only 196 Gb.
I think this may be because the file system does not occupy the entire disk space.
Need to extend the file system.
As an example if you have ext3 fs need to do:
- umount /dev/sdb # Unmount your disk
- e2fsck /dev/sdb # Check file system in your disk
- resize2fs /dev/sdb
resize2fs command without any parameters will extend filesystem to all free space on disk.
More info: https://access.redhat.com/articles/1196353

Options for storing many small images for fast batch access on Google Cloud?

We have a few datasets of small images, where each image is about 100KB, and there about 50K images per dataset (around 5GB each dataset). We typically use these datasets to batch-load each image incrementally into a memory of a Google VM instance in order to perform machine learning studies. This is done several times a day.
Currently, a few of us each have our own Google Persistent Disk attached to the VM with the datasets replicated on each. This is not ideal since they are pricey, however, data access is very fast which allows us to iterate on our studies fairly rapidly. We don't share one disk because of the inconvenience of having to manage read/write settings with Google disks when sharing.
Is there an alternative Google Cloud option to handle this use case? Google Buckets are too slow since it is reading many small files.

If your main interest is having rapid I/O your best bet is using an SSD for obvious reasons. Why I don't understand is why you don't want to share one disk. You can have one SSD attached to one of your instances as R/W for loading and modifying your datasets and mounting it read-only to the instances that need to fetch the data.
I'm not sure how faster will be this solution compared to using a bucket, though. I guess you are aware that gsutil has an option for multithreading transfers, which exponentially increases the data transfer speed, specially when transfering a lot of small files? The flag is -m
-m Causes supported operations (acl ch, acl set, cp, mv, rm, rsync,
and setmeta) to run in parallel. This can significantly improve
performance if you are performing operations on a large number of
files over a reasonably fast network connection.
gsutil performs the specified operation using a combination of
multi-threading and multi-processing, using a number of threads
and processors determined by the parallel_thread_count and
parallel_process_count values set in the boto configuration
file. You might want to experiment with these values, as the
best values can vary based on a number of factors, including
network speed, number of CPUs, and available memory.
Using the -m option may make your performance worse if you
are using a slower network, such as the typical network speeds
offered by non-business home network plans. It can also make
your performance worse for cases that perform all operations
locally (e.g., gsutil rsync, where both source and destination
URLs are on the local disk), because it can "thrash" your local
disk.
If a download or upload operation using parallel transfer fails
before the entire transfer is complete (e.g. failing after 300 of
1000 files have been transferred), you will need to restart the
entire transfer.
Also, although most commands will normally fail upon encountering
an error when the -m flag is disabled, all commands will
continue to try all operations when -m is enabled with multiple
threads or processes, and the number of failed operations (if any)
will be reported at the end of the command's execution.
If you want to go with the instance with R/W SSD and multiple read only clients see below:
One option is to set up an NFS on your SSD, one instance will act as the NFS server with R/W rights and the rest will have only read permissions. I will be using Ubuntu 16.04 but the process is similar in all distros:
1 - Install the required packages on both server and clients:
Server: sudo apt install nfs-kernel-server
Client: sudo apt install nfs-common
2 - Mount the disk SSD disk on the server (after formatting it to the filesystem you want to use):
Server:
jordim#instance-5:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 50G 0 disk <--- My extra SSD disk
sda 8:0 0 10G 0 disk
└─sda1 8:1 0 10G 0 part /
jordim#instance-5:~$ sudo fdisk /dev/sdb
(I will create a single primary ext4 partition)
jordim#instance-5:~$ sudo fdisk /dev/sdb
(create partition)
jordim#instance-5:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 50G 0 disk
└─sdb1 8:17 0 50G 0 part <- Newly created partition
sda 8:0 0 10G 0 disk
└─sda1 8:1 0 10G 0 part /
jordim#instance-5:~$ sudo mkfs.ext4 /dev/sdb1
(...)
jordim#instance-5:~$ sudo mkdir /mount
jordim#instance-5:~$ sudo mount /dev/sdb1 /mount/
Make a dir for your NFS share folder:
jordim#instance-5:/mount$ sudo mkdir shared
Now configure the exports on your server. Add the folder to share and the private IPs of the clients. Also you can tweak permissions here, use "ro" for "read only" or "rw" for read-write permissions.
jordim#instance-5:/mount$ sudo vim /etc/exports
(inside the exports file, note the IP is the private IP of the client instance):
/mount/share 10.142.0.5(ro,sync,no_subtree_check)
Now start the nfs service on the server:
root#instance-5:/mount# systemctl start nfs-server
Now to create the mountpoint on the client:
jordim#instance-4:~$ sudo mkdir -p /nfs/share
And mount the folder:
jordim#instance-4:~$ sudo mount 10.142.0.6:/mount/share /nfs/share
Now let's test it:
Server:
jordim#instance-5:/mount/share$ touch test
Client:
jordim#instance-4:/nfs/share$ ls
test
Also, see the mounts:
jordim#instance-4:/nfs/share$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 1.8G 0 1.8G 0% /dev
tmpfs 370M 9.9M 360M 3% /run
/dev/sda1 9.7G 1.5G 8.2G 16% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
tmpfs 370M 0 370M 0% /run/user/1001
10.142.0.6:/mount/share 50G 52M 47G 1% /nfs/share
There you go, now you have only one instance with a r/w disk and as many clients as you want with read only permissions.

Make a backup of the whole server that can be restored later

I have a server with the following disk structure:
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 219G 192G 17G 93% /
tmpfs 16G 0 16G 0% /lib/init/rw
udev 16G 124K 16G 1% /dev
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda2 508M 38M 446M 8% /boot
/dev/sdb1 2.7T 130G 2.3T 5% /media/3TB1
I am interested in making backup of the whole server on my local machine. When the time comes I want to be able to restore a new server from my local machine backup. What procedure do you recommend?
I tried rsync, but the indexing took extremely long so I aborted it. Than I used scp, and well, it is currently working. There is lots of symbolic links that weren't transferred to the local machine, and I worry I won't be able to restore it later on.

Since your sda isn't very large and a lot of it is used anyway, I'd create a complete backup of the block device. Your sdb, however, is very large and used only to a small part. Of that I'd create a file system backup.
Boot your server with a Ubuntu live CD and become root (sudo su -).
Attach your backup medium (I assume it's mounted as /mnt/backup/ in the following).
Create a block device backup of sda: cat /dev/sda > /mnt/backup/sda
Mount your sdb (I assume it's mounted as /media/3TB1/ in the following).
Create a file system backup of sdb: rsync -av /media/3TB1/ /mnt/backup/sdb/
For restoring the backup later:
Boot your server with a Ubuntu live CD and become root (sudo su -).
Attach your backup medium (I assume it's mounted as /mnt/backup/ in the following).
Restore the block device backup of sda: cat /mnt/backup/sda > /dev/sda
Mount your sdb (I assume it's mounted as /media/3TB1/ in the following).
Restore the file system backup of sdb: rsync -av /mnt/backup/sdb/ /media/3TB1/
There are more fancy ways of doing it for sure. But this routine worked for me lots of times.

A backup of that size should take a long time to copy over the internet in any case: rsync, cp , dd ..etc, the time taken to copy the file depends on your internet speed.
In my opinion, rsync is the way to go, but if you're not willing to wait that long for the download to complete (I wouldn't either) I highly suggest backing your disk up on another remote server, unless you don't plan on restoring it later since uploading would be a pain too (especially on ADSL).
You have a few options:
Ask your data center for disk redundancy.
A cheap and highly unrecommended solution is to backup your most important data on a file sharing web service, eg. Dropbox (As far as I remember they had a shell API for many tasks including uploading files, which can be used for automatic backups).
Wait for the download to finish.
Go with #Alfe's solution, which is pretty neat in my opinion.

Data deleted from /mnt directory after stop and start EC2 instance

Data deleted from /mnt directory after stop and start EC2 instance
[root#localhost opt]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 9.7G 1.4G 7.8G 15% /
none 1.9G 0 1.9G 0% /dev/shm
/dev/xvdb 394G 199M 374G 1% /mnt
I place my data in /mnt .I stop instance yesterday .
Afetr starting the instance today ,I didnt find any data form /mnt .
I have another from /opt .
How can i recover that data from /mnt .
If /mnt is Temporary mounting point .Then how can i use these all space

On EC2, /mnt directory is mounted to ephemeral storage.
After reboot or instance stop/start, all data is lost.
Please refer to this post.

It is a common misconception that a reboot/restart will wipe the ephemeral storage - this isn't true.
You can try it yourself and see.
What will is a stop/start - that actually deprovisions your VM, and then moves it to another host machine - which will have wiped ephemeral drive(s) and then starts it up with your root ebs (at least) attached. Stop/start and reboot are often conflated - but they are very different things here.

/mnt should really just be used for ephemeral data storage that is not critical if instance needs to be restarted. This is actually well suited for things such as local on-disk cache, temporary data storage, etc. as this ephemeral storage will oftentimes perform better from an I/O standpoint than say an EBS volume mount. Just understand that you should only place non-critical data there.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can I improve Glusterfs performance with small files? - glusterfs

Related

I have installed gitlab on a VM , with the disk of 300GB however its showing only 50GB?

gcloud instance disk space

Options for storing many small images for fast batch access on Google Cloud?

Make a backup of the whole server that can be restored later

Data deleted from /mnt directory after stop and start EC2 instance

Categories

Resources