What's a good approach to mirroring a production environment's data for dev? We have one production server in place that mounts many smb shares which several scripts run against routinely.
We now have a separate server for development that we want to keep separate for testing. How do I get sample data from all those smb shares without copying them all? The dev server couldn't hold all that data so I'm looking for something that could routinely run and just copy the first X files out of each directory.
The goal is to have the dev server be "safe" and not mount those same shares during testing.
For a development environment I like to have:
Known good data
Known (constructed) bad data
Random sample of live data
What I mean by "constructed" is data that I have put together in a certain way so I know exactly how it is bad.
In your case I'd have my good and bad data and then write a little Bash script to copy data from the SMB shares to the local dev machine. Maybe run a ls -t on each of the shares so you can grab the newest files, save that output to a file and use head or some other utility to read the first N lines - and copy those files to your dev machine.
pseudo code
clear data directory
copy known good data from some local directory
copy known bad data from some local directory
begin loop: for every SMB share
run `ls -t` and output the results to a file
run `head` or some other utility to get the first N lines (ie file names)
copy those files from the SMB share to my local data directory
end loop
You could set up cron to execute this little script however often you want.
Related
I have two scenarios.
Scenario 1: Machine A contains 1000 documents as folders. This folder of machine A is mounted in machine B. I process documents within these folders in machine B and store the output result in mounted path in machine B.
Scenario 2: The documents in machine A is directly copied into machine B and processed
Scenario 2 is much faster than Scenario 1. I could guess its because there is no data transfer happening over the network between 2 machines. Is there a way I can use mounting and still achieve better performance?
Did you try enabling a cache? - for NFS: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/fscachenfs.html - CIFS should have caching enabled by default (unless you disabled it)
The other option would be to use something like Windows’ offline files, which copies files and folders between client and server in the background, so you don’t need to deal with it. The only thing I’ve found for linux is OFS.
But the performance depends on the size of the files and if you read them randomly or sequentially. For instance when I am encoding videos, I access the file right away via the network from my NFS, because it takes as much time as it would take to read and write the file. This way no additional time is “wasted” on the encoding, as the application can encode the stream which is coming from the network.
So for large files you might want to change the algorithms to a sequential read, on the other hand small files which are copied within seconds, could be also synced between server and client using rsync, bittorrent sync, dropbox or one of the other hundreds of tools. And this is actually quite commonly done.
We're using putty and a ssh connection to our webhost. They backup our files daily onto their servers.
Since the backup files use a large amount of space, we now want to copy the backup files to our own server via a cronjob daily.
How do we have to set up the cronjob?
If you know the the backup filepath and filename (eg: backup_ddmmyyyy.tar.gz), you can simply scp that backup file from one server to another.
Put this scp command inside a shell script, and configure it accordingly with server address of the other server, and location where you want to copy the file.
Since your backup files use a large amount of space, my guess is, they are large sized individually as well, so using rsync over ssh instead of a plain scp might be a better option to compensate for network failures.
Once your script is working, you can put in a cronjob for that script for an appropriate time, after the backups on webhost are guaranteed to be over.
I have a local Linux server that I'm using to backup two remote Windows 7 boxes over an IPsec VPN tunnel connection. I have the user's Documents folders shared on the remote PC's and have mounted those shares (CIFS) on my local Linux server.
I'm going to use a cron job to run rsync on my local Linux server to create backups of these folders and am currently considering the -avz args to accomplish this.
My question is this: does the -z arg do anything for me since the mount is to a remote machine? As I understand it, -z compresses the data before sending it which definitely makes sense if the job were being run from the remote PC but, it seems like I'm compressing data that's already been pulled through the network given my setup (which seems like it would increase the backup time by adding an unnecessary step).
What are your thoughts? Should I use -z given my setup?
Thanks!
It won't save you anything. To compress the file, rsync needs to read it's contents (in blocks) and then compress them. Since reading the blocks is going to happen over the wire, pre-compression, you save no bandwidth and gain a bit of overhead from the compression itself.
we have a network of several machines and we want to distribute a big directory (ca. 10 GB) to every box.
It is located on an nfs-server and is mounted on all machines, so first approach is to just use normal cp to copy the files from the mounted to a local directory. This is easy, but unfortunately there is no progress bar, because it is not intended to use it for network copies (or is it?).
Using scp is intended for copying across network, but it may encrypt everything and therefore be slow.
Should one be faster, and if so, which: cp on nfs-mount or scp?
You can always use rsync, it can show you the progress (with --progress option) and is more lightweight than scp.
You can enable compression manually with -z.
I have 4 computers in a network. I am curios if anyone know how i can i make python to look for some files in different folders from the network or if i can create a mount point that include some folders from different computers. The reason is that i am running a script that needs to open some daemons on different computer. For instance i have the following folders from which i need to run:
/temp on 10.18.2.25
/opt on 10.18.2.35
/var-temp on 10.18.4.12
/spam on 10.18.2.17
I am using the command os.system('exec .....') to launch it , but only works for the current directory.
It sounds like you don't merely want to execute files stored on different machines on one machine, but on the machines they're stored on. Mounting won't help with that.
You'd need a daemon already running on the target, and tell it over the network to execute a file. xinetd is a common one.