How to make rsync read SRC from STDIN?

How to make rsync read SRC from STDIN? - linux

I want to dump my MySQL database and make daily backups with rsync.
First approach I came up with is something like mysqldump -ufoo -pbar baz > /var/tmp/baz.sql && rsync /var/tmp/baz.sql /backup/ && rm /var/tmp/baz.sql.
Then I started to wonder if it is possible not to use the temporary file /var/tmp/baz.sql, but instead to pipe the output of mysqldump directly to rsync.
To be more specific, what I want is quite similar to a command line which we use to update the GPG key for apt in Ubuntu: gpg --export --armor CE49EC21 | sudo apt-key add -, where the receiver of the pipe supports this '-' argument indicating it'll read from stdin. I suppose rsync doesn't have a similar argument. But I wanna know if there is a workaround.

This is right, it doesn't work this way. It is because rsync is made to transfer complete file trees from A to B.
Because of the way rsync works, it cannot work, because rsync calculates several checksums before choosing to transfer a particular file (or parts of it), and doing so in only 2 iterations (ping-pong-steps).
That means a file has to be read several times. That would not work with a (potentially large) SQL dump because it would have to be buffered somehow. And this buffering is up to the user.
Actually storing the file should be the best workaround, especially if it is a file which only gets gradual differences.

Related

Elegantly send local tarball and untar on remote end

All,
This might be a FAQ, but I can't get my search-fu to find it. Namely, I kind of want to do the "reverse" tar pipe. Usually a tar pipe used to send a local folder to a remote location as a tar ball in a single nice command:
tar zcvf - ~/MyFolder | ssh user#remote "cat > ~/backup/MyFolder.tar.gz"
(I hope I got that right. I typed it from memory.)
I'm wondering about the reverse situation. Let's say I locally have a tarball of a large directory and what I want to do is copy it (rsync? scp?) to a remote machine where it will live as the expanded file, i.e.,:
Local: sourcecode.tar.gz ==> send to Remote and untar ==>
Remote: sourcecode/
I want to do this because the "local" disk has inode pressure so keeping a single bigger file is better than many smaller files. But the remote system is one with negligible inode pressure, and it would be preferable to keep it as an expanded directory.
Now, I can think of various ways to do this with &&-command chaining and the like, but I figure there must be a way to do this with tar-pipes and rsync or ssh/scp that I am just not seeing.

You're most of the way there:
ssh user#remote "tar -C /parent/directory -xz -f-" < sourcecode.tar.gz
Where -f- tells tar to extract from stdin, and the -C flag changes directory before untarring.

Find a string in Perforce file without syncing

Not sure if this is possible or not, but I figured I'd ask to see if anyone knows. Is it possible to find a file containing a string in a Perforce repository? Specifically, is it possible to do so without syncing the entire repository to a local directory first? (It's quite large - I don't think I'd have room even if I deleted lots of stuff - that's what the archive servers are for anyhow.)
There's any number of tools that can search through files in a local directory (I personally use Agent Ransack, but it's just one of many), but these will not search a remote Perforce directory, unless there's some (preferably free) tool I'm not aware of that has this capability, or maybe some hidden feature within Perforce itself?

p4 grep is your friend. From the perforce blog
'p4 grep' allows users to use simple file searches as well as regular
expressions to search through file contents of head as well as earlier
revisions of files stored on the server. While not every single option
of a standard grep is supported, the most important options are
available. Here is the syntax of the command according to 'p4 help
grep':
p4 grep [ -a -i -n -v -A after -B before -C context -l -L -t -s -F -G ] -e pattern file[revRange]...
See also, the manual page.
Update: Note that there is a limitation on the number of files that Perforce will search in a single p4 grep command. Presumably this is to help keep the load on the server down. This manifests as an error:
Grep revision limit exceeded (over 10000).
If you have sufficient perforce permissions, you can use p4 configure to increase the dm.grep.maxrevs setting from this default of 10K to something larger. e.g. to set to 1 million:
p4 configure set dm.grep.maxrevs=1M
If you do not have permission to change this, you can work around it by splitting the p4 grep up into multiple commands over the subdirectories. You may have need to split further into sub-subdirectories etc depending on your depot structure.
For example, this command can be used at a bash shell to search each subdirectory of //depot/trunk one at a time. Makes use of the p4 dirs command to obtain the list of subdirectories from the server.
for dir in $(p4 dirs //depot/trunk/*); do
p4 grep -s -i -e the_search_string $dir/...
done

Actually, solved this one myself. p4 grep indeed does the trick. Doc here. You have to carefully narrow it down before it'll work properly - on our server at least you have to get it down to < 10000 files. I also had to redirect the output to a file instead of printing it out in the console, adding > output.txt, because there's a limit of 4096 chars per line in the console and the file paths are quite long.

It's not something you can do with the standard perforce tools. One helpful command might be p4 print but it's not really faster than syncing I would think.

This is a big if but if you have access to the server you can run agent ransack on the perforce directory. Perforce stores all versioned files on disk, it's only the metadata that's in a database.

Ideal way to use wget to download and install using temp directory?

I am trying to work out the proper process of installing with Wget, in this example I'll use Nginx.
# Download nginx to /tmp/ directory
wget http://nginx.org/download/nginx-1.3.6.tar.gz -r -P /tmp
# Extract nginx into /tmp/nginx directory
tar xzf nginx-1.3.6.tar.gz -C /tmp/nginx
# Configure it to be installed in opt
./configure --prefix=/opt/nginx
# Make it
make
# Make install
make install
# Clean up temp folder
rm -r /tmp/*
Is this the idealised process? Is there anything I can improve on?

First of all, you definitely seem to reinvent the wheel: if the problem that you want to solve is automated packaging / building software on target systems, then there are myriads of solutions available, in form of various package management systems, port builders, etc.
As for your shell script, there are a couple of things you should consider fixing:
Stuff like http://nginx.org/download/nginx-1.3.6.tar.gz or nginx-1.3.6.tar.gz are constants. Try to extract all constants in separate variables and use them to make maintaining this script a little bit easier, for example:
NAME=nginx
VERSION=1.3.6
FILENAME=$NAME-$VERSION.tar.gz
URL=http://nginx.org/download/$FILENAME
TMP_DIR=/tmp
INSTALL_PREFIX=/opt
wget "$URL" -r -P "$TMP_DIR"
tar xzf "$FILENAME" -C "$TMP_DIR/nginx"
You generally can't be 100% sure that wget exists on target deployment system. If you want to maximize portability, you can try to detect popular networking utilities, such as wget, curl, fetch or even lynx, links, w3m, etc.
Proper practices on using a temporary directory is a long separate question, but, generally, you'll need to adhere to 3 things:
One should somehow find out the temporary directory location. Generally, it's wrong to assume that /tmp is always a temporary directory, as it can be not mounted, it can be non-writable, if can be tmpfs filesystem which is full, etc, etc. Unfortunately, there's no portable and universal way to detect what temporary directory is. The very least one should do is to check out contents of $TMPDIR to make it possible for a user to point the script to proper temporary dir. Another possibly bright idea is a set of heuristic checks to make sure that it's possible to write to desired location (checking at least $TMPDIR, $HOME/tmp, /tmp, /var/tmp), there's decent amount of space available, etc.
One should create a temporary directory in a safe manner. On Linux systems, mktemp --tmpdir -d some-unique-identifier.XXXXXXXXX is usually enough. On BSD-based systems, much more manual work needed, as default mktemp implementation is not particularly race-resistant.
One should clean up temporary directory after use. Cleaning should be done not only on a successful exit, but also in a case of failure. This can be remedied with using a signal trap and a special cleanup callback, for example:
# Cleanup: remove temporary files
cleanup()
{
local rc=$?
trap - EXIT
# Generally, it's the best to remove only the files that we
# know that we have created ourselves. Removal using recursive
# rm is not really safe.
rm -f "$LOCAL_TMP/some-file-we-had-created"
[ -d "$LOCAL_TMP" ] && rmdir "$LOCAL_TMP"
exit $rc
}
trap cleanup HUP PIPE INT QUIT TERM EXIT
# Create a local temporary directory
LOCAL_TMP=$(mktemp --tmpdir -d some-unique-identifier.XXXXXXXXX)
# Use $LOCAL_TMP here
If you really want to use recursive rm, then using any * to glob files is a bad practice. If your directory would have more than several thousands of files, * would expand to too much arguments and overflow shell's command line buffer. I might even say that using any globbing without a good excuse is generally a bad practice. The rm line above should be rewritten at least as:
rm -f /tmp/nginx-1.3.6.tar.gz
rm -rf /tmp/nginx
Removing all subdirectories in /tmp (as in /tmp/*) is a very bad practice on a multi-user system, as you'll either get permission errors (you won't be able to remove other users' files) or you'll potentially heavily disrupt other people's work by removing actively used temporary files.
Some minor polishing:
POSIX-standard tar uses normal short UNIX options nowadays, i.e. tar -xvz, not tar xvz.
Modern GNU tar (and, AFAIR, BSD tar too) doesn't really need any of "uncompression" flags, such as -z, -j, -y, etc. It detects archive/compression format itself and tar -xf is sufficient to extract any of .tar / .tar.gz / .tar.bz2 tarballs.

That's the basic idea. You'll have to run the make install command as root (or the whole script if you want). Your rm -r /tmp/* should be rm -r /tmp/nginx because other commands might have stuff they're working on in the tmp directory.
It should also be noted that the chances that building from source like that will work with no modifications for a decently sized project is fairly low. Generally you will find you need to specify a path to a library explicitly or some code doesn't quite compile correctly on your distribution.

Inject parameter in hardcoded tar command

I'm using a linux software solution that uses the tar command to backup huge amounts of data.
The command which is hardcoded into the binary which calls the tar is:
/bin/tar --exclude "/backup" --exclude / --ignore-failed-read -cvjf - /pbackup 2>>'/tar_err.log' | split -b 1000m - '/backup/temp/backup.tar.bz2'
There is no chance to change the command, as it is harcoded. It uses bzip2 to compress the data. I experienced a strong performance improvement (up to 60%) when using the parameter --use-compress-prog=pbzip2 which utilizes all CPU cores.
By symlinking the bzip2 from /bin/bzip2 to the pbzip2 binary I tried to trick the software, however when monitoring the process it still uses bzip2 as I tink this is built into tar.
I know it is a tricky question but is there any way to utilize pbzip2 without changing this command that is externally called?
My system is Debian Sequeeze.
Thanks very much!

Danger: ugly solution ahead; backup the binary before proceeding
First of all, check if the hardcoded string is easily accessible: use strings on your binary, and see if it displays the string you said (probably it will be in several pieces, e.g. /bin/tar, --exclude, --ignore-failed-read, ...).
If this succeeds, grab your hex editor of choice, open the binary and look for the hardcoded string; if it's split in several pieces, the one you need is the one containing /bin/tar; overwrite tar with some arbitrary three-letter name, e.g. fkt (fake tar; a quick Google search didn't turn up any result for /usr/bin/fkt, so we should be safe).
The program should now call your /usr/bin/fkt instead of the regular tar.
Now, put in your /bin a script like this:
#!/bin/sh
/bin/tar --use-compress-prog=pbzip2 $*
call it with the name you chose before (fkt) and set the permissions correctly (they should be 755 and owned by root). This script just takes all the parameters it gets and call the real tar, adding in front of them the parameter you need.
Another solution, that I suggested in the comments, may be creating a chroot just for the application, renaming tar to some other name (realtar, maybe?) and calling the script above tar (obviously now you should change the /bin/tar inside the script to /bin/realtar).
If the program is not updated very often and the trick worked at the first try I would probably go with the first solution, setting up and maintaining chroots is not fun.

Why not move /bin/tar to (say) /bin/tar-original
Then create a script /bin/tar to do whatever you want it to do.

Preventing harmful gzip files from being extracted

The directory of which users have their backups for their files is located in a directory which they can access and upload to.
If they get the naming scheme right and cause an error on purpose that makes the system try to restore the last 5 or so backups, they could potentially put files they want onto the server by using a absolute path gzip file such as ../../../../../etc/passwd or whatever may be the case.
What checks can I perform to prevent this from happening programmatically in BASH
The following command is what is ran by root (it gets ran by root because I use webmin):
tar zxf /home/$USER/site/backups/$BACKUP_FILE -C /home/$USER/site/data/
Where $BACKUP_FILE will be the name of the backup it's attempting to restore
edit 1:
This is what I came up with so far. I am sure this way could be improved a lot:
CONTENTS=$(tar -ztvf /home/$USER/site/backups/$BACKUP_FILE | cut -c49-200)
for FILE in $CONTENTS; do
if [[ $FILE =~ \.\. ]] || [[ $FILE =~ ^\/ ]]; then
echo "Illegal characters in contents"
exit 1
fi
done
tar zxf /home/$USER/site/backups/$BACKUP_FILE -C /home/$USER/site/data/
exit 0
I am wondering if disallowing it to begin with / and not allow the .. will be enough? also is character 50+ normal for the output of tar -ztvf ?

Usually tar implementations strip a leading / and don't extract files with .., so all you need to do is check your tar manpage and don't use the -P switch.
Another thing tar should protect you from is symlink attacks: a user creates the file $HOME/foo/passwd, gets it backed up, removes it and instead symlinks $HOME/foo to /etc, then restores the backup. Signing the archive would not help with this, although running with user privileges would.

Try restoring every backup as the backup owner using su:
su $username -c tar xzvf ...
(You might also need the -l option in some cases.)
Make sure that you really understand the requirements of the program to run as root. The process you are running has no need for any other privilege than read access to one directory and write access to another one so even the user account privilege is an overkill not to even mention root. This is just asking for trouble.

I'm assuming the backups are generated by your scripts, and not the user. (Extracting arbitrary user created tar files are never ever a good idea).
If your backups have to be store in a directory writeable by users, I would suggest you digitally sign each backup file so that it's integity can be validated. You can then verify that a tar file is legit before using it for restoration.
An example using GPG:
gpg --armor --detach-sign backup_username_110217.tar.gz
That creates a signature file backup_username_110217.tar.gz.asc which can be used to verify the file using:
gpg --verify backup_username_110217.tar.gz.asc backup_username_110217.tar.gz
Note that to run that in a script, you'll need to create your keys without a passphrase. Otherwise, you'll have to store the password in your scripts as plain text which is a horrid idea.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string