Inject parameter in hardcoded tar command - linux

I'm using a linux software solution that uses the tar command to backup huge amounts of data.
The command which is hardcoded into the binary which calls the tar is:
/bin/tar --exclude "/backup" --exclude / --ignore-failed-read -cvjf - /pbackup 2>>'/tar_err.log' | split -b 1000m - '/backup/temp/backup.tar.bz2'
There is no chance to change the command, as it is harcoded. It uses bzip2 to compress the data. I experienced a strong performance improvement (up to 60%) when using the parameter --use-compress-prog=pbzip2 which utilizes all CPU cores.
By symlinking the bzip2 from /bin/bzip2 to the pbzip2 binary I tried to trick the software, however when monitoring the process it still uses bzip2 as I tink this is built into tar.
I know it is a tricky question but is there any way to utilize pbzip2 without changing this command that is externally called?
My system is Debian Sequeeze.
Thanks very much!

Danger: ugly solution ahead; backup the binary before proceeding
First of all, check if the hardcoded string is easily accessible: use strings on your binary, and see if it displays the string you said (probably it will be in several pieces, e.g. /bin/tar, --exclude, --ignore-failed-read, ...).
If this succeeds, grab your hex editor of choice, open the binary and look for the hardcoded string; if it's split in several pieces, the one you need is the one containing /bin/tar; overwrite tar with some arbitrary three-letter name, e.g. fkt (fake tar; a quick Google search didn't turn up any result for /usr/bin/fkt, so we should be safe).
The program should now call your /usr/bin/fkt instead of the regular tar.
Now, put in your /bin a script like this:
#!/bin/sh
/bin/tar --use-compress-prog=pbzip2 $*
call it with the name you chose before (fkt) and set the permissions correctly (they should be 755 and owned by root). This script just takes all the parameters it gets and call the real tar, adding in front of them the parameter you need.
Another solution, that I suggested in the comments, may be creating a chroot just for the application, renaming tar to some other name (realtar, maybe?) and calling the script above tar (obviously now you should change the /bin/tar inside the script to /bin/realtar).
If the program is not updated very often and the trick worked at the first try I would probably go with the first solution, setting up and maintaining chroots is not fun.

Why not move /bin/tar to (say) /bin/tar-original
Then create a script /bin/tar to do whatever you want it to do.

Related

I am having trouble getting the ls command file back

I want to move /bin/ls to /root, but I typed a wrong dir:
mv /bin/ls /roo
Now I couldn't find the ls command file, how can I retrieve it?
First of all, why do you want to do that?? Careful with root privilege!!
Unless you have an extremely good reason and know exactly what you're doing, don't move unix commands from /bin. For one thing, other OS components and libraries may depend on them and you could totally hose your system.
ls is used from various binaries in subprocesses to list files.
Do this to recover, if you're sure what you're showing here is what you did to move it exactly.
mv /roo /bin/ls

"Spoof" File Extension In Bash

Is there a way to "spoof" the file extension of a file in bash for consumption by another program? I can think of doing some shell scripting and making lots of soft-links, but that isn't very scalable.
Let's imagine I have a program I'm trying to use that requires input files to be of a specific file extension, and it has no method of turning off this check.
You could make a fifo with the requisite extension and cat any other file type into it. So, if your crazy program needs to see files that end in .funky, you can do this:
mkfifo file.funky
cat someotherfile > file.funky &
someprogram file.funky
Create a symbolic link for each file you want to have a particular extension, then pass the name of the symlink to the command.
For example suppose you have files with names of the form *.foo and you need to refer to them with extensions of .bar:
for file in *.foo ; do
ln -s $file _$$_$file.bar
done
I precede each symlink name with _$$_ to avoid the possibility of colliding with an existing file name (you don't want to do ln -s file.foo file.bar if file.bar already exists).
With a little more programming, your script can keep track of which symlinks it created and, if you like, clean them up after executing the command.
This assumes, as you stated in the question, that the command can't be forced to accept a different extension.
You could, without too much difficulty, create a wrapper script that replaces the command in question, creating the symlinks, invoking the command, and cleaning up after itself automatically.

How to make rsync read SRC from STDIN?

I want to dump my MySQL database and make daily backups with rsync.
First approach I came up with is something like mysqldump -ufoo -pbar baz > /var/tmp/baz.sql && rsync /var/tmp/baz.sql /backup/ && rm /var/tmp/baz.sql.
Then I started to wonder if it is possible not to use the temporary file /var/tmp/baz.sql, but instead to pipe the output of mysqldump directly to rsync.
To be more specific, what I want is quite similar to a command line which we use to update the GPG key for apt in Ubuntu: gpg --export --armor CE49EC21 | sudo apt-key add -, where the receiver of the pipe supports this '-' argument indicating it'll read from stdin. I suppose rsync doesn't have a similar argument. But I wanna know if there is a workaround.
This is right, it doesn't work this way. It is because rsync is made to transfer complete file trees from A to B.
Because of the way rsync works, it cannot work, because rsync calculates several checksums before choosing to transfer a particular file (or parts of it), and doing so in only 2 iterations (ping-pong-steps).
That means a file has to be read several times. That would not work with a (potentially large) SQL dump because it would have to be buffered somehow. And this buffering is up to the user.
Actually storing the file should be the best workaround, especially if it is a file which only gets gradual differences.

Ideal way to use wget to download and install using temp directory?

I am trying to work out the proper process of installing with Wget, in this example I'll use Nginx.
# Download nginx to /tmp/ directory
wget http://nginx.org/download/nginx-1.3.6.tar.gz -r -P /tmp
# Extract nginx into /tmp/nginx directory
tar xzf nginx-1.3.6.tar.gz -C /tmp/nginx
# Configure it to be installed in opt
./configure --prefix=/opt/nginx
# Make it
make
# Make install
make install
# Clean up temp folder
rm -r /tmp/*
Is this the idealised process? Is there anything I can improve on?
First of all, you definitely seem to reinvent the wheel: if the problem that you want to solve is automated packaging / building software on target systems, then there are myriads of solutions available, in form of various package management systems, port builders, etc.
As for your shell script, there are a couple of things you should consider fixing:
Stuff like http://nginx.org/download/nginx-1.3.6.tar.gz or nginx-1.3.6.tar.gz are constants. Try to extract all constants in separate variables and use them to make maintaining this script a little bit easier, for example:
NAME=nginx
VERSION=1.3.6
FILENAME=$NAME-$VERSION.tar.gz
URL=http://nginx.org/download/$FILENAME
TMP_DIR=/tmp
INSTALL_PREFIX=/opt
wget "$URL" -r -P "$TMP_DIR"
tar xzf "$FILENAME" -C "$TMP_DIR/nginx"
You generally can't be 100% sure that wget exists on target deployment system. If you want to maximize portability, you can try to detect popular networking utilities, such as wget, curl, fetch or even lynx, links, w3m, etc.
Proper practices on using a temporary directory is a long separate question, but, generally, you'll need to adhere to 3 things:
One should somehow find out the temporary directory location. Generally, it's wrong to assume that /tmp is always a temporary directory, as it can be not mounted, it can be non-writable, if can be tmpfs filesystem which is full, etc, etc. Unfortunately, there's no portable and universal way to detect what temporary directory is. The very least one should do is to check out contents of $TMPDIR to make it possible for a user to point the script to proper temporary dir. Another possibly bright idea is a set of heuristic checks to make sure that it's possible to write to desired location (checking at least $TMPDIR, $HOME/tmp, /tmp, /var/tmp), there's decent amount of space available, etc.
One should create a temporary directory in a safe manner. On Linux systems, mktemp --tmpdir -d some-unique-identifier.XXXXXXXXX is usually enough. On BSD-based systems, much more manual work needed, as default mktemp implementation is not particularly race-resistant.
One should clean up temporary directory after use. Cleaning should be done not only on a successful exit, but also in a case of failure. This can be remedied with using a signal trap and a special cleanup callback, for example:
# Cleanup: remove temporary files
cleanup()
{
local rc=$?
trap - EXIT
# Generally, it's the best to remove only the files that we
# know that we have created ourselves. Removal using recursive
# rm is not really safe.
rm -f "$LOCAL_TMP/some-file-we-had-created"
[ -d "$LOCAL_TMP" ] && rmdir "$LOCAL_TMP"
exit $rc
}
trap cleanup HUP PIPE INT QUIT TERM EXIT
# Create a local temporary directory
LOCAL_TMP=$(mktemp --tmpdir -d some-unique-identifier.XXXXXXXXX)
# Use $LOCAL_TMP here
If you really want to use recursive rm, then using any * to glob files is a bad practice. If your directory would have more than several thousands of files, * would expand to too much arguments and overflow shell's command line buffer. I might even say that using any globbing without a good excuse is generally a bad practice. The rm line above should be rewritten at least as:
rm -f /tmp/nginx-1.3.6.tar.gz
rm -rf /tmp/nginx
Removing all subdirectories in /tmp (as in /tmp/*) is a very bad practice on a multi-user system, as you'll either get permission errors (you won't be able to remove other users' files) or you'll potentially heavily disrupt other people's work by removing actively used temporary files.
Some minor polishing:
POSIX-standard tar uses normal short UNIX options nowadays, i.e. tar -xvz, not tar xvz.
Modern GNU tar (and, AFAIR, BSD tar too) doesn't really need any of "uncompression" flags, such as -z, -j, -y, etc. It detects archive/compression format itself and tar -xf is sufficient to extract any of .tar / .tar.gz / .tar.bz2 tarballs.
That's the basic idea. You'll have to run the make install command as root (or the whole script if you want). Your rm -r /tmp/* should be rm -r /tmp/nginx because other commands might have stuff they're working on in the tmp directory.
It should also be noted that the chances that building from source like that will work with no modifications for a decently sized project is fairly low. Generally you will find you need to specify a path to a library explicitly or some code doesn't quite compile correctly on your distribution.

How to update tar (NOT append)

I want to update an existing tar file with newer files.
At GNU, I read:
4.2.3 Updating an Archive
In the previous section, you learned how to use ‘--append’ to add a
file to an existing archive. A related operation is ‘--update’ (‘-u’).
The ‘--update’ operation updates a tar archive by comparing the date
of the specified archive members against the date of the file with the
same name. If the file has been modified more recently than the
archive member, then the newer version of the file is added to the
archive (as with ‘--append’).
However,
When I run my tar update command, the files are appended even though their modification dates are exactly the same. I want to ONLY append where modification dates of files to be tarred are newer than those already in the tar...
tar -uf ./tarfile.tar /localdirectory/ >/dev/null 2>&1
Currently, every time I update, the tar doubles in size...
The update you describe implies that the file within the archive is replaced. If the new copy is smaller than what's in the archive, it could be directly rewritten. If the new copy however is larger, tar would have to zero the existing archive entry and append. Such updates would leave runs of '\0's or other unused bytes, so any normal computer user would want that such sections are removed, which would be done by "moving up" bytes comprising the archive contents towards the start of the file (think C's memmove).
Such an in-place move operation however, which would involve seek-read-seek-write cycles, is costly, especially when you look at it in the context of tapes — which tar was designed for originally —, i.e. devices with a seek performance that is not comparable to harddisks. You'd wear out the tape rather quickly with such move ops. Oh and of course, WORM devices don't support this move op either.
If you do not want to use the "-P" switch tar -u... works correctly if the current directory is the parent directory of the one we are going to update, and the path to this directory in the tar command will not be an absolute path.
For exmple:
We want to update catalog /home/blabla/Dir. We do it like that:
cd /home/blabla
tar -u -f tarfile.tar Dir
In general, the update must be made from the same place as the creation, so that the paths agree.
It is also possible:
cd /home/blabla/Dir
tar-u -f /path/to/tarfile.tar .
You may simply create (instead of update) the archive each time:
tar -cvpf tarfile.tar *
This will solve the problem of your archive doubling in size each time. But of course, it is generating the whole archive every time.
By default tar strips the leading / from member names, but it does this after deciding what needs to be updated.
Therefore if you are archiving an absolute path, you either need to cd / and use relative paths, or add the -P/--absolute-names option.
cd /
tar -uf "$OLDPWD/tarfile.tar" localdirectory/ >/dev/null 2>&1
tar -cPf tarfile.tar /localdirectory/ >/dev/null 2>&1
tar -uPf tarfile.tar /localdirectory/ >/dev/null 2>&1
However, the updated items will still be appended. A tar (tape archive) file cannot be modified excpet by appending.
Warning! When speaking about "dates" it means any date, and that includes the access time.
Should your files have been accessed in any such way (a simple ls -l is enough) then tar is right to do what it does!
You need to find another way to do what you want. Probably use a sentinel file and see if its modification date is less than the files you wish to append.

Resources