Preventing harmful gzip files from being extracted

Preventing harmful gzip files from being extracted - security

The directory of which users have their backups for their files is located in a directory which they can access and upload to.
If they get the naming scheme right and cause an error on purpose that makes the system try to restore the last 5 or so backups, they could potentially put files they want onto the server by using a absolute path gzip file such as ../../../../../etc/passwd or whatever may be the case.
What checks can I perform to prevent this from happening programmatically in BASH
The following command is what is ran by root (it gets ran by root because I use webmin):
tar zxf /home/$USER/site/backups/$BACKUP_FILE -C /home/$USER/site/data/
Where $BACKUP_FILE will be the name of the backup it's attempting to restore
edit 1:
This is what I came up with so far. I am sure this way could be improved a lot:
CONTENTS=$(tar -ztvf /home/$USER/site/backups/$BACKUP_FILE | cut -c49-200)
for FILE in $CONTENTS; do
if [[ $FILE =~ \.\. ]] || [[ $FILE =~ ^\/ ]]; then
echo "Illegal characters in contents"
exit 1
fi
done
tar zxf /home/$USER/site/backups/$BACKUP_FILE -C /home/$USER/site/data/
exit 0
I am wondering if disallowing it to begin with / and not allow the .. will be enough? also is character 50+ normal for the output of tar -ztvf ?

Usually tar implementations strip a leading / and don't extract files with .., so all you need to do is check your tar manpage and don't use the -P switch.
Another thing tar should protect you from is symlink attacks: a user creates the file $HOME/foo/passwd, gets it backed up, removes it and instead symlinks $HOME/foo to /etc, then restores the backup. Signing the archive would not help with this, although running with user privileges would.

Try restoring every backup as the backup owner using su:
su $username -c tar xzvf ...
(You might also need the -l option in some cases.)
Make sure that you really understand the requirements of the program to run as root. The process you are running has no need for any other privilege than read access to one directory and write access to another one so even the user account privilege is an overkill not to even mention root. This is just asking for trouble.

I'm assuming the backups are generated by your scripts, and not the user. (Extracting arbitrary user created tar files are never ever a good idea).
If your backups have to be store in a directory writeable by users, I would suggest you digitally sign each backup file so that it's integity can be validated. You can then verify that a tar file is legit before using it for restoration.
An example using GPG:
gpg --armor --detach-sign backup_username_110217.tar.gz
That creates a signature file backup_username_110217.tar.gz.asc which can be used to verify the file using:
gpg --verify backup_username_110217.tar.gz.asc backup_username_110217.tar.gz
Note that to run that in a script, you'll need to create your keys without a passphrase. Otherwise, you'll have to store the password in your scripts as plain text which is a horrid idea.

Related

Elegantly send local tarball and untar on remote end

All,
This might be a FAQ, but I can't get my search-fu to find it. Namely, I kind of want to do the "reverse" tar pipe. Usually a tar pipe used to send a local folder to a remote location as a tar ball in a single nice command:
tar zcvf - ~/MyFolder | ssh user#remote "cat > ~/backup/MyFolder.tar.gz"
(I hope I got that right. I typed it from memory.)
I'm wondering about the reverse situation. Let's say I locally have a tarball of a large directory and what I want to do is copy it (rsync? scp?) to a remote machine where it will live as the expanded file, i.e.,:
Local: sourcecode.tar.gz ==> send to Remote and untar ==>
Remote: sourcecode/
I want to do this because the "local" disk has inode pressure so keeping a single bigger file is better than many smaller files. But the remote system is one with negligible inode pressure, and it would be preferable to keep it as an expanded directory.
Now, I can think of various ways to do this with &&-command chaining and the like, but I figure there must be a way to do this with tar-pipes and rsync or ssh/scp that I am just not seeing.

You're most of the way there:
ssh user#remote "tar -C /parent/directory -xz -f-" < sourcecode.tar.gz
Where -f- tells tar to extract from stdin, and the -C flag changes directory before untarring.

How to make rsync read SRC from STDIN?

I want to dump my MySQL database and make daily backups with rsync.
First approach I came up with is something like mysqldump -ufoo -pbar baz > /var/tmp/baz.sql && rsync /var/tmp/baz.sql /backup/ && rm /var/tmp/baz.sql.
Then I started to wonder if it is possible not to use the temporary file /var/tmp/baz.sql, but instead to pipe the output of mysqldump directly to rsync.
To be more specific, what I want is quite similar to a command line which we use to update the GPG key for apt in Ubuntu: gpg --export --armor CE49EC21 | sudo apt-key add -, where the receiver of the pipe supports this '-' argument indicating it'll read from stdin. I suppose rsync doesn't have a similar argument. But I wanna know if there is a workaround.

This is right, it doesn't work this way. It is because rsync is made to transfer complete file trees from A to B.
Because of the way rsync works, it cannot work, because rsync calculates several checksums before choosing to transfer a particular file (or parts of it), and doing so in only 2 iterations (ping-pong-steps).
That means a file has to be read several times. That would not work with a (potentially large) SQL dump because it would have to be buffered somehow. And this buffering is up to the user.
Actually storing the file should be the best workaround, especially if it is a file which only gets gradual differences.

Ideal way to use wget to download and install using temp directory?

I am trying to work out the proper process of installing with Wget, in this example I'll use Nginx.
# Download nginx to /tmp/ directory
wget http://nginx.org/download/nginx-1.3.6.tar.gz -r -P /tmp
# Extract nginx into /tmp/nginx directory
tar xzf nginx-1.3.6.tar.gz -C /tmp/nginx
# Configure it to be installed in opt
./configure --prefix=/opt/nginx
# Make it
make
# Make install
make install
# Clean up temp folder
rm -r /tmp/*
Is this the idealised process? Is there anything I can improve on?

First of all, you definitely seem to reinvent the wheel: if the problem that you want to solve is automated packaging / building software on target systems, then there are myriads of solutions available, in form of various package management systems, port builders, etc.
As for your shell script, there are a couple of things you should consider fixing:
Stuff like http://nginx.org/download/nginx-1.3.6.tar.gz or nginx-1.3.6.tar.gz are constants. Try to extract all constants in separate variables and use them to make maintaining this script a little bit easier, for example:
NAME=nginx
VERSION=1.3.6
FILENAME=$NAME-$VERSION.tar.gz
URL=http://nginx.org/download/$FILENAME
TMP_DIR=/tmp
INSTALL_PREFIX=/opt
wget "$URL" -r -P "$TMP_DIR"
tar xzf "$FILENAME" -C "$TMP_DIR/nginx"
You generally can't be 100% sure that wget exists on target deployment system. If you want to maximize portability, you can try to detect popular networking utilities, such as wget, curl, fetch or even lynx, links, w3m, etc.
Proper practices on using a temporary directory is a long separate question, but, generally, you'll need to adhere to 3 things:
One should somehow find out the temporary directory location. Generally, it's wrong to assume that /tmp is always a temporary directory, as it can be not mounted, it can be non-writable, if can be tmpfs filesystem which is full, etc, etc. Unfortunately, there's no portable and universal way to detect what temporary directory is. The very least one should do is to check out contents of $TMPDIR to make it possible for a user to point the script to proper temporary dir. Another possibly bright idea is a set of heuristic checks to make sure that it's possible to write to desired location (checking at least $TMPDIR, $HOME/tmp, /tmp, /var/tmp), there's decent amount of space available, etc.
One should create a temporary directory in a safe manner. On Linux systems, mktemp --tmpdir -d some-unique-identifier.XXXXXXXXX is usually enough. On BSD-based systems, much more manual work needed, as default mktemp implementation is not particularly race-resistant.
One should clean up temporary directory after use. Cleaning should be done not only on a successful exit, but also in a case of failure. This can be remedied with using a signal trap and a special cleanup callback, for example:
# Cleanup: remove temporary files
cleanup()
{
local rc=$?
trap - EXIT
# Generally, it's the best to remove only the files that we
# know that we have created ourselves. Removal using recursive
# rm is not really safe.
rm -f "$LOCAL_TMP/some-file-we-had-created"
[ -d "$LOCAL_TMP" ] && rmdir "$LOCAL_TMP"
exit $rc
}
trap cleanup HUP PIPE INT QUIT TERM EXIT
# Create a local temporary directory
LOCAL_TMP=$(mktemp --tmpdir -d some-unique-identifier.XXXXXXXXX)
# Use $LOCAL_TMP here
If you really want to use recursive rm, then using any * to glob files is a bad practice. If your directory would have more than several thousands of files, * would expand to too much arguments and overflow shell's command line buffer. I might even say that using any globbing without a good excuse is generally a bad practice. The rm line above should be rewritten at least as:
rm -f /tmp/nginx-1.3.6.tar.gz
rm -rf /tmp/nginx
Removing all subdirectories in /tmp (as in /tmp/*) is a very bad practice on a multi-user system, as you'll either get permission errors (you won't be able to remove other users' files) or you'll potentially heavily disrupt other people's work by removing actively used temporary files.
Some minor polishing:
POSIX-standard tar uses normal short UNIX options nowadays, i.e. tar -xvz, not tar xvz.
Modern GNU tar (and, AFAIR, BSD tar too) doesn't really need any of "uncompression" flags, such as -z, -j, -y, etc. It detects archive/compression format itself and tar -xf is sufficient to extract any of .tar / .tar.gz / .tar.bz2 tarballs.

That's the basic idea. You'll have to run the make install command as root (or the whole script if you want). Your rm -r /tmp/* should be rm -r /tmp/nginx because other commands might have stuff they're working on in the tmp directory.
It should also be noted that the chances that building from source like that will work with no modifications for a decently sized project is fairly low. Generally you will find you need to specify a path to a library explicitly or some code doesn't quite compile correctly on your distribution.

Avoid having subversion modify Linux file permissions.

All of my code base is being stored in a subversion repository that I disperse amongst my load balanced Apache web servers, making it easy to check out code, run updates, and seamlessly get my code in development onto production.
One of the inconveniences that I'm sure there is a easy work around for (other than executing a script upon every checkout), is getting the Linux permissions set (back) on files that are updated or checked out with subversion. Our security team has it set that the Owner and Group set in the httpd.conf files, and all directories within the documentRoot receive permissions of 700, all non-executable files (e.g. *.php, *.smarty, *.png) receive Linux permissions of 600, all executable files receive 700 (e.g. *.sh, *.pl, *.py). All files must have owner and group set to apache:apache in order to be read by the httpd service since only the file owner is set to have access via the permissions.
Every time I run an svn update, or svn co, even though the files may not be created (i.e. svn update), I'm finding that the ownership of the files is getting set to the account that is running the svn commands, and often times, the file permissions are getting set to something other than what they were originally (i.e. a .htm file before an update is 600, but after and svn update, it gets set to 755, or even 777).
What is the easiest way to bypass subversion's attempts at updating the file permissions and ownership? Is there something that can be done within the svn client, or on the Linux server to retain the original file permissions? I'm running RHEL5 (and now 6 on a few select instances).

the owner of the files will be set to the user that is running the svn command because of how it implements the underlying up command - it removes and replaces files that are updated, which will cause the ownership to 'change' to the relevant user. The only way to prevent this is to actually perform the svn up as the user that the files are supposed to be owned as. If you want to ensure that they're owned by a particular user, then run the command as that user.
With regards to the permissions, svn is only obeying the umask settings of the account - it's probably something like 066 - in order to ensure that the file is inaccessible to group and other accounts, you need to issue 'umask 077' before performing the svn up, this ensures that the files are only accessible to the user account issuing the command.
I'd pay attention to the security issue of deploying the subversion data into the web server unless the .svn directories are secured.

You can store properties on a file in Subversion (see http://svnbook.red-bean.com/en/1.0/ch07s02.html). You're particularly interested in the svn:executable property, which will make sure that the executable permission is stored.
There's no general way to do this for all permissions, though. Subversion doesn't store ownership either - it assumes that, if you check something out, you own it.

You can solve this. Use setgid.
You have apache:apache running the server
Set group permission on all files and directories. The server will read files by it's group
Set setgid on all directories - only on directories: setting this on files has a different function
Example ('2' is setgid):
chmod 2750
Make apache the group of all directories
What happens is
New files and directories created by any account will be owned by the apache group
New directories will inherit the setgid and thus preserve the structure without any effort
See https://en.wikipedia.org/wiki/Setuid#setuid_and_setgid_on_directories

One thing you may consider doing is installing the svn binary outside your path, and putting a replacement script (at and called /usr/bin/svn, or whatever) in the path. The script would look something like this:
#!/bin/sh
# set umask, whatever else you need to do before svn commands
/opt/svn/svn $* # pass all arguments to the actual svn binary, stored outside the PATH
# run chmod, whatever else you need to do after svn commands
A definite downside is that you'll probably have to do some amount of parsing of the arguments passed to svn, i.e. so you can pass the same path to your chmod, not run chmod for most svn commands, etc.
There are also probably some security considerations here. I don't know what your deployment environment is like, but you should probably investigate that a bit further.

I wrote a small script that stores permissions and owner, executes your SVN command and restores permissions and owner.
It is probably is not hackerproof but for private use it does the job.
svnupdate.sh:
#!/usr/bin/env bash
if [ $# -eq 0 ]; then
echo "Syntax: $0 <filename>"
exit
fi
IGNORENEXT=0
COMMANDS=''
FILES='';
for FILENAME in "$#"
do
if [[ $IGNORENEXT > 0 ]]; then
IGNORENEXT=0
else
case $FILENAME in
# global, shift argument if needed
--username|--password|--config-dir|--config-option)
IGNORENEXT=1
;;
--no-auth-cache|--non-interactive|--trust-server-cert)
;;
# update arguments, shift argument if needed
-r|--revision|--depth|--set-depth|--diff3-cmd|--changelist|--editor-cmd|--accept)
IGNORENEXT=1
;;
-N|--non-recursive|-q|--quiet|--force|--ignore-externals)
;;
*)
if [ -f $FILENAME ]; then
FILES="$FILES $FILENAME"
OLDPERM=$(stat -c%a $FILENAME)
OLDOWNER=$(stat -c%U $FILENAME)
OLDGROUP=$(stat -c%G $FILENAME)
FILECOMMANDS="chmod $OLDPERM $FILENAME; chown $OLDOWNER.$OLDGROUP $FILENAME;"
COMMANDS="$COMMANDS $FILECOMMANDS"
echo "COMMANDS: $FILECOMMANDS"
else
echo "File not found: $FILENAME"
fi
;;
esac
fi
done
OUTPUT=$(svn update "$#")
echo "$OUTPUT"
if [[ ( $? -eq 0 ) && ( $OUTPUT != Skipped* ) && ( $OUTPUT != "At revision"* ) ]]; then
bash -c "$COMMANDS"
ls -l $FILES
fi

I also had a similar problem.
I found a cool script: asvn (Archive SVN).
You can download it here:
https://svn.apache.org/repos/asf/subversion/trunk/contrib/client-side/asvn
Description:
Archive SVN (asvn) will allow the recording of file types not
normally handled by svn. Currently this includes devices,
symlinks and file ownership/permissions.
Every file and directory has a 'file:permissions' property set and
every directory has a 'dir:devices' and 'dir:symlinks' for
recording the extra information.
Run this script instead of svn with the normal svn arguments.
This blog entry (which helps me to find script) http://jon.netdork.net/2010/06/28/configuration-management-part-ii-setting-up-svn/ shows a simple usage.

Modifying files nested in tar archive

I am trying to do a grep and then a sed to search for specific strings inside files, which are inside multiple tars, all inside one master tar archive. Right now, I modify the files by
First extracting the master tar archive.
Then extracting all the tars inside it.
Then doing a recursive grep and then sed to replace a specific string in files.
Finally packaging everything again into tar archives, and all the archives inside the master archive.
Pretty tedious. How do I do this automatically using shell scripting?

There isn't going to be much option except automating the steps you outline, for the reasons demonstrated by the caveats in the answer by Kimvais.
tar modify operations
The tar command has some options to modify existing tar files. They are, however, not appropriate for your scenario for multiple reasons, one of them being that it is the nested tarballs that need editing rather than the master tarball. So, you will have to do the work longhand.
Assumptions
Are all the archives in the master archive extracted into the current directory or into a named/created sub-directory? That is, when you run tar -tf master.tar.gz, do you see:
subdir-1.23/tarball1.tar
subdir-1.23/tarball2.tar
...
or do you see:
tarball1.tar
tarball2.tar
(Note that nested tars should not themselves be gzipped if they are to be embedded in a bigger compressed tarball.)
master_repackager
Assuming you have the subdirectory notation, then you can do:
for master in "$#"
do
tmp=$(pwd)/xyz.$$
trap "rm -fr $tmp; exit 1" 0 1 2 3 13 15
cat $master |
(
mkdir $tmp
cd $tmp
tar -xf -
cd * # There is only one directory in the newly created one!
process_tarballs *
cd ..
tar -czf - * # There is only one directory down here
) > new.$master
rm -fr $tmp
trap 0
done
If you're working in a malicious environment, use something other than tmp.$$ for the directory name. However, this sort of repackaging is usually not done in a malicious environment, and the chosen name based on process ID is sufficient to give everything a unique name. The use of tar -f - for input and output allows you to switch directories but still handle relative pathnames on the command line. There are likely other ways to handle that if you want. I also used cat to feed the input to the sub-shell so that the top-to-bottom flow is clear; technically, I could improve things by using ) > new.$master < $master at the end, but that hides some crucial information multiple lines later.
The trap commands make sure that (a) if the script is interrupted (signals HUP, INT, QUIT, PIPE or TERM), the temporary directory is removed and the exit status is 1 (not success) and (b) once the subdirectory is removed, the process can exit with a zero status.
You might need to check whether new.$master exists before overwriting it. You might need to check that the extract operation actually extracted stuff. You might need to check whether the sub-tarball processing actually worked. If the master tarball extracts into multiple sub-directories, you need to convert the 'cd *' line into some loop that iterates over the sub-directories it creates.
All these issues can be skipped if you know enough about the contents and nothing goes wrong.
process_tarballs
The second script is process_tarballs; it processes each of the tarballs on its command line in turn, extracting the file, making the substitutions, repackaging the result, etc. One advantage of using two scripts is that you can test the tarball processing separately from the bigger task of dealing with a tarball containing multiple tarballs. Again, life will be much easier if each of the sub-tarballs extracts into its own sub-directory; if any of them extracts into the current directory, make sure you create a new sub-directory for it.
for tarball in "$#"
do
# Extract $tarball into sub-directory
tar -xf $tarball
# Locate appropriate sub-directory.
(
cd $subdirectory
find . -type f -print0 | xargs -0 sed -i 's/name/alternative-name/g'
)
mv $tarball old.$tarball
tar -cf $tarball $subdirectory
rm -f old.$tarball
done
You should add traps to clean up here, too, so the script can be run in isolation from the master script above and still not leave any intermediate directories around. In the context of the outer script, you might not need to be so careful to preserve the old tarball before the new is created (so rm -f $tarbal instead of the move and remove command), but treated in its own right, the script should be careful not to damage anything.
Summary
What you're attempting is not trivial.
Debuggability splits the job into two scripts that can be tested independently.
Handling the corner cases is much easier when you know what is really in the files.

You probably can sed the actual tar as tar itself does not do compression itself.
e.g.
zcat archive.tar.gz|sed -e 's/foo/bar/g'|gzip > archive2.tar.gz
However, beware that this will also replace foo with bar also in filenames, usernames and group names and ONLY works if foo and bar are of equal length

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string