Tar a directory, but don't store full absolute paths in the archive - linux

I have the following command in the part of a backup shell script:
tar -cjf site1.bz2 /var/www/site1/
When I list the contents of the archive, I get:
tar -tf site1.bz2
var/www/site1/style.css
var/www/site1/index.html
var/www/site1/page2.html
var/www/site1/page3.html
var/www/site1/images/img1.png
var/www/site1/images/img2.png
var/www/site1/subdir/index.html
But I would like to remove the part /var/www/site1 from directory and file names within the archive, in order to simplify extraction and avoid useless constant directory structure. Never know, in case I would extract backuped websites in a place where web data weren't stored under /var/www.
For the example above, I would like to have :
tar -tf site1.bz2
style.css
index.html
page2.html
page3.html
images/img1.png
images/img2.png
subdir/index.html
So, that when I extract, files are extracted in the current directory and I don't need to move extracted files afterwards, and so that sub-directory structures is preserved.
There are already many questions about tar and backuping in stackoverflow and at other places on the web, but most of them ask for dropping the entire sub-directory structure (flattening), or just add or remove the initial / in the names (I don't know what it changes exactly when extracting), but no more.
After having read some of the solutions found here and there as well as the manual, I tried :
tar -cjf site1.bz2 -C . /var/www/site1/
tar -cjf site1.bz2 -C / /var/www/site1/
tar -cjf site1.bz2 -C /var/www/site1/ /var/www/site1/
tar -cjf site1.bz2 --strip-components=3 /var/www/site1/
But none of them worked the way I want. Some do nothing, some others don't archive sub-directories anymore.
It's inside a backup shell script launched by a Cron, so I don't know well, which user runs it, what is the path and the current directory, so always writing absolute path is required for everything, and would prefer not changing current directory to avoid breaking something further in the script (because it doesn't only backup websites, but also databases, then send all that to FTP etc.)
How to achieve this?
Have I just misunderstood how the option -C works?

tar -cjf site1.tar.bz2 -C /var/www/site1 .
In the above example, tar will change to directory /var/www/site1 before doing its thing because the option -C /var/www/site1 was given.
From man tar:
OTHER OPTIONS
-C, --directory DIR
change to directory DIR

The option -C works; just for clarification I'll post 2 examples:
creation of a tarball without the full path:
full path /home/testuser/workspace/project/application.war and what we want is just project/application.war so:
tar -cvf output_filename.tar -C /home/testuser/workspace project
Note: there is a space between workspace and project; tar will replace full path with just project .
extraction of tarball with changing the target path (default to ., i.e current directory)
tar -xvf output_filename.tar -C /home/deploy/
tar will extract tarball based on given path and preserving the creation path; in our example the file application.war will be extracted to /home/deploy/project/application.war.
/home/deploy: given on extract
project: given on creation of tarball
Note : if you want to place the created tarball in a target directory, you just add the target path before tarball name. e.g.:
tar -cvf /path/to/place/output_filename.tar -C /home/testuser/workspace project

Seems -C option upto tar v2.8.3 does not work consistently on all the platforms (OSes). -C option is said to add directory to the archive but on Mac and Ubuntu it adds absolute path prefix inside generated tar.gz file.
tar target_path/file.tar.gz -C source_path/source_dir
Therefore the consistent and robust solution is to cd in to source_path (parent directory of source_dir) and run
tar target_path/file.tar.gz source_dir
or
tar -cf target_path/file.tar.gz source_dir
in your script. This will remove absolute path prefix in your generated tar.gz file's directory structure.

One minor detail:
tar -cjf site1.tar.bz2 -C /var/www/site1 .
adds the files as
tar -tf site1.tar.bz2
./style.css
./index.html
./page2.html
./page3.html
./images/img1.png
./images/img2.png
./subdir/index.html
If you really want
tar -tf site1.tar.bz2
style.css
index.html
page2.html
page3.html
images/img1.png
images/img2.png
subdir/index.html
You should either
cd into the directory first
or run
tar -cjf site1.tar.bz2 -C /var/www/site1 $(ls -A /var/www/site1)
Note though this does not support spaces. Thanks #dragon788 and #Fonic.

The following command will create a root directory "." and put all the files from the specified directory into it.
tar -cjf site1.tar.bz2 -C /var/www/site1 .
If you want to put all files in root of the tar file, #chinthaka is right. Just cd in to the directory and do:
tar -cjf target_path/file.tar.gz *
This will put all the files in the cwd to the tar file as root files.

Using the "point" leads to the creation of a folder named "point" (on Ubuntu 16).
tar -tf site1.bz2 -C /var/www/site1/ .
I dealt with this in more detail and prepared an example. Multi-line recording, plus an exception.
tar -tf site1.bz2\
-C /var/www/site1/ style.css\
-C /var/www/site1/ index.html\
-C /var/www/site1/ page2.html\
-C /var/www/site1/ page3.html\
--exclude=images/*.zip\
-C /var/www/site1/ images/
-C /var/www/site1/ subdir/
/

If you want to archive a subdirectory and trim subdirectory path this command will be useful:
tar -cjf site1.bz2 -C /var/www/ site1

Found tar -cvf site1-$seqNumber.tar -C /var/www/ site1 as more friendlier solution than tar -cvf site1-$seqNumber.tar -C /var/www/site1 . (notice the . in the second solution) for the following reasons
Tar file name can be insignificant as the original folder is now an archive entry
Tar file name being insignificant to the content can now be used for other purposes like sequence numbers, periodical backup etc.

Related

Archive all the files from source directory into a xyz.gz file and move that to target directory using UNIX shell script

Requirement: Archive files using UNIX shell script into .gz format without directory structure
I am using below command
tar -C source_dir -zcvf target_dir/xyz.gz source_dir
example:
tar -C /home/log -zcvf /home/archive/xyz.gz /home/log
here xyz.gz contains /home/log
It's creating xyz.gz file maintaining the directory structure. I want only files to be archive without directory structure.
You can try the following command:
$ cd /home/log
$ tar zcvf /home/archive/xyz.gz *
You can use the --transform option to strip leading path components from the archived file names using a sed espression:
tar -C /home/log -zcvf /home/archive/xyz.gz --transform 's_.*/__' /home/log
This however will also write an entry for each encountered directory. If you don't want that, you can use find to find only regular files and pass them to tar on stdin like this:
cd /home/log
find -type f -print0 | tar -zcvf /home/archive/xyz.gz --transform 's_.*/__' --verbatim-files-from --null -T -
Note that this may create multiple entries with the same name in the tar archive, if files with the same name exist in different subdirectories. Also you should probably use the conventional .tar.gz or .tgz extension for the compressed tar archive.

Staying in another folder, how can i tar specific files from another directory?

Thanks for your support,
I have the following folder structure on my linux laptop
/home
/A
/B
In folder "B", I have files of type *.csv, *.dat.
Now from folder A, How can I create a tar file containing files *.csv in folder B. I am running the command in folder A
Here is the command, I have tried but its not working,
In /home/A folder, I am running the following command
tar -cf /home/A/Sample1.tar -C /home/B/ZSBSDP4 *.csv
and also tried with this,
tar -cf /home/A/Sample1.tar -C /home/B/ZSBSDP4 --wildcards *.csv
For both of the commands, I get the following error,
tar: *.csv: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
In the tar file, I dont want to include the whole folder structure and this is the reason, I am using option -C (capital)
Moreover, the following command works but it tars all *.csv and *.dat files.
tar -cf /home/A/Sample1.tar -C /home/B/ZSBSDP4 .
You can edit the names in the tar command to remove the path. (Assuming that you have GNU tar.)
tar -cf /home/A/Sample1.tar --transform 's,.*/\([^/]*\),\1,' /home/B/ZSBSDP4/*.csv
Note that if you specify more source directories on the command, you could accidentally put more than one file with the same name in the tar file. Then when unpacking, the last one will overwrite those with the same name that precede it.
You can use the --exclude=PATTERN option:
tar -cf /home/A/Sample1.tar -C /home/B/ZSBSDP4 . --exclude=*.dat
Other "local file selection" options listed in the man page: http://linux.die.net/man/1/tar

How do I tar a directory without retaining the directory structure?

I'm working on a backup script and want to tar up a file directory:
tar czf ~/backup.tgz /home/username/drupal/sites/default/files
This tars it up, but when I untar the resulting file, it includes the full file structure: the files are in home/username/drupal/sites/default/files.
Is there a way to exclude the parent directories, so that the resulting tar just knows about the last directory (files)?
Use the --directory option:
tar czf ~/backup.tgz --directory=/home/username/drupal/sites/default files
Hi I've a better solution when enter in the specified directory it's impossible (Makefiles,etc)
tar -cjvf files.tar.bz2 -C directory/contents/to/be/compressed .
Do not forget the dot (.) at the end !!
cd /home/username/drupal/sites/default/files
tar czf ~/backup.tgz *
Create a tar archive
tar czf $sourcedir/$backup_dir.tar --directory=$sourcedir WEB-INF en
Un-tar files on a local machine
tar -xvf $deploydir/med365/$backup_dir.tar -C $deploydir/med365/
Upload to a server
scp -r -i $privatekey $sourcedir/$backup_dir.tar $server:$deploydir/med365/
echo "File uploaded.. deployment folders"
Un-tar on server
ssh -i $privatekey $server tar -xvf $deploydir/med365/$backup_dir.tar -C $deploydir/med365/
To gunzip all txt (*.txt) files from /home/myuser/workspace/zip_from/
to /home/myuser/workspace/zip_to/ without directory structure of source files use following command:
tar -P -cvzf /home/myuser/workspace/zip_to/mydoc.tar.gz --directory="/home/myuser/workspace/zip_from/" *.txt
If you want to tar files while keeping the structure but ignore it partially or completely when extracting, use the --strip-components argument when extracting.
In this case, where the full path is /home/username/drupal/sites/default/files, the following command would extract the tar.gz content without the full parent directory structure, keeping only the last directory of the path (e.g. files/file1).
tar -xzv --strip-components=5 -f backup.tgz
I've found this tip on https://www.baeldung.com/linux/tar-archive-without-directory-structure#5-using-the---strip-components-option.
To build on nbt's and MaikoID's solutions:
tar -czf destination.tar.gz -C source/directory $(ls source/directory)
This solution:
Includes all files and folders in the directory
Does not include any of the directory structure (or .) in the final product
Does not require you to change directories.
However, it requires the directory to be given twice, so it may be most useful in another script. It may also be less efficient if there are a lot of files/folders in source/directory. Adjust the subcommand as necessary.
So for instance for the following structure:
|- source
| |- one
| `- two
`- working
the following command:
working$ tar -czf destination.tar.gz -C ../source $(ls ../source)
will produce destination.tar.gz where both one and two (and sub-files/-folders) are the first items.
This worked for me:
gzip -dc "<your_file>.tgz" | tar x -C <location>
For me -C or --directory did not work, I use this
cd source/directory/or/file
tar -cvzf destination/packaged-app.tgz *.jar
# this will put your current directory to what it previously was
cd -
Kindly use the below command to generate tar file without directory structure
tar -C <directoryPath> -cvzf <Path of the tar.gz file> filename1 filename2... filename N
eg:
tar -C /home/project/files -cvzf /home/project/files/test.tar.gz text1.txt text2.txt
tar -Cczf ~/backup.tgz /home/username/drupal/sites/default/files
-C does the cd for you

tar: add all files and directories in current directory INCLUDING .svn and so on

I try to tar.gz a directory and use
tar -czf workspace.tar.gz *
The resulting tar includes .svn directories in subdirs but NOT in the current directory (as * gets expanded to only 'visible' files before it is passed to tar
I tried to
tar -czf workspace.tar.gz . instead but then I am getting an error because '.' has changed while reading:
tar: ./workspace.tar.gz: file changed as we read it
Is there a trick so that * matches all files (including dot-prefixed) in a directory?
(using bash on Linux SLES-11 (2.6.27.19)
Don't create the tar file in the directory you are packing up:
tar -czf /tmp/workspace.tar.gz .
does the trick, except it will extract the files all over the current directory when you unpack. Better to do:
cd ..
tar -czf workspace.tar.gz workspace
or, if you don't know the name of the directory you were in:
base=$(basename $PWD)
cd ..
tar -czf $base.tar.gz $base
(This assumes that you didn't follow symlinks to get to where you are and that the shell doesn't try to second guess you by jumping backwards through a symlink - bash is not trustworthy in this respect. If you have to worry about that, use cd -P .. to do a physical change directory. Stupid that it is not the default behaviour in my view - confusing, at least, for those for whom cd .. never had any alternative meaning.)
One comment in the discussion says:
I [...] need to exclude the top directory and I [...] need to place the tar in the base directory.
The first part of the comment does not make much sense - if the tar file contains the current directory, it won't be created when you extract file from that archive because, by definition, the current directory already exists (except in very weird circumstances).
The second part of the comment can be dealt with in one of two ways:
Either: create the file somewhere else - /tmp is one possible location - and then move it back to the original location after it is complete.
Or: if you are using GNU Tar, use the --exclude=workspace.tar.gz option. The string after the = is a pattern - the example is the simplest pattern - an exact match. You might need to specify --exclude=./workspace.tar.gz if you are working in the current directory contrary to recommendations; you might need to specify --exclude=workspace/workspace.tar.gz if you are working up one level as suggested. If you have multiple tar files to exclude, use '*', as in --exclude=./*.gz.
There are a couple of steps to take:
Replace * by . to include hidden files as well.
To create the archive in the same directory a --exclude=workspace.tar.gz can be used to exclude the archive itself.
To prevent the tar: .: file changed as we read it error when the archive is not yet created, make sure it exists (e.g. using touch), so the --exclude matches with the archive filename. (It does not match it the file does not exists)
Combined this results in the following script:
touch workspace.tar.gz
tar -czf workspace.tar.gz --exclude=workspace.tar.gz .
If you really don't want to include top directory in the tarball (and that's generally bad idea):
tar czf workspace.tar.gz -C /path/to/workspace .
in directory want to compress (current directory) try this :
tar -czf workspace.tar.gz . --exclude=./*.gz
You can include the hidden directories by going back a directory and doing:
cd ..
tar czf workspace.tar.gz workspace
Assuming the directory you wanted to gzip was called workspace.
You can fix the . form by using --exclude:
tar -czf workspace.tar.gz --exclude=workspace.tar.gz .
Update: I added a fix for the OP's comment.
tar -czf workspace.tar.gz .
will indeed change the current directory, but why not place the file somewhere else?
tar -czf somewhereelse/workspace.tar.gz .
mv somewhereelse/workspace.tar.gz . # Update
Actually the problem is with the compression options. The trick is the pipe the tar result to a compressor instead of using the built-in options. Incidentally that can also give you better compression, since you can set extra compresion options.
Minimal tar:
tar --exclude=*.tar* -cf workspace.tar .
Pipe to a compressor of your choice. This example is verbose and uses xz with maximum compression:
tar --exclude=*.tar* -cv . | xz -9v >workspace.tar.xz
Solution was tested on Ubuntu 14.04 and Cygwin on Windows 7.
It's a community wiki answer, so feel free to edit if you spot a mistake.
Had a similar situation myself. I think it is best to create the tar elsewhere and then use -C to tell tar the base directory for the compressed files. Example:
tar -cjf workspace.tar.gz -C <path_to_workspace> $(ls -A <path_to_workspace>)
This way there is no need to exclude your own tarfile. As noted in other comments, -A will list hidden files.
Yet another solution, assuming the number of items in the folder is not huge and assuming all names do not contain characters the shell interprets as delimiters (whitespace):
tar -czf workspace.tar.gz `ls -A`
(ls -A prints normal and hidden files but not "." and ".." as ls -a does.)
A good question. In ZSH you can use the globbing modifier (D), which stands for "dotfiles". Compare:
ls $HOME/*
and
ls $HOME/*(D)
This correctly excludes the special directory entries . and ... In Bash you can use .* to include the dotfiles explicitly:
ls $HOME/* $HOME/.*
But that includes . and .. as well, so it's not what you were looking for. I'm sure there's some way to make * match dotfiles in bash, too.
The problem with the most solutions provided here is that tar contains ./ at the begging of every entry. So this results in having . directory when opening it through GUI compressor. So what I ended up doing is:
ls -1A | xargs -d "\n" tar cfz my.tar.gz
If you already have my.tar.gz in current directory you may want to grep this out:
ls -1A | grep -v my.tar.gz | xargs -d "\n" tar cfz my.tar.gz
Be aware of that xargs has certain limit (see xargs --show-limits). So this solution would not work if you are trying to create a package which has lots of entries (directories and files) on a directory which you are trying to tar.
10 years later, you have an alternative to tar, illustrated with Git 2.30 (Q1 2021), which uses "git archive"(man) to produce the release tarball
instead of tar.
(You don't need Git 2.30 to apply that alternative)
See commit 4813277 (11 Oct 2020), and commit 93e7031 (10 Oct 2020) by René Scharfe (rscharfe).
(Merged by Junio C Hamano -- gitster -- in commit 63e5273, 27 Oct 2020)
Makefile: use git init/add/commit/archive for dist-doc
Signed-off-by: René Scharfe
Reduce the dependency on external tools by generating the distribution archives for HTML documentation and manpages using git(man) commands instead of tar.
This gives the archive entries the same meta data as those in the dist archive for binaries.
So instead of:
tar cf ../archive.tar .
You can do using Git only:
git -C workspace init
git -C workspace add .
git -C workspace commit -m workspace
git -C workspace archive --format=tar --prefix=./ HEAD^{tree} > workspace.tar
rm -Rf workspace/.git
That was initially proposed because, as explained here, some exotic platform might have an old tar distribution with lacking options.
tar -czf workspace.tar.gz .??* *
Specifying .??* will include "dot" files and directories that have at least 2 characters after the dot. The down side is it will not include files/directories with a single character after the dot, such as .a, if there are any.
If disk space space is not an issue, this could also be a very easy thing to do:
mkdir backup
cp -r ./* backup
tar -zcvf backup.tar.gz ./backup
Using find is probably the easiest way:
find . -maxdepth 1 -exec tar zcvf workspace.tar.gz {} \+
find . -maxdepth 1 will find all files/directories/symlinks/etc in the current directory and run the command specified by -exec. The {} in the command means file list goes here and \+ means that the command will be run as:
tar zcvf workspace.tar.gz .file1 .file2 .dir3
instead of
tar zcvf workspace.tar.gz .file1
tar zcvf workspace.tar.gz .file2
tar zcvf workspace.tar.gz .dir3

go to path and then tar?

Can you have tar travel to a certain direct and then tar files relative to that directory? All while using one command (tar)?
For example instead of doing
cd /home/test/backups; tar zvPcf backup.tar.gz ../data/
I could do something like
tar -g '/home/test/backups/' zvPcf backup.tar.gz ../data/
see the -C option.
the tar man page gives this example :
tar -xjf foo.tar.bz2 -C bar/
extract bzipped foo.tar.bz2 after changing directory to bar
might be what you're looking for ...
Have you tried this:
tar zvPcf /home/test/backups/backup.tar.gz /home/test/backups/../data/
You could try:
tar zvPcf backup.tar.gz ../data/ -C '/home/test/backups/'
See tar(1) man page.
-C, --directory DIR
change to directory DIR

Resources