Tarballing in Bash: Is there a way to only dereference links pointing outside the tar'd directory? - linux

I regularly use the -h tag of tar to create tarballs that contain all the libaries linked to from the directory I am zipping up, but it has to side-effect of double-tarballing all the internal links within the directory.
For example, I have two very large datasets, and use a symbolic link to choose which one my test app uses, so I end up getting one of them twice. This makes the tarball way bigger than it needs to be.
So, is there any way to get tar to only dereference links if they point to a file that's not already included in the tarball?
Thanks.

Use find instead to make a list of files to be included into the tar ball.
find . -exec realpath '{}' ';' | sort | uniq | tar -T -

Related

How to find the source of a copied file?

I have a file that I copied sometime back, but I forgot the source of it. Is there a way to find the source of the copied file? I don't remember which terminal I have used to try and check with Esc+P
Command used: cp -rf $source/file $destination/file
Thanks in advance!
You could try history | grep your_filename.
A Linux system has many files (and if you think of /proc/, it could change at every moment). And some other process can write or create (or append or truncate) files (e.g. some crontab(1) job...)
Assume you do know some parent directory containing the source file. Suppose it is /home/foo.
Then, you might use find(1) and some hashing command like md5sum(1) to compute and collect the hash of every file.
Use the property that two files A and B with identical contents (a sequence of bytes) have the same md5sum. Of course, the converse is false, but in practice unlikely.
So run first
find /home/foo -type f -exec md5sum '{}' \; > /tmp/foo-md5
then do seekingmd5=$(md5sum A )
then grep $seekingmd5 /tmp/foo-md5 will find lines for files having the same md5 than your original A
Depending on your filesystem and hardware, this could take hours.
You could accelerate slightly things by writing a C program using nftw(3) with md5init etc...

Bash Scripting with xargs to BACK UP files

I need to copy a file from multiple locations to the BACK UP directory by retaining its directory structure. For example, I have a file "a.txt" at the following locations /a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt, I now need to copy this file from multiple locations to the backup directory /tmp/backup. The end result should be:
when i list /tmp/backup/a --> it should contain /b/a.txt /c/a.txt /d/a.txt & /e/a.txt.
For this, I had used the command: echo /a/*/a.txt | xargs -I {} -n 1 sudo cp --parent -vp {} /tmp/backup. This is throwing the error "cp: cannot stat '/a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt': No such file or directory"
-I option is taking the complete input from echo instead of individual values (like -n 1 does). If someone can help debug this issue that would be very helpful instead of providing an alternative command.
Use rsync with the --relative (-R) option to keep (parts of) the source paths.
I've used a wildcard for the source to match your example command rather than the explicit list of directories mentioned in your question.
rsync -avR /a/*/a.txt /tmp/backup/
Do the backups need to be exactly the same as the originals? In most cases, I'd prefer a little compression. [tar](https://man7.org/linux/man-pages/man1/tar.1.html) does a great job of bundling things including the directory structure.
tar cvzf /path/to/backup/tarball.tgz /source/path/
tar can't update compressed archives, so you can skip the compression
tar uf /path/to/backup/tarball.tar /source/path/
This gives you versioning of a sort, as if only updates changed files, but keeps the before and after versions, both.
If you have time and cycles and still want the compression, you can decompress before and recompress after.

linux tar command -t option to show max-depth=1

may be a tar.gz file has content many filefolder,the filefolder may have a lot of file and filefolder,I want to show the tar.gz file only the first depth.How to write this command.
for example ,I want to show this tar.gz file
It's only to show auth,help,xa,install.txt,license.txt.release.xt,sqljdbc.jar,sqljdbcr.jar
How to write this command?
Since tar t itself does not have an option to limit the depth to which the tarball contents are listed, you need to take the full listing and reduce it to what you want.
Since this means tar will list the full archive in any case, it will not be faster than a full listing.
tar tzf <tarball> | sed "s#/.*##" | sort -u
Note that, for all well-behaved tarballs, this will only give one entry, of the same name as the tarball.
Real-world example:
$ tar tjf gcc-5.2.0.tar.bz2 | sed "s#/.*##" | sort -u
gcc-5.2.0
Tarballs that splatter the extraction directory with files and subfolders are commonly called tarbombs.

How to duplicate a folder exactly

I am trying to copy a filesystem for a device I am programming for. After so much time trying to figure out why the filesystem I was installing wasn't working I found out that cp didn't get the job done. I used du -s to check the size of the original filesystem and the one that I copied with cp -r, as it turns out they differ by about 150 bytes.
Something is telling me that symbolic links or some sort of kernel objects aren't being copied correctly.
Is it possible to copy a folder/file system exactly? If so how would I go about it?
Try doing this the straightforward way :
cp -a src target
from man cp
-a, --archive
same as -dR --preserve=all
It preserve rights, symlinks...
Here I tried all the code in my Linux. Seems Rsync proposed by #seanmcl as the right one while others failed to keep owners and/or some special files or a denied result. The exact code is:
$ sudo rsync -aczvAXHS --progress /var/www/html /var/www/backup
Just remember to use just the directory name and not put a slash (/) or a wildcard (/*) at the end of source and target name otherwise the hidden files right below the source are not copied.
Another popular option is to use tar c source | (cd target && tar x ). See this linuxdevcenter.com article.
The most accurate way I know of copying files is with cpio:
cd /path/to/source
find . -xdev -print0 | cpio -oa0V | (cd /path/to/target && cpio -imV)
Not really easy to use, but this is very precise, preserving timestamps, owners, permissions, special files.
Rsync is the best way to copy a file system. They are myriad arguments that let you control exactly what is copied.
This is what I do, for example to duplicate directory A -> B:
$ mkdir B
$ cd A
$ cp -a ./ ../B

Copying the .svn directories from a checkout to a non-checkout to make it a checkout

I have a large application in a production environment that I'm trying to move under version control. So, I created a new repo and imported the app, minus various directories and files that shouldn't be under version control. Now, I need to make the installed copy a checkout (but still retain the extra files). At this point, in a recent version of SVN, I'd simply do a checkout over top the existing directory using the --force option. But sadly, I have an ancient version of SVN, from before when the --force option was added (and can't yet upgrade... long story).
So, I checked out the app to another directory, and want to simply copy all of the .svn directories into the original directory, thus turning the original into a checkout whilst leaving alone the extra files. Now, maybe I'm just having a rough day and missing something that's in plain site, but I can't seem to be able to figure this out. Here are the approaches I've tried so far:
Use rsync: I can't seem to hit the right combination of include and exclude directives to recursively capture all the .svn directories but nothing else.
Use cp: I can't figure out a good way to have it hit all the .svn directories across and down through the whole app.
Use find with -exec cp: I'm running into trouble with the leading part of the pathnames of the found files messing up the destination paths. I can exclude it using -printf '%P', but that doesn't seem to go into the {} replacement for exec.
Use find with xargs to cp: I'm running into trouble with find sending over child directories before sending their parents. Unfortunately, find does not have a --breadth option.
Any thoughts out there?
Other info:
bash 3.0.0.14
rsync 2.6.3 p28
cp 5.2.1
svn 1.3.2
Use tar and find to capture the .svn dirs in a clean checkout, then untar into your app.
cd /tmp/
svn co XXX
cd XXX
find . -name .svn -print0 | tar cf /tmp/XXX.tar --null -T -
cd /to/your/app/
tar xf /tmp/XXX.tar
Edit: switched find/tar command to use NUL terminator for robustness in the face of filenames containing spaces. Original command was:
tar cf /tmp/XXX.tar $(find . -name .svn)
(5) Can't you just make a checkout to a different directory, then copy the extra files to that directory, verify that everything's fine before switching to running the app from the new directory?

Resources