Piping GUNZIP to GREP - linux

I'm trying to find all ZIP files in a specific folder, extract them using GUNZIP, and pipe the output to GREP to search within HTML files contained in these ZIP files.
I managed to do so with UNZIP (unzip -p), but unfortunately due to many servers that I will eventually run that search on with SSH loop, that doesn't have ZIP/UNZIP installed, I'm limited to GUNZIP which is installed on these old Linux kernel servers, I guess that by default.
Is there a way to pipe the output of gunzip extraction (of more than 1 file following a find -exec command) to grep, in a way that will allow searching inside these HTML files (not in their file names, but within)?
That's how I've tried to do it so far, without succeess:
find /home/osboxes/project/ZIPs/*.zip -exec gunzip -l {} \;|grep 'pattern'
UNZIP has a -p option that can pipe the output and I get the needed result with it, but it seems that GUNZIP doesn't...
Can you think of a way to help me make it work?
Appreciated

gunzip -c writes the output to standard output. The original file is not affected. zcat also works, it is the same as gunzip -c.

Related

Basic bash redirection

I'm learning the basics of bash and I'm having a little trouble.
I'm trying to figure out a one line command for extracting tar files in the working directory, without using variables, backticks or the command separator.
Suppose my tar file is called "example" and it resides in the working directory
grep "example" | tar -xf
However this doesn't place the output of grep after the -xf flag. I've tried other combinations with various other programs like rev and cat, but I still can't seem to get it right.
Thanks in advance!
The following should be able to achieve what you need:
cat *.tar | tar -xvf - -i
Replace *.tar with your selector.

Decompress .gz files from subfolders (recursively) into root folder

Here is the situation : I have a folder, containing a lot of subfolders, some of them containing .gz compressed files (NOT tar, just compressed text files). I want to recursively decompress all these .gz files into the root folder, but I can't figure out the exact way to do it.
I have my folders like that :
/folderX/subfolder1/file.gz
by using
gzip -c -d -r *.gz
I can probably extract all the files at once, but they will remain in their respective subfolders. I want them all in /folderX/
find -name *.gz
gives me the correct list of the files I am looking for, but I have no idea how to combine the 2 commands. Should I combine these commands in a script ? Or is there a functionality of gzip that I have missed allowing to decompress everything in the folder from which you are executing the command ?
Thanks for the help !
You can use a while..done loop that iterate the input:
find dirname -name *.gz|while read i; do gzip -c -d -r $i; done
You can also use xargs, with the additional benefit of dealing with spaces (" ") in the file name of parameter -0:
find dirname -name '*.gz' -print0 | xargs -0 -L 1 gzip -c -d -r
The "-print0" output all the files found separated by NULL character. The -0 switch of xargs rebuild the list parsing the NULL character and applies the "gzip..." command to each of them. Pay attention to the "-L 1" parameter which tells xargs to pass only ONE file at a time to gzip.

Unix/Bash: Redirect results of find command so files are used as input for other command

I've got a directory structure that contains many different files named foo.sql. I want to be able to cd into this directory & issue a command like the following:
find . -name "foo.sql" -exec mysql -uUserName -pUserPasswd < {} \;
where {} is the relative path to each foo.sql file. Basically, I want:
mysql -uUserName -pUserPasswd < path/to/foo.sql
to be run once for each foo.sql file under my subdirectory. I've tried Google & it's been not much help. Ideally this would be part of a UNIX shell script.
Thanks in advance, & sorry if it's been asked before.
The -exec option doesn't run a shell, so it can't process shell operators like redirection. Try this:
find . -name "foo.sql" -exec cat {} + | mysql -uUserName -pUserPasswd
cat {} will write the contents of all the files to the pipe, which will then be read by mysql.
Or, just to point out another approach:
find . | xargs cat | mysql etcetera
xargs is a generic pipe operation roughly equivalent to find's '-exec'. It has some advantages, some disadvantages, depending on what you're doing. Intend to use it because i'm often filtering the list of found files in an earlier pipeline stage before operating on them.
There are also other ways of assembling such command lines. One nice thing about Unix's generic toolkits is that there are usually multiple solutions, each with its own tradeoffs.

pipe tar extract into tar create

I have a tar.gz right now, and I want to extract just a file or two from it, and pack/add those into a new tar.gz, all in one go. Of course I can just save to a temporary file and work with it, but the ABSOLUTE requirement is to do this all without having any intermediate file output, i.e. piping. In other words, what I would like is something like the following pseudo-code (obviously the syntax is incorrect)
tar -xvf first.tar.gz subdir1/file1 subdir2/file2 | tar cf - | gzip > second.tar.gz
Does anyone know the proper syntax for this? I have tried many variants, but to no avail.
I am also very open to the idea of using cpio, but again, I am stumped by how to get the syntax down properly, and from what I understand, cpio intakes only archives or filenames, not files.
Any help will be greatly appreciated.
EDIT: There is no particular filename pattern inside the tarball to extract. Given that the BSD and GNU tar can only search one pattern at a time, I'm not sure if it's even possible to use the include/exclude flags, respectively.
I am assuming that you are using or that you can get GNU tar.
You can use the --delete option to process one tar file to another. E.g.:
% tar cf x.tar a b c d
% tar tf x.tar
a
b
c
d
% cat x.tar | tar f - --delete b c > y.tar
% tar tf y.tar
a
d
%
Note that you can specify multiple names to delete. Then you just need to figure out how specify all the files to get rid of on the command line, instead of the files to keep.
If you know the filename pattern that you are going to extract, try this:
tar zcf second.tar.gz --include='filepattern' #first.tar.gz
Here is an example showing the inclusion of multiple files:
% tar cf x.tar a b c d
% tar tf x.tar
a
b
c
d
% cat x.tar | tar cf - --include='a' --include='d' #- > y.tar
% tar tf y.tar
a
d
%
None of the above solutions worked for me, tar complained about creating an empty archive
Instead I just used &&:
tar -xf first.tar.gz subdir1/file1 subdir2/file2 && tar -cvf second.tar --remove-files subdir1/file1 subdir2/file2
Where --remove-files is the option to remove the files after adding to archive.
Another method I found to work is:
tar -cf second.tar `tar -tf first.tar.gz /desired/directory`
Note that keeps the entire directory context, so /desired/directory is still in the new tar.
When unpacking, tar normally writes the unpacked files to disk, not the output stream. You can use -O or --to-stdout to have it write files out to stdout, but there won't be a break between files or any way to know when one ends and another begins.
In addition, tar's create option can only read files from disk, not from stdin. This makes sense because of the afore mentioned problem with knowing when one file ends and another begins.
This means there is no way to do this from the command line the way you want.
However, I'm betting that you could write a perl or python script using libraries that you can get to operate strictly in-memory.

how do I check that two folders are the same in linux

I have moved a web site from one server to another and I copied the files using SCP
I now wish to check that all the files have been copied OK.
How do I compare the sites?
Count files for a folder?
Get the total files size for folder tree?
or is there a better way to compare the sites?
Paul
Using diff with the recursive -r and quick -q option. It is the best and by far the fastest way to do this.
diff -r -q /path/to/dir1 /path/to/dir2
It won't tell you what the differences are (remove the -q option to see that), but it will very quickly tell you if all the files are the same.
If it shows no output, all the files are the same, otherwise it will list the files that are different.
If you were using scp, you could probably have used rsync.
rsync won't transfer files that are already up to date, so you can use it to verify a copy is current by simply running rsync again.
If you were doing something like this on the old host:
scp -r from/my/dir newhost:/to/new/dir
Then you could do something like
rsync -a --progress from/my/dir newhost:/to/new/dir
The '-a' is short for 'archive' which does a recursive copy and preserves permissions, ownerships etc. Check the man page for more info, as it can do a lot of clever things.
cd website
find . -type f -print | sort | xargs sha1sum
will produce a list of checksums for the files. You can then diff those to see if there are any missing/added/different files.
maybe you can use something similar to this:
find <original root dir> | xargs md5sum > original
find <new root dir> | xargs md5sum > new
diff original new
To add on reply from Sidney.
It is not very necessary to filter out -type f, and produce hash code.
In reply to zidarsk8, you don't need to sort, since find, same as ls, sorts the filenames alphabetically by default. It works for empty directories as well.
To summarize, top 3 best answers would be:
(P.S. Nice to do a dry run with rsync)
diff -r -q /path/to/dir1 /path/to/dir2
diff <(cd dir1 && find) <(cd dir2 && find)
rsync --dry-run -avh from/my/dir newhost:/to/new/dir
Make checksums for all files, for example using md5sum. If they're all the same for all the files and no file is missing, everything's OK.
If you used scp, you probably can also use rsync over ssh.
rsync -avH --delete-after 1.example.com:/path/to/your/dir 2.example.com:/path/to/your/
rsync does the checksums for you.
Be sure to use the -n option to perform a dry-run. Check the manual page.
I prefer rsync over scp or even local cp, every time I can use it.
If rsync is not an option, md5sum can generate md5 digests and md5sumc --check will check them.
Try diffing your directory recursively. You'll get a nice summary if something is different in one of the directories.
I have been move a web site from one server to another I copied the files using SCP
You could do this with rsync, it is great if you just want to mirror something.
/Johan
Update : Seems like #rjack beat me with the rsync answer with 6 seconds :-)
I would add this to Douglas Leeder or Eineki, but sadly, don't have enough reputation to comment. Anyway, their answers are both great, excepting that they don't work for file names with spaces. To make that work, do
find [dir1] -type f -print0 | xargs -0 [preferred hash function] > [file1]
find [dir2] -type f -print0 | xargs -0 [preferred hash function] > [file2]
diff -y [file1] [file2]
Just from experimenting, I also like to use the -W ### arguement on diff and output it to a file, easier to parse and understand in the terminal.
...when comparing two folders across a network drive or on separate computers
If comparing two folders on the same computer, diff is fine, as explained by the main answer.
However, if trying to compare two folders on different computers, or across a network, don't do that! If across a network, it will take forever since it has to actually transmit every byte of every file in the folder across the network. So, if you are comparing a 3 GB dir, all 3 GB have to be transferred across the network just to see if the remote dir and local dir are the same.
Instead, use a SHA256 hash. Hash the dir on one computer on that computer, and on the other computer on that computer. Here is how:
(From my answer here: How to hash all files in an entire directory, including the filenames as well as their contents):
# 1. First, cd to the dir in which the dir of interest is found. This is
# important! If you don't do this, then the paths output by find will differ
# between the two computers since the absolute paths to `mydir` differ. We are
# going to hash the paths too, not just the file contents, so this matters.
cd /home/gabriel # example on computer 1
cd /home/gabriel/dev/repos # example on computer 2
# 2. hash all files inside `mydir`, then hash the list of all hashes and their
# respective file paths. This obtains one single final hash. Sorting is
# necessary by piping to `sort` to ensure we get a consistent file order in
# order to ensure a consistent final hash result. Piping to awk extracts
# just the hash.
find mydir -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'
Example run and doutput:
$ find eclipse-workspace -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'
8f493478e7bb77f1d025cba31068c1f1c8e1eab436f8a3cf79d6e60abe2cd2e4
Do this on each computer, then ensure the hashes are the same to know if the directories are the same.
Note that the above commands ignore empty directories, file permissions, timestamps of when files were last edited, etc. For most cases though that's ok.
You can also use rsync to basically do this same thing for you, even when copying or comparing across a network.

Resources