pipe specific list of gziped log files into zgrep - linux

I'm having trouble with getting a list of the lines in a bunch of gzipped apache access log files. What I want is to get a list of the log files numbered 1 and 2 only, then grep through them and extract the lines with specific matching text.
I originally got this to work just for access log archives numbered 1. The "/pathname" text was the text I was looking for:
zgrep /pathname/ access_*.log.1.gz
Since ls does not support regex, I came up with the following to get a listing from the current directory of the files I want:
find . -maxdepth 1 -type f -regex '\./access.+\.log\.[1|2]\.gz' -printf '%P\n'
find . -maxdepth 1 -type f -regex '\./access.+\.log\.[1|2]\.gz' | sed "s|^\./||"
My problem now is taking that file list output and zgrepping through the files to return lines within those files that match my text. Am I barking up the wrong tree here?

Try:
zgrep /pathname/ access_*.log.{1,2}.gz
Alternatively, use find -exec:
find . -maxdepth 1 -type f -regex '\./access.+\.log\.[1|2]\.gz' -exec zgrep /path/ {} \;

I don't have apache-logs, so I use a similar, but not identical pattern:
ls /var/log/*.[12].gz
The shell doesn't support regex, but grouping with [123] or [1-3], as well as {1,2,3} and {1..3} or even {o..w} and {066..091}.

Related

See if directory rec is used as symlink in Linux

I want to see, if a symlink points to a directories in a specific dir - recursively.
Of course, I clould use
find / -type l -ls 2>/dev/null |grep /targetpath
But I do not want type all the (recurse) paths.
So I put all symlinks on my system into a file once.
find / -type l -ls 2>/dev/null >~/symlinks.txt
Then I list the directories recursively.
find /targetpath to start/ -maxdepth 2 -type d
And that is my question:
Can I pipe these paths from the last command to grep?
Grep should look into my file symlinks.txt and show the linecontent of matching lines (could be more symlinks pointing to this DIR)
I tried something like
find /targetpath to stat/ -maxdepth 2 -type d | xargs -0 -ifoo grep foo symlinks.txt
But it does not do, what I expect.
Or maybe an other, better solution?
From man find:
-lname pattern
File is a symbolic link whose contents match shell pattern pattern. [...]
Try:
find / -lname '*/targetpath/*'
See find-all-symlinks-to-a-directory-and-change-target-to-another-directory.

"find" specific contents [linux]

I would like to go through all the files in the current directory (or sub-directories) and echoes me back the name of files only if they contain certain words.
More detail:
find -type f -name "*hello *" will give me all file names that have "hello" in their names. But instead of that, I want to search through the files and if that file's content contains "hello" then prints out the name of the file.
Is there a way to approach this?
You can use GNU find and GNU grep as
find /path -type f -exec grep -Hi 'hello' {} +
This is efficient in a way that it doesn't invoke as many grep instances to as many files returned from find. This works in an underlying assumption that find returns a set of files for grep to search on. If you are unsure if the files may not be available, as a fool-proof way, you can use xargs with -r flag, in which case the commands following xargs are executed only if the piped commands return any results
find /path -type f | xargs -r0 grep -Hi 'hello'

find -exec doesn't recognize argument

I'm trying to count the total lines in the files within a directory. To do this I am trying to use a combination of find and wc. However, when I run find . -exec wc -l {}\;, I recieve the error find: missing argument to -exec. I can't see any apparent issues, any ideas?
You simply need a space between {} and \;
find . -exec wc -l {} \;
Note that if there are any sub-directories from the current location, wc will generate an error message for each of them that looks something like that:
wc: ./subdir: Is a directory
To avoid that problem, you may want to tell find to restrict the search to files :
find . -type f -exec wc -l {} \;
Another note: good idea using the -exec option . Too many times people pipe commands together thinking to get the same result, for instance here it would be :
find . -type f | xargs wc -l
The problem with piping commands in such a manner is that it breaks if any files has spaces in it. For instance here if a file name was "a b" , wc would receive "a" and then "b" separately and you would obviously get 2 error messages: a: no such file and b: no such file.
Unless you know for a fact that your file names never have any spaces in them (or non-printable characters), if you do need to pipe commands together, you need to tell all the tools you are piping together to use the NULL character (\0) as a separator instead of a space. So the previous command would become:
find . -type f -print0 | xargs -0 wc -l
With version 4.0 or later of bash, you don't need your find command at all:
shopt -s globstar
wc -l **/*
There's no simple way to skip directories, which as pointed out by Gui Rava you might want to do, unless you can differentiate files and directories by name alone. For example, maybe directories never have . in their name, while all the files have at least one extension:
wc -l **/*.*

cat files in subdirectories using linux commands

I have the following directories:
P922_101
P922_102
.
.
Each directory, for instance P922_101 has following subdirectories:
140311_AH8MHGADXX 140401_AH8CU4ADXX
Each subdirectory, for instance 140311_AH8MHGADXX has the following files:
1_140311_AH8MH_P922_101_1.fastq.gz 1_140311_AH8MH_P922_101_2.fastq.gz
2_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_2.fastq.gz
And files in 140401_AH8CU4ADXX are:
1_140401_AH8CU_P922_101_1.fastq.gz 1_140401_AH8CU_P922_4001_2.fastq.gz
2_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_4001_2.fastq.gz
I want to do 'cat' for the files in the subdirectories in the following way:
cat 1_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_1.fastq.gz
1_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_101_1.fastq.gz > P922_101_1.fastq.gz
which means that files ending with _1.fastq.gz should be concatenated into a single file and files ending with _2.fatsq.gz into another file.
It should be run for all files in subdirectories in all directories. Could someone give a linux solution to do this?
Since they're compressed, you should probably use gzip -dc (decompress and write to stdout) -
find /somePath -type f -name "*.fastq.gz" -exec gzip -dc {} \; | \
tee -a /someOutFolder/out.txt
You can use find for this:
find /top/path -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > one_file
find /top/path -mindepth 2 -type f -name "*_2.fastq.gz" -exec cat {} \; > another_file
This will look for all the files starting from /top/path and having a name matching the pattern _1.fastq.gz / _2.fastq.gz and cat them into the desired file. -mindepth 2 makes find look for files that are at least under the current directory; this way, files in /top/path won't be matched.
Note that you will probably need zcat instead of cat, for gz files.
As you keep adding details in comments, let's see what else we can do:
Say you have the list of directories in a file directories_list, each line containing one:
while read directory
do
find $directory -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > $directory/output
done < directories_list

Search for text files in a directory and append a (static) line to each of them

I have a directory with many subdirectories and files with suffixes in those subdirectories (e.g FileA-suffixA FileB-SuffixB FileC-SuffixC FileD-SuffixA, etc).
How can I recursively search for files with a certain suffix, and append a user-defined line of text to those files? I feel like this is a job for grep and sed, but I'm not sure how I would go about doing it. I'm fairly new to scripting, so please bear with me.
You can do it like
find /where/to/search -type f -iname '*.SUFFIX' -exec echo "USER DEFINED STRING" >> \{\} \;
find searches in the suplied path
-type f finds only files
-iname '*.SUFFIX' find the .SUFFIXed names, case ignored
find ./ -name "*suffix" -exec bash -c 'echo "line_to_add" >> $1' -- {} \;
Basically you use find to get a list of the files. Then you use bash to echo append your line to that list.

Resources