Grep zipped files for variables from .txt file - cygwin

I have zipped files like:
20191231_aaa.zip
20191231_bbb.zip
20191231_ccc.zip
20191230_aaa.zip
20191230_bbb.zip
20191230_ccc.zip
20191229_aaa.zip
20191229_bbb.zip
20191229_ccc.zip
...
I want to grep if files
*aaa.zip and *bbb.zip
contains files:
with specified word in name like 'house'
for specified dates only like: 20191230, 20191220, 20191210 that are in dates.txt file in format:
20191230
20191220
20191210
I stacked with this:
ls | xargs grep dates.txt | unzip -l | grep house

If I correctly understand your question, below should work:
ls *.zip | fgrep -f dates.txt | xargs -I{F} sh -c 'unzip -l {F} | grep -q house && echo FOUND: {F}'
DO note that above is a simplification, assuming "clean" *.zip filenames i.e. without spaces nor quotes.
Distilling it:
ls *.zip | fgrep -f dates.txt will filter the file list using dates.txt as fgrep patterns
| xargs -I{F} sh -c '...' will use each line read (filename.zip) to execute sh -c '...' where the ... shell snippet will refer to each filename.zip as {F}

Related

Copy files containing a word and not containing other. / grep not working with for loop

I am new to Linux and got stuck when I tried to used pipe grep or find commands. I need to find a file with:
name pattern request_q_t.xml
contains "Phrase 1"
not contains "word 2" copy it to specific location.
I tried pipe grep command to locate the file and than copy.
for filename in $(grep --include=request_q*_t*.xml -li '"phrase 1"' $d/ | xargs grep -L '"word 2"')
do
echo "coping file: '$filename'"
cp $filename $outputpath
filefound=true
done
When I tried this grep command in command line its working fine
grep --include=request_q*_t*.xml -li '"phrase 1"' $d/ | xargs grep -L '"word 2"'
but I am getting error in for loop. for some reason output of grep command is
(Standard Input)
(Standard Input)
(Standard Input)
(Standard Input)
I am not sure what I am doing wrong.
what is the efficient way to do it.. Its a huge filesystem I have to search in.
find . -name "request_q*_t*.xml" -exec sh -c "if grep -q phrase\ 1 {} && ! grep -q word\ 2 {} ;then cp {} /path/to/somewhere/;fi;" \;
You can use AWK for this in combination with xargs. The problem is that you have to read all files completely as they cannot contain that single string, but you can also terminate early if that string is found:
awk '(FNR==1){if(a) print fname; fname=FILENAME; a=0}
/Phrase 1/{a=1}
/Word 2/{a=0;nextfile}
END{if(a) print fname}' request_q*_t*.xml \
| xargs -I{} cp "{}" "$outputpath"
If you want to store "Phrase 1" and "Word 2" in variables, you can use:
awk -v include="Phrase 1" -v exclude="Word 2" \
'(FNR==1){if(a) print fname; fname=FILENAME; a=0}
($0~include){a=1}
($0~exclude){a=0;nextfile}
END{if(a) print fname}' request_q*_t*.xml \
| xargs -I{} cp "{}" "$outputpath"
You can nest the $() constructs:
for filename in $( grep -L '"word 2"' $(grep --include=request_q*_t*.xml -li '"phrase 1"' $d/ ))
do
echo "coping file: '$filename'"
cp $filename $outputpath
filefound=true
done

Count lines and text strings in muliple csv files in linux and pipe to a csv file

How would i traver though a folder containing multiple cvs files and:
Count non-empty lines
Count lines containing:
a. "Time Out"
b. "Mortality"
c. "Init"
An pipe that to a csv file in the format:
Filename "Line Count" "Time Out" "Mortality" "Init"
in command line in Linux?
Edit:
wc -l ./*.csv > result.txt
And was able to get the count of the lines, but am unsure about find strings as stated above.
I edited according to shellcheck.net and got this:
#!/bin/bash
find ./ -type f -exec cat {} > /tmp/file.tmp \;
cmd /tmp/file.tmp | grep Time Out | grep Mortality | grep Init > /tmp/file2.tmp
wc -l /tmp/file.tmp
rm /tmp/file.tmp
You could use this script (may not be optimal for periodic use):
!/bin/bash
find ./ -type f -name "*.csv" -exec cat {} \; | grep Time Out | grep Mortality | grep Init > /tmp/file.tmp
wc -l /tmp/file.tmp
rm /tmp/file.tmp

Find files and display results in an specific format to txt file

I currently have many text files over several directories that I am sorting and storing the results in text file. The issues is not the sorting part but formatting the output that gets placed in the text file. I am looking to output in this format file '/path/to/file1' currently it shows /path/to/file1. I want to do this all within one process(not have to run an additional loop or fine to change the format).
$ target=~/tmp/shuf
$ destination=/filepath/
$ find $target -iname "*.txt" -type f | shuf | awk -F- '{printf("%s:%s\n", $0, $NF)}' | sort -t : -k 2 -s | cut -d : -f 1 | xargs -n1 basename | sed "s,^,$destination," > $destination/results.txt
Current results.txt:
/path/to/cs650-software_methodologies-fname_lname-001.txt
/path/to/s630-linux_research_paper-fname_lname-001.txt
Desired results.txt:
file '/path/to/cs650-software_methodologies-fname_lname-001.txt'
file '/path/to/s630-linux_research_paper-fname_lname-001.txt'
I find awk is often easier for this kind of formatting, if you don't have substitutions. This also allows us to skip the basename call and leave that part to awk as well. Just note that this will not work if you have any forward slashes in your actual filenames.
find $target -type f -iname "*.txt" \
| shuf \
| awk -F- '{printf("%s:%s\n", $0, $NF)}' \
| sort -t : -k 2 -s \
| cut -d : -f 1 \
| awk -F / '{printf("file '\''%s'\''\n", $0)}' \
> $destination/results.txt

Use grep for total count of string found inside file directory

I know that grep -c 'string' dir returns a list of file names and the number of times that string appeared in each respective file.
Is there any way to simply get the total count of the string appearing in the entire file directory using grep (or possibly manipulating this output)? Thank you.
BASH_DIR=$(awk -F "=" '/Bash Dir/ {print $2}' bash_input.txt)
FIND_COUNT=0
for f in "$BASH_DIR"/*.sh
do
f=$(basename $f)
#Read through job files
echo -e "$f: $(cat * | grep -c './$f')"
done
If you only want to look in files ending in .sh, use
grep -c pattern *.sh
or if you want it stored in a variable, use
n=$(grep -c xyz *.sh)
There are many ways to do this, one of them is by using awk:
grep -c 'string' dir | awk -F: '{ s+=$2 } END { print s }'
awk will get the number of occurences in each file from the output of grep and print the sum.
you can use find with -exec cat and grep -c string:
find /etc/ -maxdepth 1 -type f -exec cat {} + | grep -c conf
139
So there are 139 occurrence of string 'conf' on my /etc.
Mind you that I didn't want to run recursively otherwise I would remove -maxdepth 1

How to count occurrences of a word in all the files of a directory?

I’m trying to count a particular word occurrence in a whole directory. Is this possible?
Say for example there is a directory with 100 files all of whose files may have the word “aaa” in them. How would I count the number of “aaa” in all the files under that directory?
I tried something like:
zegrep "xception" `find . -name '*auth*application*' | wc -l
But it’s not working.
grep -roh aaa . | wc -w
Grep recursively all files and directories in the current dir searching for aaa, and output only the matches, not the entire line. Then, just use wc to count how many words are there.
Another solution based on find and grep.
find . -type f -exec grep -o aaa {} \; | wc -l
Should correctly handle filenames with spaces in them.
Use grep in its simplest way. Try grep --help for more info.
To get count of a word in a particular file:
grep -c <word> <file_name>
Example:
grep -c 'aaa' abc_report.csv
Output:
445
To get count of a word in the whole directory:
grep -c -R <word>
Example:
grep -c -R 'aaa'
Output:
abc_report.csv:445
lmn_report.csv:129
pqr_report.csv:445
my_folder/xyz_report.csv:408
Let's use AWK!
$ function wordfrequency() { awk 'BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf("%3d %s\n", words[w], w) } ' | sort -rn; }
$ cat your_file.txt | wordfrequency
This lists the frequency of each word occurring in the provided file. If you want to see the occurrences of your word, you can just do this:
$ cat your_file.txt | wordfrequency | grep yourword
To find occurrences of your word across all files in a directory (non-recursively), you can do this:
$ cat * | wordfrequency | grep yourword
To find occurrences of your word across all files in a directory (and it's sub-directories), you can do this:
$ find . -type f | xargs cat | wordfrequency | grep yourword
Source: AWK-ward Ruby
find .|xargs perl -p -e 's/ /\n'|xargs grep aaa|wc -l
cat the files together and grep the output: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '\<exception\>'
if you want 'exceptional' to match, don't use the '\<' and '\>' around the word.
How about starting with:
cat * | sed 's/ /\n/g' | grep '^aaa$' | wc -l
as in the following transcript:
pax$ cat file1
this is a file number 1
pax$ cat file2
And this file is file number 2,
a slightly larger file
pax$ cat file[12] | sed 's/ /\n/g' | grep 'file$' | wc -l
4
The sed converts spaces to newlines (you may want to include other space characters as well such as tabs, with sed 's/[ \t]/\n/g'). The grep just gets those lines that have the desired word, then the wc counts those lines for you.
Now there may be edge cases where this script doesn't work but it should be okay for the vast majority of situations.
If you wanted a whole tree (not just a single directory level), you can use somthing like:
( find . -name '*.txt' -exec cat {} ';' ) | sed 's/ /\n/g' | grep '^aaa$' | wc -l
There's also a grep regex syntax for matching words only:
# based on Carlos Campderrós solution posted in this thread
man grep | less -p '\<'
grep -roh '\<aaa\>' . | wc -l
For a different word matching regex syntax see:
man re_format | less -p '\[\[:<:\]\]'

Resources