How can I use the cut command in awk? - linux

I would like to use the ls -l file_path|cut -d ' ' -f 1 in awk script.I need the rights of a file and than use it as an index for an array.How can I do that?
For example: if the output of the command is this : -rw-rw-r--
Than I would like to be able to do something like this: Array[-rw-rw-r--]++
I tried to do this:
awk '{ system("ls -l " FILENAME "|cut -d ' ' -f 1") }' `find $1 -type f`
to get the rights but it doesn't work.

You would not write code like that as it's trying to use awk as a shell. It's like trying to dig a hole using a screwdriver while you're sitting in your backhoe. Instead you would write something like this with GNU tools:
find "$1" -type f -print0 | xargs -0 stat -c %A | awk '{arr[$0]++}'
or even:
find "." -type f -printf '%M\n' | awk '{arr[$0]++}'
Thanks to #CharlesDuffy for bug-fixes and inspiration in the comments.

Rather than using ls and parsing it's output use stat with getline:
awk '{cmd="stat -c %A " FILENAME; cmd | getline perm; close(cmd); arr[perm]++}'

Related

Copy files containing a word and not containing other. / grep not working with for loop

I am new to Linux and got stuck when I tried to used pipe grep or find commands. I need to find a file with:
name pattern request_q_t.xml
contains "Phrase 1"
not contains "word 2" copy it to specific location.
I tried pipe grep command to locate the file and than copy.
for filename in $(grep --include=request_q*_t*.xml -li '"phrase 1"' $d/ | xargs grep -L '"word 2"')
do
echo "coping file: '$filename'"
cp $filename $outputpath
filefound=true
done
When I tried this grep command in command line its working fine
grep --include=request_q*_t*.xml -li '"phrase 1"' $d/ | xargs grep -L '"word 2"'
but I am getting error in for loop. for some reason output of grep command is
(Standard Input)
(Standard Input)
(Standard Input)
(Standard Input)
I am not sure what I am doing wrong.
what is the efficient way to do it.. Its a huge filesystem I have to search in.
find . -name "request_q*_t*.xml" -exec sh -c "if grep -q phrase\ 1 {} && ! grep -q word\ 2 {} ;then cp {} /path/to/somewhere/;fi;" \;
You can use AWK for this in combination with xargs. The problem is that you have to read all files completely as they cannot contain that single string, but you can also terminate early if that string is found:
awk '(FNR==1){if(a) print fname; fname=FILENAME; a=0}
/Phrase 1/{a=1}
/Word 2/{a=0;nextfile}
END{if(a) print fname}' request_q*_t*.xml \
| xargs -I{} cp "{}" "$outputpath"
If you want to store "Phrase 1" and "Word 2" in variables, you can use:
awk -v include="Phrase 1" -v exclude="Word 2" \
'(FNR==1){if(a) print fname; fname=FILENAME; a=0}
($0~include){a=1}
($0~exclude){a=0;nextfile}
END{if(a) print fname}' request_q*_t*.xml \
| xargs -I{} cp "{}" "$outputpath"
You can nest the $() constructs:
for filename in $( grep -L '"word 2"' $(grep --include=request_q*_t*.xml -li '"phrase 1"' $d/ ))
do
echo "coping file: '$filename'"
cp $filename $outputpath
filefound=true
done

Using AWK to sort lines and columns together

I'm doing an assignment where I've been asked to find files between a certain size and sha1sum them. I've specifically been asked to provide the output in the following format:
Filename Sha1 File Size
all on one line. I'm not able to create additional files.
I've tried the following:
for listed in $(find /my/output -type f -size -20000c):
do
sha1sum "$listed" | awk '{print $1}'
ls -l "$listed" | awk '{print $9 $5}'
done
which gives me the required output fields, but not in the requested format, i.e.
sha1sum
filename filesize
Could anyone suggest a manner in which I'd be able to get all of this on a single line?
Thank you :)
If you use the stat command to avoid needing to parse the output of ls, you can simply echo all the values you need:
while IFS= read -r -d '' listed
do
echo "$listed" $(sha1sum "$listed") $(stat -c "%s" "$listed")
done < <(find /my/output -type f -size -20000c -print0)
Check your version of stat, though, the above is GNU. On OS X, e.g., would be
stat -f "%z" $listed
With single pipeline:
find /my/path-type f -size -20000c -printf "%s " -exec sha1sum {} \; | awk '{ print $3,$2,$1 }'
An exemplary output (as a result of my local test) in the needed format FILENAME SHA1 FILESIZE:
./GSE11111/GSE11111_RAW.tar 9ed615fcbcb0b771fcba1f2d2e0aef6d3f0f8a11 25446400
./artwork_tmp 3a43f1be6648cde0b30fecabc3fa795a6ae6d30a 40010166

Use grep for total count of string found inside file directory

I know that grep -c 'string' dir returns a list of file names and the number of times that string appeared in each respective file.
Is there any way to simply get the total count of the string appearing in the entire file directory using grep (or possibly manipulating this output)? Thank you.
BASH_DIR=$(awk -F "=" '/Bash Dir/ {print $2}' bash_input.txt)
FIND_COUNT=0
for f in "$BASH_DIR"/*.sh
do
f=$(basename $f)
#Read through job files
echo -e "$f: $(cat * | grep -c './$f')"
done
If you only want to look in files ending in .sh, use
grep -c pattern *.sh
or if you want it stored in a variable, use
n=$(grep -c xyz *.sh)
There are many ways to do this, one of them is by using awk:
grep -c 'string' dir | awk -F: '{ s+=$2 } END { print s }'
awk will get the number of occurences in each file from the output of grep and print the sum.
you can use find with -exec cat and grep -c string:
find /etc/ -maxdepth 1 -type f -exec cat {} + | grep -c conf
139
So there are 139 occurrence of string 'conf' on my /etc.
Mind you that I didn't want to run recursively otherwise I would remove -maxdepth 1

print search term with line count

Hello bash beginner question. I want to look through multiple files, find the lines that contain a search term, count the number of unique lines in this list and then print into a tex file:
the input file name
the search term used
the count of unique lines
so an example output line for file 'Firstpredictoroutput.txt' using search term 'Stop_gained' where there are 10 unique lines in the file would be:
Firstpredictoroutput.txt Stop_gained 10
I can get the unique count for a single file using:
grep 'Search_term' inputfile.txt | uniq -c | wc -l | >>output.txt
But I don't know enough yet about implementing loops in pipelines using bash.
All my inputfiles end with *predictoroutput.txt
Any help is greatly appreciated.
Thanks in advance,
Rubal
You can write a function called fun, and call the fun with two arguments: filename and pattern
$ fun() { echo "$1 $2 `grep -c $2 $1`"; }
$ fun input.txt Stop_gained
input.txt Stop_gained 2
You can use find :
find . -type f -exec sh -c "grep 'Search_term' {} | uniq -c | wc -l >> output.txt" \;
Although you can have issue with weird filenames. You can add more options to find, for example to treat only '.txt' files :
find . -type f -name "*.txt" -exec sh -c "grep 'Search_term' {} | uniq -c | wc -l >> output.txt" \;
q="search for this"
for f in *.txt; do echo "$f $q $(grep $q $f | uniq | wc -l)"; done > out.txt

Combining greps to make script to count files in folder

I need some help combining elements of scripts to form a read output.
Basically I need to get the file name of a user for the folder structure listed below and using count the number of lines in the folder for that user with the file type *.ano
This is shown in the extract below, to note that the location on the filename is not always the same counting from the front.
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt
/home/user/Drive-backup/2011 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/3.ano
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.ano
awk -F/ '{print $(NF-2)}'
This will give me the username I need but I also need to know how many non blank lines they are in that users folder for file type *.ano. I have the grep below that works but I dont know how to put it all together so it can output a file that makes sense.
grep -cv '^[[:space:]]*$' *.ano | awk -F: '{ s+=$2 } END { print s }'
Example output needed
UserA 500
UserB 2
UserC 20
find /home -name '*.ano' | awk -F/ '{print $(NF-2)}' | sort | uniq -c
That ought to give you the number of "*.ano" files per user given your awk is correct. I often use sort/uniq -c to count the number of instances of a string, in this case username, as opposed to 'wc -l' only counting input lines.
Enjoy.
Have a look at wc (word count).
To count the number of *.ano files in a directory you can use
find "$dir" -iname '*.ano' | wc -l
If you want to do that for all directories in some directory, you can just use a for loop:
for dir in * ; do
echo "user $dir"
find "$dir" -iname '*.ano' | wc -l
done
Execute the bash-script below from folder
/home/user/Drive-backup/2010 Backup/2010 Account/Jan
and it will report the number of non-blank lines per user.
#!/bin/bash
#save where we start
base=$(pwd)
# get all top-level dirs, skip '.'
D=$(find . \( -type d ! -name . -prune \))
for d in $D; do
cd $base
cd $d
# search for all files named *.ano and count blank lines
sum=$(find . -type f -name *.ano -exec grep -cv '^[[:space:]]*$' {} \; | awk '{sum+=$0}END{print sum}')
echo $d $sum
done
This might be what you want (untested): requires bash version 4 for associative arrays
declare -A count
cd /home/user/Drive-backup
for userdir in */*/*/*; do
username=${userdir##*/}
lines=$(grep -cv '^[[:space:]]$' $userdir/user.dir/*.ano | awk '{sum += $2} END {print sum}')
(( count[$username] += lines ))
done
for user in "${!count[#]}"; do
echo $user ${count[$user]}
done
Here's yet another way of doing it (on Mac OS X 10.6):
find -x "$PWD" -type f -iname "*.ano" -exec bash -c '
ar=( "${#%/*}" ) # perform a "dirname" command on every array item
printf "%s\000" "${ar[#]%/*}" # do a second "dirname" and add a null byte to every array item
' arg0 '{}' + | sort -uz |
while IFS="" read -r -d '' userDir; do
# to-do: customize output to get example output needed
echo "$userDir"
basename "$userDir"
find -x "${userDir}" -type f -iname "*.ano" -print0 |
xargs -0 -n 500 grep -hcv '^[[:space:]]*$' | awk '{ s+=$0 } END { print s }'
#xargs -0 -n 500 grep -cv '^[[:space:]]*$' | awk -F: '{ s+=$NF } END { print s }'
printf '%s\n' '----------'
done

Resources