Using AWK to sort lines and columns together - linux

I'm doing an assignment where I've been asked to find files between a certain size and sha1sum them. I've specifically been asked to provide the output in the following format:
Filename Sha1 File Size
all on one line. I'm not able to create additional files.
I've tried the following:
for listed in $(find /my/output -type f -size -20000c):
do
sha1sum "$listed" | awk '{print $1}'
ls -l "$listed" | awk '{print $9 $5}'
done
which gives me the required output fields, but not in the requested format, i.e.
sha1sum
filename filesize
Could anyone suggest a manner in which I'd be able to get all of this on a single line?
Thank you :)

If you use the stat command to avoid needing to parse the output of ls, you can simply echo all the values you need:
while IFS= read -r -d '' listed
do
echo "$listed" $(sha1sum "$listed") $(stat -c "%s" "$listed")
done < <(find /my/output -type f -size -20000c -print0)
Check your version of stat, though, the above is GNU. On OS X, e.g., would be
stat -f "%z" $listed

With single pipeline:
find /my/path-type f -size -20000c -printf "%s " -exec sha1sum {} \; | awk '{ print $3,$2,$1 }'
An exemplary output (as a result of my local test) in the needed format FILENAME SHA1 FILESIZE:
./GSE11111/GSE11111_RAW.tar 9ed615fcbcb0b771fcba1f2d2e0aef6d3f0f8a11 25446400
./artwork_tmp 3a43f1be6648cde0b30fecabc3fa795a6ae6d30a 40010166

Related

How can I use the cut command in awk?

I would like to use the ls -l file_path|cut -d ' ' -f 1 in awk script.I need the rights of a file and than use it as an index for an array.How can I do that?
For example: if the output of the command is this : -rw-rw-r--
Than I would like to be able to do something like this: Array[-rw-rw-r--]++
I tried to do this:
awk '{ system("ls -l " FILENAME "|cut -d ' ' -f 1") }' `find $1 -type f`
to get the rights but it doesn't work.
You would not write code like that as it's trying to use awk as a shell. It's like trying to dig a hole using a screwdriver while you're sitting in your backhoe. Instead you would write something like this with GNU tools:
find "$1" -type f -print0 | xargs -0 stat -c %A | awk '{arr[$0]++}'
or even:
find "." -type f -printf '%M\n' | awk '{arr[$0]++}'
Thanks to #CharlesDuffy for bug-fixes and inspiration in the comments.
Rather than using ls and parsing it's output use stat with getline:
awk '{cmd="stat -c %A " FILENAME; cmd | getline perm; close(cmd); arr[perm]++}'

How to find files that have long chains of zero bytes?

How can I find files that have long chains of consecutive 0s (zero bytes - 0x00) as a result of disk failure? For example, how can I find files that have more than 10000 zero bytes in sequence?
Sure, I can write a program using Java or other programming language, but is there a way to do it using more or less standard Linux command line tools?
Update
You can generate test file with dd if=/dev/zero of=zeros bs=1 count=100000.
This may be a start:
find /some/starting/point -type f -size +10000 -exec \
perl -nE 'if (/\x0{10000}/) {say $ARGV; close ARGV}' '{}' +
To test for a single file, named filename:
if tr -sc '\0' '\n' < filename | tr '\0' Z | grep -qE 'Z{1000}'; then
# ...
fi
You can now use a suitable find command to filter relevant files for test.
For example, all *.txt files in PWD:
while read -rd '' filename;do
if tr -sc '\0' '\n' < "$filename" | tr '\0' Z | grep -qE 'Z{1000}'; then
# For example, simply print "$filename"
printf '%s\n' "$filename"
fi
done < <(find . -type f -name '*.txt' -print0)
Find and grep should work just fine:
grep -E "(\0)\1{1000}" <file name>
if it's a single file or a group of files in the same dir
If you want to search throughout the system there's:
find /dir/ -exec grep -E "(\0)\1{1000}" {} \; 2> /dev/null
this is very slow though, if you're looking for something faster and can do without the thousand(or large number) of zeros
I'll suggest replacing the grep with 'grep 000000000*' instead

Use grep for total count of string found inside file directory

I know that grep -c 'string' dir returns a list of file names and the number of times that string appeared in each respective file.
Is there any way to simply get the total count of the string appearing in the entire file directory using grep (or possibly manipulating this output)? Thank you.
BASH_DIR=$(awk -F "=" '/Bash Dir/ {print $2}' bash_input.txt)
FIND_COUNT=0
for f in "$BASH_DIR"/*.sh
do
f=$(basename $f)
#Read through job files
echo -e "$f: $(cat * | grep -c './$f')"
done
If you only want to look in files ending in .sh, use
grep -c pattern *.sh
or if you want it stored in a variable, use
n=$(grep -c xyz *.sh)
There are many ways to do this, one of them is by using awk:
grep -c 'string' dir | awk -F: '{ s+=$2 } END { print s }'
awk will get the number of occurences in each file from the output of grep and print the sum.
you can use find with -exec cat and grep -c string:
find /etc/ -maxdepth 1 -type f -exec cat {} + | grep -c conf
139
So there are 139 occurrence of string 'conf' on my /etc.
Mind you that I didn't want to run recursively otherwise I would remove -maxdepth 1

Use wc on all subdirectories to count the sum of lines

How can I count all lines of all files in all subdirectories with wc?
cd mydir
wc -l *
..
11723 total
man wc suggests wc -l --files0-from=-, but I do not know how to generate the list of all files as NUL-terminated names
find . -print | wc -l --files0-from=-
did not work.
You probably want this:
find . -type f -print0 | wc -l --files0-from=-
If you only want the total number of lines, you could use
find . -type f -exec cat {} + | wc -l
Perhaps you are looking for exec option of find.
find . -type f -exec wc -l {} \; | awk '{total += $1} END {print total}'
To count all lines for specific file extension u can use ,
find . -name '*.fileextension' | xargs wc -l
if you want it on two or more different types of files u can put -o option
find . -name '*.fileextension1' -o -name '*.fileextension2' | xargs wc -l
Another option would be to use a recursive grep:
grep -hRc '' . | awk '{k+=$1}END{print k}'
The awk simply adds the numbers. The grep options used are:
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines. (-c is specified by POSIX.)
-h, --no-filename
Suppress the prefixing of file names on output. This is the
default when there is only one file (or only standard input) to
search.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
The grep, therefore, counts the number of lines matching anything (''), so essentially just counts the lines.
I would suggest something like
find ./ -type f | xargs wc -l | cut -c 1-8 | awk '{total += $1} END {print total}'
Based on ДМИТРИЙ МАЛИКОВ's answer:
Example for counting lines of java code with formatting:
one liner
find . -name *.java -exec wc -l {} \; | awk '{printf ("%3d: %6d %s\n",NR,$1,$2); total += $1} END {printf (" %6d\n",total)}'
awk part:
{
printf ("%3d: %6d %s\n",NR,$1,$2);
total += $1
}
END {
printf (" %6d\n",total)
}
example result
1: 120 ./opencv/NativeLibrary.java
2: 65 ./opencv/OsCheck.java
3: 5 ./opencv/package-info.java
190
Bit late to the game here, but wouldn't this also work? find . -type f | wc -l
This counts all lines output by the 'find' command. You can fine-tune the 'find' to show whatever you want. I am using it to count the number of subdirectories, in one specific subdir, in deep tree: find ./*/*/*/*/*/*/TOC -type d | wc -l . Output: 76435. (Just doing a find without all the intervening asterisks yielded an error.)

Combining greps to make script to count files in folder

I need some help combining elements of scripts to form a read output.
Basically I need to get the file name of a user for the folder structure listed below and using count the number of lines in the folder for that user with the file type *.ano
This is shown in the extract below, to note that the location on the filename is not always the same counting from the front.
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt
/home/user/Drive-backup/2011 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/3.ano
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.ano
awk -F/ '{print $(NF-2)}'
This will give me the username I need but I also need to know how many non blank lines they are in that users folder for file type *.ano. I have the grep below that works but I dont know how to put it all together so it can output a file that makes sense.
grep -cv '^[[:space:]]*$' *.ano | awk -F: '{ s+=$2 } END { print s }'
Example output needed
UserA 500
UserB 2
UserC 20
find /home -name '*.ano' | awk -F/ '{print $(NF-2)}' | sort | uniq -c
That ought to give you the number of "*.ano" files per user given your awk is correct. I often use sort/uniq -c to count the number of instances of a string, in this case username, as opposed to 'wc -l' only counting input lines.
Enjoy.
Have a look at wc (word count).
To count the number of *.ano files in a directory you can use
find "$dir" -iname '*.ano' | wc -l
If you want to do that for all directories in some directory, you can just use a for loop:
for dir in * ; do
echo "user $dir"
find "$dir" -iname '*.ano' | wc -l
done
Execute the bash-script below from folder
/home/user/Drive-backup/2010 Backup/2010 Account/Jan
and it will report the number of non-blank lines per user.
#!/bin/bash
#save where we start
base=$(pwd)
# get all top-level dirs, skip '.'
D=$(find . \( -type d ! -name . -prune \))
for d in $D; do
cd $base
cd $d
# search for all files named *.ano and count blank lines
sum=$(find . -type f -name *.ano -exec grep -cv '^[[:space:]]*$' {} \; | awk '{sum+=$0}END{print sum}')
echo $d $sum
done
This might be what you want (untested): requires bash version 4 for associative arrays
declare -A count
cd /home/user/Drive-backup
for userdir in */*/*/*; do
username=${userdir##*/}
lines=$(grep -cv '^[[:space:]]$' $userdir/user.dir/*.ano | awk '{sum += $2} END {print sum}')
(( count[$username] += lines ))
done
for user in "${!count[#]}"; do
echo $user ${count[$user]}
done
Here's yet another way of doing it (on Mac OS X 10.6):
find -x "$PWD" -type f -iname "*.ano" -exec bash -c '
ar=( "${#%/*}" ) # perform a "dirname" command on every array item
printf "%s\000" "${ar[#]%/*}" # do a second "dirname" and add a null byte to every array item
' arg0 '{}' + | sort -uz |
while IFS="" read -r -d '' userDir; do
# to-do: customize output to get example output needed
echo "$userDir"
basename "$userDir"
find -x "${userDir}" -type f -iname "*.ano" -print0 |
xargs -0 -n 500 grep -hcv '^[[:space:]]*$' | awk '{ s+=$0 } END { print s }'
#xargs -0 -n 500 grep -cv '^[[:space:]]*$' | awk -F: '{ s+=$NF } END { print s }'
printf '%s\n' '----------'
done

Resources