Extracting the user with the most amount of files in a dir - linux

I am currently working on a script that should receive a standard input, and output the user with the highest amount of files in that directory.
I've wrote this so far:
#!/bin/bash
while read DIRNAME
do
ls -l $DIRNAME | awk 'NR>1 {print $4}' | uniq -c
done
and this is the output I get when I enter /etc for an instance:
26 root
1 dip
8 root
1 lp
35 root
2 shadow
81 root
1 dip
27 root
2 shadow
42 root
Now obviously the root folder is winning in this case, but I don't want only to output this, i also want to sum the number of files and output only the user with the highest amount of files.
Expected output for entering /etc:
root
is there a simple way to filter the output I get now, so that the user with the highest sum will be stored somehow?

ls -l /etc | awk 'BEGIN{FS=OFS=" "}{a[$4]+=1}END{ for (i in a) print a[i],i}' | sort -g -r | head -n 1 | cut -d' ' -f2
This snippet returns the group with the highest number of files in the /etc directory.
What it does:
ls -l /etc lists all the files in /etc in long form.
awk 'BEGIN{FS=OFS=" "}{a[$4]+=1}END{ for (i in a) print a[i],i}' sums the number of occurrences of unique words in the 4th column and prints the number followed by the word.
sort -g -r sorts the output descending based on numbers.
head -n 1 takes the first line
cut -d' ' -f2 takes the second column while the delimiter is a white space.
Note: In your question, you are saying that you want the user with the highest number of files, but in your code you are referring to the 4th column which is the group. My code follows your code and groups on the 4th column. If you wish to group by user and not group, change {a[$4]+=1} to {a[$3]+=1}.

Without unreliable parsing the output of ls:
read -r dirname
# List user owner of files in dirname
stat -c '%U' "$dirname/" |
# Sort the list of users by name
sort |
# Count occurrences of user
uniq -c |
# Sort by higher number of occurrences numerically
# (first column numerically reverse order)
sort -k1nr |
# Get first line only
head -n1 |
# Keep only starting at character 9 to get user name and discard counts
cut -c9-

I have an awk script to read standard input (or command line files) and sum up the unique names.
summer:
awk '
{ sum[ $2 ] += $1 }
END {
for ( v in sum ) {
print v, sum[v]
}
}
' "$#"
Let's say we are using your example of /etc:
ls -l /etc | summer
yields:
0
dip 2
shadow 4
root 219
lp 1
I like to keep utilities general so I can reuse them for other purposes. Now you can just use sort and head to get the maximum result output by summer:
ls -l /etc | summer | sort -r -k2,2 -n | head -1 | cut -f1 -d' '
Yields:
root

Related

How to only display owner of file when using ls command with special edge case

My objective is to find all files in a directory recursively and display only the file owner name so I'm able to use uniq to count the # of files a user owns in a directory. The command I am using is the following:
command = "find " + subdirectory.directoryPath + "/ -type f -exec ls -lh {} + | cut -f 3 -d' ' | sort | uniq -c | sort -n"
This command successfully displays only the owner of the file of each line, and allows me to count of the # of times the owner names is repeated, hence getting the # of files they own in a subdirectory. Cut uses ' ' as a delimiter and only keeps the 3rd column in ls, which is the owner of the file.
However, for my purpose there is this special edge case, where I'm not able to obtain the owner name if the following occurs.
-rw-r----- 1 31122918 group 20169510233 Mar 17 06:02
-rw-r----- 1 user1 group 20165884490 Mar 25 11:11
-rw-r----- 1 user1 group 20201669165 Mar 31 04:17
-rwxr-x--- 1 user3 group 20257297418 Jun 2 13:25
-rw-r----- 1 user2 group 20048291543 Mar 4 22:04
-rw-r----- 1 14235912 group 20398346003 Mar 10 04:47
The special edge cases are the #s as the owner you see above. The current command Im using can detect user1,user2,and user3 perfectly, but because the numbers are placed all the way to the right, the command above doesn't detect the numbers, and simply displays nothing. Example output is shown here:
1
1 user3
1 user2
1
2 user1
Can anyone help me parse the ls output so I'm able to detect these #'s when trying to only print the file owner column?
cut -d' ' won't capture the third field when it contains leading spaces -- each space is treated as the separator of another field.
Alternatives:
cut -c
123456789X123456789X123456789X123456789X123456789L0123456789X0123
-rw-r----- 1 31122918 group 20169510233 Mar 17 06:02
-rw-r----- 1 user1 group 20165884490 Mar 25 11:11
The data you seek is between characters 15 and 34 on each line, so you can say
cut -c14-39
perl/awk: other tools are adept at extracting data out of a line. Try one of
perl -lane 'print $F[2]'
awk '{print $3}'
Don't try to parse the output of ls. Use the stat command.
find dirname ! -user root -type f -exec stat --format=%U {} + | sort | uniq -c | sort -n
%U prints the owner username.
Merging multiple spaces
tr -s ' '
Get file users
ls -hl | tr -s ' ' | cut -f 3 -d' '
ls -hl | awk '{print $3}'
sudo find ./ ! -user root -type f -exec ls -lh {} + | tr -s ' ' | cut -f 3 -d' ' | sort | uniq -c | sort -n
You can use the below command to display only the owner of a directory or a file.
stat -c "%U" /path/of/the/file/or/directory
If you also want to print the group of a file or directory you can use %G as well.
stat -c "%U %G" /path/of/the/file/or/directory

Check values in file is greater or equal in bash script

I have file.txt include:
2
10
60
90
now how can i check if numbers in that file is equal on greater than 50 end then do something. Something in my case is sending an email this part i have.
I have tried do this with awk but it does not work in script.
The following command will output the greatest value of your file:
sort -nr file.txt | head -1
Then just compare it to the value of your choice and voilĂ . Something like:
if [ `sort -nr file.txt | head -1` -ge 50 ]
then
<do something>
fi
Explanation:
sort -n sorts the file as numbers (otherwise 12 would be considered greater than 100).
sort -r reverse the sort (by default it displays lower numbers first, with -r it displays higher first).
head -1 displays only the first output.
This will serve your job.
$ awk 'FNR > 0 { if($1 > 50) print $1 }' <file>

Finding the Number of strings in a File

I'm trying to write a very small program that will check the number of sub strings in a large text file. All it will do is count the first 2000 lines of the text file, find any "TTT" sub-strings, count them, and set a variable to that total. I'm a bit new to shell, so any help would be amazingly appreciated!
#!/bin/bash
$counter=(head -2000 [file name] | grep TTT | grep -o TTT | wc -l)
echo $counter
For what it's worth you might awk better suited for this task:
awk -F"ttt" '{j=(NF-1)+j}END{print j}' filename
This will split each record in your file by delimiter "ttt". Then it counts the number of fields, subtracts one, and adds that to the total.
A file like:
ttt tttttt something
1 5 ttt
tt
one more ttt record
Would be split (visualizing with pipe delim) like:
| || something
1 5 |
tt
one more | record
Counting the number of fields per record:
4
2
1
2
Subtracting one from that:
3
1
0
1
Which totals to 5, which is how many "ttt" substrings are present.
To incorporate this into your script (and fixing your other issue):
#!/bin/bash
counter=$(awk -F"ttt" '{j=(NF-1)+j}END{print j}' filename)
echo $counter
The change here is that when we set a variable in Bash we don't include the $ sign at the front. Only in referencing the variable do we include the $.
You have some minor syntax errors there, probably you meant this:
counter=$(head -2000 [file name] | grep TTT | grep -o TTT | wc -l)
echo $counter
Notice the tiny changes I made there to make it work.
Btw the grep TTT in the middle is redundant, you can simply drop it, that is:
counter=$(head -2000 [file name] | grep -o TTT | wc -l)
grep can already do what you want: counter=$(grep -c TTT $infile). You can limit the number of hits (not lines) with -m NUM, --max-count=NUM, which makes grep stop at the end of the file OR when NUM occurrences are found.

Obtaining the total of coincidences with multiple pattern using grep command

I have a file in Linux contains strings:
CALLTMA
Starting
Starting
Ending
Starting
Ending
Ending
CALLTMA
Ending
I need the quantity of any string (FE. #Ending, # Starting, #CALLTMA). In my example I need obtaining:
CALLTMA : 2
Starting: 3
Ending : 4
I can obtaining this output when I execute 3 commands:
grep -i "Starting" "/myfile.txt" | wc -l
grep -i "Ending" "/myfile.txt" | wc -l
grep -i "CALLTMA" "/myfile.txt" | wc -l
I want to know if it is possible to obtain the same output using only one command.
I try running this command
grep -iE "CALLTMA|Starting|Ending" "/myfile.txt" | wc -l
But this returned the total of coincidences. I appreciate your help .
Use sort and uniq:
sort myfile.txt | uniq -c
The -c adds the counts to the unique lines. If you want to sort the output by frequency, add
| sort -n
to the end (and change to -nr if you want the descending order).
A simple awk way to handle this:
awk '{counts[$1]++} END{for (c in counts) print c, counts[c]}' file
Starting 3
Ending 4
CALLTMA 2
grep -c will work. You can put it all together in a short script:
for i in Starting CALLTMA Ending; do
printf "%-8s : %d\n" "$i" $(grep -c "$i" file.txt)
done
(to enter the search terms as arguments, just use the arguments array for the loop list, e.g. for i in "$#"; do)
Output
Starting : 3
CALLTMA : 2
Ending : 4

Find unique lines

How can I find the unique lines and remove all duplicates from a file?
My input file is
1
1
2
3
5
5
7
7
I would like the result to be:
2
3
sort file | uniq will not do the job. Will show all values 1 time
uniq has the option you need:
-u, --unique
only print unique lines
$ cat file.txt
1
1
2
3
5
5
7
7
$ uniq -u file.txt
2
3
Use as follows:
sort < filea | uniq > fileb
You could also print out the unique value in "file" using the cat command by piping to sort and uniq
cat file | sort | uniq -u
While sort takes O(n log(n)) time, I prefer using
awk '!seen[$0]++'
awk '!seen[$0]++' is an abbreviation for awk '!seen[$0]++ {print}', print line(=$0) if seen[$0] is not zero.
It take more space but only O(n) time.
I find this easier.
sort -u input_filename > output_filename
-u stands for unique.
you can use:
sort data.txt| uniq -u
this sort data and filter by unique values
uniq -u has been driving me crazy because it did not work.
So instead of that, if you have python (most Linux distros and servers already have it):
Assuming you have the data file in notUnique.txt
#Python
#Assuming file has data on different lines
#Otherwise fix split() accordingly.
uniqueData = []
fileData = open('notUnique.txt').read().split('\n')
for i in fileData:
if i.strip()!='':
uniqueData.append(i)
print uniqueData
###Another option (less keystrokes):
set(open('notUnique.txt').read().split('\n'))
Note that due to empty lines, the final set may contain '' or only-space strings. You can remove that later. Or just get away with copying from the terminal ;)
#
Just FYI, From the uniq Man page:
"Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'."
One of the correct ways, to invoke with:
#
sort nonUnique.txt | uniq
Example run:
$ cat x
3
1
2
2
2
3
1
3
$ uniq x
3
1
2
3
1
3
$ uniq -u x
3
1
3
1
3
$ sort x | uniq
1
2
3
Spaces might be printed, so be prepared!
uniq -u < file will do the job.
uniq should do fine if you're file is/can be sorted, if you can't sort the file for some reason you can use awk:
awk '{a[$0]++}END{for(i in a)if(a[i]<2)print i}'
sort -d "file name" | uniq -u
this worked for me for a similar one. Use this if it is not arranged.
You can remove sort if it is arranged
This was the first i tried
skilla:~# uniq -u all.sorted
76679787
76679787
76794979
76794979
76869286
76869286
......
After doing a cat -e all.sorted
skilla:~# cat -e all.sorted
$
76679787$
76679787 $
76701427$
76701427$
76794979$
76794979 $
76869286$
76869286 $
Every second line has a trailing space :(
After removing all trailing spaces it worked!
thank you
Instead of sorting and then using uniq, you could also just use sort -u. From sort --help:
-u, --unique with -c, check for strict ordering;
without -c, output only the first of an equal run

Resources