Getting the total size of a directory as a number with du - string

Using the command du, I would like to get the total size of a directory
Output of command du myfolder:
5454 kkkkk
666 aaaaa
3456788 total
I'm able to extract the last line, but not to remmove the string total:
du -c myfolder | grep total | cut -d ' ' -f 1
Results in:
3456788 total
Desired result
3456788
I would like to have all the command in one line.

That's probably because it's tab delimited (which is the default delimiter of cut):
~$ du -c foo | grep total | cut -f1
4
~$ du -c foo | grep total | cut -d' ' -f1
4
to insert a tab, use Ctrl+v, then TAB
Alternatively, you could use awk to print the first field of the line ending with total:
~$ du -c foo | awk '/total$/{print $1}'
4

First of, you probably want to use tail -n1 instead of grep total ... Consider what happens if you have a directory named local? :-)
Now, let's look at the output of du with hexdump:
$ du -c tmp | tail -n1 | hexdump -C
00000000 31 34 30 33 34 34 4b 09 74 6f 74 61 6c 0a |140344K.total.|
That''s the character 0x09 after the K, man ascii tells us:
011 9 09 HT '\t' (horizontal tab) 111 73 49 I
It's a tab, not a space :-)
The tab character is already the default delimiter (this is specified in the POSIX spec, so you can safely rely on it), so you don't need -d at all.
So, putting that together, we end up with:
$ du -c tmp | tail -n1 | cut -f1
140344K

Why don't you use -s to summarize it? This way you don't have to grep "total", etc.
$ du .
24 ./aa/bb
...
# many lines
...
2332 .
$ du -hs .
2.3M .
Then, to get just the value, pipe to awk. This way you don't have to worry about the delimiter being a space or a tab:
du -s myfolder | awk '{print $1}'
From man du:
-h, --human-readable
print sizes in human readable format (e.g., 1K 234M 2G)
-s, --summarize
display only a total for each argument

I would suggest using awk for this:
value=$(du -c myfolder | awk '/total/{print $1}')
This simply extracts the first field of the line that matches the pattern "total".
If it is always the last line that you're interested in, an alternative would be to use this:
value=$(du -c myfolder | awk 'END{print $1}')
The values of the fields in the last line are accessible in the END block, so you can get the first field of the last line this way.

Related

Grepping inside a log with a threshold on the result

I have certain tags stored in a series of .log files and i would like for the grep to show me Only the values > 31, meaning different to 0 and higher than 31
I have this code:
#! /bin/bash
-exec grep -cFH "tag" {} ; | grep -v ':[0-31]$' >> file.txt
echo < file.txt
Output:
I have this result from the grep:
/opt/logs/folder2.log:31
i was expecting not to bring nothing back if the result is 31 or less but still shows the result 31
i have also tried to add:
|tail -n+31
but didn't work
[0-31] means "0 or 1 or 2 or 3 or 1".
To drop all lines with 0-9, 10-19, 20-29, 30, and 31, you could use the following:
... | grep -ve ':[0-9]$' -e ':[12][0-9]$' -e ':3[01]$'
or as single regex:
... | grep -v ':\([12]\?[0-9]\|3[01]\)$'
With extended grep:
... | grep -vE ':([12]?[0-9]|3[01])$'

Shell script to convert trim and make it single line

I have a command
pdftotext -f 3 -l 3 -x 205 -y 40 -W 180 -H 75 -layout input.pdf -
When run it produces output as below
[[_थी] 2206255388
नाव मीराबाई sad
पतीचे नाव dame
| घर क्रमांक Photo's |
|वय 51 लिंग महिला Available |
I need to make each lines enclosed with double quotes and then joined to a single line separated by comma using a shell script command?
As an example, you could modify the output of your command like that:
cat <<EOF | sed 's/\(.*\)/\"\1\"/g' | tr '\n' ',' | sed 's/.$//'
> foobar
> bar
> foo
> EOF
"foobar","bar","foo"
The 1st 'sed' will add the double quotes, the 'tr' will replace the CR by a comma, last sed will remove the last comma.
So, your command will be:
pdftotext -f 3 -l 3 -x 205 -y 40 -W 180 -H 75 -layout input.pdf - | sed 's/\(.*\)/\"\1\"/g' | tr '\n' ',' | sed 's/.$//'

Extracting the user with the most amount of files in a dir

I am currently working on a script that should receive a standard input, and output the user with the highest amount of files in that directory.
I've wrote this so far:
#!/bin/bash
while read DIRNAME
do
ls -l $DIRNAME | awk 'NR>1 {print $4}' | uniq -c
done
and this is the output I get when I enter /etc for an instance:
26 root
1 dip
8 root
1 lp
35 root
2 shadow
81 root
1 dip
27 root
2 shadow
42 root
Now obviously the root folder is winning in this case, but I don't want only to output this, i also want to sum the number of files and output only the user with the highest amount of files.
Expected output for entering /etc:
root
is there a simple way to filter the output I get now, so that the user with the highest sum will be stored somehow?
ls -l /etc | awk 'BEGIN{FS=OFS=" "}{a[$4]+=1}END{ for (i in a) print a[i],i}' | sort -g -r | head -n 1 | cut -d' ' -f2
This snippet returns the group with the highest number of files in the /etc directory.
What it does:
ls -l /etc lists all the files in /etc in long form.
awk 'BEGIN{FS=OFS=" "}{a[$4]+=1}END{ for (i in a) print a[i],i}' sums the number of occurrences of unique words in the 4th column and prints the number followed by the word.
sort -g -r sorts the output descending based on numbers.
head -n 1 takes the first line
cut -d' ' -f2 takes the second column while the delimiter is a white space.
Note: In your question, you are saying that you want the user with the highest number of files, but in your code you are referring to the 4th column which is the group. My code follows your code and groups on the 4th column. If you wish to group by user and not group, change {a[$4]+=1} to {a[$3]+=1}.
Without unreliable parsing the output of ls:
read -r dirname
# List user owner of files in dirname
stat -c '%U' "$dirname/" |
# Sort the list of users by name
sort |
# Count occurrences of user
uniq -c |
# Sort by higher number of occurrences numerically
# (first column numerically reverse order)
sort -k1nr |
# Get first line only
head -n1 |
# Keep only starting at character 9 to get user name and discard counts
cut -c9-
I have an awk script to read standard input (or command line files) and sum up the unique names.
summer:
awk '
{ sum[ $2 ] += $1 }
END {
for ( v in sum ) {
print v, sum[v]
}
}
' "$#"
Let's say we are using your example of /etc:
ls -l /etc | summer
yields:
0
dip 2
shadow 4
root 219
lp 1
I like to keep utilities general so I can reuse them for other purposes. Now you can just use sort and head to get the maximum result output by summer:
ls -l /etc | summer | sort -r -k2,2 -n | head -1 | cut -f1 -d' '
Yields:
root

How to count all numbers in a file with awk?

I want to count all numbers that are in a file.
Example:
input -> Hi, this is 25 ...
input -> Lalala 21 or 29 what is ... 79?
The output should be the sum of all numbers: 154 (that is, 25+21+29+79).
From this beautiful answer by hek2mgl on how to extract the biggest number in a file, let's catch all the numbers in the file and sum them:
$ awk '{for(i=1;i<=NF;i++){sum+=$i}}END{print sum}' RS='$' FPAT='-{0,1}[0-9]+' file
154
This sets the record separator in a way that the whole block of text is a unique record. Then, it sets FPAT so that every single number (positive or negative) is a different field:
FPAT #
A regular expression (as a string) that tells gawk to create the
fields based on text that matches the regular expression. Assigning a
value to FPAT overrides the use of FS and FIELDWIDTHS for field
splitting.
$ cat data
Hi, this is 25 ...
Lalala 21 or 29 what is ... 79?
$ grep -oP '\b\d+\b' data | paste -s -d '+' | bc
154
With grep and awk :
$ cat test.txt
Hi, this is 25 ...
Lalala 21 or 29 what is ... 79?
$ grep '[0-9]\+' -o test.txt | awk '{ sum+=$1} END {print sum}'
154

tr "[1-9]" "['01'-'09']" not working properly

I'm trying to cut only the date part from a ls -lrth | grep TRACK output:
-rw-r--r-- 1 ins ins 0 Dec 3 00:00 TRACK_1_20121203_01010014.LOG
-rw-r--r-- 1 ins ins 0 Dec 3 00:00 TRACK_0_20121203_01010014.LOG
-rw-r--r-- 1 ins ins 0 Dec 13 15:10 TRACK_9_20121213_01010014.LOG
-rw-r--r-- 1 ins ins 0 Dec 13 15:10 TRACK_8_20121213_01010014.LOG
But, doing this:
ls -lrth | grep TRACK | tr "\t" " " | cut -d" " -f 9
only gives me the dates which are double digits and spaces for single digits:
13
13
So I tried something with tr command, to translate all single digit dates to double digits:
ls -lrth | grep TRACK | tr "\t" " " | tr "[1-9]" "['01'-'09']" | cut -d" " -f 9
But it's giving some weird results, and evidently don't serve my purpose. Any ideas on how to get the correct output?
Don't parse ls output.
ls is a tool for interactively looking at file information. Its output is formatted for humans and will cause bugs in scripts. Use globs or find instead. Understand why: http://mywiki.wooledge.org/ParsingLs
I recommend this way :
If you want the date and the file path :
find . -name 'TRACK*' -printf '%a %p\n'
If you want only the date:
find . -name 'TRACK*' -printf '%a\n'
You could try another approach with something like
find . -name 'TRACK*' -exec stat -c %y {} \; | sort
You can add something like | cut -f1 -d' ' if you only need the date.
I guess this does suffice:
ls -lhrt | grep TRACK | awk '{print $6, $7, $8}'
that kind of substitution would be better handled through sed:
ls -lrth | grep TRACK | sed 's/ \+/ /g;s/ \([0-9]\) / 0\1 /g' | cut -d" " -f 7
As already said, never parse the output of ls!
Since you only want the modification time, the command date has a cool option for that: option -r (man date for more info).
Hence, you probably want this instead of your line:
for i in TRACK*; do date -r "$i"; done
I don't know how you want the format of the date, so play with the options, e.g.,
for i in TRACK*; do date -r "$i" "+%D"; done
(the formats are in man date).
Use stat to get information about a file.
Also, tr only does one-to-one character translation. It won't replace one-character sequences with two-character ones.

Resources