How to perform the reverse of `xargs`? - linux

I have a list of numbers that I want to reverse.
They are already sorted.
35 53 102 342
I want this:
342 102 53 35
So I thought of this:
echo $NUMBERS | ??? | tac | xargs
What's the ???
It should turn a space separated list into a line separated list.
I'd like to avoid having to set IFS.
Maybe I can use bash arrays, but I was hoping there's a command whose purpose in life is to do the opposite of xargs (maybe xargs is more than a one trick pony as well!!)

You can use printf for that. For example:
$ printf "%s\n" 35 53 102 342
35
53
102
342
$ printf "%s\n" 35 53 102 342|tac
342
102
53
35

Another answer (easy to remember but not as fast as the printf method):
$ xargs -n 1 echo
e.g.
$ NUMBERS="35 53 102 342"
$ echo $NUMBERS | xargs -n 1 echo | tac | xargs
342 102 53 35
Here is the xargs manual for -n option:
-n number
Set the maximum number of arguments taken from standard input for
each invocation of utility. An invocation of utility will use less
than number standard input arguments if the number of bytes accumu-
lated (see the -s option) exceeds the specified size or there are
fewer than number arguments remaining for the last invocation of
utility. The current default value for number is 5000.

awk one-liner without tac:
awk '{NF++;while(NF-->1)print $NF}'
for example:
kent$ echo "35 53 102 342"|awk '{NF++;while(NF-->1)print $NF}'
342
102
53
35

Another option is to use Bash string manipulation
$ numbers="35 53 102 342"
$ echo "${numbers// /$'\n'}"
35
53
102
342
$ echo "${numbers// /$'\n'}" | tac
342
102
53
35

Well, you could write:
echo $(printf '%s\n' $NUMBERS | tac)
where printf '%s\n' ... prints each of ..., with a newline after each one, and $( ... ) is a built-in feature that makes xargs almost superfluous.
However, I don't think you should avoid using arrays, IFS, and so on; they make scripts more robust in the face of bugs and/or unexpected input.

There's a lot of answers using tac, but in case you'd like to use sort, it's almost the same:
printf "%s\n" 1 2 3 4 5 10 12 | sort -rn
n is important as it makes it sort numerically. r is reverse.

If you have sorted your list with sort, you might considered the -r reversed option

Another way to change space into newlines and the other way round is with tr :
echo 35 53 102 342|tr ' ' '\n'|tac|tr '\n' ' '
If data is not sorted, replace tac by sort -rn.

Related

merge every two rows in one Sum multiple entries

I am bit struggling with the output,as i need to merge every second row with first , sort and add up all the multiple entries.
sample output:
bittorrent_block(PCC)
127
default_384k(PCC)
28
default_384k(BWM)
28
bittorrent_block(PCC)
127
default_384k(PCC)
28
default_384k(BWM)
28
Convert 2nd row into Column (expected)
bittorrent_block(PCC): 127
default_384k(PCC): 28
default_384k(BWM): 28
bittorrent_block(PCC): 127
default_384k(PCC): 28
default_384k(BWM): 28
Sum all duplicate entries (expected)
bittorrent_block(PCC): 254
default_384k(PCC): 56
default_384k(BWM): 56
These are the possible piece of code I tried. what I am finally getting as
zcat file.tar.gz | awk 'NR%2{v=$0;next;}{print $0,v}'
bittorrent_block(PCC)
default_384k(PCC)
default_384k(BWM)
default_mk1(PCC)
default_mk1_10m(PCC)
zcat file.tar.gz |awk 'NR%2{ prev = $0; next }{ print prev, $0;}
127orrent_block(PCC)
28ault_384k(PCC)
28ault_384k(BWM)
Due to this, I am not able, to sum up duplicate values.
Please help.
I often find it easier to transform the input first and then process it. paste helps to convert consecutive lines into columns; then summing the numbers with awk becomes trivial:
$ <input paste -sd'\t\n' | awk '{sum[$1] += $2}END{for(s in sum) print s": "sum[s]}'
bittorrent_block(PCC): 254
default_384k(PCC): 56
default_384k(BWM): 56
It seems like you got CRLF in your file, so you'll have to strip them:
zcat file.tar.gz |
awk -F '\r' -v OFS=': ' '
NR % 2 { id = $1; next }
{ sum[id] += $1 }
END { for (id in sum) print id, sum[id] }
'
bittorrent_block(PCC): 254
default_384k(PCC): 56
default_384k(BWM): 56
Here is a Ruby to do that:
zcat file | ruby -e '$<.read.split(/\R/).
each_slice(2).
each_with_object(Hash.new {|h,k| h[k] = 0}) {
|(k,v), h| h[k] = h[k]+v.to_i
}.
each{|k,v| puts "#{k}: #{v}"}'
By splitting on \R this automatically handles either DOS or Unix line endings.
This might work for you (GNU sed,sort and bash):
zcat file |
paste - - |
sort |
uniq -c |
sed -E 's/^ *(\S+) (.*)\t(\S+)/echo "\2 $((\1*\3))"/e'
Decompress file.
Join pairs of lines.
Sort.
Count duplicate lines.
Format and compute final sums.

Even after `sort`, `uniq` is still repeating some values

Reference file: http://snap.stanford.edu/data/wiki-Vote.txt.gz
(It is a tape archive that contains a file called Wiki-Vote.txt)
The first few lines in the file that contains the following, head -n 10 Wiki-Vote.txt
# Directed graph (each unordered pair of nodes is saved once): Wiki-Vote.txt
# Wikipedia voting on promotion to administratorship (till January 2008).
# Directed edge A->B means user A voted on B becoming Wikipedia administrator.
# Nodes: 7115 Edges: 103689
# FromNodeId ToNodeId
30 1412
30 3352
30 5254
30 5543
30 7478
3 28
I want to find the number of nodes in the graph, (although it's already given in line 3). I ran the following command,
awk '!/^#/ { print $1; print $2; }' Wiki-Vote.txt | sort | uniq | wc -l
Explanation:
/^#/ matches all the lines that start with #. And !/^#/ matches that doesn't.
awk '!/^#/ { print $1; print $2; }' Wiki-Vote.txt prints the first and second column of all those matched lines in new lines.
| sort pipes the output to sort them.
| uniq should display all those unique values, but it doesn't.
| wc -l counts the previous lines and it is wrong.
The result of the above command is, 8491, which is not 7115 (as mentioned in the line 3). I don't know why uniq repeats the values. I can tell that since awk '!/^#/ { print $1; print $2; }' Wiki-Vote.txt | sort -i | uniq | tail returns,
992
993
993
994
994
995
996
998
999
999
Which contains the repeated values. Someone please run the code and tell me that I am not the only one getting the wrong answer and please help me figure out why I'm getting what I am getting.
The file has dos line endings - each line is ending with \r CR character.
You can inspect your tail output for example with hexdump -C, lines starting with # added by me:
$ awk '!/^#/ { print $1; print $2; }' ./wiki-Vote.txt | sort | uniq | tail | hexdump -C
00000000 39 39 32 0a 39 39 33 0a 39 39 33 0d 0a 39 39 34 |992.993.993..994|
# ^^ HERE
00000010 0a 39 39 34 0d 0a 39 39 35 0d 0a 39 39 36 0a 39 |.994..995..996.9|
# ^^ ^^
00000020 39 38 0a 39 39 39 0a 39 39 39 0d 0a |98.999.999..|
# ^^
0000002c
Because uniq sees unique lines, one with CR and one not, they are not removed. Remove the CR character before pipeing. Note that sort | uniq is better to sort -u.
$ awk '!/^#/ { print $1; print $2; }' ./wiki-Vote.txt | tr -d '\r' | sort -u | wc -l
7115

Sum all the numbers in a file given by positional parameter

I want to sum all the numbers in a file (columns and lines) given by the first parameter, but my program shows sum=sum+$i instead of the numeric sum:
sum=0;
file=$1
for i in $file
do
sum=sum+$i;
done;
echo "The sum is: " $sum
Input file:
$cat file.txt
10 20 10
40
50
Expected output :
The sum is: 21
Maybe if there is an awk method to solve this?
Try this -
$cat file1.txt
10 20 10
40
50
$awk '{for(i=1;i<=NF;i++) {sum+=$i}} END {print sum}' file1.txt
130
OR
$xargs < file1.txt| tr ' ' + | bc
130
cat file.txt | xargs | sed -e 's/\ /+/g' | bc
You can also use a simple read and an array to sum the value relying on word splitting to separate the values into an array via the default IFS (Internal Field Separator), e.g.
#!/bin/bash
declare -i sum=0
fn="${1:-/dev/stdin}" ## read from file as 1st argument (default stdin)
while read -r line; do ## read each line
a=( $line ) ## separate values into array
for i in ${a[#]}; do ## for each value in array
((sum += i)) ## add to sum
done
done <"$fn"
echo "sum: $sum"
Example Input File
$ cat dat/numfile.txt
10 20 10
40
50
Example Use/Output
$ bash sumnumfile.sh dat/numfile.txt
sum: 130
Another for some awks (at least mawk and gawk):
$ awk -v RS="[^0-9]" '{s+=$1}END{print s}' file
130

Insert a space after the second character followed by every three characters

I need to insert a space after two characters, followed by a space after every three characters.
Data:
97100101101102101
Expected Output:
97 100 101 101 102 101
Attempted Code:
sed 's/.\{2\}/& /3g'
In two steps:
$ sed -r -e 's/^.{2}/& /' -e 's/[^ ]{3}/& /g' <<< 97100101101102101
97 100 101 101 102 101
That is:
's/^.{2}/& /'
catch the first two chars in the line and print them back with a space after.
's/[^ ]{3}/& /g'
catch three consecutive non-space characters and print them back followed by a space.
With GNU awk:
$ echo '97100101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/," &","g",substr($0,3))}'
97 100 101 101 102 101
Note that unlike the currently accepted sed solution this will not add a blank char to the end of the line, e.g. using _ instead of a blank to make the issue visible:
$ echo '97100101101102101' | sed -r -e 's/^.{2}/&_/' -e 's/[^_]{3}/&_/g'
97_100_101_101_102_101_
$ echo '97100101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/,"_&","g",substr($0,3))}'
97_100_101_101_102_101
and it would work even if the input contained blank chars:
$ echo '971 0101101102101' | sed -r -e 's/^.{2}/& /' -e 's/[^ ]{3}/& /g'
97 1 010 110 110 210 1
$ echo '971 0101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/," &","g",substr($0,3))}'
97 1 0 101 101 102 101

Getting the total size of a directory as a number with du

Using the command du, I would like to get the total size of a directory
Output of command du myfolder:
5454 kkkkk
666 aaaaa
3456788 total
I'm able to extract the last line, but not to remmove the string total:
du -c myfolder | grep total | cut -d ' ' -f 1
Results in:
3456788 total
Desired result
3456788
I would like to have all the command in one line.
That's probably because it's tab delimited (which is the default delimiter of cut):
~$ du -c foo | grep total | cut -f1
4
~$ du -c foo | grep total | cut -d' ' -f1
4
to insert a tab, use Ctrl+v, then TAB
Alternatively, you could use awk to print the first field of the line ending with total:
~$ du -c foo | awk '/total$/{print $1}'
4
First of, you probably want to use tail -n1 instead of grep total ... Consider what happens if you have a directory named local? :-)
Now, let's look at the output of du with hexdump:
$ du -c tmp | tail -n1 | hexdump -C
00000000 31 34 30 33 34 34 4b 09 74 6f 74 61 6c 0a |140344K.total.|
That''s the character 0x09 after the K, man ascii tells us:
011 9 09 HT '\t' (horizontal tab) 111 73 49 I
It's a tab, not a space :-)
The tab character is already the default delimiter (this is specified in the POSIX spec, so you can safely rely on it), so you don't need -d at all.
So, putting that together, we end up with:
$ du -c tmp | tail -n1 | cut -f1
140344K
Why don't you use -s to summarize it? This way you don't have to grep "total", etc.
$ du .
24 ./aa/bb
...
# many lines
...
2332 .
$ du -hs .
2.3M .
Then, to get just the value, pipe to awk. This way you don't have to worry about the delimiter being a space or a tab:
du -s myfolder | awk '{print $1}'
From man du:
-h, --human-readable
print sizes in human readable format (e.g., 1K 234M 2G)
-s, --summarize
display only a total for each argument
I would suggest using awk for this:
value=$(du -c myfolder | awk '/total/{print $1}')
This simply extracts the first field of the line that matches the pattern "total".
If it is always the last line that you're interested in, an alternative would be to use this:
value=$(du -c myfolder | awk 'END{print $1}')
The values of the fields in the last line are accessible in the END block, so you can get the first field of the last line this way.

Resources