Linux awk sort descending order not working - linux

I have two files that I need to sort with.
The command I'm using is:
cat first-in.txt | awk '{print $2}' | cut -d '/' -f 3 | cut -d '^' -f 1 | sort -b -t . -k 1,1nr -k 2,2nr -k 3,3r -k 4,4r -k 5,5r | uniq > first-out.txt
cat second-in.txt | awk '{print $2}' | cut -d '/' -f 3 | cut -d '^' -f 1 | sort -b -t . -k 1,1nr -k 2,2nr -k 3,3r -k 4,4r -k 5,5r | uniq > second-out.txt
the issue is:
I need to sort CORRECTLY in descending order, because right now, only file 2 is sorting correctly, but file 1 is not sorting correctly.
i would like to know the mistake i am making
Files
All files are here including output are here
Thanks in advance.

I guess you mean this is wrong:
4.2.4
4.2.3
4.2.20
4.2.2
You want 4.2.20 to be higher than all of those, right?
You can fix that by change the -k param of sort to treat all fields as numeric:
.... -k 1,1nr -k 2,2nr -k 3,3nr -k 4,4nr -k 5,5nr ....

On GNU/Linux system you could use sort with the -V option:
sed -r 's|.*/([^/^]*).*$|\1|' infile | sort -Vr
Note that both sed -r and sort -V are not standard.

Related

Why does uniq -c command return duplicates in some cases?

I am trying to grep for words in a file that is not present in another file
grep -v -w -i -r -f "dont_use_words.txt" "list_of_words.txt" >> inverse_match_words.txt
uniq -c -i inverse_match_words.txt | sort -nr
But I get duplicate values in my uniq command. Why so?
I am wondering if it might be because grep differentiates between strings, say, "AAA" found in "GIRLAAA", "AAABOY", "GIRLAAABOY" and therefore, I end up with duplicates.
When I do a grep -F "AAA" all of them are returned though.
I'd appreciate if someone could help me out on this. I am new to Linux OS.
uniq eliminates all but one line in each group of consecutive duplicate lines. The conventional way to use it, therefore, is to pass the input through sort first. You're not doing that, so yes, it is entirely possible that (non-consecutive) duplicates will remain in the output.
Example:
grep -v -w -i -f dont_use_words.txt list_of_words.txt \
| sort -f \
| uniq -c -i \
| sort -nr

Why bc and args doesn't work together in one line?

I need help using xargs(1) and bc(1) in the same line. I can do it multiple lines, but I really want to find a solution in one line.
Here is the problem: The following line will print the size of a file.txt
ls -l file.txt | cut -d" " -f5
And, the following line will print 1450 (which is obviously 1500 - 50)
echo '1500-50' | bc
Trying to add those two together, I do this:
ls -l file.txt | cut -d" " -f5 | xargs -0 -I {} echo '{}-50' | bc
The problem is, it's not working! :)
I know that xargs is probably not the right command to use, but it's the only command I can find who can let me decide where to put the argument I get from the pipe.
This is not the first time I'm having issues with this kind of problem. It will be much of a help..
Thanks
If you do
ls -l file.txt | cut -d" " -f5 | xargs -0 -I {} echo '{}-50'
you will see this output:
23
-50
This means, that bc does not see a complete expression.
Just use -n 1 instead of -0:
ls -l file.txt | cut -d" " -f5 | xargs -n 1 -I {} echo '{}-50'
and you get
23-50
which bc will process happily:
ls -l file.txt | cut -d" " -f5 | xargs -n 1 -I {} echo '{}-50' | bc
-27
So your basic problem is, that -0 expects not lines but \0 terminated strings. And hence the newline(s) of the previous commands in the pipe garble the expression of bc.
This might work for you:
ls -l file.txt | cut -d" " -f5 | sed 's/.*/&-50/' | bc
Infact you could remove the cut:
ls -l file.txt | sed -r 's/^(\S+\s+){4}(\S+).*/\2-50/' | bc
Or use awk:
ls -l file.txt | awk '{print $5-50}'
Parsing output from the ls command is not the best idea. (really).
you can use many other solutions, like:
find . -name file.txt -printf "%s\n"
or
stat -c %s file.txt
or
wc -c <file.txt
and can use bash arithmetics, for avoid unnecessary slow process forks, like:
find . -type f -print0 | while IFS= read -r -d '' name
do
size=$(wc -c <$name)
s50=$(( $size - 50 ))
echo "the file=$name= size:$size minus 50 is: $s50"
done
Here is another solution, which only use one external command: stat:
file_size=$(stat -c "%s" file.txt) # Get the file size
let file_size=file_size-50 # Subtract 50
If you really want to combine them into one line:
let file_size=$(stat -c "%s" file.txt)-50
The stat command gets you the file size in bytes. The syntax above is for Linux (I tested against Ubuntu). On the Mac the syntax is a little different:
let file_size=$(stat -f "%z" mini.csv)-50

How to extract version from a single command line in linux?

I have a product which has a command called db2level whose output is given below
I need to extract 8.1.1.64 out of it, so far i came up with,
db2level | grep "DB2 v" | awk '{print$5}'
which gave me an output v8.1.1.64",
Please help me to fetch 8.1.1.64. Thanks
grep is enough to do that:
db2level| grep -oP '(?<="DB2 v)[\d.]+(?=", )'
Just with awk:
db2level | awk -F '"' '$2 ~ /^DB2 v/ {print substr($2,6)}'
db2level | grep "DB2 v" | awk '{print$5}' | sed 's/[^0-9\.]//g'
remove all but numbers and dot
sed is your friend for general extraction tasks:
db2level | sed -n -e 's/.*tokens are "DB2 v\([0-9.]*\)".*/\1/p'
The sed line does print no lines (the -n) but those where a replacement with the given regexp can happen. The .* at the beginning and the end of the line ensure that the whole line is matched.
Try grep with -o option:
db2level | grep -E -o "[0-9]+\.[0-9]+\.[0-9]\+[0-9]+"
Another sed solution
db2level | sed -n -e '/v[0-9]/{s/.*DB2 v//;s/".*//;p}'
This one desn't rely on the number being in a particular format, just in a particular place in the output.
db2level | grep -o "v[0-9.]*" | tr -d v
Try s.th. like db2level | grep "DB2 v" | cut -d'"' -f2 | cut -d'v' -f2
cut splits the input in parts, seperated by delimiter -d and outputs field number -f

How to specify more spaces for the delimiter using cut?

Is there any way to specify a field delimiter for more spaces with the cut command? (like " "+) ?
For example: In the following string, I like to reach value '3744', what field delimiter I should say?
$ps axu | grep jboss
jboss 2574 0.0 0.0 3744 1092 ? S Aug17 0:00 /bin/sh /usr/java/jboss/bin/run.sh -c example.com -b 0.0.0.0
cut -d' ' is not what I want, for it's only for one single space.
awk is not what I am looking for either, but how to do with 'cut'?
thanks.
Actually awk is exactly the tool you should be looking into:
ps axu | grep '[j]boss' | awk '{print $5}'
or you can ditch the grep altogether since awk knows about regular expressions:
ps axu | awk '/[j]boss/ {print $5}'
But if, for some bizarre reason, you really can't use awk, there are other simpler things you can do, like collapse all whitespace to a single space first:
ps axu | grep '[j]boss' | sed 's/\s\s*/ /g' | cut -d' ' -f5
That grep trick, by the way, is a neat way to only get the jboss processes and not the grep jboss one (ditto for the awk variant as well).
The grep process will have a literal grep [j]boss in its process command so will not be caught by the grep itself, which is looking for the character class [j] followed by boss.
This is a nifty way to avoid the | grep xyz | grep -v grep paradigm that some people use.
awk version is probably the best way to go, but you can also use cut if you firstly squeeze the repeats with tr:
ps axu | grep jbos[s] | tr -s ' ' | cut -d' ' -f5
# ^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^
# | | |
# | | get 5th field
# | |
# | squeeze spaces
# |
# avoid grep itself to appear in the list
I like to use the tr -s command for this
ps aux | tr -s [:blank:] | cut -d' ' -f3
This squeezes all white spaces down to 1 space. This way telling cut to use a space as a delimiter is honored as expected.
I am going to nominate tr -s [:blank:] as the best answer.
Why do we want to use cut? It has the magic command that says "we want the third field and every field after it, omitting the first two fields"
cat log | tr -s [:blank:] |cut -d' ' -f 3-
I do not believe there is an equivalent command for awk or perl split where we do not know how many fields there will be, ie out put the 3rd field through field X.
Shorter/simpler solution: use cuts (cut on steroids I wrote)
ps axu | grep '[j]boss' | cuts 4
Note that cuts field indexes are zero-based so 5th field is specified as 4
http://arielf.github.io/cuts/
And even shorter (not using cut at all) is:
pgrep jboss
One way around this is to go:
$ps axu | grep jboss | sed 's/\s\+/ /g' | cut -d' ' -f3
to replace multiple consecutive spaces with a single one.
Personally, I tend to use awk for jobs like this. For example:
ps axu| grep jboss | grep -v grep | awk '{print $5}'
As an alternative, there is always perl:
ps aux | perl -lane 'print $F[3]'
Or, if you want to get all fields starting at field #3 (as stated in one of the answers above):
ps aux | perl -lane 'print #F[3 .. scalar #F]'
If you want to pick columns from a ps output, any reason to not use -o?
e.g.
ps ax -o pid,vsz
ps ax -o pid,cmd
Minimum column width allocated, no padding, only single space field separator.
ps ax --no-headers -o pid:1,vsz:1,cmd
3443 24600 -bash
8419 0 [xfsalloc]
8420 0 [xfs_mru_cache]
8602 489316 /usr/sbin/apache2 -k start
12821 497240 /usr/sbin/apache2 -k start
12824 497132 /usr/sbin/apache2 -k start
Pid and vsz given 10 char width, 1 space field separator.
ps ax --no-headers -o pid:10,vsz:10,cmd
3443 24600 -bash
8419 0 [xfsalloc]
8420 0 [xfs_mru_cache]
8602 489316 /usr/sbin/apache2 -k start
12821 497240 /usr/sbin/apache2 -k start
12824 497132 /usr/sbin/apache2 -k start
Used in a script:-
oldpid=12824
echo "PID: ${oldpid}"
echo "Command: $(ps -ho cmd ${oldpid})"
Another way if you must use cut command
ps axu | grep [j]boss |awk '$1=$1'|cut -d' ' -f5
In Solaris, replace awk with nawk or /usr/xpg4/bin/awk
I still like the way Perl handles fields with white space.
First field is $F[0].
$ ps axu | grep dbus | perl -lane 'print $F[4]'
My approach is to store the PID to a file in /tmp, and to find the right process using the -S option for ssh. That might be a misuse but works for me.
#!/bin/bash
TARGET_REDIS=${1:-redis.someserver.com}
PROXY="proxy.somewhere.com"
LOCAL_PORT=${2:-6379}
if [ "$1" == "stop" ] ; then
kill `cat /tmp/sshTunel${LOCAL_PORT}-pid`
exit
fi
set -x
ssh -f -i ~/.ssh/aws.pem centos#$PROXY -L $LOCAL_PORT:$TARGET_REDIS:6379 -N -S /tmp/sshTunel$LOCAL_PORT ## AWS DocService dev, DNS alias
# SSH_PID=$! ## Only works with &
SSH_PID=`ps aux | grep sshTunel${LOCAL_PORT} | grep -v grep | awk '{print $2}'`
echo $SSH_PID > /tmp/sshTunel${LOCAL_PORT}-pid
Better approach might be to query for the SSH_PID right before killing it, since the file might be stale and it would kill a wrong process.

Sorting in bash

I have been trying to get the unique values in each column of a tab delimited file in bash. So, I used the following command.
cut -f <column_number> <filename> | sort | uniq -c
It works fine and I can get the unique values in a column and its count like
105 Linux
55 MacOS
500 Windows
What I want to do is instead of sorting by the column value names (which in this example are OS names) I want to sort them by count and possibly have the count in the second column in this output format. So It will have to look like:
Windows 500
MacOS 105
Linux 55
How do I do this?
Use:
cut -f <col_num> <filename>
| sort
| uniq -c
| sort -r -k1 -n
| awk '{print $2" "$1}'
The sort -r -k1 -n sorts in reverse order, using the first field as a numeric value. The awk simply reverses the order of the columns. You can test the added pipeline commands thus (with nicer formatting):
pax> echo '105 Linux
55 MacOS
500 Windows' | sort -r -k1 -n | awk '{printf "%-10s %5d\n",$2,$1}'
Windows 500
Linux 105
MacOS 55
Mine:
cut -f <column_number> <filename> | sort | uniq -c | awk '{ print $2" "$1}' | sort
This will alter the column order (awk) and then just sort the output.
Hope this will help you
Using sed based on Tagged RE:
cut -f <column_number> <filename> | sort | uniq -c | sort -r -k1 -n | sed 's/\([0-9]*\)[ ]*\(.*\)/\2 \1/'
Doesn't produce output in a neat format though.

Resources