Apply bash script with awk-commands to file - linux

I'm currently working on a bash script to automate a list of regex for a list of links to clean up the file. Currently i'm doing all manually on kate with find/replace, but having it as a script would be more comfortable. Since i'm fairly new to bash scripting, i ask you for help.
Example list of urls:
0: "/suburl0"
​
1: "/suburl1"
​
2: "/suburl2"
​
3: "/suburl3"
​
4: "/suburl4"
Currently script i have:
#!/bin/bash
awk '[^\x00-\x7F]+' $1 #there are non-ascii chars in the file, so clean it out
awk 'NF' $1 # remove non-character lines
awk '^[0-900]{0,3}: ' $1 #delete all those number infront of the link
awk '"' $1 # remove those quotation marks
awk '!seen[$0]++' $1 #remove duplicate lines
awk '{print "http://example.com/" $0}' $1 #prepend the full url to the suburl
The goal is to apply all those regexes to the file, so the file ends cleaned up
My guess is, that i'm not redirecting the output of awk correctly, but when i tried to pipe it into the file, the file was just empty lines.

A more-or-less translation of what you wanted, without restricting to awk:
cat $1 \
| tr -cd '[:print:][:space:]' \
| grep . \
| sed -r 's/^[0-9]{1,3}: //' \
| tr -d '"' \
| sort -u \
| awk '{print "http://example.com" $0}'
Note that sort will change the order, I am assuming the order doesn't matter.
Also note that sed -r is GNU.
A slightly simplified and more portable version:
cat $1 \
| tr -cd '[:graph:]\n' \
| grep . \
| tr -d '"' \
| sort -u \
| sed 's,^[0-9]*:,http://example.com,'
Output:
http://example.com/suburl0
http://example.com/suburl1
http://example.com/suburl2
http://example.com/suburl3
http://example.com/suburl4

Related

Using bash, I want to print a number followed by sizes of 2 paths on one line. i.e. output of 3 commands on one line

Using bash, I want to print a number followed by sizes of 2 paths on one line. i.e. output of 3 commands on one line.
All the 3 items should be separated by ":"
echo -n "10001:"; du -sch /abc/def/* | grep 'total' | awk '{ print $1 }'; du -sch /ghi/jkl/* | grep 'total' | awk '{ print $1 }'
I am getting the output as -
10001:61M
:101M
But I want the output as -
10001:61M:101M
This should work for you. The two key elements added being the
tr - d '\n'
which effectively strips new line characters from the end of the output. As well as adding in the echo ":" to get the extra colon for formatting in there.
Hope this helps! Here's a link to the docs for tr command.
https://ss64.com/bash/tr.html
echo -n "10001:"; du -sch /abc/def/* | grep 'total' | awk '{ print $1 }' | tr -d '\n'; echo ":" | tr -d '\n'; du -sch /ghi/jkl/* | grep 'total' | awk '{ print $1 }'
Save your values to variables, and then use printf:
printf '%s:%s:%s\n' "$first" "$second" "$third"

shell script match data from another text file

i have a shell script
#/bin/bash
var1=`cat log.json | grep "accountnumber" | awk -F ' ' '{print $1}'`
echo $var
output of shell script is :-
23466
283483
324932
87374
I want match the above number which is already store in another file (below is the file format ) and print its value .
23466=account-1
283483=account-2
324932=account-3
87374=account-4
127632=account-5
1324237=account-6
73642=account-7
324993284=account-8
.
.
4543454=account-200
exapected output
account-1
account-2
account-3
account-4
a Compact one line solution can be:
join -t "=" <(sort bf) <(sort fa) | cut -d '=' -f 2
here fa is a file containing out-put of your bash script and bf is the file that has 23466=account-1 format
the output is:
account-1
account-2
account-3
account-4
#!/bin/bash
for var1 in $(awk -F ' ' '/accountnumber/{print $1}' log.json)
do
awk -F= '$1=="'"$var1"'"{print $2}' anotherfile
done
For a moment there was another answer that almost worked that I think is much slicker than what I wrote. Probably faster / more efficient on large files too. Here it is fixed.
awk -F ' ' '/accountnumber/{print $1}' log.json \
| sort -n \
| join -t= - accountfile \
| cut -d= -f2

Extract last digits from each word in a string with multiple words using bash

Given a string with multiple words like below, all in one line:
first-second-third-201805241346 first-second-third-201805241348 first-second-third-201805241548 first-second-third-201705241540
I am trying to the maximum number from the string, in this case the answer should be 201805241548
I have tried using awk and grep, but I am only getting the answer as last word in the string.
I am interested in how to get this accomplished.
echo 'first-second-third-201805241346 first-second-third-201805241348 first-second-third-201805241548 first-second-third-201705241540' |\
grep -o '[0-9]\+' | sort -n | tail -1
The relevant part is grep -o '[0-9]\+' | sort -n | tail -n 1.
Using single gnu awk command:
s='first-second-third-201805241346 first-second-third-201805241348 first-second-third-201805241548 first-second-third-201705241540'
awk -F- -v RS='[[:blank:]]+' '$NF>max{max=$NF} END{print max}' <<< "$s"
201805241548
Or using grep + awk (if gnu awk is not available):
grep -Eo '[0-9]+' <<< "$s" | awk '$1>max{max=$1} END{print max}'
Another awk
echo 'first-...-201705241540' | awk -v RS='[^0-9]+' '$0>max{max=$0} END{print max}'
Gnarly pure bash:
n='first-second-third-201805241346 \
first-second-third-201805241348 \
first-second-third-201805241548 \
first-second-third-201705241540'
z="${n//+([a-z-])/;p=}"
p=0 m=0 eval echo -n "${z//\;/\;m=\$((m>p?m:p))\;};m=\$((m>p?m:p))"
echo $m
Output:
201805241548
How it works: This code constructs code, then runs it.
z="${n//+([a-z-])/;p=}" substitutes non-numbers with some pre-code
-- setting $p to the value of each number, (useless on its own). At this point echo $z would output:
;p=201805241346 \ ;p=201805241348 \ ;p=201805241548 \ ;p=201705241540
Substitute the added ;s for more code that sets $m to the
greatest value of $p, which needs eval to run it -- the actual
code the whole line with eval runs looks like this:
p=0 m=0
m=$((m>p?m:p));p=201805241346
m=$((m>p?m:p));p=201805241348
m=$((m>p?m:p));p=201805241548
m=$((m>p?m:p));p=201705241540
m=$((m>p?m:p))
Print $m.

how to sort lines where sorting starts when a delimeter comes (linux)

I want to sort a file which has a particular delimiter in every line. I want to sort the lines such that sorting starts from that delimeter and sorts only according to numbers.
file is like this:
adf234sdf:nzzs13245ekeke
zdkfjs:ndsd34352jejs
mkd45fei:znnd11122iens
output should be:
mkd45fei:znnd11122iens
adf234sdf:nzzs13245ekeke
zdkfjs:ndsd34352jejs
Use the -t option to set the delimiter:
$ sort -t: -nk2,2 file
mkdfei:11122iens
adf234sdf:13245ekeke
zdkfjs:34352jejs
This can be an approach, based on this idea:
$ sed -r 's/([^:]*):([a-z]*)([0-9]*)(.*)/\1:\2-\3\4/g' a | sort -t- -k2,2 | tr -d '-'
mkdfei:aa11122iens
adf234sdf:tt13245ekeke
zdkfjs:aa34352jejs
By pieces:
$ sed -r 's/([^:]*):([a-z]*)([0-9]*)(.*)/\1:\2-\3\4/g' a
adf234sdf:tt-13245ekeke
zdkfjs:aa-34352jejs
mkdfei:aa-11122iens
$ sed -r 's/([^:]*):([a-z]*)([0-9]*)(.*)/\1:\2-\3\4/g' a | sort -t- -k2,2
mkdfei:aa-11122iens
adf234sdf:tt-13245ekeke
zdkfjs:aa-34352jejs
$ sed -r 's/([^:]*):([a-z]*)([0-9]*)(.*)/\1:\2-\3\4/g' a | sort -t- -k2,2 | tr -d '-'
mkdfei:aa11122iens
adf234sdf:tt13245ekeke
zdkfjs:aa34352jejs
So what we do is to add a - character before the first number. Then we sort based on that character and finally delete - back (tr -d '-').
In gawk there is an asort function, and you could use:
gawk -f sort.awk data.txt
where data.txt is your input file, and sort.awk is
{
line[NR]=$0;
match($0,/:[^0-9]*([0-9]*)/,a)
nn[NR]=a[1]" "NR
}
END {
N=asort (nn);
for (i=1; i<=N; i++) {
split(nn[i],c," ")
ind=c[2]
print line[ind];
}
}

Get N line from unzip -l

I have a jar file, i need to execute the files in it in Linux.
So I need to get the result of the unzip -l command line by line.
I have managed to extract the files names with this command :
unzip -l package.jar | awk '{print $NF}' | grep com/tests/[A-Za-Z] | cut -d "/" -f3 ;
But i can't figure out how to obtain the file names one after another to execute them.
How can i do it please ?
Thanks a lot.
If all you need the first row in a column, add a pipe and get the first line using head -1
So your one liner will look like :
unzip -l package.jar | awk '{print $NF}' | grep com/tests/[A-Za-Z] | cut -d "/" -f3 |head -1;
That will give you first line
now, club head and tail to get second line.
unzip -l package.jar | awk '{print $NF}' | grep com/tests/[A-Za-Z] | cut -d "/" -f3 |head -2 | tail -1;
to get second line.
But from scripting piont of view this is not a good approach. What you need is a loop as below:
for class in `unzip -l el-api.jar | awk '{print $NF}' | grep javax/el/[A-Za-Z] | cut -d "/" -f3`; do echo $class; done;
you can replace echo $class with whatever command you wish - and use $class to get the current class name.
HTH
Here is my attempt, which also take into account Daddou's request to remove the .class extension:
unzip -l package.jar | \
awk -F'/' '/com\/tests\/[A-Za-z]/ {sub(/\.class/, "", $NF); print $NF}' | \
while read baseName
do
echo " $baseName"
done
Notes:
The awk command also handles the tasks of grep and cut
The awk command also handles the removal of the .class extension
The result of the awk command is piped into the while read... command
baseName represents the name of the class file, with the .class extension removed
Now, you can do something with that $baseName

Resources