Using sed to fill blanks with 0's (zero's) - linux

Does anyone know of a way to replace blanks with 0's? Here's what im trying to do...
Basically i have a script that pulls an ip address and manipulates the address to make a port number out of it.
192.168.202.3 = Port 23
what i need is a smart enough sed command to add 2 0's in front of the 3 making it a full value.
192.168.202.3 = Port 2003
or:
192.168.202.003 = Port 2003
The catch is, if the number already exists then i dont want it to add 0's..
192.168.202.254 = Port 2254
instead of:
192.168.202.254 = Port 200254
Any ideas on how to do it?
Relevant Portion of the script:
# Retrieve local-ipv4 address from meta-data
GET http://169.254.169.254/latest/meta-data/local-ipv4 > /metadata
# Create a manipulated version of ipv4 to use as a port number
sed "s/192.168.20//" /metadata > /metaport
sed -i "s/\.//g" /metaport
If you have another way without using sed im open for those suggestions as well!!
Thanks!

I would prefer using awk for number manipulation rather than sed
awk -F'.' '{printf "%03s%03s\n", $3, $4}' /metadata | cut -c3-6 > /metaport
Input IP:
192.168.202.3
192.168.202.23
192.168.202.254
Output Port:
2003
3023
2254
EDIT
More concise awk only solution avoiding need of cut (Suggested by Jonathan Leffler)
awk -F'.' '{printf "%d%03d\n", $3 % 10, $4}' /metadata > /metaport

If the input file contains only an IP address, then brute force and ignorance can do the job:
sed -e 's/\([0-9]\)\.\([0-9]\)$/& = Port \100\2/' \
-e 's/\([0-9]\)\.\([0-9][0-9]\)$/& = Port \10\2/' \
-e 's/\([0-9]\)\.\([0-9][0-9][0-9]\)$/& = Port \1\2/'
The first expression deals with 1 digit; the second with 2 digits; the third with 3.
Given input data:
192.168.202.3
192.168.203.13
192.168.202.003
192.168.202.254
the output is:
192.168.202.3 = Port 2003
192.168.203.13 = Port 3013
192.168.202.003 = Port 2003
192.168.202.254 = Port 2254
If you have a different input data format, you have to work harder to isolate the relevant section of the IP address, but you should really, really show what the input data looks like.

Just for fun, bash:
while IFS=. read a b c d; do
printf "%d%03d\n" $((c%10)) $d
done <<END
192.168.202.3
192.168.202.003
192.168.209.123
127.0.0.1
END
2003
2003
9123
0001

Given the description -- only insert two zeros when we only have 2 digits into the port the following should work:
sed -r '/Port [0123456789]{2}$/s/Port (.)/\100/'
So this only matches when Port is followed by 2 digits. If it does match, replace the first digit with that digit and two zeros.
If you need to handle 3 digits, another match section that does just 3 digits could be trivially added.

Related

Need to remove domains domains from sub-domans

I am trying to get last 2 values from right to left from cut command
I have a large database for about 110 Million domains and subdomains.
Like
yahoo.com
mail.yahoo.com
a.yahoo.com
a.yahoo.co.uk
In simple words I am trying to remove subdomains from domains
echo a.yahoo.aa | cut -d '.' -f 2,3
yahoo.aa
but when I try
echo yahoo.aa | cut -d '.' -f 2,3
aa
it give me only aa
Required output is
yahoo.com
yahoo.com
yahoo.com
yahoo.co.uk
edit thanks anubhava for suggestion.
a TLD property is like
xxxx.xx
xxx.xx
xx.xx
i.e. a ccTLD always has 2 characters in last.
Long solution but a think that makes what you want to do:
Executable file domain.awk:
#! /usr/bin/awk -f
BEGIN {
FS="."
}
{
ret = $NF
if (NF >= 2 && (length($(NF - 1)) == 2 || length($(NF - 1)) == 3)) {
ret = $(NF - 1) "." ret
if (NF >= 3) {
ret = $(NF - 2) "." ret
}
} else if (NF >= 2) {
ret = $(NF - 1) "." ret
}
print ret
}
with domains.lst file:
yahoo.com
mail.yahoo.com
a.yahoo.com
a.yahoo.co.uk
aus.co.au
Used like that:
./domain.awk domains.lst
Output:
yahoo.com
yahoo.com
yahoo.com
yahoo.co.uk
aus.co.au
Using the sample input you provided and accepting your statement that a ccTLD always has 2 characters in last. as being your criteria for printing the last 3 instead of last 2 segments of the input:
Using GNU grep for -o:
$ grep -Eo '[^.]+\.[^.]+(\.[^.]{2})?$' file
yahoo.com
yahoo.com
yahoo.com
yahoo.co.uk
or using any awk:
$ awk 'match($0,/[^.]+\.[^.]+(\.[^.]{2})?$/){print substr($0,RSTART)}' file
yahoo.com
yahoo.com
yahoo.com
yahoo.co.uk
Try
echo a.yahoo.aa | awk -F'.' '{print $NF"."$(NF-1)}'
large database for about 110 Million domains and subdomains.
Due to this I suggest using sed here, let file.txt content be
yahoo.com
mail.yahoo.com
a.yahoo.com
then
sed 's/^.*\.\([^.]*\.[^.]*\)$/\1/' file.txt
output
yahoo.com
yahoo.com
yahoo.com
Explanation: In regular expression spanning whole line (^-start, $-end) I use single capturing group which contain zero-or-more (*) non-dots followed by literal dot (\.) followed by zero-or-more non-dots which is adjacent to end of line, I replace whole line with content of that group. Disclaimer: this solution assumes there is always at least one dot in each line
(tested in GNU sed 4.2.2)
You are selecting only fields 2 and 3. You need to select from field 2 up to the end:
... | cut -d '.' -f 2-

how to count occurrence of specific word in group of file by bash/shellscript

i have two text files 'simple' and 'simple1' with following data in them
simple.txt--
hello
hi hi hello
this
is it
simple1.txt--
hello hi
how are you
[]$ tr ' ' '\n' < simple.txt | grep -i -c '\bh\w*'
4
[]$ tr ' ' '\n' < simple1.txt | grep -i -c '\bh\w*'
3
this commands show the number of words that start with "h" for each file but i want to display the total count to be 7 i.e. total of both file. Can i do this in single command/shell script?
P.S.: I had to write two commands as tr does not take two file names.
Try this, the straightforward way :
cat simple.txt simple1.txt | tr ' ' '\n' | grep -i -c '\bh\w*'
This alternative requires no pipelines:
$ awk -v RS='[[:space:]]+' '/^h/{i++} END{print i+0}' simple.txt simple1.txt
7
How it works
-v RS='[[:space:]]+'
This tells awk to treat each word as a record.
/^h/{i++}
For any record (word) that starts with h, we increment variable i by 1.
END{print i+0}
After we have finished reading all the files, we print out the value of i.
It is not the case, that tr accepts only one filename, it does not accept any filename (and always reads from stdin). That's why even in your solution, you didn't provide a filename for tr, but used input redirection.
In your case, I think you can replace tr by fmt, which does accept filenames:
fmt -1 simple.txt simple1.txt | grep -i -c -w 'h.*'
(I also changed the grep a bit, because I personally find it better readable this way, but this is a matter of taste).
Note that both solutions (mine and your original ones) would count a string consisting of letters and one or more non-space characters - for instance the string haaaa.hbbbbbb.hccccc - as a "single block", i.e. it would only add 1 to the count of "h"-words, not 3. Whether or not this is the desired behaviour, it's up to you to decide.

find a pattern and print line based on finding the first pattern sed, awk grep

I have a rather large file. What is common to all is the hostname to break each section example :
HOSTNAME:host1
data 1
data here
data 2
text here
section 1
text here
part 4
data here
comm = 2
HOSTNAME:host-2
data 1
data here
data 2
text here
section 1
text here
part 4
data here
comm = 1
The above prints
As you see above, in between each section there are other sections broken down by key words or lines that have specific values
I like to use a oneliner to print host name for each section and then print which ever lines I want to extract under each hostname section
Can you please help. I am using now grep -C 10 HOSTNAME | gerp -C pattern
but this assumes that there are 10 lines in each section. This is not an optimal way to do this; can someone show a better way. I also need to be able to print more than one line under each pattern that I find . So if I find data1 and there are additional lines under it I like to grab and print them
So output of command would be like
grep -C 10 HOSTNAME | grep data 1
grep -C 10 HOSTNAME | grep -A 2 data 1
HOSTNAME:Host1
data 1
HOSTNAME:Hoss2
data 1
Beside Grep I use this sed command to print my output
sed -r '/HOSTNAME|shared/!d' filename
The only problem with this sed command is that it only prints the lines that have patterns shared & HOSTNAME in them. I also need to specify the number of lines I like to print in my case under the line that matched patterns shared. So I like to print HOSTNAME and give the number of lines I like to print under second search pattern shared.
Thanks
awk to the rescue!
$ awk -v lines=2 '/HOSTNAME/{c=lines} NF&&c&&c--' file
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1
print lines number of lines including pattern match, skips empty lines.
If you want to specify secondary keyword instead number of lines
$ awk -v key='data 1' '/HOSTNAME/{h=1; print} h&&$0~key{print; h=0}' file
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1
Here is a sed twoliner:
sed -n -r '/HOSTNAME/ { p }
/^\s+data 1/ {p }' hostnames.txt
It prints (p)
when the line contains a HOSTNAME
when the line starts with some whitespace (\s+) followed by your search criterion (data 1)
non-mathing lines are not printed (due to the sed -n option)
Edit: Some remarks:
this was tested with GNU sed 4.2.2 under linux
you dont need the -r if your sed version does not support it, replace the second pattern to /^.*data 1/
we can squash everything in one line with ;
Putting it all together, here is a revised version in one line, without the need for the extended regex ( i.e without -r):
sed -n '/HOSTNAME/ { p } ; /^.*data 1/ {p }' hostnames.txt
The OP requirements seem to be very unclear, but the following is consistent with one interpretation of what has been requested, and more importantly, the program has no special requirements, and the code can easily be modified to meet a variety of requirements. In particular, both search patterns (the HOSTNAME pattern and the "data 1" pattern) can easily be parameterized.
The main idea is to print all lines in a specified subsection, or at least a certain number up to some limit.
If there is a limit on how many lines in a subsection should be printed, specify a value for limit, otherwise set it to 0.
awk -v limit=0 '
/^HOSTNAME:/ { subheader=0; hostname=1; print; next}
/^ *data 1/ { subheader=1; print; next }
/^ *data / { subheader=0; next }
subheader && (limit==0 || (subheader++ < limit)) { print }'
Given the lines provided in the question, the output would be:
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1
(Yes, I know the variable 'hostname' in the awk program is currently unused, but I included it to make it easy to add a test to satisfy certain obvious requirements regarding the preconditions for identifying a subheader.)
sed -n -e '/hostname/,+p' -e '/Duplex/,+p'
The simplest way to do it is to combine two sed commands ..

grep using variable and regex

I am trying to grep a log file for entries within the last 24 hours. I came up with the following command:
grep "$(date +%F\ '%k')"\|"$(date +%F --date='yesterday')\ [$(date +%k)-23]" /path/to/log/file
I know regular expressions can be used in grep, but am not very familiar with regex. You see I am greping for anything from today or anything from yesterday between the current hour or higher. This isnt working and I am guessing due to the way I am trying to pass a command as a variable in the regex of grep. I also wouldnt be opposed to using awk with awk I came up with the following but it is not checking the variables properly:
t=$(date +%F) | y=$(date +%F --date='yesterday') | hr=$(date +%k) | awk '{ if ($1=$t || $1=$y && $2>=$hr) { print $0 }}' /path/to/log/file
I would assume systime could be used with awk rather than settings variables but i am not familiar with systime at all. Any suggestions with either command would be greatly appreciated! Oh and here's the log formatting:
2012-12-26 16:33:16 SMTP connection from [127.0.0.1]:46864 (TCP/IP connection count = 1)
2012-12-26 16:33:16 SMTP connection from (localhost) [127.0.0.1]:46864 closed by QUIT
2012-12-26 16:38:19 SMTP connection from [127.0.0.1]:48451 (TCP/IP connection count = 1)
2012-12-26 16:38:21 SMTP connection from [127.0.0.1]:48451 closed by QUIT
2012-12-26 16:38:21 SMTP connection from [127.0.0.1]:48860 (TCP/IP connection count = 1)
Here's one way using GNU awk. Run like:
awk -f script.awk file
Contents of script.awk:
BEGIN {
time = systime()
}
{
spec = $1 " " $2
gsub(/[-:]/, " ", spec)
}
time - mktime(spec) < 86400
Alternatively, here's the one-liner:
awk 'BEGIN { t = systime() } { s = $1 " " $2; gsub(/[-:]/, " ", s) } t - mktime(s) < 86400' file
Also, the correct way to pass shell vars to awk is to use the -v flag. I've made a few adjustments to your awk command to show you what I mean, but I recommend against doing this:
awk -v t="$(date +%F)" -v y="$(date +%F --date='yesterday')" -v hr="$(date +%k)" '$1==t || $1==y && $2>=hr' file
Explanation:
So before awk starts processing the file, the BEGIN block is processed first. In this block we create a variable called time / t and this is set using the systime() function. systime() simply returns the current time as the number of seconds since the system epoch. Then, for every line in your log file, awk will create another variable called spec / s and this is set to the first and second fields seperated by a single space. Additionally, other characters like - and : need to be globally substituted with spaces for the mktime() function to work correctly and this done using gsub(). Then it's just a little mathematics to test if the datetime in the log file is within the last 24 hours (or exactly 86400 seconds). If the test is true, the line will be printed. Maybe a little extra reading would help, see Time Functions and String Manipulation Functions. HTH.

ping script and log output and cut with grep

I want to ping a bunch of locations but not at the same time, in order so they don't timeout.
The input is for example: ping google.com -n 10 | grep Minimum >> output.txt
This will make the output of: Minimum = 29ms, Maximum = 46ms, Average = 33ms
But there are extra spaces in front of it which I don't know how to cut off, and when it outputs to the txt file it doesn't go to a new line. What I am trying to do is make it so I can copy and paste the input and ping a bunch of places once the previous finishes and log it in a .txt file and number them so it would look like:
Server 1: Minimum = 29ms, Maximum = 46ms, Average = 33ms
Server 2: Minimum = 29ms, Maximum = 46ms, Average = 33ms
Server 3: Minimum = 29ms, Maximum = 46ms, Average = 33ms
Server 4: Minimum = 29ms, Maximum = 46ms, Average = 33ms
Well, first of all, ping on linux limits packet number to send with -c, not -n.
Secondly, output of ping is not Minimum = xx ms, Maximum = yy ms, Avrage = zz ms, but rtt min/avg/max/mdev = 5.953/5.970/5.987/0.017 ms
So basically if you do something in lines of:
for server in google.com yahoo.com
do
rtt=`ping $server -c 2 | grep rtt`
echo "$server: $rtt" >> output.txt
done
You should achieve what you want.
[edit]
If cygwin is your platform, the easiest way to strip the spaces would be either what people are suggesting, sed, or then just | awk '{print $1}', will trim your line as well.
I think you might be able to solve this using sed two times and a while loop at the end:
N=1; ping google.com -n 10 | grep Minimum | sed -r 's/(Average = [[:digit:]]+ms)/\1\n/g' | sed -r s'/[[:space:]]+(Minimum)/\1/g' | while read file; do echo Server "$N": "$file"; N=$((N+1)); done >> output.txt
The steps:
The first sed fixes the newline issue:
Match the final part of the string after which you want a new line, in this case Average = [[:digit:]]+ms and put it into a group using the parenthesis
Then replace it with the same group (\1) and insert a newline character (\n) after it
The second sed removes the whitespaces, by matching the word Minimum and all whitespaces in front of it after which it only returns the word Minimum
The final while statement loops over each line and adds Server "$N": in front of the ping results. The $N was initialized to 1 at the start, and is increased with 1 after each read line
You can use sed to remove first 4 spaces :
ping google.com -n 10 | grep Minimum | sed s/^\ \ \ \ //

Resources