Sort text file alphabetically in groups (Linux)

Sort text file alphabetically in groups (Linux) - linux

I know how to sort a text file alphabetically but I'm trying to do more than sorting (i.e. grouping).
I'm trying to create a Unix Shell script that formats /etc/hosts file in my organization in the following format:
From:
Xsb ip
aabc ip
A2bc ip
Eexg ip
exx ip
Fxg ip
To:
### A
aabc ip
a2bc ip
### E
eexg ip
exx ip
### F
fxg ip
### X
xsb ip
Then I'll create another script to add new hostname lines but for now I'm not sure what's the most compact way to do this. I thought perhaps I'll need to loop on all initial letters of the hostnames with a for loop but your expert advice is highly appreciated on the shortest way.

Following script maybe achieve your need.
tr A-Z a-z | sort | awk '{\
if (last != substr($0, 0, 1)) {\
print "### "toupper(substr($0,0,1))"\n"$0; last = substr($1, 0, 1)\
} else {\
print $0\
}\
}'
tr convert letters to lowercase
sort sort text alphabetically
awk add '###' when first letter of line different from next line
Hope this can help you.

Related

Insert the result of a PostgreSQL "select" at the end of every line of a file

I have a file with some IPs in the second column
[postgres#hotname]$ cat /tmp/ips.txt
538954 10.20.30.1
130708 10.20.30.2
55300 10.20.30.3
47634 10.20.30.4
And a table with the name of the servers along with its IP address.
If I have the name of each IP with this:
[postgres#hotname]$ awk '{system("psql database -tc \"select b.des_servidor from cat_servidores a,ctl_servidores b where a.num_ip=\47"$2"\47 and a.idu_servidor = b.idu_servidor\"")}' /tmp/ips.txt
TestServer1
TestServer2
TestServer3
TestServer4
How I can insert the name of the server at the end of the file with the IPs, and have something like this:
[postgres#hotname]$ cat /tmp/ips.txt
538954 10.20.30.1 TestServer1
130708 10.20.30.2 TestServer2
55300 10.20.30.3 TestServer3
47634 10.20.30.4 TestServer4
I've been unsuccessfully trying using awk and sed, but I read somewhere that it is not recommended.
Can someone help me?

with 'e' flag in sed
sed -r 's/([^ ]* )(.*)/echo -n \0" " \;psql database -tc "select b.des_servidor from cat_servidores a,ctl_servidores b where a.num_ip=\\47\2\\47 and a.idu_servidor = b.idu_servidor"/e' /tmp/ips.txt

Use sed or awk to replace line after match

I'm trying to create a little script that basically uses dig +short to find the IP of a website, and then pipe that to sed/awk/grep to replace a line. This is what the current file looks like:
#Server
123.455.1.456
246.523.56.235
So, basically, I want to search for the '#Server' line in a text file, and then replace the two lines underneath it with an IP address acquired from dig.
I understand some of the syntax of sed, but I'm really having trouble figuring out how to replace two lines underneath a match. Any help is much appreciated.

Based on the OP, it's not 100% clear exactly what needs to replaced where, but here's a a one-liner for the general case, using GNU sed and bash. Replace the two lines after "3" with standard input:
echo Hoot Gibson | sed -e '/3/{r /dev/stdin' -e ';p;N;N;d;}' <(seq 7)
Outputs:
1
2
3
Hoot Gibson
6
7
Note: sed's r command is opaquely documented (in Linux anyway). For more about r, see:
"5.9. The 'r' command isn't inserting the file into the text" in this sed FAQ.

here's how in awk:
newip=12.34.56.78
awk -v newip=$newip '{
if($1 == "#Server"){
l = NR;
print $0
}
else if(l>0 && NR == l+1){
print newip
}
else if(l==0 || NR != l+2){
print $0
}
}' file > file.tmp
mv -f file.tmp file
explanation:
pass $newip to awk
if the first field of the current line is #Server, let l = current line number.
else if the current line is one past #Server, print the new ip.
else if the current row is not two past #Server, print the line.
overwrite original file with modified version.

find a pattern and print line based on finding the first pattern sed, awk grep

I have a rather large file. What is common to all is the hostname to break each section example :
HOSTNAME:host1
data 1
data here
data 2
text here
section 1
text here
part 4
data here
comm = 2
HOSTNAME:host-2
data 1
data here
data 2
text here
section 1
text here
part 4
data here
comm = 1
The above prints
As you see above, in between each section there are other sections broken down by key words or lines that have specific values
I like to use a oneliner to print host name for each section and then print which ever lines I want to extract under each hostname section
Can you please help. I am using now grep -C 10 HOSTNAME | gerp -C pattern
but this assumes that there are 10 lines in each section. This is not an optimal way to do this; can someone show a better way. I also need to be able to print more than one line under each pattern that I find . So if I find data1 and there are additional lines under it I like to grab and print them
So output of command would be like
grep -C 10 HOSTNAME | grep data 1
grep -C 10 HOSTNAME | grep -A 2 data 1
HOSTNAME:Host1
data 1
HOSTNAME:Hoss2
data 1
Beside Grep I use this sed command to print my output
sed -r '/HOSTNAME|shared/!d' filename
The only problem with this sed command is that it only prints the lines that have patterns shared & HOSTNAME in them. I also need to specify the number of lines I like to print in my case under the line that matched patterns shared. So I like to print HOSTNAME and give the number of lines I like to print under second search pattern shared.
Thanks

awk to the rescue!
$ awk -v lines=2 '/HOSTNAME/{c=lines} NF&&c&&c--' file
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1
print lines number of lines including pattern match, skips empty lines.
If you want to specify secondary keyword instead number of lines
$ awk -v key='data 1' '/HOSTNAME/{h=1; print} h&&$0~key{print; h=0}' file
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1

Here is a sed twoliner:
sed -n -r '/HOSTNAME/ { p }
/^\s+data 1/ {p }' hostnames.txt
It prints (p)
when the line contains a HOSTNAME
when the line starts with some whitespace (\s+) followed by your search criterion (data 1)
non-mathing lines are not printed (due to the sed -n option)
Edit: Some remarks:
this was tested with GNU sed 4.2.2 under linux
you dont need the -r if your sed version does not support it, replace the second pattern to /^.*data 1/
we can squash everything in one line with ;
Putting it all together, here is a revised version in one line, without the need for the extended regex ( i.e without -r):
sed -n '/HOSTNAME/ { p } ; /^.*data 1/ {p }' hostnames.txt

The OP requirements seem to be very unclear, but the following is consistent with one interpretation of what has been requested, and more importantly, the program has no special requirements, and the code can easily be modified to meet a variety of requirements. In particular, both search patterns (the HOSTNAME pattern and the "data 1" pattern) can easily be parameterized.
The main idea is to print all lines in a specified subsection, or at least a certain number up to some limit.
If there is a limit on how many lines in a subsection should be printed, specify a value for limit, otherwise set it to 0.
awk -v limit=0 '
/^HOSTNAME:/ { subheader=0; hostname=1; print; next}
/^ *data 1/ { subheader=1; print; next }
/^ *data / { subheader=0; next }
subheader && (limit==0 || (subheader++ < limit)) { print }'
Given the lines provided in the question, the output would be:
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1
(Yes, I know the variable 'hostname' in the awk program is currently unused, but I included it to make it easy to add a test to satisfy certain obvious requirements regarding the preconditions for identifying a subheader.)

sed -n -e '/hostname/,+p' -e '/Duplex/,+p'
The simplest way to do it is to combine two sed commands ..

Calling a bash builtin function with a parameter within awk

I have this command which outputs 2 columns separated by ⎟. First column is the number of occurrence, second is the IP address. And the whole thing is sorted by ascending # of occurrence.
awk '{ips[$1]++} END {for (ip in ips) { printf "%5s %-1s %-3s\n", ips[ip], "⎟", ip}}' "${ACCESSLOG}" | sort -nk1
19 ⎟ 76.20.221.34
19 ⎟ 76.9.214.2
22 ⎟ 105.152.107.118
26 ⎟ 24.185.179.32
26 ⎟ 42.117.198.229
26 ⎟ 83.216.242.69
etc.
Now i would like to add a third column in there. In the bash shell, if you do, for instance:
host 72.80.99.43
you'll get:
43.99.80.72.in-addr.arpa domain name pointer pool-72-80-99-43.nycmny.fios.verizon.net.
So for every IP appearing in the list, i want to show in the third column its associated host. And i want to do that from within awk. So calling host from awk and passing it the parameter ip. And ideally, skipping all the standard stuff and only showing the hostname like so: nycmny.fios.verizon.net.
So my final command would look like this:
awk '{ips[$1]++} END {for (ip in ips) { printf "%5s %-1s %-3s %20s\n", ips[ip], "⎟", ip, system( "host " ip )}}' "${ACCESSLOG}" | sort -nk1
Thanks

You wouldn't use system() since you want to combine the shell command output with your awk output, you'd call the command as a string and read it's result into a variable with getline, e.g.:
awk '{ips[$1]++}
END {
for (ip in ips) {
cmd = "host " ip
if ( (cmd | getline host) <= 0 ) {
host = "N/A"
}
close(cmd)
printf "%5s %-1s %-3s %20s\n", ips[ip], "⎟", ip, host
}
}' "${ACCESSLOG}" | sort -nk1
I assume you can figure out how to use *sub() to get just the part of the host output you care about.

Using sed to fill blanks with 0's (zero's)

Does anyone know of a way to replace blanks with 0's? Here's what im trying to do...
Basically i have a script that pulls an ip address and manipulates the address to make a port number out of it.
192.168.202.3 = Port 23
what i need is a smart enough sed command to add 2 0's in front of the 3 making it a full value.
192.168.202.3 = Port 2003
or:
192.168.202.003 = Port 2003
The catch is, if the number already exists then i dont want it to add 0's..
192.168.202.254 = Port 2254
instead of:
192.168.202.254 = Port 200254
Any ideas on how to do it?
Relevant Portion of the script:
# Retrieve local-ipv4 address from meta-data
GET http://169.254.169.254/latest/meta-data/local-ipv4 > /metadata
# Create a manipulated version of ipv4 to use as a port number
sed "s/192.168.20//" /metadata > /metaport
sed -i "s/\.//g" /metaport
If you have another way without using sed im open for those suggestions as well!!
Thanks!

I would prefer using awk for number manipulation rather than sed
awk -F'.' '{printf "%03s%03s\n", $3, $4}' /metadata | cut -c3-6 > /metaport
Input IP:
192.168.202.3
192.168.202.23
192.168.202.254
Output Port:
2003
3023
2254
EDIT
More concise awk only solution avoiding need of cut (Suggested by Jonathan Leffler)
awk -F'.' '{printf "%d%03d\n", $3 % 10, $4}' /metadata > /metaport

If the input file contains only an IP address, then brute force and ignorance can do the job:
sed -e 's/\([0-9]\)\.\([0-9]\)$/& = Port \100\2/' \
-e 's/\([0-9]\)\.\([0-9][0-9]\)$/& = Port \10\2/' \
-e 's/\([0-9]\)\.\([0-9][0-9][0-9]\)$/& = Port \1\2/'
The first expression deals with 1 digit; the second with 2 digits; the third with 3.
Given input data:
192.168.202.3
192.168.203.13
192.168.202.003
192.168.202.254
the output is:
192.168.202.3 = Port 2003
192.168.203.13 = Port 3013
192.168.202.003 = Port 2003
192.168.202.254 = Port 2254
If you have a different input data format, you have to work harder to isolate the relevant section of the IP address, but you should really, really show what the input data looks like.

Just for fun, bash:
while IFS=. read a b c d; do
printf "%d%03d\n" $((c%10)) $d
done <<END
192.168.202.3
192.168.202.003
192.168.209.123
127.0.0.1
END
2003
2003
9123
0001

Given the description -- only insert two zeros when we only have 2 digits into the port the following should work:
sed -r '/Port [0123456789]{2}$/s/Port (.)/\100/'
So this only matches when Port is followed by 2 digits. If it does match, replace the first digit with that digit and two zeros.
If you need to handle 3 digits, another match section that does just 3 digits could be trivially added.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string