SED command usage multiple pattern
I am using the sed command to search for multiple patterns.
The command works and print the lines when it find matches
However I need to do 2 things ( here is the command I use)
sed -r '/pattern1|pattern2/!d' filename
A - Print the line containing the first pattern
then print not only the line matching the second pattern
but print the number of lines below it. I like to specify
the number of lines below second pattern search .
B - I need to print first pattern and then only a certain number of lines below
the 2nd pattern but omit the line containing the search pattern
In short, I need to control specify the number of lines below
my second serach pattern and omit the line containing the serach patetrn as well if
I decide to do so
Hostname1
section1
a
section2
a
c
d
Hostname2
section1
a
section2
x
y
d
desired Output
hostname1
section2
a
c
hostname2
section2
x
y
# Create test file
(
cat << EOF
Hostname1
section1
a
section2
a
c
d
Hostname2
section1
a
section2
x
y
d
EOF
) > filename
# transformation
cat filename | grep -v "^ *$" | sed -e "s/\(Hostname\)/==\1/g" | sed -e "s/\(section\)/=\1/g" | tr '\n' '|' | tr '=' '\n' | sed -r '/Hostname1|Hostname2|section2/!d' | cut -d"|" -f-3 | tr '|' '\n' | grep -v "^ *$" | sed -e "s/\(Hostname\)/\n\1/g"
explications
# etape 1 : transforme each section to on ligne, with a dilimiter "|" :
cat filename | grep -v "^ *$" | sed -e "s/\(Hostname\)/==\1/g" | sed -e "s/\(section\)/=\1/g" | tr '\n' '|' | tr '=' '\n'
#Hostname1|
#section1|a|
#section2|a|c|d|
#
#Hostname2|
#section1|a|
#section2|x|y|d|
# etape 2 : cut n+1 fild ( cut -d"|" -f-3 ) :
cat filename | cat filename | grep -v "^ *$" | sed -e "s/\(Hostname\)/==\1/g" | sed -e "s/\(section\)/=\1/g" | tr '\n' '|' | tr '=' '\n' | sed -r '/Hostname1|Hostname2|section2/!d' | cut -d"|" -f-3
#Hostname1|
#section2|a|c
#Hostname2|
#section2|x|y
#etape 3 : transfomation to wanted format :
cat filename | cat filename | grep -v "^ *$" | sed -e "s/\(Hostname\)/==\1/g" | sed -e "s/\(section\)/=\1/g" | tr '\n' '|' | tr '=' '\n' | sed -r '/Hostname1|Hostname2|section2/!d' | cut -d"|" -f-3 | tr '|' '\n' | grep -v "^ *$" | sed -e "s/\(Hostname\)/\n\1/g"
#Hostname1
#section2
#a
#c
#
#Hostname2
#section2
#x
#y
Related
tr -c '[:alnum:]' '[\n*]' < 4300-0.txt | sort | uniq -c | sort -nr | head
The following command retrieves unique words along with the count. I'd like to retrieve punctuation marks along with the unique word counts.
What is the way to achieve this?
You could split your input with tee and extract punctuations and alnum separately.
echo "Helo, world!" |
{
tee >(tr -c '[:alnum:]' '\n' >&3) |
tr -c '[:punct:]' '\n'
} 3>&1 |
sed '/^$/d' |
sort | uniq -c | sort -nr | head
should output:
1 world
1 Helo
1 !
1 ,
A short sed script also seems to work:
echo "Helo, world!
OK!" |
sed '
s/\([[:alnum:]]\+\)\([^[:alnum:]]\)/\1\n\2/g
s/\([[:punct:]]\+\)\([^[:punct:]]\)/\1\n\2/g
s/[^[:punct:][:alnum:]]/\n/g
' |
sed '/^$/d' |
sort | uniq -c | sort -nr | head
should output:
2 !
1 world
1 OK
1 Helo
1 ,
You can use [:punct:] to retrieve the punctuation marks
And you can run:
tr -c '[:alnum:][:punct:]' '[\n*]' < 4300-0.txt | sort | uniq -c | sort -nr | head
it will print out the punctuation marks as well.
For example:
if you have in your txt file
aaa,
aaa
the output will be:
1 aaa
1 aaa,
I would like to count the occurence of strings from certain file using pipelines, without awk and sed command.
my_file content:
ls -al
bash
cat datoteka.txt
cat d.txt | sort | less
/bin/bash
terminal command that I use:
cat $my_file | cut -d ' ' -f 1 | tr '|' '\n' | xargs -r -L1 basename | sort | uniq -c | xargs -r -L1 sh -c 'echo $1 $0'
desired output:
bash 2
cat 2
less 1
ls 1
sort 1
In my case, I get:
bash 2
cat 2
ls 1
_sort 1 (not counted )
_less 1 (not counted )
Sort and less comand are not counted because of the whitespace (I marked with _ ) infront of those two strings. How shall I improve my code, to remove this blank space before "sort" and "less"? Thanks in advance!
Update: Here is a second and longer example of an input file:
nl /etc/passwd
seq 1 10 | tr "\n" ","
seq 1 10 | tr -d 13579 | tr -s "\n "
seq 1 100 | split -d -a 2 -l10 - blabla-
uname -a | cut -d" " -f1,3
cut -d: -f1 /etc/passwd > fst
cut -d: -f3 /etc/passwd > scnd
ps -e | column
echo -n ABC | wc -m -c
cmp -s dat1.txt dat1.txt ; echo $?
diff dat1 dat2
ps -e | grep firefox
echo dat1 dat2 dat3 | tr " " "\n" | xargs -I {} -p ln -s {}
The problem with the code in the question, as you were aware, was with the cut statement. This replaces cut with a shell while loop that also includes the basename command:
$ tr '|' '\n' <my_file | while read cmd other; do basename "$cmd"; done | sort | uniq -c | xargs -r -L1 sh -c 'echo $1 $0'
bash 2
cat 2
less 1
ls 1
sort 1
Alternate Sorting
The above sorts the results alphabetically by the name of the command. If instead we want to sort in descending numerical order of number of occurrences, then:
tr '|' '\n' <file2 | while read cmd other; do basename "$cmd"; done | sort | uniq -c | xargs -r -L1 sh -c 'echo $1 $0' | sort -snrk2
Applying this command to the second input example in the question:
$ tr '|' '\n' <file2 | while read cmd other; do basename "$cmd"; done | sort | uniq -c | xargs -r -L1 sh -c 'echo $1 $0' | sort -snrk2
tr 4
cut 3
seq 3
echo 2
ps 2
cmp 1
column 1
diff 1
grep 1
nl 1
split 1
uname 1
wc 1
xargs 1
while IFS='|' read -ra commands; do
for cmd in "${commands[#]}"; do
set -- $cmd # unquoted to discard irrelevant whitespace
basename $1
done
done < myfile |
sort |
uniq -c |
while read num cmd; do
echo "$cmd $num"
done
bash 2
cat 2
less 1
ls 1
sort 1
After several grep's I am able to have a list of some "words" like this
everything starts as
cat \path\verilargestructured.txt | grep option1 -B50 | grep option2 -A30 | grep option3 -A20 | grep "=host"
which result in a list with this structure
part1.part2.part3.part4=host
part1.part2.part3.part4=host
...
part1.part2.part3.part4=host
I want to use sed or any other option in bash to trim that out to
part1.part2.part3.part4
or
part2.part3.part4
assuming partN is only alphanumeric (no special characters)
Thanks
With awk, you can specify multiple delimiters with the -F parameter and output field separator with OFS option.
For example awk -F '[.=]' '{print $2,$3,$4}' OFS=. will print only second, third and fourth row of your output separated with dot.
cat \path\verilargestructured.txt | grep option1 -B50 | grep option2 -A30 | grep option3 -A20 | grep "=host" | awk -F '[.=]' '{print $2,$3,$4}' OFS=.
If I understand you correctly, then pipe it through
sed 's/=.*//'
...that will cut off the first = in each line and everything that comes after it. So, all in all,
cat \path\verilargestructured.txt | grep option1 -B50 | grep option2 -A30 | grep option3 -A20 | grep "=host" | sed 's/=.*//'
Alternatively, you could use cut:
cut -d = -f 1
Addendum: Going the cut route, to isolate all but part1, you could pipe it through yet another cut call
cut -d . -f 2-
As in
echo 'part1.part2.part3.part4=host' | cut -d = -f 1 | cut -d . -f 2-
Here -f 2- means "from the second field to the last." if you only wanted parts 2 and 3, you could use -f 2-3, and so forth. See man cut for details.
Given a .txt files with space separated words such as:
But where is Esope the holly Bastard
But where is
And the Awk function :
cat /pathway/to/your/file.txt | tr ' ' '\n' | sort | uniq -c | awk '{print $2"#"$1}'
I get the following output in my console :
1 Bastard
1 Esope
1 holly
1 the
2 But
2 is
2 where
How to get into printed into myFile.txt ?
I actually have 300.000 lines and near 2 millions words. Better to output the result into a file.
EDIT: Used answer (by #Sudo_O):
$ awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" myfile.txt | sort > myfileout.txt
Your pipeline isn't very efficient you should do the whole thing in awk instead:
awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" file > myfile
If you want the output in sorted order:
awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" file | sort > myfile
The actual output given by your pipeline is:
$ tr ' ' '\n' < file | sort | uniq -c | awk '{print $2"#"$1}'
Bastard#1
But#2
Esope#1
holly#1
is#2
the#1
where#2
Note: using cat is useless here we can just redirect the input with <. The awk script doesn't make sense either, it's just reversing the order of the words and words frequency and separating them with an #. If we drop the awk script the output is closer to the desired output (notice the preceding spacing however and it's unsorted):
$ tr ' ' '\n' < file | sort | uniq -c
1 Bastard
2 But
1 Esope
1 holly
2 is
1 the
2 where
We could sort again a remove the leading spaces with sed:
$ tr ' ' '\n' < file | sort | uniq -c | sort | sed 's/^\s*//'
1 Bastard
1 Esope
1 holly
1 the
2 But
2 is
2 where
But like I mention at the start let awk handle it:
$ awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" file | sort
1 Bastard
1 Esope
1 holly
1 the
2 But
2 is
2 where
Just redirect output to a file.
cat /pathway/to/your/file.txt % tr ' ' '\n' | sort | uniq -c | \
awk '{print $2"#"$1}' > myFile.txt
Just use shell redirection :
echo "test" > overwrite-file.txt
echo "test" >> append-to-file.txt
Tips
A useful command is tee which allow to redirect to a file and still see the output :
echo "test" | tee overwrite-file.txt
echo "test" | tee -a append-file.txt
Sorting and locale
I see you are working with asian script, you need to be need to be careful with the locale use by your system, as the resulting sort might not be what you expect :
* WARNING * The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.
And have a look at the output of :
locale
Using sed, how to change the letter 'a' to 'A' but only if it appears repeated as two or more consecutive letters. Example, from:
galaxy
ear
aardvak
Haaaaaaaaa
into
galaxy
ear
AArdvak
HAAAAAAAAA
You can do it using groups. If you have this file:
$ cat a
galaxy
ear
aardvak
Haaaaaaaaa
Ulaanbaatar
You can use this sed command:
$ sed 's/\(.\)\1\{1,\}/\U&/g' a
galaxy
ear
AArdvak
HAAAAAAAAA
UlAAnbAAtar
What does happen here? If we have a char, "packed" in a group (\(.\)), and this group (\1) repeats itself one or more times (\1\{1,\}), then replace the matched part (&) by its uppercased version (\U&).
EDIT
You can do this with:
sed 's/a\(a\+\)/A\U\1/;s/b\(b\+\)/B\U\1/;s/c\(c\+\)/C\U\1/;s/d\(d\+\)/D\U\1/;s/e\(e\+\)/E\U\1/;s/f\(f\+\)/F\U\1/;s/g\(g\+\)/G\U\1/;s/h\(h\+\)/H\U\1/;s/i\(i\+\)/I\U\1/;s/j\(j\+\)/J\U\1/;s/k\(k\+\)/K\U\1/;s/l\(l\+\)/L\U\1/;s/m\(m\+\)/M\U\1/;s/n\(n\+\)/N\U\1/;s/o\(o\+\)/O\U\1/;s/p\(p\+\)/P\U\1/;s/q\(q\+\)/Q\U\1/;s/r\(r\+\)/R\U\1/;s/s\(s\+\)/S\U\1/;s/t\(t\+\)/T\U\1/;s/u\(u\+\)/U\U\1/;s/v\(v\+\)/V\U\1/;s/w\(w\+\)/W\U\1/;s/x\(x\+\)/X\U\1/;s/y\(y\+\)/Y\U\1/;s/z\(z\+\)/Z\U\1/'
(Thanks to shelter)
Or with a pipe of sed:
function capitalize_consecutives () {
sed 's/a\(a\+\)/A\U\1/' |
sed 's/b\(b\+\)/B\U\1/' |
sed 's/c\(c\+\)/C\U\1/' |
sed 's/d\(d\+\)/D\U\1/' |
sed 's/e\(e\+\)/E\U\1/' |
sed 's/f\(f\+\)/F\U\1/' |
sed 's/g\(g\+\)/G\U\1/' |
sed 's/h\(h\+\)/H\U\1/' |
sed 's/i\(i\+\)/I\U\1/' |
sed 's/j\(j\+\)/J\U\1/' |
sed 's/k\(k\+\)/K\U\1/' |
sed 's/l\(l\+\)/L\U\1/' |
sed 's/m\(m\+\)/M\U\1/' |
sed 's/n\(n\+\)/N\U\1/' |
sed 's/o\(o\+\)/O\U\1/' |
sed 's/p\(p\+\)/P\U\1/' |
sed 's/q\(q\+\)/Q\U\1/' |
sed 's/r\(r\+\)/R\U\1/' |
sed 's/s\(s\+\)/S\U\1/' |
sed 's/t\(t\+\)/T\U\1/' |
sed 's/u\(u\+\)/U\U\1/' |
sed 's/v\(v\+\)/V\U\1/' |
sed 's/w\(w\+\)/W\U\1/' |
sed 's/x\(x\+\)/X\U\1/' |
sed 's/y\(y\+\)/Y\U\1/' |
sed 's/z\(z\+\)/Z\U\1/'
}
Then let it parses your file:
capitalize_consecutives < myfile
\U is to UPPERCASE the occurence. I guess this is only for GNU sed.