Unix/Linux Shell Grep to cut - linux

I have a file, say 'names' that looks like this
first middle last userid
Brian Duke Willy willybd
...
whenever I use the following
line=`grep "willybd" /dir/names`
name=`echo $line | cut -f1-3 -d' '`
echo $name
It prints the following:
Brian Duke Willy willybd
Brian Duke Willy
My question is, how would I get it to print just "Brian Duke Willy" without first printing the original line that I cut?

The usual way to do this sort of thing is:
awk '/willybd/{ print $1, $2, $3 }' /dir/names
or, to be more specific
awk '$4 ~ /willybd/ { print $1, $2, $3 }' /dir/names
or
awk '$4 == "willybd" { print $1, $2, $3 }' /dir/names

grep "willybd" /dir/names | cut "-d " -f1-3
The default delimiter for cut is tab, not space.

Unless you need the intermediate variables, you can use
grep "willybd" /dir/names | cut -f1-3 -d' '
One of the beautiful features of linux is that most commands can be used as filters: they read from stdin and write to stdout, which means you can "pipe" the output of one command into the next command. That's what the | character does. It's pronounced pipe.

Related

Exact Match of Word using grep

I have data in file.txt as follows
BRAD CHICAGO|NORTH SAMSONCHESTER|
CORA|NEW ERICA|
CAMP LOGAN|KINGBERG|
NCHICAGOS|ESTING|
CHICAGO|MANKING|
OCREAN|CHICAGO|
CHICAGO PIT|BULL|
CHICAGO |NEWYORK|
Question 1:
I want to search for the exact match for word "CHICAGO" in first column and print second column.
Output should look like:
MANKING
NEWYORK
Question 2:
If multiple matches found then can we limit the out to only one ? so that the output will be only MANKING or NEWYORK
I tried below
grep -E -i "^CHICAGO" file.txt | awk -F '|' '{print $2}'
but i am getting below output
MANKING
BULL
NEWYORK
Expected output for Question 1:
MANKING
NEWYORK
Expected output for Question 2:
MANKING
Here are some more ways:
Using grep and cut:
grep "^CHICAGO|" file.txt | cut -d'|' -f2
Using awk
awk -F"|" '/^CHICAGO\|/{print $2}' file.txt
For question 2 simply pipe it to head, i.e:
grep "^CHICAGO|" file.txt | cut -d'|' -f2 | head -n1
Similarly for the awk command.
how about an awk solution?
awk -F'|' '$1 == "CHICAGO"{print $2}' file
to only print one output, exit once you have a match, i.e.
awk -F'|' '$1 == "CHICAGO"{print $2; exit}' file
Making that more generic, you can pass in a variable, i.e.
awk -v trgt="CHICAGO" -F'|' '{targ="^" trgt " *$"; if ( $1 ~ targ ) {print $2}}' file
The " *$" regex limits the match to zero or more trailing spaces without any extra chars at the end of the target string. So this will meet your criteria to match skip matching CHICAGO PIT|BULL.
AND this can be further reduced to
awk -v trgt="CHICAGO" -F'|' '{ if ( $1 ~ "^" trgt " *$" ) {print $2}}' file
constructing the regex "in-place" in with the comparison.
So you could use more verbose variable names to "describe" how the regex is being constructed from the input and the regex "wrappers" (as in the 3rd example) OR, you can just combine the input variable with the regex syntax in place. That is just a matter of taste or documentation conventions.
You might want to include a comment to explain you are constructing a regex test that would look like the $1 ~ /^CHICAGO *$/.
IHTH

replace sed command text inline

I have this file
file.txt
unknown#mail.com||unknown#mail.com||
unknown#mail2.com||unknown#mail2.com||
unknown#mail3.com||unknown#mail3.com||
unknown#mail4.com||unknown#mail4.com||
unknownpass
unknownpass2
unknownpass3
unknownpass4
How can I use the sed command to obtain this:
unknown#mail.com|unknownpass|unknown#mail.com|unknownpass|
unknown#mail2.com|unknownpass2|unknown#mail2.com|unknownpass2|
unknown#mail3.com|unknownpass3|unknown#mail3.com|unknownpass3|
unknown#mail4.com|unknownpass4|unknown#mail4.com|unknownpass4|
This might work for you (GNU sed):
sed ':a;N;/\n[^|\n]*$/!ba;s/||\([^|]*\)||\(\n.*\)*\n\(.*\)$/|\3|\1|\3|\2/;P;D' file
Slurp the first part of the file into pattern space and one of the replacements, substitute, print and delete the first line and then repeat.
Well, this does use sed anyway:
{ sed -n 5,\$p file.txt; sed 4q file.txt; } | awk 'NR<5{a[NR]=$0; next}
{$2=a[NR-4]; $4=a[NR-4]} 1' FS=\| OFS=\|
awk to the rescue!
awk 'BEGIN {FS=OFS="|"}
NR==FNR {if(NF==1) a[++c]=$1; next}
NF>4 {$2=a[FNR]; $4=$2; print}' file{,}
a two pass algorithm, caches the entries in the first round and inserts them into the empty fields, assumes the number of items match.
Here is another approach with one pass, powered by tac wrapped awk
tac file |
awk 'BEGIN {FS=OFS="|"}
NF==1 {a[++c]=$1}
NF>4 {$2=a[c--]; $4=$2; print}' |
tac
I would combine the related lines with paste and reshuffle the elements with awk (I assume the related lines are exactly half a file away):
n=$(wc -l < file.txt)
paste -d'|' <(head -n $((n/2)) file.txt) <(tail -n $((n/2)) file.txt) |
awk '{ print $1, $6, $3, $6, "" }' FS='|' OFS='|'
Output:
unknown#mail.com|unknownpass|unknown#mail.com|unknownpass|
unknown#mail2.com|unknownpass2|unknown#mail2.com|unknownpass2|
unknown#mail3.com|unknownpass3|unknown#mail3.com|unknownpass3|
unknown#mail4.com|unknownpass4|unknown#mail4.com|unknownpass4|

Extract text with any command in linux shell

How do I extract the text from the following text and store it to the variables:
05:21-09:32, 14:21-19:30
Here, I want to store 05 in one variable, 21 in another, 09 in another and so on. All the value must me stored in array or in separate varibles.
I have tried:
k="05:21-09:32, 14:21-19:30"
part1=($k | awk -F"-" '{print $1}' | awk -F":" '{print $1}')
part2=($k | awk -F"-" '{print $2}' | awk -F":" '{print $1}')
part3=($k | awk -F"," '{print $2}' | awk -F":" '{print $1}')
part4=($k | awk -F"-" '{print $3}' | awk -F":" '{print $1}')
I need a more clear solution or short solution.
You can use read with the -array option:
IFS=':-, ' read -ra my_arr <<< "05:21-09:32, 14:21-19:30"
The above code will split the input string on :, -, , and spaces:
$ echo "${my_arr[0]}" "${my_arr[1]}" "${my_arr[2]}" "${my_arr[3]}"
05 21 09 32
Your code has a number of problems.
You can't pipe the value of k to standard output with just $k -- you want something like printf '%s\n' "$k" or perhaps the less portable echo "$k"
Notice also the quoting in the expression above; without it, the shell will perform wildcard expansion and whitespace tokenization on the value
Spawning two Awk processes for a simple string substitution is excessive
Spawning a separate pipeline for each value you want to extract is inefficient; if at all possible, extract everything in one go.
Something like IFS=':-, '; set -- $k will assign the parts to $1, $2, $3, and $4 in one go.

How to set permanent alias

Can anybody help me to set following script as alias:
ps axu | awk '{print $2, $3, $4, $11}' | head -1 && ps axu | awk '{print $2, $3, $4, $11}' | sort -k3 -nr |head -20
I tried adding below line .bashrc file
alias abc='ps axu | awk '{print $2, $3, $4, $11}' | head -1 && ps axu | awk '{print $2, $3, $4, $11}' | sort -k3 -nr |head -20'
But had no luck, I am getting below error
$abc
Usage: grep [OPTION]... PATTERN [FILE]... sort: read failed: /apps/: Is a directory Try 'grep --help' for more information.
Here's a tip: don't call ps twice: pipe the output to a group of commands. As a function, you'll have much less quoting grief.
abc() {
ps axu | awk '{print $2, $3, $4, $11}' | {
IFS= read -r header && echo "$header" # the first line
sort -k3 -nr | head -20 # all the rest
}
}
The command ps is very configurable. This two commands are almost equivalent, your selection by awk, and a configured ps format:
ps axu | awk '{print $2, $3, $4, $11}'
ps axopid,pcpu,pmem,comm
Where the user u formatting was replaced by an special format o pid,pcpu,pmem,cmd. Is is similar, not identical only because of the command name and some formatting. We will get to that a bit later.
If the command name is not a deal breaker, ps could even sort by some selected key with the k option, and selecting only 20 lines we get:
ps axopid,pcpu,pmem,comm k-pmem | head -20
Which replace all the selecting, sorting, and formatting of your initial command. That should be enough for all practical uses I think.
But if you do need an identical output as your original, we need to expand the command to show all args. Such output is very long for some commands and doesn't format well. Additionally, the awk processing you used could NOT be reproduced by plain ps. We need to cut the command part in the first space and, to get a better formatting we need some printf love.
All said, this gets exactly the same output (well, a bit better formatted):
ps axopid,pcpu,pmem,cmd k-pmem | head -20 | awk '
{gsub(/ .*/, "", $4); printf "%5s %4s %4s %-.50s\n", $1,$2,$3,$4}'
And, just making it a single line to make its use a little easier to copy/paste:
ps axopid,pcpu,pmem,cmd k-pmem | head -20 | awk '{gsub(/ .*/, "", $4); printf "%5s %4s %4s %-.50s\n", $1,$2,$3,$4}'
And so, the alias becomes only one line.
I hope you will be able to get the alias working.
I don't know where grep is coming from but your problem is that quotes don't nest like that.
When you stuck the single quoted awk scripts inside single quotes for the alias the quotes match-up incorrectly.
Replace each "inner" single quote with '\'' and it should work.
alias abc='ps axu | awk '\''{print $2, $3, $4, $11}'\'' | head -1 && ps axu | awk '\''{print $2, $3, $4, $11}'\'' | sort -k3 -nr |head -20'

Multisplitting in AWK

I would like to execute 2 splits using AWK (i have 2 fields separator), the String of data i'm working on would look like something like so:
data;digit&int&string&int&digit;data;digit&int&string&int&digit
As you can see the outer field separator is a semicolon, and the nested one is an ampersand.
What i'm doing with awk is (suppose that the String would be in a variable named test)
echo ${test} | awk '{FS=";"} {print $2}' | awk '{FS="&"} {print $3}'
This should catch the "String" word, but for some reason this is not working.
It seems like the second pipe its not being applied, as i see only the result of the first awk function
Any advice?
use awk arrays
echo $test | awk -F';' '{split($2, arr, "&"); print(arr[3])}'
The other answers give working solutions, but they don't really explain the problem.
The problem is that setting FS inside a regular { ... } block the awk script won't cause $1, $2, etc. to be re-calculated for the current line; so FS will be set for any later lines, but the very first line will already have been split by whitespace. To set FS before running the script, you can use a BEGIN block (which is run before the first line); or, you can use the -F command-line option.
Making either of those changes will fix your command:
echo "$test" | awk 'BEGIN{FS=";"} {print $2}' | awk 'BEGIN{FS="&"} {print $3}'
echo "$test" | awk -F';' '{print $2}' | awk -F'&' '{print $3}'
(I also took the liberty of wrapping $test in double-quotes, since unquoted parameter-expansions are a recipe for trouble. With your value of $test it would have been fine, but I make it a habit to always use double-quotes, just in case.)
Try that :
echo "$test" | awk -F'[;&]' '{print $4}'
I specify a multiple separator in -F'[;&]'

Resources