AWK Search Fact itself in same File

AWK Search Fact itself in same File - linux

I have a file where
data is
90|123456|.. some more fields
90|654321|... some more fields
.... some more lines starting with 90
91|123456|.. some more fields
91|654321|... some more fields
.... some more lines starting with 91
92|123456|.. some more fields
92|654321|... some more fields
.... some more lines starting with 92
2nd Field is Key value for me
& it will have 90,91 & 92 values in start field
90|keyvalue will always be there
91|keyvalue .. not mendatory
92|keyvalue .. not mendatory
expected output is
90|keyvalue [Mendatory]
91|keyvalue --> print if exist in file
92|keyvalue --> print if exist in file
for all key values
what i did was
grep "^90" origfilename |awk -F '|' '{print $2}'> temp90.txt #this gives me all keyvalues
awk '{print "90|"$0"|"}' temp90.txt >> temp90-1.txt
awk '{print "91|"$0"|"}' temp90.txt >> temp90-1.txt
awk '{print "92|"$0"|"}' temp90.txt >> temp90-1.txt
grep -f temp90-1.txt origfilename
This gets me output But I think its not proper efficient way to do this
How to do this on single awk or other way

awk to the rescue!
$ awk -F'|' 'NR==FNR && /^90/ {k[$2]}
NR!=FNR && $2 in k{print}' file{,}
90|123456|.. some more fields
90|654321|... some more fields
91|123456|.. some more fields
91|654321|... some more fields
92|123456|.. some more fields
92|654321|... some more fields
Explanation In the first scan get the keys and in the second scan print the lines with a matching key. Note that file{,} is the same as file file for awk to double scan the input file.

Related

Joining consecutive lines using awk

How can i join consecutive lines into a single lines using awk? Actually i have this with my awk command:
awk -F "\"*;\"*" '{if (NR!=1) {print $2}}' file.csv
I remove the first line
44895436200043
38401951900014
72204547300054
38929771400013
32116464200027
50744963500014
i want to have this:
44895436200043 38401951900014 72204547300054 38929771400013 32116464200027 50744963500014
csv file

That's a job for tr:
# tail -n +2 prints the whole file from line 2 on
# tr '\n' ' ' translates newlines to spaces
tail -n +2 file | tr '\n' ' '
With awk, you can achieve this by changing the output record separator to " ":
# BEGIN{ORS= " "} sets the internal output record separator to a single space
# NR!=1 adds a condition to the default action (print)
awk 'BEGIN{ORS=" "} NR!=1' file

I assume you want to modify your existing awk, so that it prints a horizontal space separated list, instead of words, one per row.
You can replace the print $2 action in your command, you can do this:
awk -F "\"*;\"*" 'NR!=1{u=u s $2; s=" "} END {print u}' file.csv
or replace the ORS (output record separator)
awk -F "\"*;\"*" -v ORS=" " 'NR!=1{print $2}' file.csv
or pipe output to xargs:
awk -F "\"*;\"*" 'NR!=1{print $2}' file.csv | xargs

How to get 1st field of a file only when 2nd field matches a string?

How to get 1st field of a file only when 2nd field matches a given string?
#cat temp.txt
Ankit pass
amit pass
aman fail
abhay pass
asha fail
ashu fail
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'*
gives no output

Another syntax with awk:
awk '$2 ~ /^faild$/{print $1}' input_file
A deleted 'cat' command.
^ start string
$ end string
It's the best way to match patten.

Either:
Your fields are not tab-separated or
You have blanks at the end of the relevant lines or
You have DOS line-endings and so there are CRs at the end of every
line and so also at the end of every $2 in every line (see
Why does my tool output overwrite itself and how do I fix it?)
With GNU cat you can run cat -Tev temp.txt to see tabs (^I), CRs (^M) and line endings ($).

Your code seems to work fine when I remove the * at the end
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'
The other thing to check is if your file is using tab or spaces. My copy/paste of your data file copied spaces, so I needed this line:
cat temp.txt | awk '$2 == "fail" { print $1 }'
The other way of doing this is with grep:
cat temp.txt | grep fail$ | awk '{ print $1 }'

search for a string and after getting result cut that word and store result in variable

I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable
example:-
REFLIST_OP=/tmp/abc.lst
cat $REFLIST_OP
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
Desired Output:-
expdp_TEST_P119_*_18112017.dmp
I Have tried below command :-
FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'`
echo $FULL_DMP_NAME
/data/abc/GOon/expdp_TEST_P119_*_18112017.dmp

REFLIST_OP=/tmp/abc.lst
awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
Test Results:
$ REFLIST_OP=/tmp/abc.lst
$ cat "$REFLIST_OP"
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
$ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
expdp_TEST_P119_*_18112017.dmp
To save in variable
myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )

Following awk may help you on same.
awk -F'/| ' '{print $6}' Input_file
OR
awk -F'/| ' '{print $6}' "$REFLIST_OP"
Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP.
To see the field number and field's value you could use following command too:
awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"

Using sed with one of these regex
sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part.
sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped (
sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand)
myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar
If you want to avoid external command (sed)

Set an external variable in awk

I have written a script in which I want to count the number of columns in data.txt . My problem is I am unable to set the x in awk script.
Any help would be highly appreciated.
while read p; do
x=1;
echo $p | awk -F' ' '{x=NF}'
echo $x;
file="$x"".txt";
echo $file;
done <$1
data.txt file:
4495125 94307025 giovy115p#live.it 94307025.094307025 12443
stazla deva1a23#gmail.com 1992/.:\1
1447585 gioao_87#hotmail.it h1st#1
saknit tomboro#seznam.cz 1233 1990
Expected output:
5.txt
3.txt
3.txt
4.txt
My output:
1.txt
1.txt
1.txt
1.txt

You just cannot import variable set in Awk to a shell context. In your example the value set inside x containing NF will be not reflected outside.
Either you need to use command substitution($(..)) syntax to get the value of NF and use it later
x=$(echo "$p" | awk '{print NF}')
Now x will contain the column count in each of the line. Note that you don't need to use -F' ' which is the default de-limiter in awk.
Besides your requirement can be fully done in Awk itself.
awk 'NF{print NF".txt"}' file
Here the NF{..} is to ensure that the actions inside {..} are applied only to non-empty rows. The for each row we print the length and append the extension .txt along with it.

Awk processes a line at a time -- processing each line in a separate Awk script inside a shell while read loop is horrendously inefficient. See also https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice
Maybe something like this:
awk '{ print >(NF ".txt") }' data.txt
to create a file with the five-column rows in 5.txt, the four-column ones in 4.txt, the three-column rows in 2.txt, etc for each unique column count.
The Awk variable NF contains the number of fields (by default, Awk splits fields on runs of whitespace -- use -F to change to some other separator) and the expression (NF ".txt") simply produces a string catenation of the number of fields with the suffix .txt which we pass as a file name to the print redirection.

With bash:
while read p; do p=($p); echo "${#p[#]}.txt"; done < file
or shorter:
while read -a p; do echo "${#p[#]}.txt"; done < file
Output:
5.txt
3.txt
3.txt
4.txt

Add a variable to a column in a CSV file

I have a large file (~10GB) and I want to duplicate that file 10 times but each time add a variable to the first column:
for i in (1, 10):
var = (i-1) * 1000
# add var to the first column of the file and save the file as file(i).csv
So far I have tried:
#!/bin/bash
for i in {1..10}
do
t=1
j=$(( $i - t ))
s=1000
person_id=$(( j * add ))
awk -F"," 'BEGIN{OFS=","} NR>1{$1=$1+$person_id} {print $0}' file.csv > file$i.csv
done
but no change in column value.

Awk variables are different from shell variables.
Replace:
awk -F"," 'BEGIN{OFS=","} NR>1{$1=$1+$person_id} {print $0}' file.csv > file$i.csv
With:
awk -F"," -v id="$person_id" 'BEGIN{OFS=","} NR>1{$1=$1+id} {print $0}' file.csv > "file$i.csv"
This uses the -v option to define an awk variable id whose value is the value of the shell variable person_id.
Because , is not a shell-active character, the code can be simplified. Also, changing the location of the definition of OFS can further shorten the code:
awk -F, -v id="$person_id" 'NR>1{$1+=id} 1' OFS=, file.csv > "file$i.csv"
Lastly, we replaced {print $0} with the cryptic shorthand 1. (This works because awk interprets 1 as a logical condition which it evaluates to true and, since no action was supplied, awk will perform the default action which is to print the line.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

AWK Search Fact itself in same File - linux

Related

Joining consecutive lines using awk

How to get 1st field of a file only when 2nd field matches a string?

search for a string and after getting result cut that word and store result in variable

Set an external variable in awk

Add a variable to a column in a CSV file

Categories

Resources