Set an external variable in awk

Set an external variable in awk - linux

I have written a script in which I want to count the number of columns in data.txt . My problem is I am unable to set the x in awk script.
Any help would be highly appreciated.
while read p; do
x=1;
echo $p | awk -F' ' '{x=NF}'
echo $x;
file="$x"".txt";
echo $file;
done <$1
data.txt file:
4495125 94307025 giovy115p#live.it 94307025.094307025 12443
stazla deva1a23#gmail.com 1992/.:\1
1447585 gioao_87#hotmail.it h1st#1
saknit tomboro#seznam.cz 1233 1990
Expected output:
5.txt
3.txt
3.txt
4.txt
My output:
1.txt
1.txt
1.txt
1.txt

You just cannot import variable set in Awk to a shell context. In your example the value set inside x containing NF will be not reflected outside.
Either you need to use command substitution($(..)) syntax to get the value of NF and use it later
x=$(echo "$p" | awk '{print NF}')
Now x will contain the column count in each of the line. Note that you don't need to use -F' ' which is the default de-limiter in awk.
Besides your requirement can be fully done in Awk itself.
awk 'NF{print NF".txt"}' file
Here the NF{..} is to ensure that the actions inside {..} are applied only to non-empty rows. The for each row we print the length and append the extension .txt along with it.

Awk processes a line at a time -- processing each line in a separate Awk script inside a shell while read loop is horrendously inefficient. See also https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice
Maybe something like this:
awk '{ print >(NF ".txt") }' data.txt
to create a file with the five-column rows in 5.txt, the four-column ones in 4.txt, the three-column rows in 2.txt, etc for each unique column count.
The Awk variable NF contains the number of fields (by default, Awk splits fields on runs of whitespace -- use -F to change to some other separator) and the expression (NF ".txt") simply produces a string catenation of the number of fields with the suffix .txt which we pass as a file name to the print redirection.

With bash:
while read p; do p=($p); echo "${#p[#]}.txt"; done < file
or shorter:
while read -a p; do echo "${#p[#]}.txt"; done < file
Output:
5.txt
3.txt
3.txt
4.txt

Related

Select subdomains using print command

cat a.txt
a.b.c.d.e.google.com
x.y.z.google.com
rev a.txt | awk -F. '{print $2,$3}' | rev
This is showing:
e google
x google
But I want this output
a.b.c.d.e.google
b.c.d.e.google
c.d.e.google
e.google
x.y.z.google
y.z.google
z.google

With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
BEGIN{
FS=OFS="."
}
{
nf=NF
for(i=1;i<(nf-1);i++){
print
$1=""
sub(/^[[:space:]]*\./,"")
}
}
' Input_file

Here is one more awk solution:
awk -F. '{while (!/^[^.]+\.[^.]+$/) {print; sub(/^[^.]+\./, "")}}' file
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com

Using sed
$ sed -En 'p;:a;s/[^.]+\.(.*([^.]+\.){2}[[:alpha:]]+$)/\1/p;ta' input_file
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com

Using bash:
IFS=.
while read -ra a; do
for ((i=${#a[#]}; i>2; i--)); do
echo "${a[*]: -i}"
done
done < a.txt
Gives:
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
(I assume the lack of d.e.google.com in your expected output is typo?)

For a shorter and arguably simpler solution, you could use Perl.
To auto-split the line on the dot character into the #F array, and then print the range you want:
perl -F'\.' -le 'print join(".", #F[0..$#F-1])' a.txt
-F'\.' will auto-split each input line into the #F array. It will split on the given regular expression, so the dot needs to be escaped to be taken literally.
$#F is the number of elements in the array. So #F[0..$#F-1] is the range of elements from the first one ($F[0]) to the penultimate one. If you wanted to leave out both "google" and "com", you would use #F[0..$#F-2] etc.

Write a file using AWK on linux

I have a file that has several lines of which one line is
-xxxxxxxx()xxxxxxxx
I want to add the contents of this line to a new file
I did this :
awk ' /^-/ {system("echo" $0 ">" "newline.txt")} '
but this does not work , it returns an error that says :
Unnexpected token '('
I believe this is due to the () present in the line. How to overcome this issue?

You need to add proper spaces!
With your erronous awk ' /^-/ {system("echo" $0 ">" "newline.txt")} ', the shell command is essentially echo-xxxxxxxx()xxxxxxxx>newline.txt, which surely doesn't work. You need to construct a proper shell command inside the awk string, and obey awks string concatenation rules, i.e. your intended script should look like this (which is still broken, because $0 is not properly quoted in the resulting shell command):
awk '/^-/ { system("echo " $0 " > newline.txt") }'
However, if you really just need to echo $0 into a file, you can simply do:
awk '/^-/ { print $0 > "newline.txt" }'
Or even more simply
awk '/^-/' > newline.txt
Which essentially applies the default operation to all records matching /^-/, whereby the default operation is to print, which is short for neatly printing the current record, i.e. this script simply filters out the desired records. The > newline.txt redirection outside awk simply puts it into a file.

You don't need the system, echo commands, simply:
awk '/^-/ {print $1}' file > newfile
This will capture lines starting with - and truncate the rest if there's a space.
awk '/^-/ {print $0}' file > newfile
Would capture the entire line including spaces.
You could use grep also:
grep -o '^-.*' file > newfile
Captures any lines starting with -
grep -o '^-.*().*' file > newfile
Would be more specific and capture lines starting with - also containing ()

First of all for simple extraction of patterns from file, you do not need to use awk it is an overkill, grep would be more than enough for the task:
INPUT:
$ more file
123
-xxxxxxxx()xxxxxxxx
abc
-xyxyxxux()xxuxxuxx
123
abc
123
command:
$ grep -oE '^-[^(]+\(\).*' file
-xxxxxxxx()xxxxxxxx
-xyxyxxux()xxuxxuxx
explanations:
Option: -oE to define the output as the pattern and not the whole line (can be removed)
Regex: ^-[^(]+\(\).* will select lines that starts with - and contains ()
You can redirect your output to a new_file by adding > new_file at the end of your command.

Replace string in a file from a file [duplicate]

This question already has answers here:
Difference between single and double quotes in Bash
(7 answers)
Closed 5 years ago.
I need help with replacing a string in a file where "from"-"to" strings coming from a given file.
fromto.txt:
"TRAVEL","TRAVEL_CHANNEL"
"TRAVEL HD","TRAVEL_HD_CHANNEL"
"FROM","TO"
First column is what to I'm searching for, which is to be replaced with the second column.
So far I wrote this small script:
while read p; do
var1=`echo "$p" | awk -F',' '{print $1}'`
var2=`echo "$p" | awk -F',' '{print $2}'`
echo "$var1" "AND" "$var2"
sed -i -e 's/$var1/$var2/g' test.txt
done <fromto.txt
Output looks good (x AND y), but for some reason it does not replace the first column ($var1) with the second ($var2).
test.txt:
"TRAVEL"
Output:
"TRAVEL" AND "TRAVEL_CHANNEL"
sed -i -e 's/"TRAVEL"/"TRAVEL_CHANNEL"/g' test.txt
"TRAVEL HD" AND "TRAVEL_HD_CHANNEL"
sed -i -e 's/"TRAVEL HD"/"TRAVEL_HD_CHANNEL"/g' test.txt
"FROM" AND "TO"
sed -i -e 's/"FROM"/"TO"/g' test.txt
$ cat test.txt
"TRAVEL"

input:
➜ cat fromto
TRAVEL TRAVEL_CHANNEL
TRAVELHD TRAVEL_HD
➜ cat inputFile
TRAVEL
TRAVELHD
The work:
➜ awk 'BEGIN{while(getline < "fromto") {from[$1] = $2}} {for (key in from) {gsub(key,from[key])} print}' inputFile > output
and output:
➜ cat output
TRAVEL_CHANNEL
TRAVEL_CHANNEL_HD
➜
This first (BEGIN{}) loads your input file into an associate array: from["TRAVEL"] = "TRAVEL_HD", then rather inefficiently performs search and replace line by line for each array element in the input file, outputting the results, which I piped to a separate outputfile.
The caveat, you'll notice, is that the search and replaces can interfere with each other, the 2nd line of output being a perfect example since the first replacement happens. You can try ordering your replacements differently, or use a regex instead of a gsub. I'm not certain if awk arrays are guaranteed to have a certain order, though. Something to get you started, anyway.
2nd caveat. There's a way to do the gsub for the whole file as the 2nd step of your BEGIN and probably make this much faster, but I'm not sure what it is.

you can't do this oneshot you have to use variables within a script
maybe something like below sed command for full replacement
-bash-4.4$ cat > toto.txt
1
2
3
-bash-4.4$ cat > titi.txt
a
b
c
-bash-4.4$ sed 's|^\s*\(\S*\)\s*\(.*\)$|/^\2\\>/s//\1/|' toto.txt | sed -f - titi.txt > toto.txt
-bash-4.4$ cat toto.txt
a
b
c
-bash-4.4$

Print all columns except first using AWK

I have a file which contains file list. The file looks like this
$ cat filelist
D src/layouts/PersonAccount-Person Account Layout.layout
D src/objects/Case Account-Record List.object
I want to cut first two Columns and print only file names with along directory path. This list is dynamic. File name has spaces in between. So I can't use space as delimiter. How to get this using AWK command?
The output should be like this
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object

Can you try this once:
bash-4.4$ cat filelist |awk '{$1="";print $0}'
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object
else if you want to remove 2 columns it would be:
awk '{$1=$2="";print $0}'
This will produce the below output:
bash-4.4$ cat filelist |awk '{$1=$2="";print $0}'
Account Layout.layout
Account-Record List.object

Try this out:
awk -F" " '{$1=""; print $0}' filelist | sed 's/^ //c'
Here sed is used to remove the first space of the output line.

print only file names with along directory path
awk approach:
awk '{ sub(/^[[:space:]]*[^[:space:]][[:space:]]+/,"",$0) }1' filelist
The output:
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object
----------
To extract only basename of the file:
awk -F'/' '{print $NF}' filelist
The output:
PersonAccount-Person Account Layout.layout
Case Account-Record List.object

This will do exactly what you want for your example :
sed -E 's/(.*)([ ][a-zA-Z0-9]+\/[a-zA-Z0-9]+\/[a-zA-Z0-9. -]+)/\2/g' filelist
Explanation :
Its matching your path (including spaces if there were any ) and then replacing the whole line with that one match. Easy peasy lemon squeezy :)
Regards!

A simple grep
grep -o '[^[:blank:]]*/.*' filelist
That's zero or more non-blank characters followed by a slash followed by the rest of the string.
This will not match any lines that don't have a slash

Here is a portable POSIX shell solution:
#!/bin/sh
cat "$#" |while read line; do
echo "${line#* * }"
done
This loops over each line of the given input file(s) (or else standard input) and prints the line without the first two spaces or the text that exists before them. It is not greedy.
Unlike some of the other answers here, this will preserve spacing (if any) in the rest of the line.
If you want that as a one-liner:
while read L < filelist; do echo "${L#* * }"; done
This will fail if the uppermost directory's name starts with a space. To work around that, you need to peel away the leading ten characters (which I assume are static):
#!/bin/sh
cat "$#" |while read line; do
echo "${line#??????????}"
done
As a one-liner, in bash, this can be simplified by using substrings:
while read L < filelist; do echo "${L:10}"; done

how can i compare two text files which has multiple fields in unix

i have two text files
file 1
number,name,account id,vv,sfee,dac acc,TDID
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7
file 2
number,account id,dac acc,TDID
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1
i want to compare those two text files. if the four columns of file 2 is there in file 1 and equal means i want output like this
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' file2.txt file1.txt.. this works good for comparing two single column in two files. i want to compare multiple column. any one have suggestion?
EDIT: From the OP's comments:
nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' file2.txt file1.txt
.. this works good for comparing two single column in two files. i want to compare multiple column. you have any suggestion?

This awk one-liner works for multi-column on unsorted files:
awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' file1.txt file2.txt
In order for this to work, it is imperative that the first file used for input (file1.txt in my example) be the file that only has 4 fields like so:
file1.txt
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1
file2.txt
7000,john,2,0,0,1,6
7000,john,2,0,0,1,7
7000,john,2,0,0,1,8
7000,john,2,0,0,1,9
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
7003,mike,1,0,0,2,2
7003,mike,1,0,0,2,3
7003,mike,1,0,0,2,4
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7
Output
$ awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' file1.txt file2.txt
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
Alternatively, you could also use the following syntax which more closely matches the one in your question but is not very readable IMHO
awk -F, 'NR==FNR{a[$1,$2,$3,$4];next} ($1SUBSEP$3SUBSEP$6SUBSEP$7 in a)' file1.txt file2.txt

TxtSushi looks like what you want. It allows to work with CSV files using SQL.

It's not an elegant one-liner, but you could do it with perl.
#!/usr/bin/perl
open A, $ARGV[0];
while(split/,/,<A>) {
$k{$_[0]} = [#_];
}
close A;
open B, $ARGV[1];
while(split/,/,<B>) {
print join(',',#{$k{$_[0]}}) if
defined($k{$_[0]}) &&
$k{$_[0]}->[2] == $_[1] &&
$k{$_[0]}->[5] == $_[2] &&
$k{$_[0]}->[6] == $_[3];
}
close B;

Quick answer: Use cut to split out the fields you need and diff to compare the results.

Not really well tested, but this might work:
join -t, file1 file2 | awk -F, 'BEGIN{OFS=","} {if ($3==$8 && $6==$9 && $7==$10) print $1,$2,$3,$4,$6,$7}'
(Of course, this assumes the input files are sorted).

This is neither efficient nor pretty it will however get the job done. It is not the most efficient implementation as it parses file1 multiple times however it does not read the entire file into RAM either so has some benefits over the simple scripting approaches.
sed -n '2,$p' file1 | awk -F, '{print $1 "," $3 "," $6 "," $7 " " $0 }' | \
sort | join file2 - |awk '{print $2}'
This works as follows
sed -n '2,$p' file1 sends file1 to STDOUT without the header line
The first awk command prints the 4 "key fields" from file1 in the same format as they are in file2 followed by a space followed by the contents of file1
The sort command ensures that file1 is in the same order as file2
The join command joins file2 and STDOUT only writing records that have a matching record in file2
The final awk command prints just the original part of file1
In order for this to work you must ensure that file2 is sorted before running the command.
Running this against your example data gave the following result
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
EDIT
I note from your comments you are getting a sorting error. If this error is occuring when sorting file2 before running the pipeline command then you could split the file, sort each part and then cat them back together again.
Something like this would do that for you
mv file2 file2.orig
for i in 0 1 2 3 4 5 6 7 8 9
do
grep "^${i}" file2.orig |sort > file2.$i
done
cat file2.[0-9] >file2
rm file2.[0-9] file2.orig
You may need to modify the variables passed to for if your file is not distributed evenly across the full range of leading digits.

The statistical package R handles processing multiple csv tables really easily.
See An Intro. to R or R for Beginners.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Set an external variable in awk - linux

With bash: while read p; do p=($p); echo "${#p[#]}.txt"; done < file or shorter: while read -a p; do echo "${#p[#]}.txt"; done < file Output: 5.txt 3.txt 3.txt 4.txt

Related

Select subdomains using print command

Write a file using AWK on linux

Replace string in a file from a file [duplicate]

Print all columns except first using AWK

how can i compare two text files which has multiple fields in unix

Categories

Resources