cat a.txt
a.b.c.d.e.google.com
x.y.z.google.com
rev a.txt | awk -F. '{print $2,$3}' | rev
This is showing:
e google
x google
But I want this output
a.b.c.d.e.google
b.c.d.e.google
c.d.e.google
e.google
x.y.z.google
y.z.google
z.google
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
BEGIN{
FS=OFS="."
}
{
nf=NF
for(i=1;i<(nf-1);i++){
print
$1=""
sub(/^[[:space:]]*\./,"")
}
}
' Input_file
Here is one more awk solution:
awk -F. '{while (!/^[^.]+\.[^.]+$/) {print; sub(/^[^.]+\./, "")}}' file
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
Using sed
$ sed -En 'p;:a;s/[^.]+\.(.*([^.]+\.){2}[[:alpha:]]+$)/\1/p;ta' input_file
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
Using bash:
IFS=.
while read -ra a; do
for ((i=${#a[#]}; i>2; i--)); do
echo "${a[*]: -i}"
done
done < a.txt
Gives:
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
(I assume the lack of d.e.google.com in your expected output is typo?)
For a shorter and arguably simpler solution, you could use Perl.
To auto-split the line on the dot character into the #F array, and then print the range you want:
perl -F'\.' -le 'print join(".", #F[0..$#F-1])' a.txt
-F'\.' will auto-split each input line into the #F array. It will split on the given regular expression, so the dot needs to be escaped to be taken literally.
$#F is the number of elements in the array. So #F[0..$#F-1] is the range of elements from the first one ($F[0]) to the penultimate one. If you wanted to leave out both "google" and "com", you would use #F[0..$#F-2] etc.
I have a file that has several lines of which one line is
-xxxxxxxx()xxxxxxxx
I want to add the contents of this line to a new file
I did this :
awk ' /^-/ {system("echo" $0 ">" "newline.txt")} '
but this does not work , it returns an error that says :
Unnexpected token '('
I believe this is due to the () present in the line. How to overcome this issue?
You need to add proper spaces!
With your erronous awk ' /^-/ {system("echo" $0 ">" "newline.txt")} ', the shell command is essentially echo-xxxxxxxx()xxxxxxxx>newline.txt, which surely doesn't work. You need to construct a proper shell command inside the awk string, and obey awks string concatenation rules, i.e. your intended script should look like this (which is still broken, because $0 is not properly quoted in the resulting shell command):
awk '/^-/ { system("echo " $0 " > newline.txt") }'
However, if you really just need to echo $0 into a file, you can simply do:
awk '/^-/ { print $0 > "newline.txt" }'
Or even more simply
awk '/^-/' > newline.txt
Which essentially applies the default operation to all records matching /^-/, whereby the default operation is to print, which is short for neatly printing the current record, i.e. this script simply filters out the desired records. The > newline.txt redirection outside awk simply puts it into a file.
You don't need the system, echo commands, simply:
awk '/^-/ {print $1}' file > newfile
This will capture lines starting with - and truncate the rest if there's a space.
awk '/^-/ {print $0}' file > newfile
Would capture the entire line including spaces.
You could use grep also:
grep -o '^-.*' file > newfile
Captures any lines starting with -
grep -o '^-.*().*' file > newfile
Would be more specific and capture lines starting with - also containing ()
First of all for simple extraction of patterns from file, you do not need to use awk it is an overkill, grep would be more than enough for the task:
INPUT:
$ more file
123
-xxxxxxxx()xxxxxxxx
abc
-xyxyxxux()xxuxxuxx
123
abc
123
command:
$ grep -oE '^-[^(]+\(\).*' file
-xxxxxxxx()xxxxxxxx
-xyxyxxux()xxuxxuxx
explanations:
Option: -oE to define the output as the pattern and not the whole line (can be removed)
Regex: ^-[^(]+\(\).* will select lines that starts with - and contains ()
You can redirect your output to a new_file by adding > new_file at the end of your command.
I have a file which contains file list. The file looks like this
$ cat filelist
D src/layouts/PersonAccount-Person Account Layout.layout
D src/objects/Case Account-Record List.object
I want to cut first two Columns and print only file names with along directory path. This list is dynamic. File name has spaces in between. So I can't use space as delimiter. How to get this using AWK command?
The output should be like this
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object
Can you try this once:
bash-4.4$ cat filelist |awk '{$1="";print $0}'
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object
else if you want to remove 2 columns it would be:
awk '{$1=$2="";print $0}'
This will produce the below output:
bash-4.4$ cat filelist |awk '{$1=$2="";print $0}'
Account Layout.layout
Account-Record List.object
Try this out:
awk -F" " '{$1=""; print $0}' filelist | sed 's/^ //c'
Here sed is used to remove the first space of the output line.
print only file names with along directory path
awk approach:
awk '{ sub(/^[[:space:]]*[^[:space:]][[:space:]]+/,"",$0) }1' filelist
The output:
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object
----------
To extract only basename of the file:
awk -F'/' '{print $NF}' filelist
The output:
PersonAccount-Person Account Layout.layout
Case Account-Record List.object
This will do exactly what you want for your example :
sed -E 's/(.*)([ ][a-zA-Z0-9]+\/[a-zA-Z0-9]+\/[a-zA-Z0-9. -]+)/\2/g' filelist
Explanation :
Its matching your path (including spaces if there were any ) and then replacing the whole line with that one match. Easy peasy lemon squeezy :)
Regards!
A simple grep
grep -o '[^[:blank:]]*/.*' filelist
That's zero or more non-blank characters followed by a slash followed by the rest of the string.
This will not match any lines that don't have a slash
Here is a portable POSIX shell solution:
#!/bin/sh
cat "$#" |while read line; do
echo "${line#* * }"
done
This loops over each line of the given input file(s) (or else standard input) and prints the line without the first two spaces or the text that exists before them. It is not greedy.
Unlike some of the other answers here, this will preserve spacing (if any) in the rest of the line.
If you want that as a one-liner:
while read L < filelist; do echo "${L#* * }"; done
This will fail if the uppermost directory's name starts with a space. To work around that, you need to peel away the leading ten characters (which I assume are static):
#!/bin/sh
cat "$#" |while read line; do
echo "${line#??????????}"
done
As a one-liner, in bash, this can be simplified by using substrings:
while read L < filelist; do echo "${L:10}"; done
i have two text files
file 1
number,name,account id,vv,sfee,dac acc,TDID
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7
file 2
number,account id,dac acc,TDID
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1
i want to compare those two text files. if the four columns of file 2 is there in file 1 and equal means i want output like this
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' file2.txt file1.txt.. this works good for comparing two single column in two files. i want to compare multiple column. any one have suggestion?
EDIT: From the OP's comments:
nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' file2.txt file1.txt
.. this works good for comparing two single column in two files. i want to compare multiple column. you have any suggestion?
This awk one-liner works for multi-column on unsorted files:
awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' file1.txt file2.txt
In order for this to work, it is imperative that the first file used for input (file1.txt in my example) be the file that only has 4 fields like so:
file1.txt
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1
file2.txt
7000,john,2,0,0,1,6
7000,john,2,0,0,1,7
7000,john,2,0,0,1,8
7000,john,2,0,0,1,9
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
7003,mike,1,0,0,2,2
7003,mike,1,0,0,2,3
7003,mike,1,0,0,2,4
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7
Output
$ awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' file1.txt file2.txt
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
Alternatively, you could also use the following syntax which more closely matches the one in your question but is not very readable IMHO
awk -F, 'NR==FNR{a[$1,$2,$3,$4];next} ($1SUBSEP$3SUBSEP$6SUBSEP$7 in a)' file1.txt file2.txt
TxtSushi looks like what you want. It allows to work with CSV files using SQL.
It's not an elegant one-liner, but you could do it with perl.
#!/usr/bin/perl
open A, $ARGV[0];
while(split/,/,<A>) {
$k{$_[0]} = [#_];
}
close A;
open B, $ARGV[1];
while(split/,/,<B>) {
print join(',',#{$k{$_[0]}}) if
defined($k{$_[0]}) &&
$k{$_[0]}->[2] == $_[1] &&
$k{$_[0]}->[5] == $_[2] &&
$k{$_[0]}->[6] == $_[3];
}
close B;
Quick answer: Use cut to split out the fields you need and diff to compare the results.
Not really well tested, but this might work:
join -t, file1 file2 | awk -F, 'BEGIN{OFS=","} {if ($3==$8 && $6==$9 && $7==$10) print $1,$2,$3,$4,$6,$7}'
(Of course, this assumes the input files are sorted).
This is neither efficient nor pretty it will however get the job done. It is not the most efficient implementation as it parses file1 multiple times however it does not read the entire file into RAM either so has some benefits over the simple scripting approaches.
sed -n '2,$p' file1 | awk -F, '{print $1 "," $3 "," $6 "," $7 " " $0 }' | \
sort | join file2 - |awk '{print $2}'
This works as follows
sed -n '2,$p' file1 sends file1 to STDOUT without the header line
The first awk command prints the 4 "key fields" from file1 in the same format as they are in file2 followed by a space followed by the contents of file1
The sort command ensures that file1 is in the same order as file2
The join command joins file2 and STDOUT only writing records that have a matching record in file2
The final awk command prints just the original part of file1
In order for this to work you must ensure that file2 is sorted before running the command.
Running this against your example data gave the following result
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
EDIT
I note from your comments you are getting a sorting error. If this error is occuring when sorting file2 before running the pipeline command then you could split the file, sort each part and then cat them back together again.
Something like this would do that for you
mv file2 file2.orig
for i in 0 1 2 3 4 5 6 7 8 9
do
grep "^${i}" file2.orig |sort > file2.$i
done
cat file2.[0-9] >file2
rm file2.[0-9] file2.orig
You may need to modify the variables passed to for if your file is not distributed evenly across the full range of leading digits.
The statistical package R handles processing multiple csv tables really easily.
See An Intro. to R or R for Beginners.