Finding common columns position between files

Finding common columns position between files - linux

I need to find the difference between two files in Unix,
File 1:
1,column1
2,column2
3,column3
File 2:
1,column1
2,column3
3,column5
I need to find the position of common column in file 2 from file 1
If there is no matching column in file1 some default index value and column name should return.
Output:
1,column1
3,column3
-1,column5
Can anyone help me to get in Unix script ?
Thanks,
William R

awk:
awk -F, 'NR==FNR{a[$2]=1; next;} ($2 in a)' file2 file1
grep+process substitution:
grep -f <(cut -d, -f2 file2) file1
EDIT for updated question:
awk:
awk -F, 'NR==FNR{a[$2]=$1;next} {if ($2 in a) print a[$2]","$2; else print "-1," $2}' file1 file2
# if match found in file1, print the index, else print -1
# (Also note that the input file order is reversed in this command, compared to earlier awk.)
grep:
cp file1 tmpfile #get original file
grep -f <(cut -d, -f2 file1) -v f2 | sed 's/.*,/-1,/' >> tmpfile #append missing entries
grep -f <(cut -d, -f2 file2) tmpfile # grep in this tmpfile

Related

need to make username:plain from hash:plain username:hash

File A contains hash:plain
File B contains username:hash
needed output username:plain
Any way to do that in shell?

Use command substitution with cut:
echo $(cut -d: -f1 B):$(cut -d: -f2 A)

Assuming the files aren't in the same order, that there are multiple lines per file, and you want lines with the same hash to be paired, a few ways:
$ join -11 -22 -t: -o 2.1,1.2 <(sort -k1,1 -t: filea) <(sort -k2,2 -t: fileb)
(Requires bash, zsh, ksh93, or another shell that understands <() redirection)
or
$ awk -F: -v OFS=: 'NR == FNR { hashes[$1] = $2; next }
$2 in hashes { print $1, hashes[$2] }' filea fileb

Matching third field in a CSV with pattern file in GNU Linux (AWK/SED/GREP)

I need to print all the lines in a CSV file when 3rd field matches a pattern in a pattern file.
I have tried grep with no luck because it matches with any field not only the third.
grep -f FILE2 FILE1 > OUTPUT
FILE1
dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567
asdasd,0,85249,1,lkjiou,52874
dasdas,1,48555,0,gfdkjh,06793
sadsad,0,98745,1,gfdkjh,45346
asdasd,1,56321,0,gfdkjh,47832
FILE2
00567
98745
45486
54543
48349
96349
56485
19615
56496
39493
RIGHT OUTPUT
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346
WRONG OUTPUT
dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567 <---- I don't want this to appear
sadsad,0,98745,1,gfdkjh,45346
I have already searched everywhere and tried different formulas.
EDIT: thanks to Wintermute, I managed to write something like this:
csvquote file1.csv > file1.csv
awk -F '"' 'FNR == NR { patterns[$0] = 1; next } patterns[$6]' file2.csv file1.csv | csvquote -u > result.csv
Csvquote helps parsing CSV files with AWK.
Thank you very much everybody, great community!

With awk:
awk -F, 'FNR == NR { patterns[$0] = 1; next } patterns[$3]' file2 file1
This works as follows:
FNR == NR { # when processing the first file (the pattern file)
patterns[$0] = 1 # remember the patterns
next # and do nothing else
}
patterns[$3] # after that, select lines whose third field
# has been seen in the patterns.

Using grep and sed:
grep -f <( sed -e 's/^\|$/,/g' file2) file1
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346
Explanation:
We insert a coma at the beginning and at the end of file2, but without changing the file, then we just grep as you were already doing.

This can be a start
for i in $(cat FILE2);do cat FILE1| cut -d',' -f3|grep $i ;done

sed 's#.*#/^[^,]*,[^,]*,&,/!d#' File2 >/tmp/File2.sed && sed -f /tmp/File2.sed FILE1;rm /tmp/File2.sed
hard in a simple sed like awk can do but should work if awk is not available
same with egrep (usefull on huge file)
sed 's#.*#^[^,]*,[^,]*,&,#' File2 >/tmp/File2.egrep && egrep -f /tmp/File2.egrep FILE1;rm /tmp/File2.egrep

awk - how to delete first column with field separator

I have a csv file with data presented as follows
87540221|1356438283301|1356438284971|1356438292151697
87540258|1356438283301|1356438284971|1356438292151697
87549647|1356438283301|1356438284971|1356438292151697
I'm trying to save the first column to a new file (without field separator , and then delete the first column from the main csv file along with the first field separator.
Any ideas?
This is what I have tried so far
awk 'BEGIN{FS=OFS="|"}{$1="";sub("|,"")}1'
but it doesn't work

This is simple with cut:
$ cut -d'|' -f1 infile
87540221
87540258
87549647
$ cut -d'|' -f2- infile
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
Just redirect into the file you want:
$ cut -d'|' -f1 infile > outfile1
$ cut -d'|' -f2- infile > outfile2 && mv outfile2 file

Assuming your original CSV file is named "orig.csv":
awk -F'|' '{print $1 > "newfile"; sub(/^[^|]+\|/,"")}1' orig.csv > tmp && mv tmp orig.csv

GNU awk
awk '{$1="";$0=$0;$1=$1}1' FPAT='[^|]+' OFS='|'
Output
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697

Pipe is special regex symbol and sub function expectes you to pass a regex. Correct awk command should be this:
awk 'BEGIN {FS=OFS="|"} {$1=""; sub(/\|/, "")}'1 file
OUTPUT:
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697
1356438283301|1356438284971|1356438292151697

With sed :
sed 's/[^|]*|//' file.txt

How to count number of tabs in each line using shell script?

I need to write a script to count the number of tabs in each line of a file and print the output to a text file (e.g., output.txt).
How do I do this?

awk '{print gsub(/\t/,"")}' inputfile > output.txt

If you treat \t as the field delimiter, there will be one fewer \t than fields on each line:
awk -F'\t' '{ print NF-1 }' input.txt > output.txt

sed 's/[^\t]//g' input.txt | awk '{ print length }' > output.txt
Based on this answer.

This will give the total number of tabs in file:
od -c infile | grep -o "\t" | wc -l > output.txt
This will give you number of tabs line by line:
awk '{print gsub(/\t/,"")}' infile > output.txt

Merge two text files specific position

I need to merge two files with a Bash script.
File_1.txt
TEXT01 TEXT02 TEXT03 TEXT04
TEXT05 TEXT06 TEXT07 TEXT08
TEXT09 TEXT10 TEXT11 TEXT12
File_2.txt
1993.0
1994.0
1995.0
Result.txt
TEXT01 TEXT02 1993.0 TEXT03 TEXT04
TEXT05 TEXT06 1994.0 TEXT07 TEXT08
TEXT09 TEXT10 1995.0 TEXT11 TEXT12
File_2.txt need to be merged at this specific position. I have tried different solutions with multiple do while loops, but they have not been working so far..

awk '{
getline s3 < "file1"
printf "%s %s %s ",$1,$2,s3
for(i=3;i<=NF;i++){
printf "%s ",$i
}
print ""
}END{close(s3)}' file
output
# more file
TEXT01 TEXT02 TEXT03 TEXT04
TEXT05 TEXT06 TEXT07 TEXT08
TEXT09 TEXT10 TEXT11 TEXT12
$ more file1
1993.0
1994.0
1995.0
$ ./shell.sh
TEXT01 TEXT02 1993.0 TEXT03 TEXT04
TEXT05 TEXT06 1994.0 TEXT07 TEXT08
TEXT09 TEXT10 1995.0 TEXT11 TEXT12

Why, use cut and paste, of course! Give this a try:
paste -d" " <(cut -d" " -f 1-2 File_1.txt) File_2.txt <(cut -d" " -f 3-4 File_1.txt)

This was inspirated by Dennis Williamson's answer so if you like it give there a +1 too!
paste test1.txt test2.txt | awk '{print $1,$2,$5,$3,$4}'

This is a solution without awk.
The interesting is how to use the file descriptors in bash.
#!/bin/sh
exec 5<test2.txt # open file descriptor 5
cat test1.txt | while read ln
do
read ln2 <&5
#change this three lines as you wish:
echo -n "$(echo $ln | cut -d ' ' -f 1-2) "
echo -n "$ln2 "
echo $ln | cut -d ' ' -f 3-4
done
exec 5>&- # Close fd 5

Since the question was tagged with 'sed', here's a variant of Vereb's answer using sed instead of awk:
paste File_1.txt File_2.txt | sed -r 's/( [^ ]* [^ ]*)\t(.*)/ \2\1/'
Or in pure sed ... :D
sed -r '/ /{H;d};G;s/^([^\n]*)\n*([^ ]* [^ ]*)/\2 \1/;P;s/^[^\n]*\n//;x;d' File_1.txt File_2.txt

Using perl, give file1 and file2 as arguments to:
#/usr/local/bin/perl
open(TXT2, pop(#ARGV));
while (<>) {
chop($m = <TXT2>);
s/^((\w+\s+){2})/$1$m /;
print;
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Finding common columns position between files - linux

Related

need to make username:plain from hash:plain username:hash

Matching third field in a CSV with pattern file in GNU Linux (AWK/SED/GREP)

awk - how to delete first column with field separator

How to count number of tabs in each line using shell script?

Merge two text files specific position

Categories

Resources