I have a big file like this:
79597700
79000364
79002794
79002947
And other big file like this:
79597708|11
79000364|12
79002794|11
79002947|12
79002940|12
Then i need the numbers that appear in the second file that are in the first file bur with the second column, something like:
79000364|12
79002794|11
79002947|12
79002940|12
(The MSISDN that appear in the first file and appear in the second file, but i need return the two columns of the second file)
Who can help me, because whit a grep does not work to me because return only the MSISDN without the second column
and with a comm is not possible because each row is different in the files
Try this:
grep -f bigfile1 bigfile2
Using awk:
awk -F"|" 'FNR==NR{f[$0];next}($1 in f)' file file2
Source: return common fields in two files
Related
I have two files with 4 columns each. I am trying to compare the second column from the first file with the second column from the second file. I have managed to check on some websites how to do it, and it works, but I have a problem printing a new file containing the whole second file and 3rd and 4nd column from the first file. I have tried to use such a syntax:
awk 'NR==FNR{label[$2]=$2;date[$2]=$3;date[$2]=$4;next}; ($2==label[$2]){print $0" "date[$2]}' file1 file2
I was only able to add the 4th column from the first file. Where do I make a mistake?
Could you please try following, since no samples are given so not tested it.
awk 'NR==FNR{label[$2]=$2;date[$2]=$3;date[$2]=$3 OFS $4;next}; ($2==label[$2]){print $0" "date[$2]}' file1 file2
Basically you need to change from date[$2]=$4 TO date[$2]=$3 OFS $4 to get both 3rd and 4th field in the output.
I have two excel file with common headings "StudentID" and "StudentName" in both of the excel files. I want to merge these two excel files in to a third excel containing all the records from the two excel along with the common heading. How can i do the same through linux commands.
I assumed it was csv files as it would be way more complicated with .xlsx files
cp first_file.csv third_file.csv
tail -n +2 second_file.csv >> third_file.csv
First line copies your first file into a new file called third_file.csv. Second line fills the new file with the content of the second file starting from the second line (escapes header).
Due to your requirement to do this with "Linux commands" I assume that you have two CSV files rather than XLSX files.
If so, the Linux join command is a good fit for a problem like this.
Imagine your two files are:
# file1.csv
Student ID,Student Name,City
1,John Smith,London
2,Arthur Dent,Newcastle
3,Sophie Smith,London
and:
# file2.csv
Student ID,Student Name,Subjects
1,John Smith,Maths
2,Arthur Dent,Philosophy
3,Sophie Smith,English
We want to do an equality join on the Student ID field (or we could use Student Name, it doesn't matter since both are common to each).
We can do this using the following command:
$ join -1 1 -2 1 -t, -o 1.1,1.2,1.3,2.3 file1.csv file2.csv
Student ID,Student Name,City,Subjects
1,John Smith,London,Maths
2,Arthur Dent,Newcastle,Philosophy
3,Sophie Smith,London,English
By way of explanation, this join command written as SQL would be something like:
SELECT `Student ID`, `Student Name`, `City`, `Subjects`
FROM `file1.csv`, `file2.csv`
WHERE `file1.Student ID` = `file2.Student ID`
The options to join mean:
The "SELECT" clause:
-o 1.1,1.2,1.3,2.3 means select the first file's first field, first file's second field, first file's third field,second file's third field.
The "FROM" clause:
file1.csv file2.csv, i.e. the two filename arguments passed to join.
The "WHERE" clause:
-1 1 means join from the 1st field from the Left table
-2 1 means join to the 1st field from the Right table (-1 = Left; -2 = Right)
Also:
-t, tells join to use the comma as the field separator
#Corentin Limier Thanks for the answer.
Was able to achieve the same through similar way below.
Let's say two files a.xls,b.xls and want to merge the same into the third file c.xls
cat a.xls > c.xls && tail -n +2 b.xls >> c.xls
I have a document (.txt) composed like that.
info1: info2: info3: info4
And I want to show some information by column.
For example, I have some different information in "info3" shield, I want to see only the lines who are composed by "test" in "info3" column.
I think I have to use sort but I'm not sure.
Any idea ?
The previous answers are assuming that the third column is exactly equal to test. It looks like you were looking for columns where the value included test. We need to use awk's match function
awk -F: 'match($3, "test")' file
You can use awk for this. Assuming your columns are de-limited by : and column 3 has entries having test, below command lists only those lines having that value.
awk -F':' '$3=="test"' input-file
Assuming that the spacing is consistent, and you're looking for only test in the third column, use
grep ".*:.*: test:.*" file.txt
Or to take care of any spacing that might occur
grep ".*:.*: *test *:.*" file.txt
I want to compare the lenght of fields of a csv file that contain 2columns, and keep only the lines in witch the lenght of field in the second column exceeds the one in the first column for example if I have the following csv file
ABRTYU;ABGTYUI
GHYUI;GTYIOKJ
RTYUIOJ;GHYU
I want to get as result
ABRTYU;ABGTYUI
GHYUI;GTYIOKJ
Blockquote
like this?
kent$ awk -F';' 'length($2)>length($1)' file
ABRTYU;ABGTYUI
GHYUI;GTYIOKJ
I have a requirment where, I get files from source with different number of delimeter data, i need to make them to one standard number of delimeted data.
source file1:
AA,BB,CC,0,0
AC,BD,DB,1,0
EE,ER,DR,0,0
What i want to do is appened an extra 3 zeros at the end for each row
AA,BB,CC,0,0,0,0,0
AC,BD,DB,1,0,0,0,0
EE,ER,DR,0,0,0,0,0
The source file always contains less number of column data . Can anyone help on this.
Thanks In Advance
Try this, it will add particular string after each line of mentioned file
sed '1,$ s/$/,0,0,0/' infile > outfile
Here is what I tried;
sed can do it in place with the -i flag
sed -i "s/$/,0,0,0/g" file