Merging two excel file into a third one with the common heading - linux

I have two excel file with common headings "StudentID" and "StudentName" in both of the excel files. I want to merge these two excel files in to a third excel containing all the records from the two excel along with the common heading. How can i do the same through linux commands.

I assumed it was csv files as it would be way more complicated with .xlsx files
cp first_file.csv third_file.csv
tail -n +2 second_file.csv >> third_file.csv
First line copies your first file into a new file called third_file.csv. Second line fills the new file with the content of the second file starting from the second line (escapes header).

Due to your requirement to do this with "Linux commands" I assume that you have two CSV files rather than XLSX files.
If so, the Linux join command is a good fit for a problem like this.
Imagine your two files are:
# file1.csv
Student ID,Student Name,City
1,John Smith,London
2,Arthur Dent,Newcastle
3,Sophie Smith,London
and:
# file2.csv
Student ID,Student Name,Subjects
1,John Smith,Maths
2,Arthur Dent,Philosophy
3,Sophie Smith,English
We want to do an equality join on the Student ID field (or we could use Student Name, it doesn't matter since both are common to each).
We can do this using the following command:
$ join -1 1 -2 1 -t, -o 1.1,1.2,1.3,2.3 file1.csv file2.csv
Student ID,Student Name,City,Subjects
1,John Smith,London,Maths
2,Arthur Dent,Newcastle,Philosophy
3,Sophie Smith,London,English
By way of explanation, this join command written as SQL would be something like:
SELECT `Student ID`, `Student Name`, `City`, `Subjects`
FROM `file1.csv`, `file2.csv`
WHERE `file1.Student ID` = `file2.Student ID`
The options to join mean:
The "SELECT" clause:
-o 1.1,1.2,1.3,2.3 means select the first file's first field, first file's second field, first file's third field,second file's third field.
The "FROM" clause:
file1.csv file2.csv, i.e. the two filename arguments passed to join.
The "WHERE" clause:
-1 1 means join from the 1st field from the Left table
-2 1 means join to the 1st field from the Right table (-1 = Left; -2 = Right)
Also:
-t, tells join to use the comma as the field separator

#Corentin Limier Thanks for the answer.
Was able to achieve the same through similar way below.
Let's say two files a.xls,b.xls and want to merge the same into the third file c.xls
cat a.xls > c.xls && tail -n +2 b.xls >> c.xls

Related

Script that deletes any duplicate line and keeps the original order

Need to write a script that will process standard input and remove each duplicated line found till the end of the standard input. Each copy but also the first option of a linear duplicate will be deleted, regardless of whether there is the other one or scattered in the entry. The standard output displays only the lines (in the received order of entry) which had no duplicate entry.
For exemple we have the file test.txt containing the following :
Whatever
You
Want
You
To
Whatever
Have
Here
Output is supposed to have all the duplicated lines deleted and the order of the lines the same as input like this :
Want
To
Have
Here
Note that we don't know what the file contains (this is just an example). I tested out many commands but couldn't find one that works and respects the requirements.
IMPORTANT NOTE : I need all occurences of the line to be deleted,
not only all of them after the first one
I have no idea what 'Each copy but also the first option of a linear duplicate will be deleted' means, but I think you are just looking for:
awk '!a[$0]++'
or perhaps:
awk '!a[$1]++'
eg:
$ cat input
Whatever
You
Want
Whatever 1
You
To
Whatever 1
Have
Here
$ awk '!a[$0]++' input
Whatever
You
Want
Whatever 1
To
Have
Here
$ awk '!a[$1]++' input
Whatever
You
Want
To
Have
Here

Export results of the command tree in csv file with columns

I need to save the results of our directory structure, only folders, from the linux server and export the results in a csv file with columns.
What I tried and works best is tree -d /path/folder/ -L 3 > file.csv
I tried to combine with column but my knowledge is ye.. limited.
Best would be if I can list the first level of a directory in column A, second level in column B and the last one in column C.
Assuming you want to only list files that are three levels deep:
find . -maxdepth 3 -mindepth 3 | sed 's:./::;s:/:,:g' > file.csv
But I generally don't see any goods from trying to translate a file structure to a csv file. That doesn't seem to be any useful.

Combine first two columns of a single csv file into another column

So I have a large CSV file (in Gb) where I have multiple columns, the first two columns are :
Invoice number|Line Item Number
I want a unix / linux /ubuntu command which can merge this two columns and create a new column which is separated by separator ':', so for eg : If invoice number is 64789544 and Line Item Number is 234533, then my Merged value should be
64789544:234533
Can it really be achieved, If yes can the merged column is possible to be added back to the source csv file.
You can use the following sed command:
$ cat large.csv
Invoice number|Line Item Number|Other1|Other2
64789544|234533|abc|134
64744123|232523|cde|awc
$ sed -i.bak 's/^\([^|]*\)|\([^|]*\)/\1:\2/' large.csv
$ cat large.csv
Invoice number:Line Item Number|Other1|Other2
64789544:234533|abc|134
64744123:232523|cde|awc
Just be aware that it will take a backup of your input file just in case so you need to have enough space in your file system.
Explanations:
s/^\([^|]*\)|\([^|]*\)/\1:\2/ this command will replace the first two field of your CSV separated by | and will replace the separator by : using back references what will merge the 2 columns.
If you are sure about what you are doing, you can change -i.bak in -i to avoid taking a backup of the CSV file.
Perhaps with this simple sed
sed 's/|/:/' infile

joining two csv files based on a column

I have 2 csv files as follows
AllEmpployees.txt
EmpID,Name
QualifiedEmployeees.csv
Empid
Now i want to find names of qualified employees
Empid,Name
Am using following command
join -t , -1 1 -2 1 QualifiedEmployeees.csv AllEmployees.txt
This results in zero records.Am sure that there is a intersection of employeeids.
Reference : https://superuser.com/questions/26834/how-to-join-two-csv-files
Is it because qualified employees file has only one column and there is no delimiter?Or am i doing something wrong
Try this:
join -t "," <(dos2unix <QualifiedEmployeees.csv) <(dos2unix <AllEmpployees.txt)
If join is not working (not producing as many rows as you expect, or no rows at all), it is likely because your input is not sorted. From man join we see this:
When the default field delimiter characters are used, the files to be joined should be ordered in the
collating sequence of sort(1), using the -b option, on the fields on which they are to be joined, oth-
erwise join may not report all field matches. When the field delimiter characters are specified by the
-t option, the collating sequence should be the same as sort(1) without the -b option.
awk -F, 'FNR==NR{a[$1];next}($1 in a){print $2}' Qualiedemployees.txt allEmployees.txt

grep in two files returning two columns

I have a big file like this:
79597700
79000364
79002794
79002947
And other big file like this:
79597708|11
79000364|12
79002794|11
79002947|12
79002940|12
Then i need the numbers that appear in the second file that are in the first file bur with the second column, something like:
79000364|12
79002794|11
79002947|12
79002940|12
(The MSISDN that appear in the first file and appear in the second file, but i need return the two columns of the second file)
Who can help me, because whit a grep does not work to me because return only the MSISDN without the second column
and with a comm is not possible because each row is different in the files
Try this:
grep -f bigfile1 bigfile2
Using awk:
awk -F"|" 'FNR==NR{f[$0];next}($1 in f)' file file2
Source: return common fields in two files

Resources