Linux CSV - Add a colum from a CSV file to another CSV File - linux

I'm struggling to create a CSV file from two other ones
Here's what I need
File I want (lot of others lines)
"AB";"A";"B";"C";"D";"E"
Files I have:
File 1:
"A";"B";"C";"D";"E"
File 2:
"AB";"C";"D";"E"
How can I simply add "AB" from File to the 1st position of 1st one, adding one ";" ?
Thanks for your help

You can use awk as below. This assumes that you have only ; character as the field separator. And it is not used anywhere else in the CSV file.
$ awk -F\; '{print $2}' file.csv

Related

Split flat file and add delimiter in Linux

I would like how to improve a code that I have.
My shell script reads a flat file, and split it in two files based on first char of each line, header and detail. For header the first char is 1 and for detail is 2. Splitted files does not include the firts char.
Header is delimited by "|", and detail is fixed-width, so, I add the delimiter to it alter.
What I want is to do this in one single awk, to avoid creating a tmp file.
For splitting file I use and awk command, and for adding delimiter another awk command.
This is what I have now:
Input=Input.txt
Header=Header.txt
DetailTmp=DetailTmp.txt
Detail=Detail.txt
#First I split in two files and remove first char
awk -v vFileHeader="$Header" -v vFileDetail="$DetailTmp" '/^1/ {f=vFileHeader} /^2/ {f=vFileDetail} {sub(/^./,""); print > f}' $Input
#Then, I add the delimiter to detail
awk '{OFS="|"};{print substr($1,1,10),substr($1,11,5),substr($1,16,2),substr($1,18,14),substr($1,32,4),substr($1,36,18),substr($1,54,1)}' $DetailTmp > $Detail
Any suggestion?
Input.txt file
120190301|0170117174|FRANK|DURAND|USA
2017011717400052082911070900000000000000000000091430200
120190301|0170117204|ERICK|SMITH|USA
2017011720400052082911070900000000000000000000056311910
Header.txt splitted
20190301|0170117174|FRANK|DURAND|USA
20190301|0170117204|ERICK|SMITH|USA
DetailTmp.txt splitted
017011717400052082911070900000000000000000000091430200
017011720400052082911070900000000000000000000056311910
017011727100052052911070900000000000000000000008250000
017011718200052082911070900000000000000000000008102500
017011726300052052911070900000000000000000000008250000
Detail.txt desired
0170117174|00052|08|29110709000000|0000|000000000009143020|0
0170117204|00052|08|29110709000000|0000|000000000005631191|0
0170117271|00052|05|29110709000000|0000|000000000000825000|0
0170117182|00052|08|29110709000000|0000|000000000000810250|0
0170117263|00052|05|29110709000000|0000|000000000000825000|0
just combine the scripts
$ awk -v OFS='|' '/^1/{print substr($0,2) > "header"}
/^2/{print substr($0,2,10),substr($0,11,5),... > "detail"}' file
however, you may be better off, using FIELDWIDTHS on the detail file on the second pass.

How to split a file into two files based on a pattern?

In a file in Linux I have the following
123_test
234_test
abc_rest
cde_rest
and so on
Now I want to get two files in Linux. One which contains only records like below
123_test
234_test
and 2nd file like below
abc_rest
cde_rest
I want to split files based on what comes after _ like _test or _rest
Edited:
123_test
234_test
abc_rest
cde_rest
456_test
fgh_rest
How can I achieve that in Linux?
Can we use split function for this?
You can use this single awk command for splitting:
awk '{ print > (/_test$/ ? "file1" : "file2") }' file
This awk command will copy all lines that start with digits to file1 and remaining lines to file2.

Bash Script to export first columns in txt file to excel with Header

I would like to export first column of my txt file into excel along with the user defined header
Currently my txt file has following information
667869 667869
580083 580083
316133 316133
9020 9020
and i would like to export it to excel with my own header, how could i possibly do it in bash script?.
Using a for loop with sed maybe this will help:
for file in /path/to/folder/*.txt ; do
bname=$(basename $file)
pref=${bname%%.txt}
sed -i '1iCOL1,COL2' $file
done
This will add a header COL1,COL2 to each .txt file in the directory
You can do something along these lines:
awk -v header="Col_1" 'NR==1 {print header} {print $1}' file
That is assuming that the separator between columns is a space and the fields do not have spaces.

Generate record of files which have been removed by grep as a secondary function of primary command

I asked a question here to remove unwanted lines which contained strings which matched a particular pattern:
Remove lines containg string followed by x number of numbers
anubhava provided a good line of code which met my needs perfectly. This code removes any line which contains the string vol followed by a space and three or more consecutive numbers:
grep -Ev '\bvol([[:blank:]]+[[:digit:]]+){2}' file > newfile
The command will be used on a fairly large csv file and be initiated by crontab. For this reason, I would like to keep a record of the lines this command is removing, just so I can go back to check the correct data that is being removed- I guess it will be some sort of log containing the name sof the lines that did not make the final cut. How can I add this functionality?
Drop grep and use awk instead:
awk '/\<vol([[:blank:]]+[[:digit:]]+){2}/{print >> "deleted"; next} 1' file
The above uses GNU awk for word delimiters (\<) and will append every deleted line to a file named "deleted". Consider adding a timestamp too:
awk '/\<vol([[:blank:]]+[[:digit:]]+){2}/{print systime(), $0 >> "deleted"; next} 1' file

Read a CSV file in bash

I have a requirement to read the CSV file in shell, Well I am ok with the CSV file having single line in a cell. But if we have multiple lines in cell of CSV file then I am unable to delimit the the CSV file.
Filename Lines
/etc/hosts example.test.com
example2.test.com
/etc/resolv.conf nameserver dns.test.com
search test.com
I will take input from the user in a CSV file and have to add the given lines to the mentioned files. Here there are multiple lines in each cell of a CSV file and If I try to cat the file it is giving in a different order.
[user2#mon ~]$ cat test2.csv
"Filename","Lines"
"/etc/hosts","example.test.com"
,"example2.test.com"
"/etc/resolv.conf","nameserver dns.test.com"
,"search test.com"
Is there any way we can read the multiple lines from that file and number of lines is not same in all the time.
This might be what you're after:
awk -F, '{ sub(/^"/, "", $1); sub(/"$/, "", $1);
sub(/^"/, "", $2); sub(/"$/, "", $2);
printf "%-20s %s\n", $1, $2;
}'
It may well be possible to compress the substitute operations if you spend more time manual bashing. This is fragile as a solution (most solutions not using code specialized for dealing with CSV format are fragile); it fails horribly if a comma appears inside any of the quote-enclosed fields.
Applied to your data, it yields:
Filename Lines
/etc/hosts example.test.com
example2.test.com
/etc/resolv.conf nameserver dns.test.com
search test.com
Other possible tools to manipulate CSV format data reliably include:
Perl plus Text::CSV module
csvfix.
If this is not what you are looking for, please clarify the question.
Assuming your input is as basic as your example, you might be able to get away with simply doing:
sed 's/^,/ ,/' test2.csv | tr -d \" | column -s, -t

Resources