Read a CSV file in bash - linux

I have a requirement to read the CSV file in shell, Well I am ok with the CSV file having single line in a cell. But if we have multiple lines in cell of CSV file then I am unable to delimit the the CSV file.
Filename Lines
/etc/hosts example.test.com
example2.test.com
/etc/resolv.conf nameserver dns.test.com
search test.com
I will take input from the user in a CSV file and have to add the given lines to the mentioned files. Here there are multiple lines in each cell of a CSV file and If I try to cat the file it is giving in a different order.
[user2#mon ~]$ cat test2.csv
"Filename","Lines"
"/etc/hosts","example.test.com"
,"example2.test.com"
"/etc/resolv.conf","nameserver dns.test.com"
,"search test.com"
Is there any way we can read the multiple lines from that file and number of lines is not same in all the time.

This might be what you're after:
awk -F, '{ sub(/^"/, "", $1); sub(/"$/, "", $1);
sub(/^"/, "", $2); sub(/"$/, "", $2);
printf "%-20s %s\n", $1, $2;
}'
It may well be possible to compress the substitute operations if you spend more time manual bashing. This is fragile as a solution (most solutions not using code specialized for dealing with CSV format are fragile); it fails horribly if a comma appears inside any of the quote-enclosed fields.
Applied to your data, it yields:
Filename Lines
/etc/hosts example.test.com
example2.test.com
/etc/resolv.conf nameserver dns.test.com
search test.com
Other possible tools to manipulate CSV format data reliably include:
Perl plus Text::CSV module
csvfix.
If this is not what you are looking for, please clarify the question.

Assuming your input is as basic as your example, you might be able to get away with simply doing:
sed 's/^,/ ,/' test2.csv | tr -d \" | column -s, -t

Related

Linux CSV - Add a colum from a CSV file to another CSV File

I'm struggling to create a CSV file from two other ones
Here's what I need
File I want (lot of others lines)
"AB";"A";"B";"C";"D";"E"
Files I have:
File 1:
"A";"B";"C";"D";"E"
File 2:
"AB";"C";"D";"E"
How can I simply add "AB" from File to the 1st position of 1st one, adding one ";" ?
Thanks for your help
You can use awk as below. This assumes that you have only ; character as the field separator. And it is not used anywhere else in the CSV file.
$ awk -F\; '{print $2}' file.csv

Bash Script to export first columns in txt file to excel with Header

I would like to export first column of my txt file into excel along with the user defined header
Currently my txt file has following information
667869 667869
580083 580083
316133 316133
9020 9020
and i would like to export it to excel with my own header, how could i possibly do it in bash script?.
Using a for loop with sed maybe this will help:
for file in /path/to/folder/*.txt ; do
bname=$(basename $file)
pref=${bname%%.txt}
sed -i '1iCOL1,COL2' $file
done
This will add a header COL1,COL2 to each .txt file in the directory
You can do something along these lines:
awk -v header="Col_1" 'NR==1 {print header} {print $1}' file
That is assuming that the separator between columns is a space and the fields do not have spaces.

Generate record of files which have been removed by grep as a secondary function of primary command

I asked a question here to remove unwanted lines which contained strings which matched a particular pattern:
Remove lines containg string followed by x number of numbers
anubhava provided a good line of code which met my needs perfectly. This code removes any line which contains the string vol followed by a space and three or more consecutive numbers:
grep -Ev '\bvol([[:blank:]]+[[:digit:]]+){2}' file > newfile
The command will be used on a fairly large csv file and be initiated by crontab. For this reason, I would like to keep a record of the lines this command is removing, just so I can go back to check the correct data that is being removed- I guess it will be some sort of log containing the name sof the lines that did not make the final cut. How can I add this functionality?
Drop grep and use awk instead:
awk '/\<vol([[:blank:]]+[[:digit:]]+){2}/{print >> "deleted"; next} 1' file
The above uses GNU awk for word delimiters (\<) and will append every deleted line to a file named "deleted". Consider adding a timestamp too:
awk '/\<vol([[:blank:]]+[[:digit:]]+){2}/{print systime(), $0 >> "deleted"; next} 1' file

grep from a input file, multiple lines while the input file has ^name

I would really appreciate some help with this:
I have a huge file, I will give you an example of how it is formatted:
name:lastname:email
I have a input file with lots of names set out like this example:
edward
michael
jenny
I want to match to name column from the huge file to the name in the input file, and only if it is an exact match (case insensitive)
Once it finds a match I want it to output a .txt with all of the matchs
I think I can use a command something like ^Michael: to give it.
Can anyone help me with this grep problem?
sorry if I am not too clear its very late and I have been on this problem for ages
"Centos 5, "grep -i -E -f file.txt /root/dir2search >out.txt"
file.txt containing
^michael:
^bobert:
^billy:
Doesn't find anything.
grep -i -E -f inputfile namesfile > outputfile will do what you want, if your input file consists of one input name per line, in the pattern you already suggested:
^Michael:
^Jane:
^Tom:
-i: case-insensitive matching
-E: regexp pattern matching (often the default, but I don't know how your environment is set up)
-f: read patterns from a file, one pattern per line
>: redirect the output to a file
To get the existing input file you described (space-separated names) into the new format, you could use:
sed -r 's/([^ ]+)[ $]?/^\1:\n/g;s/\n$//g' inputfile > newinputfile

fastest way convert tab-delimited file to csv in linux

I have a tab-delimited file that has over 200 million lines. What's the fastest way in linux to convert this to a csv file? This file does have multiple lines of header information which I'll need to strip out down the road, but the number of lines of header is known. I have seen suggestions for sed and gawk, but I wonder if there is a "preferred" choice.
Just to clarify, there are no embedded tabs in this file.
If you're worried about embedded commas then you'll need to use a slightly more intelligent method. Here's a Python script that takes TSV lines from stdin and writes CSV lines to stdout:
import sys
import csv
tabin = csv.reader(sys.stdin, dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in tabin:
commaout.writerow(row)
Run it from a shell as follows:
python script.py < input.tsv > output.csv
If all you need to do is translate all tab characters to comma characters, tr is probably the way to go.
The blank space here is a literal tab:
$ echo "hello world" | tr "\\t" ","
hello,world
Of course, if you have embedded tabs inside string literals in the file, this will incorrectly translate those as well; but embedded literal tabs would be fairly uncommon.
perl -lpe 's/"/""/g; s/^|$/"/g; s/\t/","/g' < input.tab > output.csv
Perl is generally faster at this sort of thing than the sed, awk, and Python.
If you want to convert the whole tsv file into a csv file:
$ cat data.tsv | tr "\\t" "," > data.csv
If you want to omit some fields:
$ cat data.tsv | cut -f1,2,3 | tr "\\t" "," > data.csv
The above command will convert the data.tsv file to data.csv file containing only the first three fields.
sed -e 's/"/\\"/g' -e 's/<tab>/","/g' -e 's/^/"/' -e 's/$/"/' infile > outfile
Damn the critics, quote everything, CSV doesn't care.
<tab> is the actual tab character. \t didn't work for me. In bash, use ^V to enter it.
#ignacio-vazquez-abrams 's python solution is great! For people who are looking to parse delimiters other tab, the library actually allows you to set arbitrary delimiter. Here is my modified version to handle pipe-delimited files:
import sys
import csv
pipein = csv.reader(sys.stdin, delimiter='|')
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in pipein:
commaout.writerow(row)
assuming you don't want to change header and assuming you don't have embedded tabs
# cat file
header header header
one two three
$ awk 'NR>1{$1=$1}1' OFS="," file
header header header
one,two,three
NR>1 skips the first header. you mentioned you know how many lines of header, so use the correct number for your own case. with this, you also do not need to call any other external commands. just one awk command does the job.
another way if you have blank columns and you care about that.
awk 'NR>1{gsub("\t",",")}1' file
using sed
sed '2,$y/\t/,/' file #skip 1 line header and translate (same as tr)
You can also use xsv for this
xsv input -d '\t' input.tsv > output.csv
In my test on a 300MB tsv file, it was roughly 5x faster than the python solution (2.5s vs. 14s).
the following awk oneliner supports quoting + quote-escaping
printf "flop\tflap\"" | awk -F '\t' '{ gsub(/"/,"\"\"\"",$i); for(i = 1; i <= NF; i++) { printf "\"%s\"",$i; if( i < NF ) printf "," }; printf "\n" }'
gives
"flop","flap""""
right click file, click rename, delete the 't' and put a 'c'. I'm actually not joking, most csv parsers can handle tab delimiters. I had this issue now and for my purposes renaming worked just fine.
I think it is better not to cat the file because it may create problem in the case of large file. The better way may be
$ tr ',' '\t' < csvfile.csv > tabdelimitedFile.txt
The command will get input from csvfile.csv and store the result as tab seperated in tabdelimitedFile.txt

Resources