How to create a CSV file based on row in shell script? - linux

I have a text file /tmp/some.txt with below values
JOHN YES 6 6 2345762
SHAUN NO 6 6 2345748
I want to create a csv file with below format (i.e based on rows. NOT based on columns).
JOHN,YES,6,6,2345762
SHAUN,NO,6,6,2345748
i tried below code
for i in `wc -l /tmp/some.txt | awk '{print $1}'`
do
awk 'NR==$i' /tmp/some.txt | awk '{print $1","$2","$3","$4","$5}' >> /tmp/some.csv
done
here wc -l /tmp/some.txt | awk '{print $1}' will get the value as 2 (i.e 2 rows in text file).
and for each row awk 'NR==$i' /tmp/some.txt | awk '{print $1","$2","$3","$4","$5}' will print the 5 fields into some.csvfile which is separated by comma.
when i execute each command separately it will work. but when i make it as a shell script i'm getting empty some.csv file.

#Kart: Could you please try following.
awk '{$1=$1;} 1' OFS=, Input_file > output.csv
I hope this helps you.

I suggest:
sed 's/[[:space:]]\+/,/g' /tmp/some.txt

You almost got it. awk already process the file row by row, so you don't need to iterate with the for loop.
So you just need to run:
awk '{print $1","$2","$3","$4","$5}' /tmp/some.txt >> /tmp/some.csv

With tr, squeezing (-s), and then transliterating space/tab ([:blank:]):
tr -s '[:blank:]' ',' <file.txt
With sed, substituting one or more space/tab with ,:
sed 's/[[:blank:]]\+/,/g' file.txt
With awk, replacing one ore more space/tab with , using gsub() function:
awk 'gsub("[[:blank:]]+", ",", $0)' file.txt
Example
% cat foo.txt
JOHN YES 6 6 2345762
SHAUN NO 6 6 2345748
% tr -s '[:blank:]' ',' <foo.txt
JOHN,YES,6,6,2345762
SHAUN,NO,6,6,2345748
% sed 's/[[:blank:]]\+/,/g' foo.txt
JOHN,YES,6,6,2345762
SHAUN,NO,6,6,2345748
% awk 'gsub("[[:blank:]]+", ",", $0)' foo.txt
JOHN,YES,6,6,2345762
SHAUN,NO,6,6,2345748

Related

Print lines not containg a period linux

I have a file with thousands of rows. I want to print the rows which do not contain a period.
awk '{print$2}' file.txt | head
I have used this to print the column I am interested in, column 2 (The file only has two columns).
I have removed the head and then did
awk '{print$2}' file.txt | grep -v "." | head
But I only get blank lines not any actual values which is expected, I think it has included the spaces between the rows but I am not sure.
Is there an alternative command?
As suggested by Jim, I did-
awk '{print$2}' file.txt | grep -v "\." | head
However the number of lines is greater than before, is this expected? Also, my output is a list of numbers but with spaces in between them (Vertical), is this normal?
file.txt example below-
120.4 3
270.3 7.9
400.8 3.9
200.2 4
100.2 8.7
300.2 3.4
102.3 6
49.0 2.3
38.0 1.2
So the expected (and correct) output would be 3 lines, as there is 3 values in column 2 without the period:
$ awk '{print$2}' file.txt | grep -v "\." | head
3
4
6
However, when running the code as above, I instead get 5, which is also counting the spaces between the rows I think:
$ awk '{print$2}' file.txt | grep -v "\." | head
3
4
6
You seldom need to use grep if you're already using awk
This would print the second column on each line where that second column doesn't contain a dot:
awk '$2 !~ /\./ {print $2}'
But you also wanted to skip empty lines, or perhaps ones where the second column is not empty. So just test for that, too:
awk '$2 != "" && $2 !~ /\./ {print $2}'
(A more amusing version would be awk '$2 ~ /./ && $2 !~ /\./ {print $2}' )
As you said, grep -v "." gives you only blank lines. That's because the dot means "any character", and with -v, the only lines printed are those that don't contain, well, any characters.
grep is interpreting the dot as a regex metacharacter (the dot will match any single character). Try escaping it with a backslash:
awk '{print$2}' file.txt | grep -v "\." | head
If I understand well, you can try this sed
sed ':A;N;${s/.*/&\n/};/\n$/!bA;s/\n/ /g;s/\([^ ]*\.[^ ]* \)//g' file.txt
output
3
4
6

Reformat item with line feeds from a CSV file

I have a csv and I would like to know how to replace newline by -, just in the brothers column, with bash:
name,brothers,age,adress
------------------------
john,"marc
peter
paul
alex",18,street
thomas,mike,20,place
Awk is perfect for this
awk -v RS='^$' -v ORS= '{while ( match($0,/"[^"]+"/,a) ) {gsub(/\n/," ",a[0]); print substr($0,1,RSTART-1) a[0]; $0=substr($0,RSTART+RLENGTH)} print}' your.csv
outputs:
me,brothers,age,adress
------------------------
john,"marc peter paul alex",18,street
thomas,mike,20,place
Ungainly combo of csvtool, sed, & bash:
csvtool pastecol 2 1- \
input.csv
<(csvtool col 2 input.csv | \
sed -n '/"/,/"/{:a;N;$!ba;s/\([^"]\)\n/\1-/g;};p') | \
csvtool trim r -
Output:
name,brothers,age,adress
------------------------
john,marc-peter-paul-alex,18,street
thomas,mike,20,place
Except for the sed part, it's not that bad. csvtool replaces column 2 with an edited copy. At the end it trims an extra comma that csvtool stuck in there.

linux command to get the last appearance of a string in a text file

I want to find the last appearance of a string in a text file with linux commands. For example
1 a 1
2 a 2
3 a 3
1 b 1
2 b 2
3 b 3
1 c 1
2 c 2
3 c 3
In such a text file, i want to find the line number of the last appearance of b which is 6.
I can find the first appearance with
awk '/ b / {print NR;exit}' textFile.txt
but I have no idea how to do it for the last occurrence.
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
cat -n prints the file to STDOUT prepending line numbers.
grep greps out all lines containing "b" (you can use egrep for more advanced patterns or fgrep for faster grep of fixed strings)
tail -1 prints last line of those lines containing "b"
cut -f 1 prints first column, which is line # from cat -n
Or you can use Perl if you wish (It's very similar to what you'd do in awk, but frankly, I personally don't ever use awk if I have Perl handy - Perl supports 100% of what awk can do, by design, as 1-liners - YMMV):
perl -ne '{$n=$. if / b /} END {print "$n\n"}' textfile.txt
This can work:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
We check every second file being "b" and we record the number of line. It is appended, so by the time we finish reading the file, it will be the last one.
Test:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
6
Update based on sudo_O advise:
$ awk '{if ($2=="b") a=NR} END{print a}' your_file
to avoid having some abc in 2nd field.
It is also valid this one (shorter, I keep the one above because it is the one I thought :D):
$ awk '$2=="b" {a=NR} END{print a}' your_file
Another approach if $2 is always grouped (may be more efficient then waiting until the end):
awk 'NR==1||$2=="b",$2=="b"{next} {print NR-1; exit}' file
or
awk '$2=="b"{f=1} f==1 && $2!="b" {print NR-1; exit}' file

Find value from one csv in another one (like vlookup) in bash (Linux)

I have already tried all options that I found online to solve my issue but without good result.
Basically I have two csv files (pipe separated):
file1.csv:
123|21|0452|IE|IE|1|MAYOBAN|BRIN|OFFICE|STREET|MAIN STREET|MAYOBAN|
123|21|0453|IE|IE|1|CORKKIN|ROBERT|SURNAME|CORK|APTS|CORKKIN|
123|21|0452|IE|IE|1|CORKCOR|NAME|HARRINGTON|DUBLIN|STREET|CORKCOR|
file2.csv:
MAYOBAN|BANGOR|2400
MAYOBEL|BELLAVARY|2400
CORKKIN|KINSALE|2200
CORKCOR|CORK|2200
DUBLD11|DUBLIN 11|2100
I need a linux bash script to find the value of pos.3 from file2 based on the content of pos7 in file1.
Example:
file1, line1, pos 7: MAYOBAN
find MAYOBAN in file2, return pos 3 (2400)
the output should be something like this:
**2400**
**2200**
**2200**
**etc...**
Please help
Jacek
A little approach, far away to be perfect:
DELIMITER="|"
for i in $(cut -f 7 -d "${DELIMITER}" file1.csv );
do
grep "${i}" file2.csv | cut -f 3 -d "${DELIMITER}";
done
This will work, but since the input files must be sorted, the output order will be affected:
join -t '|' -1 7 -2 1 -o 2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)
The output would look like:
2200
2200
2400
which is useless. In order to have a useful output, include the key value:
join -t '|' -1 7 -2 1 -o 0,2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)
The output then looks like this:
CORKCOR|2200
CORKKIN|2200
MAYOBAN|2400
Edit:
Here's an AWK version:
awk -F '|' 'FNR == NR {keys[$7]; next} {if ($1 in keys) print $3}' file1.csv file2.csv
This loops through file1.csv and creates array entries for each value of field 7. Simply referring to an array element creates it (with a null value). FNR is the record number in the current file and NR is the record number across all files. When they're equal, the first file is being processed. The next instruction reads the next record, creating a loop. When FNR == NR is no longer true, the subsequent file(s) are processed.
So file2.csv is now processed and if it has a field 1 that exists in the array, then its field 3 is printed.
You can use Miller (https://github.com/johnkerl/miller).
Starting from input01.txt
123|21|0452|IE|IE|1|MAYOBAN|BRIN|OFFICE|STREET|MAIN STREET|MAYOBAN|
123|21|0453|IE|IE|1|CORKKIN|ROBERT|SURNAME|CORK|APTS|CORKKIN|
123|21|0452|IE|IE|1|CORKCOR|NAME|HARRINGTON|DUBLIN|STREET|CORKCOR|
and input02.txt
MAYOBAN|BANGOR|2400
MAYOBEL|BELLAVARY|2400
CORKKIN|KINSALE|2200
CORKCOR|CORK|2200
DUBLD11|DUBLIN 11|2100
and running
mlr --csv -N --ifs "|" join -j 7 -l 7 -r 1 -f input01.txt then cut -f 3 input02.txt
you will have
2400
2200
2200
Some notes:
-N to set input and output without header;
--ifs "|" to set the input fields separator;
-l 7 -r 1 to set the join fields of the input files;
cut -f 3 to extract the field named 3 from the join output
cut -d\| -f7 file1.csv|while read line
do
grep $line file1.csv|cut -d\| -f3
done

how to operate the mathematical data on the first fields and assign the varibale linux

I have a file contaning for just 2 numubers. One number on eash line.
4.1865E+02
4.1766E+02
I know its something line BHF = ($1 from line 1 - $1 from line 2 )
but can find the exact command.
How can I do a mathematical operation on them and save the result to a variable.
PS: This was got using
sed -i -e '/^$/d' nodout15
sed -i -e 's/^[ \t]*//;s/[ \t]*$//' nodout15
awk ' {print $13} ' nodout15 > 15
mv 15 nodout15
sed -i -e '/^$/d' nodout15
sed -i -e 's/^[ \t]*//;s/[ \t]*$//' nodout15
sed -n '/^[0-9]\{1\}/p' nodout15 > 15
mv 15 nodout15
tail -2 nodout15 > 15
mv 15 nodout15
After all this I have these two numbers and now I am not able to do some arithmatics. If possible please tell me a short form to do it on the spot rather doing all this jugglary. Nodout is a file with different length of columns so I am only interested in 13th column. Since all lines wont be in the daughter file so , empty lines deleted. Then only those lines to be taken starting with number. Then the last two lines, as they show the final state. The difference between them , will lead to a conditional statement. so I need to save it in a variable.
regards.
awk
$ BHF=`awk -v RS='' '{print $1-$2}' input.txt`
$ echo $BHF
0.99
bc
$ BHF=`cat input.txt | xargs printf '%f-%f\n' | bc`
$ echo $BHF
.990000

Resources