linux append missing qoutes to csv fields/header

linux append missing qoutes to csv fields/header - linux

I have the following csv file:
id,"path",score,"file"
1,"/tmp/file 1.csv",5,"file 1.csv"
2,"/tmp/file2.csv",15,"file2.csv"
I want to convert it to:
"id","path","score","file"
"1","/tmp/file 1.csv","5","file 1.csv"
"2","/tmp/file2.csv","15","file2.csv"
How can I do it using sed/awk or any another linux tool?

Assuming that you want to quote all entries, coma is separator, and there are no white spaces between separator and entry (this one can be solved as well but for brevity I didn't include it).
$ cat csv1 | sed -e 's/^/\"/' -e 's/$/\"/' -e 's/,/\",\"/g' -e 's/\"\"/\"/g' > csv2
It replaces beginning ^, end $ of line, , with " and at the end removes duplicates.

Using Miller, if you run
mlr --csv --quote-all cat input.csv >output.csv
you will have
"id","path","score","file"
"1","/tmp/file 1.csv","5","file 1.csv"
"2","/tmp/file2.csv","15","file2.csv"

Related

How to convert an uneven tab separated file using sed?

How to convert an uneven TAB separated input file to CSV or PSV using sed command?
28828082-1 04/08/19 08:48 04/11/19 12:37 04/12/19 16:22 4/15-4/16 04/17/19 2 9 LCO W OIP 04/08/19 08:53 21 1 58.00 9 222 79 FEDX FEDXH SL3 484657064673 0410099900691041119 SMITHFIELD RI 02917 "41.890066 , -71.548680" YES
Above is 1 row, I tried using sed -r 's/^\s+//;s/\s+/|/g' but the result was not as expected.

gawk to the rescue!
$ awk -vFPAT='([^[:space:]]+)|("[^"]+")' -v OFS='|' '$1=$1' file
28828082-1|04/08/19|08:48|04/11/19|12:37|04/12/19|16:22|4/15-4/16|04/17/19|2|9|LCO|W|OIP|04/08/19|08:53|21|1|58.00|9|222|79|FEDX|FEDXH|SL3|484657064673|0410099900691041119|SMITHFIELD|RI|02917|"41.890066 , -71.548680"|YES
define the field pattern as non space or a quoted value which might include spaces (but not escaped quotes), replace the output field separated with tab, force the line to be parsed and non zero lines will be printed after format change.
A better version would be ... '{$1=$1; print}'.
Of course, if all the field delimiters are tabs and quotes string doesn't include any tabs, it's much simpler.

Your question isn't clear but is this what you're trying to do?
$ printf 'now\t"is the winter"\tof\t"our discontent"\n' > file
$ cat file
now "is the winter" of "our discontent"
$ tr '\t' ',' < file
now,"is the winter",of,"our discontent"
$ tr '\t' '|' < file
now|"is the winter"|of|"our discontent"

You initial answer was very close:
sed 's/[[:space:]]\+/|/g' input.txt
Explanation:
[[:space:]] Match a single whitespace character such as space/tab/CR/newline.
\+ Match one or more of the current grab.
Update:
If you require 2 or more white spaces.
sed 's/[[:space:]]\{2,\}/|/g' input.txt
\{2,\} Match two or more of the current grab.

Squeezing spaces between columns in Unix shell

I want the spaces to be removed between two columns.
After running a sql query from shell, I'm getting the output as below:
23554402243 0584940772;2TZ0584940772001U;
23554402272 0423721840;7TT0423721840001B;
23554402303 0110770863;BBTU500248822001Q;
23554402305 02311301;BTB02311301001J;
23554402563 0550503408;PPTU004984208001O;
23554402605 0457553223;Q0T0457553223001I;
23554367602 0454542427;TB8U501674990001V;
23554378584 0383071261;HTHU500374797001Y;
23554404965 059792244;ST3059792244005C;
23554405503 0571632586;QTO0571632586001D;
But the desired output should be like below:
23554400043 0117601738;22TU003719388001V;
23554402883 0823973229;TTT0823973229001C;
23554402950 024071080;MNT024071080001D;
23554405827 0415260614;TL20415260614001R;
23554405828 08119270800;TL2U003010407001G;
23554406553 011306895;VBT011306895001E;
23554406557 054121509;TL2054121509001M;
23554406563 065069209;TL2065069209005M;
23554409085 0803434328;QTO0803434328001B;
23553396219 062004063;G6T062004063001C;
Remember, there should be only one tabspace between two columns in the desired output.

Assuming you need to remove space between all the columns:
If you need tab spaced result between first two columns. Put g to apply changes between all the columns.
sed -r 's/\s+/\t/' inputfile
if -r is not available:
sed 's/\s\+/\t/'
or If you need single space between every multi-space
tr -s ' '

Easy to do using this awk:
awk -v OFS='\t' '{$1=$1} 1' file
23554402243 0584940772;2TZ0584940772001U;
23554402272 0423721840;7TT0423721840001B;
23554402303 0110770863;BBTU500248822001Q;
23554402305 02311301;BTB02311301001J;
23554402563 0550503408;PPTU004984208001O;
23554402605 0457553223;Q0T0457553223001I;
23554367602 0454542427;TB8U501674990001V;
23554378584 0383071261;HTHU500374797001Y;
23554404965 059792244;ST3059792244005C;
23554405503 0571632586;QTO0571632586001D;
Alternatively this tr will also work:
tr -s ' ' < file | tr ' ' '\t'
or this sed:
sed -i.bak $'s/ \{1,\}/\t/g' file

what about the following perl one-liner?
perl -ne '/(.*?)\s+(.*)/; print "$1\t$2\n"' your_input_file

replace string in a file with a string from within the same file

I have a file like this (tens of variables) :
PLAY="play"
APPS="/opt/play/apps"
LD_FILER="/data/mysql"
DATA_LOG="/data/log"
I need a script that will output the variables into another file like this (with space between them):
PLAY=${PLAY} APPS=${APPS} LD_FILER=${LD_FILER}
Is it possible ?

I would say:
$ awk -F= '{printf "%s=${%s} ", $1,$1} END {print ""}' file
PLAY=${PLAY} APPS=${APPS} LD_FILER=${LD_FILER} DATA_LOG=${DATA_LOG}
This loops through the file and prints the content before = in a format var=${var} together with a space. At the end, it prints a new line.
Note this leaves a trailing space at the end of the line. If this matters, we can check how to improve it.

< input sed -e 's/\(.*\)=.*/\1=${\1}/' | tr \\n \ ; echo

sed 's/"\([^"]*"\)"/={\1}/;H;$!d
x;y/\n/ /;s/.//' YourFile
your sample exclude last line so if this is important
sed '/DATA_LOG=/ d
s/"\([^"]*"\)"/={\1}/;H;$!d
x;y/\n/ /;s/.//' YourFile

Sed: Replacing Date Format

I have a text file where there is a date string of "2014-06-01T03:11:00Z " in every line. I would like to replace that with "2014-06-01 03:11Z " using sed.
I've been trying to use this code but, it's failing me:
sed -i 's/[0-9]-[0-9]-[0-9]T[0-9]:[0-9]:[0-9]Z/[0-9]-[0-9]-[0-9] [0-9]:[0-9]Z/g' \
/home/aaron/grads/data/metars/${YMD}/latest.metars

Your digit sub-expressions only match a single digit, but the date contains 2 or 4 digits. A simple version that would match dates is:
sed -i 's/\([0-9]*-[0-9]*-[0-9]*\)T\([0-9]*:[0-9]*\):[0-9]*Z/\1 \2Z/g' \
/home/aaron/grads/data/metars/${YMD}/latest.metars
However, this matches zero or more digits at each position where digits are expected. You really want to insist on the correct number of digits in each segment. A more refined version is:
sed -i 's/\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)T\([0-9]\{2\}:[0-9]\{2\}\):[0-9]\{2\}Z/\1 \2Z/g' \
/home/aaron/grads/data/metars/${YMD}/latest.metars
And since your sed supports -i without specifying a back-up suffix (so it is probably GNU sed), you can probably abbreviate that to:
sed -r -i 's/([0-9]{4}-[0-9]{2}-[0-9]{2})T([0-9]{2}:[0-9]{2}):[0-9]{2}Z/\1 \2Z/g' \
/home/aaron/grads/data/metars/${YMD}/latest.metars

Try this GNU sed command to replace all the lines which contains the date string with the string you mentioned,
sed -ri 's/^.*([0-9]{4})-([0-9]{2})-([0-9]{2})\w*([0-9]{2}):([0-9]{2}):([0-9]{2})(.)(.*)$/\1-\2-\3 \4:\5\7/g' file
Example:
$ cat aa
jgklj 2014-06-01T03:11:00Z jhgkjhvk
blaf 2015-12-08T03:15:02Z bvcjghj
$ sed -r 's/^.*([0-9]{4})-([0-9]{2})-([0-9]{2})\w*([0-9]{2}):([0-9]{2}):([0-9]{2})(.)(.*)$/\1-\2-\3 \4:\5\7/g' aa
2014-06-01 03:11Z
2015-12-08 03:15Z
For to replace date only and print all the other text as it is then run the below command.
sed -ri 's/^(.*)([0-9]{4})-([0-9]{2})-([0-9]{2})\w*([0-9]{2}):([0-9]{2}):([0-9]{2})(.)(.*)$/\1\2-\3-\5 \5:\6\8\9/g' file
Example:
$ cat aa
jgklj 2014-06-01T03:11:00Z jhgkjhvk
blaf 2015-12-08T03:15:02Z bvcjghj
$ sed -r 's/^(.*)([0-9]{4})-([0-9]{2})-([0-9]{2})\w*([0-9]{2}):([0-9]{2}):([0-9]{2})(.)(.*)$/\1\2-\3-\5 \5:\6\8\9/g' aa
jgklj 2014-06-03 03:11Z jhgkjhvk
blaf 2015-12-03 03:15Z bvcjghj

You can use this method also
$-sed -r 's/^([^T]+).((.*):){1,2}.([^Z])/\1 \3/g'

how to edit a line using sed or awk in linux containing a certain number or string

My Stress.k file is as follows
180.4430
*INCLUDE
$# filename
*STRESS_INITIALIZATION
*END
I want it to be like
180.4430
*INCLUDE
$# filename
*STRESS_INITIALIZATION
*/home/hassan/534.k
*END
for that I used sed as follows
a="$(cat flow.k)"
sed -i -e '/*END/i \*/home/hassan/$a.k ' Stress.k
where flow.k has only a single number like 534.k or something . Here sed put the line before END but it doesn't take the value of a , instead it puts the same alphabet and it doesn't understand $a.k.
Please also tell me how to delete the second last line or the line with a string hassan for example so that I can delete it first and the for the next step I use it to enter my required line.
if possible please also suggest the alternatives.
best regards

bash variables are only replaced when in double quotes, e.g.
sed -i -e "/*END/i \*/home/hassan/$a.k " Stress.k

Use double quotes to allow the variable to be expanded.
sed -i -e "/*END/i \*/home/hassan/$a.k " Stress.k
To replace the string, do it as you read in the file:
a=$(sed 's/534/100/' flow.k)
To delete a line:
sed '/hassan/d' inputfile
To read a file into the stream after the current line:
sed '/foo/r filename' inputfile

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

linux append missing qoutes to csv fields/header - linux

Using Miller, if you run mlr --csv --quote-all cat input.csv >output.csv you will have "id","path","score","file" "1","/tmp/file 1.csv","5","file 1.csv" "2","/tmp/file2.csv","15","file2.csv"

Related

How to convert an uneven tab separated file using sed?

Squeezing spaces between columns in Unix shell

replace string in a file with a string from within the same file

Sed: Replacing Date Format

how to edit a line using sed or awk in linux containing a certain number or string

Categories

Resources