Squeezing spaces between columns in Unix shell - linux

I want the spaces to be removed between two columns.
After running a sql query from shell, I'm getting the output as below:
23554402243 0584940772;2TZ0584940772001U;
23554402272 0423721840;7TT0423721840001B;
23554402303 0110770863;BBTU500248822001Q;
23554402305 02311301;BTB02311301001J;
23554402563 0550503408;PPTU004984208001O;
23554402605 0457553223;Q0T0457553223001I;
23554367602 0454542427;TB8U501674990001V;
23554378584 0383071261;HTHU500374797001Y;
23554404965 059792244;ST3059792244005C;
23554405503 0571632586;QTO0571632586001D;
But the desired output should be like below:
23554400043 0117601738;22TU003719388001V;
23554402883 0823973229;TTT0823973229001C;
23554402950 024071080;MNT024071080001D;
23554405827 0415260614;TL20415260614001R;
23554405828 08119270800;TL2U003010407001G;
23554406553 011306895;VBT011306895001E;
23554406557 054121509;TL2054121509001M;
23554406563 065069209;TL2065069209005M;
23554409085 0803434328;QTO0803434328001B;
23553396219 062004063;G6T062004063001C;
Remember, there should be only one tabspace between two columns in the desired output.

Assuming you need to remove space between all the columns:
If you need tab spaced result between first two columns. Put g to apply changes between all the columns.
sed -r 's/\s+/\t/' inputfile
if -r is not available:
sed 's/\s\+/\t/'
or If you need single space between every multi-space
tr -s ' '

Easy to do using this awk:
awk -v OFS='\t' '{$1=$1} 1' file
23554402243 0584940772;2TZ0584940772001U;
23554402272 0423721840;7TT0423721840001B;
23554402303 0110770863;BBTU500248822001Q;
23554402305 02311301;BTB02311301001J;
23554402563 0550503408;PPTU004984208001O;
23554402605 0457553223;Q0T0457553223001I;
23554367602 0454542427;TB8U501674990001V;
23554378584 0383071261;HTHU500374797001Y;
23554404965 059792244;ST3059792244005C;
23554405503 0571632586;QTO0571632586001D;
Alternatively this tr will also work:
tr -s ' ' < file | tr ' ' '\t'
or this sed:
sed -i.bak $'s/ \{1,\}/\t/g' file

what about the following perl one-liner?
perl -ne '/(.*?)\s+(.*)/; print "$1\t$2\n"' your_input_file

Related

linux append missing qoutes to csv fields/header

I have the following csv file:
id,"path",score,"file"
1,"/tmp/file 1.csv",5,"file 1.csv"
2,"/tmp/file2.csv",15,"file2.csv"
I want to convert it to:
"id","path","score","file"
"1","/tmp/file 1.csv","5","file 1.csv"
"2","/tmp/file2.csv","15","file2.csv"
How can I do it using sed/awk or any another linux tool?
Assuming that you want to quote all entries, coma is separator, and there are no white spaces between separator and entry (this one can be solved as well but for brevity I didn't include it).
$ cat csv1 | sed -e 's/^/\"/' -e 's/$/\"/' -e 's/,/\",\"/g' -e 's/\"\"/\"/g' > csv2
It replaces beginning ^, end $ of line, , with " and at the end removes duplicates.
Using Miller, if you run
mlr --csv --quote-all cat input.csv >output.csv
you will have
"id","path","score","file"
"1","/tmp/file 1.csv","5","file 1.csv"
"2","/tmp/file2.csv","15","file2.csv"

remove character on the last line that specific word appears

we have the following file example
we want to remove the , character on the last line that topic word exists
more file
{"topic":"life_is_hard","partition":84,"replicas":[1006,1003]},
{"topic":"life_is_hard","partition":85,"replicas":[1001,1004]},
{"topic":"life_is_hard","partition":86,"replicas":[1002,1005]},
{"topic":"life_is_hard","partition":87,"replicas":[1003,1006]},
{"topic":"life_is_hard","partition":88,"replicas":[1004,1001]},
{"topic":"life_is_hard","partition":89,"replicas":[1005,1002]},
{"topic":"life_is_hard","partition":90,"replicas":[1006,1004]},
{"topic":"life_is_hard","partition":91,"replicas":[1001,1005]},
{"topic":"life_is_hard","partition":92,"replicas":[1002,1006]},
{"topic":"life_is_hard","partition":93,"replicas":[1003,1001]},
{"topic":"life_is_hard","partition":94,"replicas":[1004,1002]},
{"topic":"life_is_hard","partition":95,"replicas":[1005,1003]},
{"topic":"life_is_hard","partition":96,"replicas":[1006,1005]},
{"topic":"life_is_hard","partition":97,"replicas":[1001,1006]},
{"topic":"life_is_hard","partition":98,"replicas":[1002,1001]},
{"topic":"life_is_hard","partition":99,"replicas":[1003,1002]},
expected output
{"topic":"life_is_hard","partition":84,"replicas":[1006,1003]},
{"topic":"life_is_hard","partition":85,"replicas":[1001,1004]},
{"topic":"life_is_hard","partition":86,"replicas":[1002,1005]},
{"topic":"life_is_hard","partition":87,"replicas":[1003,1006]},
{"topic":"life_is_hard","partition":88,"replicas":[1004,1001]},
{"topic":"life_is_hard","partition":89,"replicas":[1005,1002]},
{"topic":"life_is_hard","partition":90,"replicas":[1006,1004]},
{"topic":"life_is_hard","partition":91,"replicas":[1001,1005]},
{"topic":"life_is_hard","partition":92,"replicas":[1002,1006]},
{"topic":"life_is_hard","partition":93,"replicas":[1003,1001]},
{"topic":"life_is_hard","partition":94,"replicas":[1004,1002]},
{"topic":"life_is_hard","partition":95,"replicas":[1005,1003]},
{"topic":"life_is_hard","partition":96,"replicas":[1006,1005]},
{"topic":"life_is_hard","partition":97,"replicas":[1001,1006]},
{"topic":"life_is_hard","partition":98,"replicas":[1002,1001]},
{"topic":"life_is_hard","partition":99,"replicas":[1003,1002]}
we try to removed the character , from the the last line that contain topic word as the following sed cli but this syntax not renewed the ,
sed -i '${s/,[[:blank:]]*$//}' file
sed (GNU sed) 4.2.2
In case you have control M characters in your Input_file then remove them by doing:
tr -d '\r' < Input_file > temp && mv temp Input_file
Could you please try following once. From your question what I understood is you want to remove comma from very last line which has string topic in it, if this is the case then I am coming up with tac + awk solution here.
tac Input_file |
awk '/topic/ && ++count==1{sub(/,$/,"")} 1' |
tac
Once you are happy with above results then append > temp && mv temp Input_file to above command too, to save output into Input_file itself.
Explanation:
Atac will read Input_file from bottom line to first line then passing it's output to awk where I am checking if first occurrence of topic is coming remove comma from last and rest of lines simply print then passing this output to tac again to make Input_file in original form again.
You should use the address $ (last line):
sed '$s/,$//' file
Using awk:
$ awk '{if(NR>1)print p;p=$0}END{sub(/,$/,"",p);print p}' file
Output:
...
{"topic":"life_is_hard","partition":98,"replicas":[1002,1001]},
{"topic":"life_is_hard","partition":99,"replicas":[1003,1002]}

how to count occurrence of specific word in group of file by bash/shellscript

i have two text files 'simple' and 'simple1' with following data in them
simple.txt--
hello
hi hi hello
this
is it
simple1.txt--
hello hi
how are you
[]$ tr ' ' '\n' < simple.txt | grep -i -c '\bh\w*'
4
[]$ tr ' ' '\n' < simple1.txt | grep -i -c '\bh\w*'
3
this commands show the number of words that start with "h" for each file but i want to display the total count to be 7 i.e. total of both file. Can i do this in single command/shell script?
P.S.: I had to write two commands as tr does not take two file names.
Try this, the straightforward way :
cat simple.txt simple1.txt | tr ' ' '\n' | grep -i -c '\bh\w*'
This alternative requires no pipelines:
$ awk -v RS='[[:space:]]+' '/^h/{i++} END{print i+0}' simple.txt simple1.txt
7
How it works
-v RS='[[:space:]]+'
This tells awk to treat each word as a record.
/^h/{i++}
For any record (word) that starts with h, we increment variable i by 1.
END{print i+0}
After we have finished reading all the files, we print out the value of i.
It is not the case, that tr accepts only one filename, it does not accept any filename (and always reads from stdin). That's why even in your solution, you didn't provide a filename for tr, but used input redirection.
In your case, I think you can replace tr by fmt, which does accept filenames:
fmt -1 simple.txt simple1.txt | grep -i -c -w 'h.*'
(I also changed the grep a bit, because I personally find it better readable this way, but this is a matter of taste).
Note that both solutions (mine and your original ones) would count a string consisting of letters and one or more non-space characters - for instance the string haaaa.hbbbbbb.hccccc - as a "single block", i.e. it would only add 1 to the count of "h"-words, not 3. Whether or not this is the desired behaviour, it's up to you to decide.

replace string in a file with a string from within the same file

I have a file like this (tens of variables) :
PLAY="play"
APPS="/opt/play/apps"
LD_FILER="/data/mysql"
DATA_LOG="/data/log"
I need a script that will output the variables into another file like this (with space between them):
PLAY=${PLAY} APPS=${APPS} LD_FILER=${LD_FILER}
Is it possible ?
I would say:
$ awk -F= '{printf "%s=${%s} ", $1,$1} END {print ""}' file
PLAY=${PLAY} APPS=${APPS} LD_FILER=${LD_FILER} DATA_LOG=${DATA_LOG}
This loops through the file and prints the content before = in a format var=${var} together with a space. At the end, it prints a new line.
Note this leaves a trailing space at the end of the line. If this matters, we can check how to improve it.
< input sed -e 's/\(.*\)=.*/\1=${\1}/' | tr \\n \ ; echo
sed 's/"\([^"]*"\)"/={\1}/;H;$!d
x;y/\n/ /;s/.//' YourFile
your sample exclude last line so if this is important
sed '/DATA_LOG=/ d
s/"\([^"]*"\)"/={\1}/;H;$!d
x;y/\n/ /;s/.//' YourFile

how to replace specific record on a line containg a string with a number from another file using inplace editing sed in linux

I have an input file like following.
R sfst 1000.0000
$ new time step for mass scaled calculation
R dt2ms -4.000E-7
$ friction value for blank
R mue 0.120000
$ blankholder force
R bhf 2.0000E+5
$ simulation time
R endtime 0.150000
i want to change the value on the line containing 'mue'
with following I can read it but cant change it.
awk ' /mue/ { print $3 } ' input.txt
The value is to be taken from another file fric.txt.
fric.txt contains only numbers, one on each line .
fric.txt has data like
0.1234
0.234
0.0234
.
.
Blockquote
It should be noted that ONLY the FIRST instance need to be replaced and the format i.e. white spacing be kept cosntant.
Blockquote
Can anybody guide me doing this using sed or awk?
Try this command:
$ awk '/mue/ && !seen {getline $3 <"fric.txt"; seen=1} 1' input.txt
This might work for you:
sed '/\<mue\>/!d;=;s/.* \([^ ]\+\).*/\1/;R fric.txt' input.txt |
sed 'N;N;s|\n|s/|;s|\n|/|;s|$|/|;q' >temp.sed
sed -i -f temp.sed input.txt
You can do it with a sed in the sed (assuming you like to take line 1 from fric.txt):
sed -ir 's/(.*mue[ \t]+)[0-9.]+(.*)/\1'$(sed -n '1{p;q}' fric.txt)'\2/' input.txt

Resources