GNU Awk - don't modify whitespaces - linux

I am using GNU Awk to replace a single character in a file. The file is a single line with varying whitespacing between "fields". After passing through gawk all the extra whitespacing is removed and I end up with single spaces. This is completely unintended and I need it to ignore these spaces and only change the one character I have targeted. I have tried several variations, but I cannot seem to get gawk to ignore these extra spaces.
Since I know this will come up, I read from the end of the line for replacement because the whitespacing is arbitrary/inconsistent in the source file.
Command:
gawk -i inplace -v new=3 'NF {$(NF-5) = new} 1' ~/scripts/tmp_beta_weather_file
Original file example:
2020-07-01 18:29:51.00 C M -11.4 28.9 29 9 23 5.5 000 0 0 00020 044013.77074 1 1 1 3 0 0
Result after command above:
2020-07-01 18:30:51.00 C M -11.8 28.8 29 5 23 5.5 000 0 0 00020 044013.77143 3 1 1 3 0 0

it might be easier with sed
sed -E 's/([^ ]+)(( [^ ]+){5})$/3\2/' file
test and add -i for in-place edit.

Related

How can I replace a specific character in a file where it's position changes in bash command line or script?

I have the following file:
2020-01-27 19:43:57.00 C M -8.5 0.2 0 4 81 -2.9 000 0 0 00020 043857.82219 3 1 1 1 1 1
The character "3" that I need to change is bolded and italicized. The value of this character is dynamic, but always a single digit. I have tried a few things using sed but I can't come up with a way to account for the character changing position due to additional characters being added before that position.
This character is always at the same position from the END of the line, but not from the beginning. Meaning, the content to the left of this character may change and it may be longer, but this is always the 11th character and 6th digit from the end. It is easy to devise a way to cut it, or find it using tail, but I can't devise a way to replace it.
To be clear, the single digit character in question will always be replaced with another single digit character.
With GNU awk
$ cat file
2020-01-27 19:43:57.00 C M -8.5 0.2 0 4 81 -2.9 000 0 0 00020 043857.82219 3 1 1 1 1 1
$ gawk -i inplace -v new=9 'NF {$(NF-5) = new} 1' file
$ cat file
2020-01-27 19:43:57.00 C M -8.5 0.2 0 4 81 -2.9 000 0 0 00020 043857.82219 9 1 1 1 1 1
Where:
NF {$(NF-5) = new} means, when the line is not empty, replace the 6th-last field with the new value (9).
1 means print every record.
awk '{ $(NF-5) = ($(NF - 5) + 8) % 10; print }'
Given your input data, it produces;
2020-01-27 19:43:57.00 C M -8.5 0.2 0 4 81 -2.9 000 0 0 00020 043857.82219 1 1 1 1 1 1
The 3 has been mapped via 11 to 1 — pick your poison on how you assign the new value, but the magic is $(NF - 5) to pick up the fifth column before the last one (or sixth from end).
Would you try the following:
replace="x" # or whatever you want to replace
sed 's/\(.\)\(.\{10\}\)$/'"$replace"'\2/' file
The left portion of the sed command \(.\)\(.\{10\}\)$ matches a character, followed by ten characters, then anchored by the end of line.
Then the 1st character is replaced with the specified character and the following ten characters are reused.
I'm gonna assume that the number that you're looking for is the same distance from the end, regardless of what comes before it:
rev ~/test.txt | awk '$6=<value to replace>' | rev
Using the bash shell which should be the last option.
rep=10
read -ra var <<< '2020-01-27 19:43:57.00 C M -8.5 0.2 0 4 81 -2.9 000 0 0 00020 043857.82219 3 1 1 1 1 1'
for i in "${!var[#]}"; do printf '%s ' "${var[$i]/${var[-6]}/$rep}"; done
If it is in a file.
rep=10
read -ra var < file.txt
for i in "${!var[#]}"; do printf '%s ' "${var[$i]/${var[-6]}/$rep}"; done
Not the shortest and fastest way but it can be done...

How to split a column which has multiple dots using Linux command line

I have a file which looks like this:
chr10:100013403..100013414,- 0 0 0 0
chr10:100027943..100027958,- 0 0 0 0
chr10:100076685..100076699,+ 0 0 0 0
I want output to be like:
chr10 100013403 100013414 - 0 0 0 0
chr10 100027943 100027958 - 0 0 0 0
chr10 100076685 100076699 + 0 0 0 0
So, I want the first column to be tab separated at field delimiter = : , ..
I have used awk -F":|," '$1=$1' OFS="\t" file to separate first column. But, I am still struggling with .. characters.
I tried awk -F":|,|.." '$1=$1' OFS="\t" file but this doesn't work.
.. should be escaped.
awk -F':|,|\\.\\.' '$1=$1' OFS="\t" file
It is important to remember that when you assign a string constant as the value of FS, it undergoes normal awk string processing. For example, with Unix awk and gawk, the assignment FS = "\.." assigns the character string .. to FS (the backslash is stripped). This creates a regexp meaning “fields are separated by occurrences of any two characters.” If instead you want fields to be separated by a literal period followed by any single character, use FS = "\\..".
https://www.gnu.org/software/gawk/manual/html_node/Field-Splitting-Summary.html
If your Input_file is same as shown sample then following may help you too in same.
awk '{gsub(/:|\.+|\,/,"\t");} 1' Input_file
Here I am using gsub keyword of awk to globally substitute (:) (.+ which will take all dots) (,) with TAB and then 1 will print the edited/non-edited line of Input_file. I hope this helps.

how to remove only the first two leading spaces in all lines of a files

my input file is like
*CONTROL_ADAPTIVE
$ adpfreq adptol adpopt maxlvl tbirth tdeath lcadp ioflag
0.10 5.000 2 3 0.0 0.0 0 0
I JUST want to remove the leading 2 spaces in all the lines.
I used
sed "s/^[ \t]*//" -i inputfile.txt
but it deletes all the space from all the lines.. I just want to shift the complete text in files to two position to left.
Any solutions to this?
You can specify that you want to delete two matches of the character set in the brackets:
sed -r -i "s/^[ \t]{2}//" inputfile.txt
See the output:
$ sed -r "s/^[ \t]{2}//" file
*CONTROL_ADAPTIVE
$ adpfreq adptol adpopt maxlvl tbirth tdeath lcadp ioflag
0.10 5.000 2 3 0.0 0.0 0 0

Cutting Element in Unix Based on Column Value

Without a shell script, in a single line. What command can help you cut from a row based on the column value
For example:
In
118 Balboni,Steve 23
11 Baker,Doug 0
120 Armas,Tony 13
133 Allanson,Andy 5
158 Baines,Harold 13
33 Bando,Chris 1
44 Adduci,James 1
50 Aguayo,Luis 3
5 Allen,Rod 0
94 Anderson,Brady 1
IF there 3rd row is not zero, how do I remove the row entirely in one statement? Is this possible in unix?
Assuming that the question is really asking about 'if the third column is non-zero, do not print it' or (equivalently) 'only print the row if the third column is 0':
Using awk:
awk '$3 == 0' data
(If the third column is zero, print the input; otherwise, ignore it. You could add { print } after the 0 to make the action explicit.)
Using perl:
perl -nae 'print if $F[2] == 0' data
Using sed:
sed -n '/ 0$/p' data
Using grep:
grep '[^0-9]0$' input
This does the inplace replacement.
perl -i -F -pane 'undef $_ if($F[2]!=0)' your_file
tested:
> cat temp
118 Balboni,Steve 23
11 Baker,Doug 0
120 Armas,Tony 13
133 Allanson,Andy 5
158 Baines,Harold 13
33 Bando,Chris 1
44 Adduci,James 1
50 Aguayo,Luis 3
5 Allen,Rod 0
94 Anderson,Brady 1
>
>
> perl -i -F -pane 'undef $_ if($F[2]!=0)' temp
> cat temp
11 Baker,Doug 0
5 Allen,Rod 0
>
If you wish to print lines that have no third column as well as those in which the 3rd column is explicitly 0 (ie, if you consider a blank field to be zero), try:
awk '!$3'
If you do not want to print lines with only 2 columns, try:
awk 'NF>2 && !$3'

Slice 3TB log file with sed, awk & xargs?

I need to slice several TB of log data, and would prefer the speed of the command line.
I'll split the file up into chunks before processing, but need to remove some sections.
Here's an example of the format:
uuJ oPz eeOO 109 66 8
uuJ oPz eeOO 48 0 221
uuJ oPz eeOO 9 674 3
kf iiiTti oP 88 909 19
mxmx lo uUui 2 9 771
mxmx lo uUui 577 765 27878456
The gaps between the first 3 alphanumeric strings are spaces. Everything after that is tabs. Lines are separated with \n.
I want to keep only the last line in each group.
If there's only 1 line in a group, it should be kept.
Here's the expected output:
uuJ oPz eeOO 9 674 3
kf iiiTti oP 88 909 19
mxmx lo uUui 577 765 27878456
How can I do this with sed, awk, xargs and friends, or should I just use something higher level like Python?
awk -F '\t' '
NR==1 {key=$1}
$1!=key {print line; key=$1}
{line=$0}
END {print line}
' file_in > file_out
Try this:
awk 'BEGIN{FS="\t"}
{if($1!=prevKey) {if (NR > 1) {print lastLine}; prevKey=$1} lastLine=$0}
END{print lastLine}'
It saves the last line and prints it only when it notcies that the key has changed.
This might work for you:
sed ':a;$!N;/^\(\S*\s\S*\s\S*\)[^\n]*\n\1/s//\1/;ta;P;D' file

Resources