Find a pattern and modify the next line without modifying the other contents in the file. preferably linux based commands (sed, awk etc) - linux

I have a file which looks like this:
# Hello, welcome to the world
# Trying to modify XXXXXX
# Some more random text
poly RANDOM LAYER{
20 25
18 2
1 5
1 2
5 6
}
poly RANDOM LAYER{
30 50
14 25
15 25
15 26
15 26
15 27
}
I would like to increment the values in the next line of poly RANDOM layer,say add 10 to the first number (20+10=30) and 20 to the next number (25+20=45). The rest of the contents should be the same:
This should be done for all the lines immediately after poly RANDOM LAYER
The output should look like:
# Hello, welcome to the world
# Trying to modify XXXXXX
# Some more random text
poly RANDOM LAYER{
*30 45*
18 2
1 5
1 2
5 6
}
poly RANDOM LAYER{
*40 60*
14 25
15 25
15 26
15 26
15 27
}

If the specific leading white space is always 4 chars:
$ awk 'f{$1=" "$1+10; $2+=20; f=0} /RANDOM/{f=1} 1' file
# Hello, welcome to the world
# Trying to modify XXXXXX
# Some more random text
poly RANDOM LAYER{
30 45
18 2
1 5
1 2
5 6
}
poly RANDOM LAYER{
40 70
14 25
15 25
15 26
15 26
15 27
}
otherwise use:
$ awk 'f{fmt=$0; gsub(/[^[:space:]]+/,"%s",fmt); $0=sprintf(fmt,$1+10,$2+20); f=0} /RANDOM/{f=1} 1' file
as that will just reproduce in your output WHATEVER leading, trailing, or inter-field white space you have in your input.

You say (sed, awk, etc). Is perl part of etc?
perl -pe 's/(\d+)/$1+10/ge if($lastLineMatch); $lastLineMatch = m/poly RANDOM/; ' < file
Or if you want to add different values to the two numbers:
perl -pe 's/(\d+)(\D+)(\d+)/($1+10).$2.($3+20)/ge if($lastLineMatch); $lastLineMatch = m/poly RANDOM/; ' < file

Related

Efficient Reading of Input File

Currently for a task, I am working with input files which give Matrix related test cases (Matrix Multiplication) i.e., example of an input file ->
N M
1 3 5 ... 6 (M columns)
....
5 4 2 ... 1 (N rows)
I was using simple read() to access them till now, but this is not efficient for large files of size > 10^2.
So I wanted to know is there some way to use processes to do this in parallel.
Also I was thinking of using multiple IO readers based on line, so then each process could read different segments of the file but couldn't find any helpful resources.
Thank you.
PS: Current code is using this:
io:fread(IoDev, "", "~d")
Did you consider to use re module? I did not make a performance test, but it may be efficient. In the following example I do not use the first "M N" line. So I did not put it in the matrix.txt file.
matrix file:
1 2 3 4 5 6 7 8 9
11 12 13 14 15 16 17 18 19
21 22 23 24 25 26 27 28 29
31 32 33 34 35 36 37 38 39
I made the conversion in the shell
1> {ok,B} = file:read_file("matrix.txt"). % read the complete file and store it in a binary
{ok,<<"1 2 3 4 5 6 7 8 9\r\n11 12 13 14 15 16 17 18 19\r\n21 22 23 24 25 26 27 28 29\r\n31 32 33 34 35 36 37 38 39">>}
2> {ok,ML} = re:compile("[\r\n]+"). % to split the complete binary in a list a binary, one for each line
{ok,{re_pattern,0,0,0,
<<69,82,67,80,105,0,0,0,0,0,0,0,1,8,0,0,255,255,255,255,
255,255,...>>}}
3> {ok,MN} = re:compile("[ ]+"). % to split the line into binaries one for each integer
{ok,{re_pattern,0,0,0,
<<69,82,67,80,73,0,0,0,0,0,0,0,17,0,0,0,255,255,255,255,
255,255,...>>}}
4> % a function to split a line and convert each chunk into integer
4> F = fun(Line) -> Nums = re:split(Line,MN), [binary_to_integer(N) || N <- Nums] end.
#Fun<erl_eval.7.126501267>
5> Lines = re:split(B,ML). % split the file into lines
[<<"1 2 3 4 5 6 7 8 9">>,<<"11 12 13 14 15 16 17 18 19">>,
<<"21 22 23 24 25 26 27 28 29">>,
<<"31 32 33 34 35 36 37 38 39">>]
6> lists:map(F,Lines). % map the function to each lines
[[1,2,3,4,5,6,7,8,9],
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29],
[31,32,33,34,35,36,37,38,39]]
7>
if you want to check the matrix size, you can replace the last line with:
[[NbRows,NbCols]|Matrix] = lists:map(F,Lines),
case (length(Matrix) == NbRows) andalso
lists:foldl(fun(X,Acc) -> Acc andalso (length(X) == NbCols) end,true,Matrix) of
true -> {ok,Matrix};
_ -> {error_size,Matrix}
end.
is there some way to use processes to do this in parallel.
Of course.
Also I was thinking of using multiple IO readers based on line, so
then each process could read different segments of the file but
couldn't find any helpful resources.
You don't seek to positions in a file by line, rather you seek to byte positions. While a file may look like a bunch of lines, a file is actually just one long sequence of characters. Therefore, you will need to figure out what byte positions you want to seek to in the file.
Check out file:position, file:pread.

Shifting column titles to right

I have a file which I want to process it in bash or python.
The structure is with 4 columns but only with 3 column titles:
input.txt
1STCOLUMN 2NDCOLUMN THIRDCOLUMN
input1 12 33 45
input22 10 13 9
input4 2 23 11
input4534 3 1 1
I am trying to shift the title columns to right and add a title of "INPUTS" to the first column (input column).
Desired output: Adding the column title
Desired-output-step1.csv
INPUTS 1STCOLUMN 2NDCOLUMN THIRDCOLUMN
input1 12 33 45
input22 10 13 9
input4 2 23 11
input4534 3 1 1
I tried with sed:
sed -i '1iINPUTS, 1STCOLUMN, 2NDCOLUMN, THIRDCOLUMN' input.txt
But I do not prefer to type the names of the columns for this reason.
How do I just insert the new title to first column and the other column titles shift to right?
you can specify which line to be replaced using line numbers
$ sed '1s/^/INPUTS /' ip.txt
INPUTS 1STCOLUMN 2NDCOLUMN THIRDCOLUMN
input1 12 33 45
input22 10 13 9
input4 2 23 11
input4534 3 1 1
here, 1 indicates that you want to apply s command only for 1st line
s/^/INPUTS / insert something to start of line, you'll have to adjust the spacing as needed
instead of counting and testing the spaces, you can let column -t do the padding and formatting job:
sed '1s/^/INPUTS /' ip.txt|column -t
This will give you:
INPUTS 1STCOLUMN 2NDCOLUMN THIRDCOLUMN
input1 12 33 45
input22 10 13 9
input4 2 23 11
input4534 3 1 1

How can I use awk for modify a column based in the first column?

I have a data like this, and I need automatize a simple task. I need to make the second value of a row, become the same as the first cell in the next row in the sequence like this:
First Second
1 2
4 6
10 12
25 28
30 35
Become
First Second
1 4
4 10
10 25
25 30
30 35
$ awk 'NR==1; NR>2{print p[1], $1} {split($0,p)} END{print p[1], p[2]}' file
First Second
1 4
4 10
10 25
25 30
30 35
It should be noted your output is wrong, you cannot know the 35 because that row has not been read yet:
$ awk 'NR > 1 {print $1} {printf $1 "\t"}' file
1 4
4 10
10 25
25 30
30

How to generate 3 natural number that sum to 60 using awk

I am trying to write awk script that generate 3 natural numbers that sum to 60. I am trying with rand function but I`ve got problem with sum to 60
Here is one way:
awk -v n=60 'BEGIN{srand();a=int(rand()*n);b=int(rand()*(n-a));c=n-a-b;
print a,b,c}'
Idea is:
generate random number a :0=<a<60
generate random number b :0=<b<60-a
c=60-a-b
here, I set a variable n=60, to make it easy if you have other sum.
If we run this one-liner 10 times, we get output:
kent$ awk 'BEGIN{srand();for(i=1;i<=10;i++){a=int(rand()*60);b=int(rand()*(60-a));c=60-a-b;print a,b,c}}'
46 7 7
56 1 3
26 15 19
14 12 34
44 6 10
1 36 23
32 1 27
41 0 19
55 1 4
54 1 5

How to set an arbitrary seed to the --random-sort option of Linux SORT?

In man page of SORT, it says you can set a random source like:
$ sort some.txt --random-sort --random-source=/dev/urandom
I want to an standard output text to the source like:
$ sort some.txt --random-sort --random-source=`date +"%m%d%H%M"`
But this only says:
open failed: 11021103: No such file or directory
How can I do this?
Here's a simple python script that takes a seed and outputs random bytes:
> cat rand_bits.py
import random
import sys
if len(sys.argv) > 1:
rng = random.Random(int(sys.argv[-1]))
else:
rng = random.Random(0xBA5EBA11)
try:
while True:
sys.stdout.write(chr(rng.getrandbits(8)))
except (IOError, KeyboardInterrupt):
pass
sys.stdout.close()
You can just feed those bytes straight into sort:
> sort <(seq 25) -R --random-source=<(python rand_bits.py 5)
8
2
4
7
10
19
17
11
3
20
14
18
1
16
25
12
5
21
24
23
22
9
15
13
6
By the way, the input can be any file, but the file better be long enough!
> sort <(seq 25) -R --random-source=<(date +"%m%d%H%M")
sort: /dev/fd/12: end of file
> sort <(seq 25) -R --random-source=/dev/sda1
3
13
24
5
10
16
4
17
12
18
14
2
6
15
23
21
19
11
9
1
20
25
22
8
7

Resources