BASH Script - Check if consecutive numbers in a string are above a value - string

I am echoing some data from an Oracle DB cluster, via a bash script. Currently, my output into a variable in the script from SQLPlus is:
11/12 0 0 0 0 0 0 1 0 1 0 5 4 1 0 0 0 0 0 0 0 0 0 0 0
What I'd like to be able to do is evaluate that string of numbers, excluding the first one (the date), to see if any consecutive 6 of the numbers are above a certain value, lets say 10.
I only want the logic to return true if all 6 consecutive values were above "10".
So for example, if the output was:
11/12 0 0 8 10 5 1 1 0 8 10 25 40 6 2 0 0 0 0 0 0 0 0 0 0
The logic should return false/null/zero, anything I can handle negatively.
But if the string looked like this:
11/12 0 0 0 0 5 9 1 0 1 10 28 10 12 19 15 11 6 7 0 0 0 0
Then it would return true/1 etc..
Is there any bash component that I can make use of to do this? I've been stuck on this part for a while now.

For variety, here is a solution not depending on awk:
#!/usr/bin/env bash
contains() {
local nums=$* count=0 threshold=10 limit=6 i
for i in ${nums#* }; do
if (( i >= threshold )); then
(( ++count >= limit )) && return 0
else
count=0
fi
done
return 1
}
output="11/12 0 0 0 0 5 9 1 0 1 10 28 10 12 19 15 11 6 7 0 0 0 0"
if contains "$output"; then
echo "Yaaay!"
else
echo "Noooo!"
fi

Say your string is in $S, then
echo $S | awk '
{ L=0; threshold = 10; reqLength = 6;
for (i = 2; i <= NF; ++i) {
if ($i >= threshold) {
L += 1
if (L >= reqLength) {
exit(1);
}
} else {
L = 0
}
}
}'
would do it. ($? will be 1 if you have enough numbers exceeding your threshold)

Related

How to filter a matrix based on another column

I want to filter a matrix file using a column from another file.
I have 2 tab-separated files. One includes a matrix. I want to filter my matrix file based on the first column of FileB. If the headers(column names) of this matrix file (FileA) are present in the first column of File B, I want to filter them to use in a new file. All solutions I could try were based on filtering rows, not fields. Any help is appreciated. Thanks!
FileA
A B C D E F G H I J K L M N
R1 0 0 0 0 0 0 0 0 0 1 0 0 1 1
R2 1 1 0 1 0 0 0 0 1 0 1 0 0 0
R3 0 0 0 0 0 0 0 0 0 0 0 0 0 1
R4 1 1 0 1 0 0 0 1 0 1 0 1 0 0
R5 0 0 0 0 1 0 1 0 1 0 1 0 1 0
FileB
A Green
B Purple
K Blue
L Blue
Z Green
M Purple
N Red
O Red
U Red
My expected output is:
ExpectedOutput
A B K L M N
R1 0 0 0 0 1 1
R2 1 1 1 0 0 0
R3 0 0 0 0 0 1
R4 1 1 0 1 0 0
R5 0 0 1 0 1 0
Oh, what the heck, I'm not sure having you post an R script is really going to make any difference other than satisfying my need to be pedantic so here y'go:
$ cat tst.awk
NR == FNR {
outFldNames2Nrs[$1] = ++numOutFlds
next
}
FNR == 1 {
$0 = "__" FS $0
for (inFldNr=1; inFldNr<=NF; inFldNr++) {
outFldNr = outFldNames2Nrs[$inFldNr]
out2inFldNrs[outFldNr] = inFldNr
}
}
{
printf "%s", $1
for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
inFldNr = out2inFldNrs[outFldNr]
if (inFldNr) {
printf "%s%s", OFS, $inFldNr
}
}
print ""
}
$ awk -f tst.awk fileB fileA
__ A B K L M N
R1 0 0 0 0 1 1
R2 1 1 1 0 0 0
R3 0 0 0 0 0 1
R4 1 1 0 1 0 0
R5 0 0 1 0 1 0
I'm using the term "field name" to apply to the letter at the top of each column ("field" in awk). Try to figure the rest out for yourself from looking at the man pages and adding "prints" if/when useful and then feel free to ask questions if you have any.
I added __ at the front of your header line so you'd have the same number of columns in every line of output - that makes it easier to pass along to other tools to manipulate further but it's easy to tweak the code to not do that if you don't like it.
As #EdMorton mentions, bash may not be a suitable tool to manipulate
complex data structure as a table from maintainability and robustness
point of view.
Here is a bash script example just for information:
#!/bin/bash
declare -A seen
declare -a ary include
while read -r alpha color; do
seen["$alpha"]=1
done < FileB
while read -r -a ary; do
if (( $((nr++)) == 0 )); then # handle header line
echo -n " "
for (( i=0; i<${#ary[#]}; i++ )); do
alpha="${ary[$i]}"
if [[ ${seen["$alpha"]} = 1 ]]; then
echo -n " $alpha"
include[$((i+1))]=1
fi
done
else
echo -n "${ary[0]}"
for (( i=1; i<${#ary[#]}; i++ )); do
if [[ ${include[$i]} = 1 ]]; then
echo -n " ${ary[$i]}"
fi
done
fi
echo
done < FileA
If python is your option, you can say instead something like:
import pandas as pd
dfb = pd.read_csv("./FileB", sep="\s+", header=None)
vb = [x[0] for x in dfb.values.tolist()]
dfa = pd.read_csv("./FileA", sep="\s+")
va = dfa.columns.tolist()
print(dfa[sorted(set(va) & set(vb))])
Output:
A B K L M N
R1 0 0 0 0 1 1
R2 1 1 1 0 0 0
R3 0 0 0 0 0 1
R4 1 1 0 1 0 0
R5 0 0 1 0 1 0

How to delete the first subset of each set of column in a data file?

I have a data file with more than 40000 column. In header each column's name begins with C1 , c2, ..., cn and each set of c has one or several subset for example c1. has 2 subsets. I need to delete first column(subset) of each set of c. for example if input looks like :
input:
c1.20022 c1.31012 c2.44444 c2.87634 c2.22233 c3.00444 c3.44444
1 1 0 1 0 0 0 1
2 0 1 0 0 1 0 1
3 0 1 0 0 1 1 0
4 1 0 1 0 0 1 0
5 1 0 1 0 0 1 0
6 1 0 1 0 0 1 0
I need the output be like:
c1.31012 c2.87634 c2.22233 c3.44444
1 0 0 0 1
2 1 0 1 1
3 1 0 1 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 1 0 0 0
Any suggestion please?
update: If there be no space between digits in row (which is th real situation of my data set) then what should I do? my mean is that my real data looks like this:
input:
c1.20022 c1.31012 c2.44444 c2.87634 c2.22233 c3.00444 c3.44444
1 1010001
2 0100101
3 0100110
4 1010010
5 1010010
6 1010010
and output:
c1.31012 c2.87634 c2.22233 c3.44444
1 0001
2 1011
3 1010
4 0000
5 0000
6 0000
7 1000
Perl solution: It first reads the header line, uses a regex to extract the column name before a dot, and keeps a list of column numbers to keep. It then uses the indices to print only the wanted columns from the header and remaining lines.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #header = split ' ', <>;
my $last = q();
my #keep;
for my $i (0 .. $#header) {
my ($prefix) = $header[$i] =~ /(.*)\./;
if ($prefix eq $last) {
push #keep, $i + 1;
}
$last = $prefix;
}
unshift #header, q();
say join "\t", #header[#keep];
while (<>) {
my #columns = split;
say join "\t", #columns[#keep];
}
Update:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #header = split ' ', <>;
my $last = q();
my #keep;
for my $i (0 .. $#header) {
my ($prefix) = $header[$i] =~ /(.*)\./;
if ($prefix eq $last) {
push #keep, $i;
}
$last = $prefix;
}
say join "\t", #header[#keep];
while (<>) {
my ($line_number, $all_digits) = split;
my #digits = split //, $all_digits;
say join "\t", $line_number, join q(), #digits[#keep];
}

How to extract only the odd position value of an string?

I have a string Data="0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0";
I want to extract only the odd position of Data.
I mean New string with value:-0 0 0 0 0 0 0 0
You could do this:
var Data="0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0";
var output= string.Join(" ", Data
.Split(' ')
.Select ((s,i) =>new {s,i})
.Where (w =>w.i % 2 != 0 )
.Select (s => s.s));
Output will be:
0 0 0 0 0 0 0 0
You could also do this:
private IEnumerable<string> GetOdd(string data)
{
var split=data.Split(' ');
for(int i=0;i<split.Length;i++)
{
if(i % 2 != 0)
yield return split[i];
}
}
And then call the function like this:
var output= string.Join(" ", GetOdd(Data))

how to select only those lines which have same string in all columns with first line of different header in linux

I have a text file with 200 columns like:
sample1 0 12 11 23 12
sample2 3 16 89 12 0
sample3 0 0 0 0 0
sample4 33 22 0 0 0
sample5 0 0 0 0 0
And I want only those lines which have only 0 from column 2 to 6. desired out put is:
sample3 0 0 0 0 0
sample5 0 0 0 0 0
Like this, for example:
$ awk '!$2 && !$3 && !$4 && !$5 && !$6' file
sample3 0 0 0 0 0
sample5 0 0 0 0 0
Which is the same as:
$ awk '!($2 || $3 || $4 || $5 || $6)' file
sample3 0 0 0 0 0
sample5 0 0 0 0 0
As per your comment
that is for example but i want to do that from column 2 to 200th
This can be a way:
$ awk '{for (i=2;i<=200;i++) if ($i) {next}}1' file
sample3 0 0 0 0 0
sample5 0 0 0 0 0
Note that $i refers to the field in the position i. $i is true when it has got a "true" value. Hence, $i will be false when it is 0.
Based on that approach, we loop through all values. In case one value is True, meaning not 0, then we do next, which means that the line is not analyzed any more. For the rest of the cases (2nd to 200th column being 0 or empty), the next is not accomplished so it interprets the 1, which makes {print $0} to be executed.

Convert column pattern

I have this kind of file:
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
8 0 1
10 0 1
11 0 1
The RS separator is an empty line by default.
If there was a double blank line, we have to substitute on of them by a pattern $1 0 0, where $1 means the increased "number" before the $1 0 * record.
If the separator is empty line + 1 empty line we have to increase the $1 by 1.
If the separator is empty line + 2 empty line we have to increase the $1 by 2.
...
and I need to get this output:
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
6 0 0
7 0 0
8 0 1
9 0 0
10 0 1
11 0 1
Thanks in advance!
awk 'NF{f=0;n=$1;print;next}f{print ++n " 0 0"}{print;f=1}' ./infile
Output
$ awk 'NF{f=0;n=$1;print;next}f{print ++n " 0 0"}{print;f=1}' ./infile
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
6 0 0
7 0 0
8 0 1
9 0 0
10 0 1
11 0 1
Explanation
NF{f=0;n=$1;print;next}: if the current line has data, unset flag f, save the number in the first field to n, print the line and skip the rest of the script
{print;f=1}: We only reach this action if the current line is blank. If so, print the line and set the flag f
f{print ++n " 0 0"}: We only execute this action if the flag f is set which only happens if the previous line was blank. If we enter this action, print the missing fields with an incremented n
You can try something like this. The benefit of this way is that your input file need not have an empty line for the missing numbers.
awk -v RS="" -v ORS="\n\n" -v OFS="\n" '
BEGIN{getline; col=$1;line=$0;print line}
$1==col{print $0;next }
($1==col+1){print $0;col=$1;next}
{x=$1;y=$0; col++; while (col < x) {print col" 0 0";col++};print y;next}' file
Input File:
[jaypal:~/Temp] cat file
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
8 0 1
10 0 1
11 0 1
Script Output:
[jaypal:~/Temp] awk -v RS="" -v ORS="\n\n" -v OFS="\n" '
BEGIN{getline; col=$1;line=$0;print line}
$1==col{print $0;next }
($1==col+1){print $0;col=$1;next}
{x=$1;y=$0; col++; while (col < x) {print col" 0 0";col++};print y;next}' file
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
6 0 0
7 0 0
8 0 1
9 0 0
10 0 1
11 0 1

Resources