how can i average value to each staff - calculated-columns

I have a table like this:
ID|Money|Staff1|Staff2|Staff3|Staff4
-------------------------------
1 |200 | John | Peter| Mary | John
2 |300 | John | Peter| Mary
and I need to calculate the average money of each staff in the same table.
ID|Staff1|Staff2|Staff3|Staff4
----------------------------------
1 |50 | 50 | 50 | 50
2 |100 | 100 | 100 | Null

A solution to this could be using a subquery to give you the number of how many staff for that given ID. We calculate this by using CASE statements and the addition + symbol. Be careful not to quote your 0's and 1's or you will end up with a string instead of an integer.
In it's simplest format you can output the SQL using the following;
SELECT
a.ID,
a.Money/a.stcnt AS 'Each'
FROM
(SELECT
b.ID,
b.Money,
CASE WHEN Staff1 > '0' THEN 1 ELSE 0 END
+
CASE WHEN Staff2 > '0' THEN 1 ELSE 0 END
+
CASE WHEN Staff3 > '0' THEN 1 ELSE 0 END
+
CASE WHEN Staff4 > '0' THEN 1 ELSE 0 END
AS stcnt
FROM staffmoney b) a
However if you require it with specific columns for each staff, despite the fact these figures will be identical, you can add another series of CASE statements to your top-level SELECT statement, which will look like this;
SELECT
a.ID,
CASE WHEN a.stcnt > 0 THEN a.Money/a.stcnt ELSE NULL END AS 'Staff1',
CASE WHEN a.stcnt > 1 THEN a.Money/a.stcnt ELSE NULL END AS 'Staff2',
CASE WHEN a.stcnt > 2 THEN a.Money/a.stcnt ELSE NULL END AS 'Staff3',
CASE WHEN a.stcnt > 3 THEN a.Money/a.stcnt ELSE NULL END AS 'Staff4'
FROM
(SELECT
b.ID,
b.Money,
CASE WHEN Staff1 > '0' THEN 1 ELSE 0 END
+
CASE WHEN Staff2 > '0' THEN 1 ELSE 0 END
+
CASE WHEN Staff3 > '0' THEN 1 ELSE 0 END
+
CASE WHEN Staff4 > '0' THEN 1 ELSE 0 END
AS stcnt
FROM staffmoney b) a
Not very efficient, my strongest advice would probably be to remodel your data so that you can calculate from rows rather than columns, but this will get you going with what you have if that is what you require.
Good luck!

Related

Difference between 2 consecutive values in Kusto

I have the following script:
let StartTime = datetime(2022-02-18 10:10:00 AM);
let EndTime = datetime(2022-02-18 10:15:00 AM);
MachineEvents
| where Timestamp between (StartTime .. EndTime)
| where Id == "00112233" and Name == "Higher"
| top 2 by Timestamp
| project Timestamp, Value
I got the following result:
What I am trying to achieve after that is to check if the last Value received (in this case for example it is 15451.433) is less than 30,000. If the condition is true, then I should check again the difference between the last two consecutive values (in this case : 15451.433 - 15457.083). If the difference is < 0 then I should return the Value as true, else it should return as false (by other words the Value should give a boolean value instead of double as shown in the figure)
datatable(Timestamp:datetime, Value:double)
[
datetime(2022-02-18 10:15:00 AM), 15457.083,
datetime(2022-02-18 10:14:00 AM), 15451.433,
datetime(2022-02-18 10:13:00 AM), 15433.333,
datetime(2022-02-18 10:12:00 AM), 15411.111
]
| top 2 by Timestamp
| project Timestamp, Value
| extend nextValue=next(Value)
| extend finalResult = iff(Value < 30000, nextValue - Value < 0, false)
| top 1 by Timestamp
| project finalResult
Output:
finalResult
1
You can use the prev() function (or next()) to process the values in the other rows.
...
| extend previous = prev(value)
| extend diff = value - previous
| extend isPositive = diff > 0
You might need to use serialize if you don't have something like top that already does that for you.

Calculate average of 1kb windows

My files looks like the following:
18 1600014 + CAA 0 3
18 1600017 - CTT 0 1
18 1600019 - CTC 0 1
18 1600020 + CAT 0 3
18 1600031 - CAA 0 1
18 1600035 - CAT 0 1
...
I am trying to calculate the average of column 6 in windows that cover 1000 range of column 2. So from 1600001-1601000, 1601001-1602000, etc. My values go from 1600000-1700000. Is there any way to do this is one step? My initial thought was to use grep to sort these values, but that would require many different commands. I am aware you can calculate the average with awk but can you reiterate over each window?
Desire output would be something like this:
1600001-1601000 3.215
1601001-1602000 3.141
1602001-1603000 3.542
You can use GNU awk to gather the counts and sums, if I understand your problem correct, you might need something like this:
BEGIN { mod = 1000
PROCINFO["sorted_in"] = "#ind_num_asc"
}
{
k= ($2 - ( $2 % mod ) ) / mod
sum[ k ]+= $6
cnt[ k ]++
}
END {
for( k in sum ) printf( "%d-%d\t%6.3f\n", k*mod +1, (k+1)*mod, sum[k] / cnt [k])
}

Counting number of rows depending on more than 1 column condition

I have a data file like this
H1 H2 H3 E1 E2 E3 C1 C2 C3
0 0 0 0 0 0 0 0 1
1 0 0 0 1 0 0 0 1
0 1 0 0 1 0 1 0 1
now i want to count the rows where H1,H2,H3 has the same pattern as E1,E2 and E3. for example, i want to count the number of time H1,H2,H3 and E1,E2,E3 both are 010 or 000.
I tried to use this code but it doesnt really work
awk -F "" '!($1==0 && $2==1 && $3==0 && $4==0 && $5==1 && $6==0)' file | wc -l
Something like
>>> awk '$1$2$3 == $4$5$6' input | wc -l
2
What it does?
$1$2$3 == $4$5$6 Checks if the string formed by columns 1 2 and 3 is equal to the columns formed by 4 5 and 6. When it is true, awk takes the default action of printing the entire line and the wc takes care of counting those lines.
Or, if you want complete awk solution, you can write
>>> awk '$1$2$3 == $4$5$6{count++} END{print count}' input
2

How to loop an awk command on every column of a table and output to a single output file?

I have a multi column file composed of single unit 1s, 2s and 3s. There are a lot of repeats of a unit in each column, and sometimes it switches from one to another. I want to count how many times this switch happens on every column. For example in column 1 the switch change from 1 to 2 to 3 to 1, so there are 3 switches and the output should be 3. In the second column there are 2s the entire column, so the changes is 0 and the output is 0.
My input file has 4000 columns so it is impossible to do it by hand. The file is space separated.
For example:
Input:
1 2 3 1 2
1 2 2 1 3
1 2 3 1 2
2 2 2 1 2
2 2 2 1 2 ......
3 2 2 1 2
3 2 2 1 1
1 2 2 1 1
1 2 2 1 2
1 2 2 1 1
Desired output:
3 ## column 1 switch times
0 ## column 2 switch times
3 .....
0
5
I was using:
awk '{print $1}' <inputfile> | uniq | wc -l
awk '{print $2}' <inputfile> | uniq | wc -l
awk '{print $3}' <inputfile> | uniq | wc -l
....
This execute one column at a time. It will give me the output "4" for the first column, later I will just calculate 4-1 =3 to get my desired output. But Is there a way I can write this awk command into a loop and execute it on each column and output to one file?
Thanks!
awk tells you how many fields are in a given row in the variable NF, so you can create two arrays to keep track of the information you need. One array will keep the value of the last row in the given column. The other will count the number of switches in a given column. You'll also keep a track of the maximum number of columns (and set the counts for new columns to zero so that they are printed appropriately in the output at the end if the number of switches is 0 for that column). You'll also make sure you don't count the transition from an empty string to a non-empty string — which happens when the column is encountered for the first time.
If, in fact, the file is uniformly the same number of columns, that will only affect the first row of data. If subsequent rows actually have more columns than the first line, then it adds them. If a column stops appearing for a bit, I've assumed it should resume where it left off (as if the missing columns were the same value as before). You can decide on different algorithms; that could count as two transitions (from number to blank and from blank to number too. If that's the case, you have to modify the counting code. Or, perhaps more sensibly, you could decide that irregular numbers of columns are simply not allowed, in which case you can bail out early if the number of columns in the current row is not the same as in the previous row (beware blank lines, or are they outlawed too?).
And you won't try writing the whole program on one line because it will be incomprehensible and it really isn't necessary.
awk '{ if (NF > maxNF)
{
for (i = maxNF + 1; i <= NF; i++)
count[i] = 0;
maxNF = NF;
}
for (i = 1; i <= NF; i++)
{
if (col[i] != "" && $i != col[i])
count[i]++;
col[i] = $i;
}
}
END {
for (i = 1; i <= maxNF; i++)
print count[i];
}' data-file-with-4000-columns
Given your sample data (with the dots removed), the output from the script is as requested:
3
0
3
0
5
This alternative data file with jagged rows:
1 2 3 1 2
1 2 2 1 3
1 2 3 1 2
2 2 2 1 2
2 2 2 1 2 1 1 1
3 2 2 1 2 2 1
3 2 2 1 1
1 2 2 1 1 2 2 1
1 2 2 1
1 2 2 1 1 3
produces the output:
3
0
3
0
3
2
1
0
Which is correct according to the rules I formulated — but if you decide you want different rules to cover the data, you can end up with different answers.
If you used printf("%d\n", count[i]); in the final loop, you'd not need to set the count values to zero in a loop. You pays your money and takes your pick.
Use a loop and keep an array for each of the column current value and another array for the corresponding count:
awk '{for(i=0;i<5;i++) if(c[i]!=$(i+1)) {c[i]=$(i+1); t[i]++}} END{for(i=0;i<5;i++)print t[i]-1}' filename
Note that this assumes that column's value are not zero. If you happen to have zero values, then just initialize the array c to some unique value which will not be present in the file.
Coded out for ease of viewing, SaveColx, CountColx should be arrays. I'd print the column number itself in the results at least for checking :-)
BEGIN {
SaveCol1 = " "
CountCol1 = 0
CountCol2 = 0
CountCol3 = 0
CountCol4 = 0
CountCol5 = 0
}
{
if ( SaveCol1 == " " ) {
SaveCol1 = $1
SaveCol2 = $2
SaveCol3 = $3
SaveCol4 = $4
SaveCol5 = $5
next
}
if ( $1 != SaveCol1 ) {
CountCol1++
SaveCol1 = $1
}
if ( $2 != SaveCol2 ) {
CountCol2++
SaveCol2 = $2
}
if ( $3 != SaveCol3 ) {
CountCol3++
SaveCol3 = $3
}
if ( $4 != SaveCol4 ) {
CountCol4++
SaveCol4 = $4
}
if ( $5 != SaveCol5 ) {
CountCol5++
SaveCol5 = $5
}
}
END {
print CountCol1
print CountCol2
print CountCol3
print CountCol4
print CountCol5
}

How to select rows in which column two and three are not equal to each other and to 0 or 1?(with awk)

I have a file like this:
AX-75448119 0 1
AX-75448118 0.45 0.487179
AX-75474642 0 0
AX-75474643 0.25 0.820513
AX-75448113 1 0
AX-75474641 1 1
and I want to select the rows that column 2 and 3 are not equal each other and 0 or 1 (both of them)! (i.e if column 2 and 3 are similar but equal to 0.5 (or any other number except 0 and 1) I would like to have that row)
so the output would be:
AX-75448119 0 1
AX-75448118 0.45 0.487179
AX-75474643 0.25 0.820513
AX-75448113 1 0
I know how to write the command to select the rows that column 2 and 3 are equal to each other and are equal to 0 or 1 which is this:
awk '$2=$3==1 || $2=$3==0' test.txt | wc -l
but I want exactly the opposite, to select every rows that are not the output of the above command!
Thanks, I hope I was able to explain what I want
It might work for you if I get your requirements correctly.
awk ' $2 != $3 { print; next } $2 == $3 && $2 != 0 && $2 != 1 { print }' INPUTFILE
See it in action at Ideone.com
This might work for you:(?)
awk '($2==0 || $2==1) && ($3==0 || $3==1) && $2==$3{next}1' file

Resources