How to return the number of values greater than X with multiple criteria - excel-formula

I am seeking for a formula that returns the counts the values greater than 20 after applying two criterias.
I have a table with 3 fields:
Field A: 18, 18, 19, 19, 21, 21, 44, 55, 55, 56, 61, 61, 75, 76, 86
Field B: 1, 4, 1, 5, 1, 6, 3, 1, 2, 1, 1, 3, 1, 1, 1
Field C: 5, 2, 14, 7, 38, 1, 100, 76, 32, 65, 83, 20, 17, 41, 88
I have two criterias:
Criteria1: 18, 55, 61, 75, 86 (this is an array)
Criteria2: 1
Steps:
Step 1 - Apply Criteria_1 to Field_A
Step 2 - Apply Criteria_2 to Field_B
Step 3 - Return number of values greater than 20
Regards,
Elio Fernandes

=SUM(ISNUMBER(MATCH(A1:A15, {18,55,61,75,86}, 0)) * (B1:B15 = 1) * (C1:C15 > 20))
Ctrl+Shift+Enter
This uses the property that TRUE counts as 1 and FALSE counts as 0.

Related

transform integer value patterns in a column to a group

DataFrame
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3]})
df
Expected output
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3],'group':[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 4, 100, 5, 5, 5, 5]})
df
I need to transform the dataframe into the output. I am after a wild card that will determine 1 is the start of a new group and a group consists of only 1 followed by n zeroes. If a group criteria is not met, then group it as 100.
I tried in the line of;
bs=df[df.occurance.eq(1).any(1)&df.occurance.shift(-1).eq(0).any(1)].squeeze()
bs
This even when broken down could only bool select start and nothing more.
Any help?
Create mask by compare 1 and next 1 in mask, then filter occurance for all values without them, create cumulative sum by Series.cumsum and last add 100 values by Series.reindex:
m = df.occurance.eq(1) & df.occurance.shift(-1).eq(1)
df['group'] = df.loc[~m, 'occurance'].cumsum().reindex(df.index, fill_value=100)
print (df)
occurance value group
0 1 45 1
1 0 3 1
2 0 2 1
3 0 12 1
4 1 14 2
5 0 32 2
6 0 1 2
7 0 1 2
8 0 6 2
9 0 4 2
10 1 9 3
11 0 32 3
12 1 78 100
13 1 96 4
14 0 12 4
15 0 6 4
16 0 3 4

create new dataframe based upon max value in one column and corresponding value in a second column

I have a dataframe created by extracting data from a source (network wireless controller).
Dataframe is created off of a dictionary I build. This is basically what I am doing (a sample to show structure - not the actual dataframe):
df = pd.DataFrame({'AP-1': [30, 32, 34, 31, 33, 35, 36, 38, 37],
'AP-2': [30, 32, 34, 80, 33, 35, 36, 38, 37],
'AP-3': [30, 32, 81, 31, 33, 101, 36, 38, 37],
'AP-4': [30, 32, 34, 95, 33, 35, 103, 38, 121],
'AP-5': [30, 32, 34, 31, 33, 144, 36, 38, 37],
'AP-6': [30, 32, 34, 31, 33, 35, 36, 110, 37],
'AP-7': [30, 87, 34, 31, 111, 35, 36, 38, 122],
'AP-8': [30, 32, 99, 31, 33, 35, 36, 38, 37],
'AP-9': [30, 32, 34, 31, 33, 99, 88, 38, 37]}, index=['1', '2', '3', '4', '5', '6', '7', '8', '9'])
df1 = df.transpose()
This works fine.
Note about the data. Columns 1,2,3 are 'related'. They go together. Same for columns 4,5,6 and 7,8,9. I will explain more shortly.
Columns 1, 4, 7 are client count. Columns 2, 5, 8 are channel util on the 5 Ghz spectrum. Columns 3, 6, 9 are channel util on the 2.4 Ghz spectrum.
Basically I take a reading at 5 minute intervals. The above would represent three readings at 5 minute intervals.
What I want is two new dataframes, two columns each, constructed as follows:
Examine the 5 Ghz columns (here it is 2, 5, 8). Which ever has the highest value becomes column 1 in the new dataframe. Column 2 would be the value of the client count column related to the 5 Ghz column with the highest value. In other words, if column 2 were the highest out of columns 2, 5, 8, then I want the value in column 1 to be the value in the new dataframe for the second column. If the value in column 8 were highest, then I want to also pull the value in column 7. I want the index to be same in the new dataframes as the original -- AP name.
I want to do this for all rows in the 'main' dataframe. I want two new dataframes -- so I will repeat this exact procedure for the 5 Ghz columns and the 2.4 (columns 3, 6, 9 -- also grabbing the corresponding highest client count value for the second column in the new dataframe.
What I have tried:
First I broke the main dataframe into three: df1 has all the client count columns, df2 has the 5 Ghz, and df3 has the 2.4 info, using this:
# create client count only dataframe
df_cc = df[df.columns[::3]]
print(df_cc)
print()
# create 5Ghz channel utilization only dataframe
df_5Ghz = df[df.columns[1::3]]
print(df_5Ghz)
print()
# create 2.4Ghz channel utilization only dataframe
df_24Ghz = df[df.columns[2::3]]
print(df_24Ghz)
print()
This works.
I thought I could then reference the main dataframe, but I don't know how.
Then I found this:
extract column value based on another column pandas dataframe
The query option looked great, but I don't know the value. I need to first discover the max value of the 2.4 and 5 Ghz columns respectively, then grab the corresponding client count value. That is why I first created dataframes containing the 2.4 and 5 Ghz values only, thinking I could first get the max value of each row, then do a lookup on the main dataframe (or use the client count onlydataframe I created), but I just do not know how to realize this idea.
Any assistance would be greatly appreciated.
You can get what you want in 3 steps:
# connection between columns
mapping = {'2': '1', '5': '4', '8': '7'}
# 1. column with highest value among 5GHz values (pandas series)
df2 = df1.loc[:, ['2', '5', '8']].idxmax(axis=1)
df2.name = 'highest value'
# 2. column with client count corresponding to the highest value (pandas series)
df3 = df2.apply(lambda x: mapping[x])
df3.name = 'client count'
# 3. build result using 2 lists of columns (pandas dataframe)
df4 = pd.DataFrame(
{df.name: [
df1.loc[idx, col]
for idx, col in zip(df.index, df.values)]
for df in [df2, df3]},
index=df1.index)
print(df4)
Output:
highest value client count
AP-1 38 36
AP-2 38 36
AP-3 38 36
AP-4 38 103
AP-5 38 36
AP-6 110 36
AP-7 111 31
AP-8 38 36
AP-9 38 88
I guess while not sure it would be easier to solve the issue (and faster to compute) without pandas using just built-in python data types - dictionaries and lists.

Image.frombytes not writing squares

I have a numpy array:
[[12 13 12 5 6 5 14 4 6 11 11 10 8 11 8 11 7 8 0 0 0]
[ 5 14 4 6 11 11 10 8 11 8 11 8 11 8 11 7 8 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 8 11 8 11 8 11 7 8 0 0]
[ 5 14 4 6 11 11 10 7 8 0 0 0 0 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 11 10 8 11 8 11 8 11 8 11 8 11 8 11 7 8]
[ 5 14 4 6 11 10 8 11 10 8 11 10 8 11 10 7 8 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 7 8 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 11 10 1 11 1 11 7 8 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 10 10 1 11 1 11 1 11 7 8 0 0 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 8 11 7 8 0 0 0 0 0 0]
[ 5 14 4 6 11 10 8 11 10 8 11 10 8 11 10 8 11 7 7 0 0]]
And a colors dictionary:
{0: (0, 0, 0), 1: (17, 17, 17), 2: (34, 34, 34), 3: (51, 51, 51), 4: (68, 68, 68), 5: (85, 85, 85), 6: (102, 102, 102), 7: (119, 119, 119), 8: (136, 136, 136), 9: (153, 153, 153), 10: (170, 170, 170), 11: (187, 187, 187), 12: (204, 204, 204), 13: (221, 221, 221), 14: (238, 238, 238)}
And I'm trying to write pass the array through the dictionary, then write those colors in 10x10 blocks to a .png file. So far I have:
rows = []
for row in arr:
for j in range(10):
for col in row:
for i in range(10):
rows.extend(colors[col])
rows = bytes(rows)
img = Image.frombytes('RGB', (110, 120), rows)
img.save("generated.png")
But this writes it like this:
Which has lines instead of the 10x10 blocks I was trying to write. It seems to me as though the blocks are shifted somehow, but I can't figure out how to un-shift them. Why is this behavior happening?
I believe you only need to change the size parameter to obtain the result you want. Replacing this line should correct the error:
# img = Image.frombytes('RGB', (110, 120), rows)
img = Image.frombytes('RGB', (210, 110), rows)
Size should be a 2-Tuple of the width and height of the image in pixels. The rows list you are creating is an image that is (210,110) pixels. You are drawing that to an image that is (110,120) pixels. This causes the image to break to a new row every 110 pixels.
Here is a working example:
from PIL import Image
array = [
[12, 13, 12, 5, 6, 5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 7, 8, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8],
[5, 14, 4, 6, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 10, 7, 8, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 1, 11, 1, 11, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 1, 11, 1, 11, 1, 11, 7, 8, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 7, 7, 0, 0],
]
colors = {
0: (0, 0, 0),
1: (17, 17, 17),
2: (34, 34, 34),
3: (51, 51, 51),
4: (68, 68, 68),
5: (85, 85, 85),
6: (102, 102, 102),
7: (119, 119, 119),
8: (136, 136, 136),
9: (153, 153, 153),
10: (170, 170, 170),
11: (187, 187, 187),
12: (204, 204, 204),
13: (221, 221, 221),
14: (238, 238, 238)
}
rows = []
for row in array:
for _ in range(10):
for col in row:
for _ in range(10):
rows.extend(colors[col])
rows = bytes(rows)
img = Image.frombytes('RGB', (210, 110), rows)
img.save("generated.png")

Reverse a rolling total based on historic data

Say I have a list of rolling x-day page view totals. That is, each data point is the sum of the previous x days of page views, but I do not have each individual day's page view total. Would it be possible to get the individual values?
For example, say someone gathers the following page view metrics:
{4 days before Day 1: {1,2,3,8}, Day 1: 4, Day 2: 2, Day 3: 5, Day 4: 2, Day 5: 9, Day 6: 8, Day 7: 10, Day 8: 10, Day 9: 7, Day 10: 6}
They provide me with the following list of 5-day running totals:
{Day 1: 18 (1+2+3+8+4), Day 2: 19 (2+3+8+4+2), Day 3: 22 (3+8+4+2+5), Day 4: 21 (etc.), Day 5: 22, Day 6: 26, Day 7: 34, Day 8: 39, Day 9: 44, Day 10: 41}
Would it be possible for me to take only the second dataset and determine at least some of the values in the first dataset?
In your example, the history
{1, 2, 3, 8, 4, 2, 5, 2, 9, 8, 10, 10, 7, 6}
gives the following 5-day running totals:
{18, 19, 22, 21, 22, 26, 34, 39, 44, 41}
But so would the history:
{3, 8, 1, 3, 3, 4, 11, 0, 4, 7, 12, 16, 5, 1}
So no, in general you can't reconstruct any of the values.
...Unless you have five days in a row with no views, giving you a zero in the list of running totals. If that happens, you can reconstruct the entire history before and after.

combine several lines in a CSV file into a single line based on a certain condition

I am trying to read a CSV file in this format
COL1, COL2
5, 25
5, 67
5, 89
3, 55
3, 8
3, 109
3, 12
3, 45
3, 663
80, 34
80, 5
and combine COL2 for all entries having the same COL1 in a single line such that the first column indicates the number of columns that follow. So for the sample given above, the output file should look like this:
3, 25, 67, 89
6, 55, 8, 109, 12, 45, 663
2, 34, 5
A solution using awk:
$ awk 'NR>1{a[$1]=a[$1]", "$2;c[$1]++}END{for (k in a) print c[k] a[k]}' file
3, 25, 67, 89
6, 55, 8, 109, 12, 45, 663
2, 34, 5

Resources