I am trying do gropby function with condition and I am not sure how to get this to work.
Here is how my data looks like:
generated_id timestamp direction date hour
0 1 1590394859141 forward 2020-05-25 04:20:59.141000-04:00 4
1 2 1599758616945 forward 2020-09-10 13:23:36.945000-04:00 13
2 3 1599759625509 backward 2020-09-10 13:40:25.509000-04:00 13
I need to get count of values "forward" direction for each hour. Based on the same data above, I should have one value "forward at 4 and 1 "forward" values for 13.
I am trying to use this:
daily_sum = daily_df.groupby("hour")['direction'].count().reset_index()
Direction can also be backwards so I only need to focus on forward.
How do I do this?
daily_sum = daily_df[daily_df['direction'] == 'forward']\
.groupby("hour")['direction'].count().reset_index()
Related
Is there any way to not to show an axis label if value is zero against that?
Suppose if a table is like below
Vehicles Sold per Brand
jun-21
jul-21
ago-21
sept-21
Opel
2
4
3
5
Renoult
6
3
8
1
Ferrari
0
0
0
0
Mercedes
1
1
6
4
Seat
2
0
4
2
Others
12
11
15
16
If i want to not to get the graph of Ferrari in axis, what should I do?
I know that, I can hide that column if the graph is not to be shown for that. I can not use that since its a highly dynamic data and I dont want to go and hide it everytime.
Could somebody help?
Many thanks an advance
So, quick and dirty:
But I would then produce the table of numbers so that any row not to be included gets removed and then build the chart with 5 only and not have the gap. I will let you work on that.
So, did that as well, but I will let you figure out how to control the Legend:
The trick is to use large(), but you may need to be wrapping with if() to control 0 better...
I have a very large dataset with over 400,000 rows and growing. I understand that you are not supposed to use iterows to modify a pandas data frame. However I'm a little lost on what I should do in this case, since I'm not sure I could use .loc() or some rolling filter to modify a data frame in the way I need to. I'm trying to figure out if I can take a data frame and average the range while the condition is met. For example:
Condition
Temp.
Pressure
1
8
20
1
7
23
1
8
22
1
9
21
0
4
33
0
3
35
1
9
21
1
11
20
1
10
22
While the condition is == 1 the outputed dataframe would look like this:
Condition
Avg. Temp.
Avg. Pressure
1
8
21.5
1
10
21
Has anyone attempted something similar that can put me on the right path? I was thinking of using something like this:
df = pd.csv_read(csv_file)
for index, row in df.iterrows():
if row['condition'] == 1:
#start index = first value that equals 1
else: #end index & calculate rolling average of range
len = end - start
new_df = df.rolling(len).mean()
I know that my code isn't great, I also know I could brute force it doing something similar as I have shown above, but as I said it has a lot of rows and continues to grow so I need to be efficient.
TRY:
result = df.groupby((df.Condition != df.Condition.shift()).cumsum()).apply(
lambda x: x.rolling(len(x)).mean().dropna()).reset_index(drop=True)
print(result.loc[result.Condition.eq(1)]) # filter by required condition
OUTPUT:
Condition Temp. Pressure
0 1.0 8.0 21.5
2 1.0 10.0 21.0
I have been struggling with a problem with my data frame build in pandas that is current like this
MyDataFrame:
Index Status Value
0 A 10
1 A 8
2 A 5
3 B 9
4 B 5
5 A 1
6 B 2
7 A 3
8 A 5
9 A 1
The desired output would be:
Index Status Value
0 A 10
1 B 9
2 A 1
3 B 2
4 A 5
So far I tried to use range and while conditions to filter, however, if I put a conditional like :
for i in range:
if Status[i] == "A":
print(Value[i])
if Status == "B":
break
** The code above is more an example of what I have been trying to reach my goal, I tried to use .iloc and range with while, but maybe in the wrong way idk.*
The desired output isn't printed.
One thing that complicates this filtering process is that MyDataFrame changes every time that I run the script since it uses another base of data to create this DataFrame.
I believe that I'm missing something simple, but it has been almost a week and I can't figure out.
Thanks in advance for all your answers and support.
Let us try using shift with cumsum create the groupby key , then it is groupby + agg
out = df.groupby(df.Status.ne(df.Status.shift()).cumsum()).agg({'Status':'first','Value':'max'})
Out[14]:
Status Value
Status
1 A 10
2 B 9
3 A 1
4 B 2
5 A 5
Very close to #BEN_YO:
grp = (df['Status'] != df['Status'].shift()).cumsum()
df.loc[df.groupby(grp)['Value'].idxmax()]
Output:
Status Value
Index
0 A 10
3 B 9
5 A 1
6 B 2
8 A 5
Create groups using shift and inequality with cumsum, then groupby and find the index of the max value of 'Value', idxmax, and filter the dataframe using loc
So, I have this problem, I would like to find the average of a column by using the OR function to check criteria from adjusted columns, I tried putting OR into AverageIf function, fail, also tried the "Average(IF(OR(" again not the correct return. Thought it is a simple thing could be done easily but don't know why it doesn't work. So my table is something like this:
ID: Rate Check 1 Check 2 Check 3
1 5 1 1 1
2 3 1 1
3 2 1
4 4
5 5 1 1
6 3
7 4 1
I would like to find the average of the rate column by checking if there are any value in either Check 1; Check 2 or Check 3 columns, so in the above case i will get the average of all but row with the id 4 and 6. Is this possible without using a helper column?
You can use SUMPRODUCT()
=SUMPRODUCT(((C2:C8<>"")+(D2:D8<>"")+(E2:E8<>"")>0)*(B2:B8<>"")*B2:B8)/SUMPRODUCT(--((C2:C8<>"")+(D2:D8<>"")+(E2:E8<>"")>0)*(B2:B8<>""))
If your first ID starts in A2, use this formula (edited to handle empty values in the "Rate" column):
=AVERAGE(IF(MMULT(LEN(C2:E8)*LEN(B2:B8),ROW(INDIRECT("1:"&COLUMNS($C$1:$E$1)))),B2:B8))
my data looks some thing like this:
Name Event Result
Bob 1 0
Mary 1 1
Sue 2 0
Tom 1 0
Dick 2 1
Harry 1 1
Mary 2 0
Sue 2 1
Dick 1 1
etc...
Names repeat, Event is the Event type, and Result is whether the event was successful or not (0, 1). What I want to end up with is a cluster bar chart with four bars to each name:
Event 1 # of Success
Event 1 # of Fail
Event 2 # of Success
Event 2 # of Fail
I figure I'll probably want to put this in a clustered stacked bar in the future, but if I can get the simple cluster going I can figure it out. A link to a good tutorial on event based charts would be appreciated. I'll keep searching and post back what I find. Thanks in advance!
Not Sure if this will fit your needs compleately, but it might be the quickest way to visualize the data:
Select your posted data, go to insert-tab, select pivot-chart (hides behind pivot-table) and insert it as a new sheet.
Then put the Event and the Result columns to the row-field and again the Event column to the value-field and set it up to use count instead of sum. Then you get the result beneath.