Groupby on python without aggegation

Groupby on python without aggegation - python-3.x

I am having problem on selecting observations on each unique index.
I would like to extract data from each group like the following table:
enter image description here
The table consists of 12 unique months and they repeat 10 times for 10 years. I would like to group by their unique months and see the distribution of 12 different months. So the final table would be having 12 columns of 12 months and 10 rows of years.
I am thinking of starting from groupby function and use for loop to print out different groups.
data1 = data.groupby(by='month')
for name, group in data1:
print(name)
print(group)

Related

Count specific column values by year in another date column in Excel

I have an excel table with two columns and several hundred rows. One column lists a single-digit code that represents an event, the other the date of that event. I'm attempting to count the number of times a particular event (say "1") occurred in a particular year (say 2017).
I normally use "COUNTIFS" for two or more criteria, but I'm stumped by the date format. I can't seem to get it to work in the formula. In this case, I'm using the yyyy-mm-dd format. How can I perform this operation?
A small example of the table is as follows:
------------------------
A B
--+--------------------+
1 Event Date
2 1 2016-09-12
3 1 2019-10-11
4 3 2017-03-24
5 2 2016-05-25
6 3 2017-08-02
7 1 2018-10-11

Just use
=COUNTIFS(A:A,1,B:B,">="&DATE(2017,1,1),B:B,"<="&DATE(2017,12,31))
EDIT:
if you have text-that-looks-like-dates, you can use SUMPRODUCT and YEAR:
=SUMPRODUCT((A2:A7=1)*(YEAR(B2:B7)=2017))

Excel sum based on lookup of code and values in another table

Given 2 named tables in Excel 2013 (or higher):
tblInvoice
ID InvRef Total
1 I/123 45
2 I/234 8
tblDeliveries
ID InvRef Amt
1 I/123 10
2 I/123 15
3 I/123 20
4 I/234 5
5 I/234 3
How can we get the tblInvoice[Total] to compute automatically using an Excel formula? i.e. in pseudocode:
tblDeliveries[Total] = SUM(tblDeliveries[Amt] WHERE MATCH InvRef)
I have tried this Excel formula in tblInvoice[InvTotal] but it is returning an incorrect value:
=SUMPRODUCT(SUMIF(tblDeliveries[InvRef],[InvRef],tblDeliveries[Amt]))
Also tried swapping the first and second parameters. Produces a different amount, but still incorrect:
=SUMPRODUCT(SUMIF([InvRef],tblDeliveries[InvRef],tblDeliveries[Amt]))
If relevant, it is assumed that there is a 1:N relationship from tblInvoice[InvRef]:tblDeliveries[InvRef] and that tblInvoice[InvRef] is UNIQUE.

The syntax is incorrect for what you require.
=SUMPRODUCT(SUMIF(tblDeliveries[InvRef],[#InvRef],tblDeliveries[Amt]))
The # is the crucial difference.
Regards

Pandas, groupby/Grouper on month ignoring the year

I have the following data in a Pandas df:
index;Aircraft_Registration;issue;Leg_Number;Departure_Time;Departure_Date;Arrival_Time;Arrival_Date;Departure_Airport;Arrival_Airport
0;XXA;0;QQ464;01:07:00;2013-12-01;03:33:00;2013-12-01;JFK;AMS
1;XXA;0;QQQ445;06:08:00;2013-12-01;12:02:00;2013-12-01;AMS;CPT
2;XXA;0;QQQ446;13:04:00;2013-12-01;13:13:00;2013-12-01;JFK;SID
3;XXA;0;QQ446;14:17:00;2013-12-01;20:15:00;2013-12-01;SID;FRA
4;XXA;0;QQ453;02:02:00;2013-12-02;13:09:00;2013-12-02;JFK;BJL
5;XXA;0;QQ150;05:47:00;2018-12-03;12:37:00;2018-03-03;KAO;AMS
6;XXA;0;QQ457;15:09:00;2018-11-03;17:51:00;2018-03-03;AMS;AGP
7;XXA;0;QQ457;08:34:00;2018-12-03;22:47:00;2018-03-03;AGP;JFK
8;XXA;0;QQ458;03:34:00;2018-12-03;23:59:00;2018-03-03;ATL;BJL
9;XXA;0;QQ458;06:26:00;2018-10-04;07:01:00;2018-03-04;BJL;AMS
I want to group this data on the month ignoring the year so ideally would end up with 12 new dataframes each representing the events of that months ignoring the year.
I tried the following:
sort = list(df.groupby(pd.Grouper(freq='M', key='Departure_Date')))
This results in a list containing a data frame for each month and year, in this case yielding 60 lists of which many are empty since there is no data for that month.
My expected result is a list containing 12 dataframes, one for each month (January, Februari etc.)

I think need dt.month for 1-12 months or dt.strftime for January-December:
sort = list(df.groupby(df['Departure_Date'].dt.month))
Or:
sort = list(df.groupby(df['Departure_Date'].dt.strftime('%B')))

DAX: Getting all rows with a column value that appears X times?

I have a table where each row is a product sold in a store. A product has two relevant columns, EAN and Store. How do I find all products (that have the same EAN) that appear in all Stores? For example, if Store A has products with EANs 1, 2 and 3, Store B has 2, 3 and 4 and Store C has 2, 3 and 5 (9 rows in total), how can I get all rows where EAN is 2 or 3 (6 rows in total)?
Thanks in advance!

Add a calculated column to your model, then use this as a filter.
=calculate(counta(product[EAN]),filter(all(product),product[EAN] = EARLIER(product[EAN])))
Screenshot of solution
You can then use this value to compare with Total number of stores.
For this you can use the distinctcount measure.

excel 2013 pivot table count sums greater than and less than

I have a large spreadsheet in Excel 2013 with student records. Each row corresponds to one student registered in one course. The spreadsheet spans 5 years of student records. I am trying to create a pivot table that shows me the distinct count of students who have 6 or more courses as well as those with fewer than 6 courses.
One row has the following fields (and many more): Student
Number Academic Year Course ID Calculated Field (as
above)
The pivot table will count unique student courses (ie. John Doe in Course A). I have a calculated field in my main data that combines Academic Year (ex. 2015), student number (ex. 345987) and Course ID (ex. 195100) into a field like AY2015SN345987CS195100. So, if student 345987 takes 7 different courses in 2015, I want that to count as 7. Then I create my pivot table with rows: Academic Year and Student Number; Values are Distinct Count of Calculated Field
I have created a pivot table that calculates all distinct student courses into something like this:
Year # of Students
+2015 501
+2014 640
+2013 465
...
If I expand my pivot table a bit more to individual student number rows, it looks like this:
Year # of Students
2015 501
345987 7
123765 5
...
I can also create a value filter (i.e. distinct count of courses is greater than or equal to 6) applied to the Student Number, so I meet one of my criteria (ex. 6 or more) into something like this:
Year 6 or More
2015 356
2014 458
2013 290
I can also filter and get those with less than 6 courses.
However, what I really want is to show the distinct count of those students that have 6 or more courses in one year and the distinct count of those students that have less than 6 courses into a single pivot table.
The final product would look something like one of these:
Year 6 or More Less than 6
2015 356 145
2014 458 182
2013 290 175

Data summarization with greater than and less than – Excel-Formula & PivotTable
Assuming the DATA is located at range B6:D176 with the following fields as described by the user (adjust range as required):
Student : Student Number
Year : Academic Year
Course ID
Key : Calculated Field
Objectives :
Classify within each year in the database the student population in two groups:
a. Students with 6 or more courses
b. Students with less than 6 courses
Summarize both groups for each year, showing for every year and each group the total of students and total of student\courses
I’m not sure that all calculations needed can be performed by a PivotTable, therefore I propose to use working fields to do the calculations then a PivotTable to summarize the results.
Working Fields :
Key : Let’s take out of this calculation the Course Id in order to have a field that contains the Year\Student combination.
Enter this formula in E7 then copy till last record
=CONCATENATE("AY",$C7,"SN",$B7)
AY.SN.CS.Cnt : Count of Year\Student\Course. Enter this formula in F7 then copy till last record
=COUNTIF($E$6:$E$176,$E7)
AY.SN.Cnt : Count of Year\Student. Enter this formula in G7 then copy till last record
=1*(COUNTIF($E$6:$E7,$E7)=1)
AY.SN.CS >= 6 : Quantity of records for Year\Student with 6 or more courses. Enter this formula in H7 then copy till last record
=1*($F7>=6)
AY.SN.CS < 6 : Quantity of records for Year\Student with less than 6 courses. Enter this formula in I7 then copy till last record
=1*($F7<6)
AY.SN >= 6 : Quantity of Students with 6 or more courses in a Year. Enter this formula in J7 then copy till last record
=1*($F7>=6)*$G7
AY.SN < 6 : Quantity of Students with less than 6 courses in a Year. Enter this formula in K7 then copy till last record
=1*($F7<6)*$G7
Fig. 1
The working fields can be hidden if it’s preferable to the user
Then create a PivotTable as per figure below
Fig. 2
The PivotTable reads that in Year 2015 there are:
3 Students with 6 or more courses AY.SN >= 6 and a total of 22 courses AY.SN.CS >= 6
3 Students with less than 6 courses AY.SN < 6 and a total of 10 courses AY.SN.CS < 6

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string