I have the below input data, what I want is to aggregate the duration field (field3) once field1 and field2 is repeated,
I tried the hashmap, but in my case there is 2 keys, and one value !
Input file:
date Company Duration
20161014 IBM 234
20161014 IBM 132
20161014 DELL 223
20161014 DELL 23
20161014 DELL 12
20161015 IBM 122
20161015 IBM 654
20161015 IBM 347
20161015 IBM 997
20161015 DELL 666
Needed output:
Date Company Total duration
20161014 IBM 366
20161015 IBM 2120
20161014 DELL 258
20161015 DELL 666
Thanks
You can use groupingBy with a stream :
items.stream().collect(
Collectors.groupingBy(Item::getDate, mapping(Item::getCompany, toSet())), Collectors.summingInt(Item::getTotalDuration)));
Related
I need to make a diagram which shows the lines of different ceramic firing schedules. I want them to be plotted in one diagram and they need to be plotted in time-relative ax. It needs to show the different durations in a right way. I don't seem to be able to achieve this.
What I have is the following:
First table:
Pendelen
Temp. per uur
Stooktemp.
Stooktijd 4
Stooktijd Cum.4
95
120
1:15:47
1,26
205
537
2:02:03
3,30
80
620
1:02:15
4,33
150
1075
3:02:00
7,37
50
1196
2:25:12
9,79
10
1196
0:10:00
9,95
Total
9:57:17
Second table:
Pendelen
Temp. per uur
Stooktemp.
Stooktijd 5
Stooktijd Cum.5
140
540
3:51:26
3,86
65
650
1:41:32
5,55
140
1095
3:10:43
8,73
50
1222
2:32:24
11,27
Total
11:16:05
The lines to be shown in a diagram should represent the 'stooktijd cum.' for both programs 4 and 5 (which is a cumulation of the time needed to fire up the kiln from it's previous temp. in the schedule). One should be able to see in the diagram that program 5 takes more time to reach it's endtemp.
What I achieved is nothing more than a diagram with two lines, but only plotted in the 'stooktijd cum.4' points from program 4. The image shows a screenshot of this diagram.
But as you can see, this doesn't look like program 5 takes more time to reach it's end. I would like it to show something like this:
Create this table :
p4
p5
0
10
3.86
540
5.55
650
8.73
1095
11.27
1222
0
0
1.26
120
3.3
537
4.33
620
7.37
1075
9.79
1196
9.95
1196
Select all > F11 > Design > Chg Chart type > scatter with straight line and marker
Here's my tryout :
Please share if it works/not. ( :
So I have two DataFrames: Historic and Applet.
Historic contains a list of all courses my school offered in the past and Applet is all courses that my school currently offers
I want to merge the two DataFrames so that any items in my Applet DataFrame that don't exist in Historic are added and any that do exist overwrite their copies in Historic (Some courses may have updated information and should overwrite their historic entries with that information..)
I'm currently using Historic.combine_first(Applet) to merge the two by on their Indexes. However, I want the duplicate entries to overwrite their Historic entries not just make a duplicate entry.
Code:
def update2(self):
historic = pd.read_csv('course_history.txt', header=None, sep='"', encoding = 'ISO-8859-1',
names=['Course_ID', 'Course_Title', 'Professor_Name','Meeting_Time','Enrollment','Room','Year','Term','Credit'],index_col=[0,6,7])
winnet = pd.DataFrame(self.data, columns =['Course_ID', 'Course_Title', 'Professor_Name','Meeting_Time','Enrollment','Room','Year','Term','Credit'] )
winnet.set_index(['Course_ID','Year','Term'], inplace=True)
historic3 = historic.combine_first(winnet)
Historic DataFrame:
Course_ID Year Term ...
AC 230 01 2020-21 May Accounting Systems Crouse, Justin D. ... ROOM NULL 1.00
AC 429 01 2020-21 May CPA Review Sommermeyer, Eric ... ROOM NULL 1.00
ART 150 01 2020-21 May 20th-Century Art, Media, & Design Fedeler, Barbara J. ... ROOM NULL 1.00
ART 208 01 2020-21 May Photography I Payne, Thomas R. ... ROOM NULL 1.00
PSY 222 01 2018-19 FA Cognitive Psychology Eslick Watkins, A ... ROOM NULL 1.00
Applet DataFrame:
Course_ID Year Term
PSY 101 01 2018-19 FA Introduction to Psychology Bane, C T H 9:35AM-11:15AM 40/44/0 LH 330 1.00
PSY 101 02 2018-19 FA Introduction to Psychology Eslick Watkins, A T H 1:00PM-2:40PM 40/43/0 SC 134 1.00
PSY 210 10 2018-19 FA Child Development Munir, S T H 9:35AM-11:15AM 30/10/0 LH 327 0.50
PSY 211 20 2018-19 FA Adolescent Development Munir, S T H 1:00PM-2:40PM 30/6/0 LH 330 0.50
PSY 222 01 2018-19 FA Cognitive Psychology Eslick Watkins, A T H 9:35AM-11:15AM 30/24/0 LH 324 1.00
You can use concat then drop_duplicates
cols = [columns_to_judge_duplicates]
combined = pd.concat([Applet, Historic])
combined = combined.drop_duplicates(subset=cols, method='first')
I have two items codes 555 and 777 that are the same item (Pen). If they are the only items a customer has bought I would like to see just them. Example below
Name CustomerID Item Name Item # Desired Result
Bob 1 Tape 111
Bob 1 Tape 111
Bob 1 Pen 555
Greg 3 Pen 555 Check
Jim 4 Tape 111
Jim 4 Pen 555
Tom 7 Tape 111
Tom 7 Stapler 222
Jack 8 Pen 777 Check
Zach 9 Pen 555
Zach 9 Paper 333
Zach 9 Stapler 222
Zach 9 Tape 111
=IF(OR(AND(B1:B3,D2=555),AND(B1:B3,B2=777)),"Check","")
is what I have tried but it just marks any with 555 or 777.
use:
=IF(AND(OR(D2={555,777}),COUNTIF(B:B,B2)=1),"Check","")
If you know that the customers are sorted, you could try something like:
=IF(AND(OR(D2=555,D2=777),AND(B2<>B1,B2<>B3)),"Check","")
unless you also want to check customers who bought both 555 and 777.
AND(B1:B3,D2=555) probably isn't doing what you want. B1:B3 will always be 'true' (it's not an expression, its just a range), so that is only checking if D2=555
Dataset Sample
I have data set like the attached picture where I want only the observations that have same numsecur every year.
How do I do this in SAS proc sql function? Will this be easier to do in STATA? If so what procedure can I use?
You look like a new user to stackoverflow. Welcome. Your question is getting down voted for at least three reasons:
1) It's not really clear what you want from your description of the problem and the data
you're providing
2) You haven't shown any attempts at what you've tried
3) Providing your data as a picture is not great. It's most helpful if you're going
to provide data to provide it so it's easy for others to consume in their program.
After all, you're asking for our help make it easier for us to help you. If You
included something like the following we just have to copy and paste to create your
dataset to work with:
DATA test;
INPUT ID YEAR EXEC SUM;
DATALINES;
1573 1997 50 1080
1581 1997 51 300
1598 1996 54 80
1598 1998 54 80
1598 1999 54 80
1602 1996 55 112.6
1602 1997 55 335.965
;
RUN;
That being said the following MAY give you what you're looking for but it's only a guess as I'm not sure if this is really what you're asking:
proc sql no print;
create table testout as
select *,count(*) as cnt
from test
group by sum
having cnt > 1;
quit;
Are you asking: show all rows where the same SUM is used or something else?
Assuming I understand your question correctly, you would like to keep the observations from the same company/individual only if the company has the same numsecur every year. So, here is what I would try using STATA:
input ID YEAR EXEC SUM
1573 1997 50 1080 //
1581 1997 51 300 //
1598 1996 54 80 //
1598 1998 54 80 //
1598 1999 54 80 //
1602 1996 55 112.6 //
1602 1997 55 335.965 //
1575 1997 50 1080 //
1575 1998 51 1080 //
1595 1996 54 80 //
1595 1998 54 30 //
1595 1999 54 80 //
1605 1996 55 112.6 //
1605 1997 55 335.965 //
end
bysort ID SUM: gen drop=cond(_N==1, 0,_n)
drop if drop==0
The results show ( based on my data):
ID YEAR EXEC SUM drop
1. 1575 1997 50 1080 1
2. 1575 1998 51 1080 2
3. 1595 1999 54 80 1
4. 1595 1996 54 80 2
5. 1598 1996 54 80 1
6. 1598 1998 54 80 2
7. 1598 1999 54 80 3
I have a question very similar to a previous post:
Merging two files by a single column in unix
but i want to merge my data based on two columns (The orders are the same, so no need to sort).
Example,
subjectid subID2 name age
12 121 Jane 16
24 241 Kristen 90
15 151 Clarke 78
23 231 Joann 31
subjectid subID2 prob_disease
12 121 0.009
24 241 0.738
15 151 0.392
23 231 1.2E-5
And the output to look like
subjectid SubID2 prob_disease name age
12 121 0.009 Jane 16
24 241 0.738 Kristen 90
15 151 0.392 Clarke 78
23 231 1.2E-5 Joanna 31
when i use join it only considers the first column(subjectid) and repeats the SubID2 column.
Is there a way of doing this with join or some other way please? Thank you
join command doesn't have an option to scan more than one field as a joining criteria. Hence, you will have to add some intelligence into the mix. Assuming your files has a FIXED number of fields on each line, you can use something like this:
join f1 f2 | awk '{print $1" "$2" "$3" "$4" "$6}'
provided the the field counts are as given in your examples. Otherwise, you need to adjust the scope of print in the awk command, by adding or taking away some fields.
If the orders are identical, you could still merge by a single column and specify the format of which columns to output, like:
join -o '1.1 1.2 2.3 1.3 1.4' file_a file_b
as described in join(1).