I'm trying to write an Excel formula to return a value from the table below:
Q Y Mean 1 2 3 4 5 6 7 8 9
1 4 1301 <1183 1183 1233 1283 1333 1383 1433 1483 1533
2 4 1306 <1189 1189 1239 1289 1339 1389 1439 1489 1539
3 4 1317 <1200 1200 1250 1300 1350 1400 1450 1500 1550
4 4 1333 <1214 1214 1264 1314 1364 1414 1464 1514 1564
1 5 1346 <1225 1225 1275 1325 1375 1425 1475 1525 1575
2 5 1360 <1235 1235 1285 1335 1385 1435 1485 1535 1585
3 5 1372 <1245 1245 1295 1345 1395 1445 1495 1545 1595
4 5 1390 <1255 1255 1305 1355 1405 1455 1505 1555 1605
1 6 1403 <1266 1266 1316 1366 1416 1466 1516 1566 1616
2 6 1416 <1276 1276 1326 1376 1426 1476 1526 1576 1626
3 6 1425 <1285 1285 1335 1385 1435 1485 1535 1585 1635
4 6 1426 <1291 1291 1341 1391 1441 1491 1541 1591 1641
I want to be able to identify a year, then a quarter, and then according to a pupil's score, return the corresponding standard nine figure in the top line.
What's the best way to do this? I've tried INDEX and MATCH functions without success.
One strategy for a multiple lookup like this is to concatenate the indices together to form a unique index. This unique index will let you get the correct row combining the year/quarter.
The second piece of this is using INDEX to return an entire row from your table of scores which can then be used to find the score in the table with MATCH.
Once you have the score column, you can return the nine from there. Using INDEX again.
The end result is an INDEX-MATCH-INDEX-MATCH. It made more sense to me by splitting the formulas into different cells, but I combined them together below.
Here is what I started with. I added the ID column that combines the year/quarter.
Formula in D3 = =B3&"-"&C3, copied down to the end.
Cells C17 and C18 are inputs.
Cell C19 = =C17&"-"&C18
Cell C20 (Score) is an input.
Cell C21 is the messy one which combines the logic described above: =INDEX(F2:N2,MATCH(C20,INDEX(F3:N14,MATCH(C19,D3:D14,0),),1))
Here is that formula expanded with color so you can see somewhat is going on:
Say we have:
We place the desired quarter in A1 and the desired year in B1, and in C1 enter:
=INDEX(C1:C14,SUMPRODUCT(--($A$3:$A$14=$A$1)*($B$3:$B$14=$B$1)*(ROW(3:14))))
and copy across. This gives us:
Related
im relatively new to Dataframes in Python and running into an Issue I cant find.
im having a Dataframe with the following column layout
print(list(df.columns.values)) returns:
['iccid', 'system', 'last_updated', '01.01', '02.01', '03.01', '04.01', '05.01', '12.01', '18.01', '19.01', '20.01', '21.01', '22.01', '23.01', '24.01', '25.01', '26.01', '27.01', '28.01', '29.01', '30.01', '31.01']
normally i should have a column for each day in a specific month. in the example above its December 2022. Sometimes Days are missing which isnt an issue.
i tried to first get all given columns that are relevant by filtering them:
# Filter out the columns that are not related to the data
data_columns = [col for col in df.columns if '.' in col]
Now comes the issue:
Sometimes the column "system" could also be empty so i need to put the iccid inside the system value:
df.loc[df['system'] == 'Nicht benannt!', 'system'] = df.loc[df['system'] == 'Nicht benannt!', 'iccid'].iloc[0]
df.loc[df['system'] == '', 'system'] = df.loc[df['system'] == '', 'iccid'].iloc
grouped = df.groupby('system').sum(numeric_only=False)
then i tried to create that needed 'data_usage' column.
grouped['data_usage'] = grouped[data_columns[-1]]
grouped.reset_index(inplace=True)
By that line i should normally only get the result of the last column in the dataframe (which was a workaround that also didnt work as expected)
Now what im trying to get is the sum of all columns which contain a date in their name and add this sum to a new column named data_usage.
the issue im having here is im getting results for systems which dont have an initial system value which have a data_usage of 120000 (which is just value that represents the megabytes used) and if i check the sqlite file the system in total only used 9000 mb in that particular month.
For Example:
im having this column in the sqlite file:
iccid
system
last_updated
06.02
08.02
8931080320014183316
Nicht benannt!
2023-02-06
1196
1391
and in the dataframe i get the following result:
8931080320014183316 48129.0
I cant find the issue and would be very happy if someone can point me into the right direction.
Here are some example data as requested:
iccid
system
last_updated
01.12
02.12
03.12
04.12
05.12
06.12
07.12
08.12
09.12
10.12
11.12
12.12
13.12
14.12
15.12
16.12
17.12
18.12
19.12
20.12
21.12
22.12
23.12
28.12
29.12
30.12
31.12
8945020184547971966
U-O-51
2022-12-01
2
32
179
208
320
509
567
642
675
863
1033
1055
1174
2226
2277
2320
2466
2647
2679
2713
2759
2790
2819
2997
3023
3058
3088
8945020855461807911
L-O-382
2022-12-01
1
26
54
250
385
416
456
481
506
529
679
772
802
832
858
915
940
1019
1117
1141
1169
1193
1217
1419
1439
1461
1483
8945020855461809750
C-O-27
2022-12-01
1
123
158
189
225
251
456
489
768
800
800
800
800
800
800
2362
2386
2847
2925
2960
2997
3089
3116
3448
3469
3543
3586
8931080019070958450
L-O-123
2022-12-02
0
21
76
313
479
594
700
810
874
1181
1955
2447
2527
2640
2897
3008
3215
3412
3554
3639
3698
3782
3850
4741
4825
4925
5087
8931080453114183282
Nicht benannt!
2022-12-02
0
6
45
81
95
98
101
102
102
102
102
102
102
103
121
121
121
121
149
164
193
194
194
194
194
194
194
8931080894314183290
C-O-16 N
2022-12-02
0
43
145
252
386
452
532
862
938
1201
1552
1713
1802
1855
2822
3113
3185
3472
3527
3745
3805
3880
3938
4221
4265
4310
4373
8931080465814183308
L-O-83
2022-12-02
0
61
169
275
333
399
468
858
1094
1239
1605
1700
1928
2029
3031
4186
4333
4365
4628
4782
4842
4975
5265
5954
5954
5954
5954
8931082343214183316
Nicht benannt!
2022-12-02
0
52
182
506
602
719
948
1129
1314
1646
1912
1912
1912
1912
2791
3797
3944
4339
4510
4772
4832
5613
5688
6151
6482
6620
6848
8931087891314183324
L-O-119
2022-12-02
0
19
114
239
453
573
685
800
1247
1341
1341
1341
1341
1341
1341
1341
1341
1341
1341
1341
1341
1341
1423
2722
3563
4132
4385
We have a sensor that measures some physical process that sends the value via PWM. As the sensor discretize the data, when reading the PWM signal we should only see the value in discrete steps. For example, when the signal varies from 200 to 400, we will not see all possible values between 200 and 400, but rather values like 200 225 250 ... 375 400 indicating the discretization step of 25.
However our PWM reader is noisy resulting in values like 198 224 261 275 etc. What is the correct way to estimate the discretization step and parameters of the noise assuming the noise is Gaussian?
I suppose we can just enumerate all the possible steps, then for each step calculate reminders of values divided by the step. Then the discretization step should be the step when the remainders produce the best fit to Gaussian. For the best fit I suppose we can take the least sum of minimal squares from the Gaussian based on reminders.
The obvious problem with this naive approach is that both step and step / 2 may produce the same fit. That can be taken care I suppose by reducing a possible step range based on some guess. But perhaps this naive approach based on remainders is not the proper way to deal with the problem?
In our case typical histogram of data points is given bellow. As we suspect that the step and parameters of the noise depend on environmental conditions we prefer not to mix data from different runs limiting the data amount.
value count
682 2
775 8
776 14
807 7
838 3
868 1
869 2
900 1
931 3
993 2
1024 2
1055 2
1117 3
1179 2
1210 1
1241 1
1272 2
1303 1
1334 1
1365 2
1397 1
1427 1
1428 2
1458 1
1521 2
1551 1
1552 1
1583 1
1614 1
1645 1
1676 1
1707 3
1738 1
1769 1
1800 2
1831 1
1862 3
1893 1
1924 2
1955 1
1986 2
2047 1
2048 2
2079 1
2111 1
2141 1
2142 1
2173 1
2204 2
2234 1
2235 2
2266 1
2297 1
2325 1
2359 2
2390 3
2483 3
2514 2
2545 1
2575 1
2607 2
2638 3
2669 1
2731 4
2762 2
2794 2
2823 1
2825 1
2854 1
2858 1
2889 1
2918 2
2980 3
3011 2
3042 2
3071 1
3073 1
3104 2
3134 2
3135 1
3197 3
3228 1
3259 1
3287 1
3289 1
3290 2
3321 2
3351 1
3352 2
3383 1
3414 1
3445 4
3506 1
3508 3
3539 1
3570 3
3601 2
3630 1
3632 1
3661 1
3663 2
3694 2
3723 1
3725 2
3754 2
3756 2
3785 16
3787 7
3818 45
3820 1
3822 1
3825 2
3849 3
3880 2
3911 1
3942 3
3971 1
3973 2
4002 1
4004 2
4035 5
4095 1
4097 3
4128 2
4152 1
4157 2
4191 2
4220 1
4222 1
4251 1
4253 3
4276 1
4313 1
4315 4
4344 1
4346 3
4375 3
4377 4
I have two dataframes.
First one:
Date B
2021-12-31 NaN
2022-01-31 500
2022-02-28 540
Second one:
Date A
2021-12-28 520
2021-12-31 530
2022-01-20 515
2022-01-31 529
2022-02-15 544
2022-02-25 522
I want to concatenate both the dataframes based on Year and Month and the resultant dataframe should look like below
Date A B
2021-12-28 520 NaN
2021-12-31 530 NaN
2022-01-20 515 500
2022-01-31 529 500
2022-02-15 544 540
2022-02-25 522 540
You need a left merge on the month period:
df2.merge(df1,
left_on=pd.to_datetime(df2['Date']).dt.to_period('M'),
right_on=pd.to_datetime(df1['Date']).dt.to_period('M'),
suffixes=(None, '_'),
how='left'
)
Then drop(columns=['key_0', 'Date_']) if needed.
Output:
key_0 Date A Date_ B
0 2021-12 2021-12-28 520 2021-12-31 NaN
1 2021-12 2021-12-31 530 2021-12-31 NaN
2 2022-01 2022-01-20 515 2022-01-31 500.0
3 2022-01 2022-01-31 529 2022-01-31 500.0
4 2022-02 2022-02-15 544 2022-02-28 540.0
5 2022-02 2022-02-25 522 2022-02-28 540.0
I have to use groupby() on a dataframe in python 3.x. Column name is Origin, then based upon the origin, I have to find out the destination with maximum occurrences.
Sample df is like:
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay origin dest
0 2013 1 1 517 515 2 830 819 11 EWR IAH
1 2013 1 1 533 529 4 850 830 20 LGA IAH
2 2013 1 1 542 540 2 923 850 33 JFK MIA
3 2013 1 1 544 545 -1 1004 1022 -18 JFK BQN
4 2013 1 1 554 600 -6 812 837 -25 LGA ATL
5 2013 1 1 554 558 -4 740 728 12 EWR ORD
6 2013 1 1 555 600 -5 913 854 19 EWR FLL
7 2013 1 1 557 600 -3 709 723 -14 LGA IAD
8 2013 1 1 557 600 -3 838 846 -8 JFK MCO
9 2013 1 1 558 600 -2 753 745 8 LGA ORD
You can use the following to find out the max number of occurrences of another column:
df.groupby(['origin'])['dest'].size().reset_index()
origin dest
0 EWR 3
1 JFK 3
2 LGA 4
you can use aggregate functions to make your life simpler and plot graphs onto it as well.
fun={'dest':{'Count':'count'}
df= df.groupby(['origin','dest']).agg(fun).reset_index()
df.columns=df.columns.droplevel(1)
df
This is a relatively common question so I don't want to be voted down for asking something that has been asked before. I will explain as I go along the steps I took to answer this question using StackOver Flow and other sources so that you can see that I have made attempts to solve it without solving a question.
I have a set of values as below:
O P Q "R" Z
6307 586 92.07 1.34
3578 195 94.83 6.00
3147 234 93.08 4.29
3852 227 94.43 15.00
3843 171 95.74 5.10
3511 179 95.15 7.18
6446 648 90.87 1.44
4501 414 91.58 0.38
3435 212 94.19 6.23
I want to take the average of the first six values in row "R" and then put that average in the sixth column in the sixth row of Z as such:
O P Q "R" Z
6307 586 92.07 1.34
3578 195 94.83 6.00
3147 234 93.08 4.29
3852 227 94.43 15.00
3843 171 95.74 5.10
3511 179 95.15 7.18 6.49
6446 648 90.87 1.44
4501 414 91.58 0.38
3435 212 94.19 6.23
414 414 91.58 3.49
212 212 94.19 11.78
231 231 93.44 -1.59 3.6
191 191 94.59 2.68
176 176 91.45 .75
707 707 91.96 2.68
792 420 90.95 0.75
598 598 92.15 7.45
763 763 90.66 -4.02
652 652 91.01 3.75
858 445 58.43 2.30 2.30
I have utilized the following formula I obtained
=AVERAGE(OFFSET(R1510,COUNTA(R:R)-6,0,6,1))
but I received an answer that was different from what I obtained by simply taking the average of the six previous cells as such:
=AVERAGE(R1505:R1510)
I then tried the following code from a Stack OverFlow (excel averaging every 10 rows) conversation that was tangentially similar to what I wanted
=AVERAGE(INDEX(R:R,1+6*(ROW()-ROW($B$1))):INDEX(R:R,10*(ROW()- ROW($B$1)+1)))
but I was unable to get an answer that resembled what I got from taking a rote
==AVERAGE(R1517:R1522)
I also found another approach in the following but was unable to accurately change the coding (F3 to R1510, for example)
=AVERAGE(OFFSET(F3,COUNTA($R$1510:$R$1517)-1,,-6,))
Doing so gave me a negative number for a clearly positive set of data. It was -6.95.
Put this in Z1 and copy down:
=IF(MOD(ROW(),6)=0,AVERAGE(INDEX(R:R,ROW()-5):INDEX(R:R,ROW())),"")