Rearrange and regroup stacked data - excel

I have Excel data as follows:
Mon 34
Mon 76
Mon 86
Tue 24
Tue 34
Tue 66
Wed 88
Wed 89
Wed 87
Is there a way with a formula to rewrite this data as follows:
Mon Tue Wed
34 24 88
76 66 89
86 66 87

Assuming 76 is in B2, insert a column on the left and a row above. Label the columns (say ID, day and value) and in A2 enter 1 and series fill down to A4. Then select A2:A4 and series fill down to suit.
Build a PivotTable with ID for ROWS, day for COLUMNS and value for VALUES.
Won't give quite the result you show from the data sample:

Related

Sort rows by row value (top to bottom)

There is lotto draw (5 numbers) on each row. I have formula which calculates the most frequient numbers with their number of draws. Is it possible in end result to sort same number of draws results by row value. This means that if number is drawn on top rows will have grater value than those on bottom rows. Considering number of row to be a value. How is that possible?
Formula used:
=LET(flatten, TEXTSPLIT(TEXTJOIN(";",,A1:F27),,";"), numUq, UNIQUE(flatten), matches, XMATCH(flatten,numUq),SORT(HSTACK(numUq, DROP(FREQUENCY(matches, UNIQUE(matches)),-1)),2,-1))
In the example screenshot number 35 and number 13 have equal draws count, but 13 should be before 35.
Data:
A
B
C
D
E
F
18
35
31
13
37
10
43
47
36
13
6
19
6
12
6
35
14
1
43
24
45
7
21
16
37
39
44
24
12
40
39
8
34
28
49
46
27
44
15
46
45
12
22
0
10
5
28
28
4
7
23
6
44
41
30
22
47
13
29
29
37
9
26
44
39
10
30
17
21
20
41
22
43
35
0
22
13
9
14
22
42
20
32
21
13
38
48
6
14
2
11
47
20
20
23
6
22
26
1
25
45
31
27
39
6
44
3
24
22
45
34
17
5
13
16
23
20
7
30
16
25
21
7
34
1
35
32
34
1
9
10
32
23
35
11
3
6
12
5
30
4
20
33
15
26
10
8
28
16
11
21
14
3
38
10
42
16
3
26
48
30
28
Link to file
Here it is on a bit of the data. Here I have added a third column based on the average row of each unique number and sorted first on frequency then on row average:
=LET(range,A1:F3,uniques,UNIQUE(TOCOL(range)),rows,SEQUENCE(ROWS(range)),
avrow,BYROW(uniques,LAMBDA(uniq,SUM((range=uniq)*rows/SUM(--(range=uniq))))),
freq,DROP(FREQUENCY(range,uniques),-1),
SORTBY(HSTACK(uniques,freq,avrow),freq,-1,avrow,1))
Can 6 really occur twice in the same draw? Maybe not, but it doesn't affect the answer.
EDIT
Here is a version based on your original formula:
=LET(range,A1:F27,
flatten, TEXTSPLIT(TEXTJOIN(";",,A1:F27),,";"),
numUq, UNIQUE(flatten),
rows,SEQUENCE(ROWS(range)),
matches, XMATCH(flatten,numUq),
avrow,BYROW(numUq,LAMBDA(numUq,SUM((range=--numUq)*rows/SUM(--(range=--numUq))))),
freq,DROP(FREQUENCY(matches, UNIQUE(matches)),-1),
SORTBY(HSTACK(numUq,freq,avrow),freq,-1,avrow,1))
Full Dataset
The sorting is based on number of appearances and average row, but you could use other measures like row of first appearance if you wanted to.
Different approach:
=LET(data,A1:F27,
a,TOCOL(data),
b,MMULT(--(TRANSPOSE(a)=a),SEQUENCE(COUNTA(a),,1,0)),
c,TOCOL(IF(ISNUMBER(data),MAX(ROW(data)+1)-ROW(data)^99)),
d,MMULT(--(TRANSPOSE(a)=a),c),
s,SORTBY(HSTACK(a,b),b,-1,d,1),
UNIQUE(s))
a "flattens" the data using TOCOL.
b creates a "countif" of the drawn values in a using MMULT.
c returns the maximum row value of the data + 1 minus the row value of each value found ^99.
^99 because I want the number to be higher if it would be found in the first row only versus if it was found in each row except the first.
d returns a "sumif" of the calculated row values of c against the values of a.
We than only need a and b for the list using HSTACK, but we need them sorted by the count b descending and sorted by the sumif d ascending using SORTBY.
This will sort it as you illustrated it.
If it's a tie (36 and 19 in the data) it will show the first in row first.

Horizontal SUMIFS with two vertical criteria

I am given the following sales table which provide the sales that each employee made, but instead of their name I have their ID and each ID may have more than 1 row.
To map the ID back to the name, I have a look up table with each employee's name and ID.
Sales Table:
Year
ID
North
South
West
East
2020
A
58
30
74
72
2020
A
85
40
90
79
2020
B
9
82
20
5
2020
B
77
13
49
21
2020
C
85
55
37
11
2020
C
29
70
21
22
2021
A
61
37
21
42
2021
A
22
39
2
34
2021
B
62
55
9
72
2021
B
59
11
2
37
2021
C
41
22
64
47
2021
C
83
18
56
83
ID table:
ID
Name
A
Allison
B
Brandon
C
Chris
I am trying to sum up each employee's sales by a given year, and aggregate all their transactions by their name (rather than ID), so that my result looks like the following:
Result:
Report
2021
Allison
258
Brandon
307
Chris
414
I want the user to be able to select the year, and the report would automatically sum up each person's sales by the year and their name.
Any ideas on how I can accomplish this?
With FILTER:
=SUM(FILTER($C$2:$F$13,($B$2:$B$13=INDEX($I$2:$I$4,MATCH(N3,$J$2:$J$4,0)))*($A$2:$A$13=$N$2)))
With SUMPRODUCT:
=SUMPRODUCT($C$2:$F$13*($B$2:$B$13=INDEX($I$2:$I$4,MATCH(N3,$J$2:$J$4,0)))*($A$2:$A$13=$N$2))

Creating missed records - Hive/PySpark

I've this situation of identifying previously available records and duplicate them whenever the corresponding month's record is not available.
Here is the table structure:
metrics table:
Metric_id
Frequency
Metrics_results
293
Monthly
151
293
Monthly
152
293
Monthly
153
294
quarterly
173
294
quarterly
174
295
Annually
195
Metrics_results table:
Metrics_results
Month
year
value
151
Jan
2020
98
152
Feb
2020
98
153
Mar
2020
99
173
Dec
2019
87
174
Mar
2020
86
195
Jan
2020
90
Join metrics and metrics_results table on Metrics_results column for every month and average the value per month.
Expected results:
Metric_id
Metrics_results
Month
year
value
Flag
293
151
Jan
2020
98
Existing
293
152
Feb
2020
98
Existing
293
153
Mar
2020
99
Existing
294
173
Dec
2019
87
Existing
294
173
Jan
2020
87
Copied
294
173
Feb
2020
87
Copied
294
174
Mar
2020
86
Existing
295
195
Jan
2020
90
Existing
295
195
Feb
2020
90
Copied
295
195
Mar
2020
90
Copied
For the metrics which are evaluated monthly, there will be a corresponding record in the metrics_results table. For the ones which are evaluated quarterly(Mar, Jun, Sep, Dec) and annually(Jan), there will be only selected months records available in metrics_results. For such records, have to copy over the previous available month's record if the current month value is not available in averaging.
Eg:
For Metric id = 294, there is no record for Jan 2020 & Feb 2020. In this case for Jan 2020 & Feb 2020, have to copy the record of Dec 2019 and change the month to Jan 2020 & Feb 2020 as that's the last value available.
For metric id = 295, there is no record for any other month other than Jan 2020. This Jan 2020 record must be copied and replace the month for the rest of the year.
I'm looking for a solution either in hive query or in PySpark. Any ideas or suggestions will be appreciated.

Panel data - Creating a date variable from year and weeknumber as string

I am writing with a query relating to panel data for historical prices. I am trying to create a date variable from an Excel file which contains year and weeknumber as a string. Is there a way to convert information available - year and week numbers (as string) into Stata or Excel recognisable dates? Thanks very much.
year weeknum Price 1 Price 2
1890 2nd week in Jan 76 90
1890 3rd week in Jan 76 90
1890 4th week in Jan 76 90
1890 2nd week in Feb 76 90
1890 3rd week in Feb 76 90
1890 4th week in Feb 76 90
1890 2nd week in March 76 90
1890 3rd week in March 80 94
1890 4th week in March 80 94
1890 5th week in March 80 94

Excel - Summing up consecutive values based on drop-down selection

I have a Values like
Month Price
Jan 10
Feb 20
Mar 30
............
Dec 50
I have a dropdown for selecting month
If user pickedup the month Feb
then the sum should be displayed as 30
Help me out ! tried a lot with excel function ended up with frustration
Very interesting idea.
Formula in B1 =SUM(INDIRECT("E1:E"&MATCH(A1,D:D,0)))
Hope this will help you.
A B C D E
Feb 30 Month Price
Jan 10
Feb 20
Mar 30
Apr 40
May 50
Jun 60
Jul 70
Aug 80
Sep 90
Oct 100
Nov 110
Dec 120

Resources