There is lotto draw (5 numbers) on each row. I have formula which calculates the most frequient numbers with their number of draws. Is it possible in end result to sort same number of draws results by row value. This means that if number is drawn on top rows will have grater value than those on bottom rows. Considering number of row to be a value. How is that possible?
Formula used:
=LET(flatten, TEXTSPLIT(TEXTJOIN(";",,A1:F27),,";"), numUq, UNIQUE(flatten), matches, XMATCH(flatten,numUq),SORT(HSTACK(numUq, DROP(FREQUENCY(matches, UNIQUE(matches)),-1)),2,-1))
In the example screenshot number 35 and number 13 have equal draws count, but 13 should be before 35.
Data:
A
B
C
D
E
F
18
35
31
13
37
10
43
47
36
13
6
19
6
12
6
35
14
1
43
24
45
7
21
16
37
39
44
24
12
40
39
8
34
28
49
46
27
44
15
46
45
12
22
0
10
5
28
28
4
7
23
6
44
41
30
22
47
13
29
29
37
9
26
44
39
10
30
17
21
20
41
22
43
35
0
22
13
9
14
22
42
20
32
21
13
38
48
6
14
2
11
47
20
20
23
6
22
26
1
25
45
31
27
39
6
44
3
24
22
45
34
17
5
13
16
23
20
7
30
16
25
21
7
34
1
35
32
34
1
9
10
32
23
35
11
3
6
12
5
30
4
20
33
15
26
10
8
28
16
11
21
14
3
38
10
42
16
3
26
48
30
28
Link to file
Here it is on a bit of the data. Here I have added a third column based on the average row of each unique number and sorted first on frequency then on row average:
=LET(range,A1:F3,uniques,UNIQUE(TOCOL(range)),rows,SEQUENCE(ROWS(range)),
avrow,BYROW(uniques,LAMBDA(uniq,SUM((range=uniq)*rows/SUM(--(range=uniq))))),
freq,DROP(FREQUENCY(range,uniques),-1),
SORTBY(HSTACK(uniques,freq,avrow),freq,-1,avrow,1))
Can 6 really occur twice in the same draw? Maybe not, but it doesn't affect the answer.
EDIT
Here is a version based on your original formula:
=LET(range,A1:F27,
flatten, TEXTSPLIT(TEXTJOIN(";",,A1:F27),,";"),
numUq, UNIQUE(flatten),
rows,SEQUENCE(ROWS(range)),
matches, XMATCH(flatten,numUq),
avrow,BYROW(numUq,LAMBDA(numUq,SUM((range=--numUq)*rows/SUM(--(range=--numUq))))),
freq,DROP(FREQUENCY(matches, UNIQUE(matches)),-1),
SORTBY(HSTACK(numUq,freq,avrow),freq,-1,avrow,1))
Full Dataset
The sorting is based on number of appearances and average row, but you could use other measures like row of first appearance if you wanted to.
Different approach:
=LET(data,A1:F27,
a,TOCOL(data),
b,MMULT(--(TRANSPOSE(a)=a),SEQUENCE(COUNTA(a),,1,0)),
c,TOCOL(IF(ISNUMBER(data),MAX(ROW(data)+1)-ROW(data)^99)),
d,MMULT(--(TRANSPOSE(a)=a),c),
s,SORTBY(HSTACK(a,b),b,-1,d,1),
UNIQUE(s))
a "flattens" the data using TOCOL.
b creates a "countif" of the drawn values in a using MMULT.
c returns the maximum row value of the data + 1 minus the row value of each value found ^99.
^99 because I want the number to be higher if it would be found in the first row only versus if it was found in each row except the first.
d returns a "sumif" of the calculated row values of c against the values of a.
We than only need a and b for the list using HSTACK, but we need them sorted by the count b descending and sorted by the sumif d ascending using SORTBY.
This will sort it as you illustrated it.
If it's a tie (36 and 19 in the data) it will show the first in row first.
I have the below data
df1
Hema shiva Ishan
0 22 30 33
1 34 32 21
2 20 12 14
3 26 14 18
4 12 28 17
5 30 11 22
6 18 15 18
7 19 18 19
8 22 20 32
I wanted to take ratio of first column value with rest of the columns , eg first column should divide by 22 , 2nd column 30 and 3rd columns by 33 .
The answer is below .
Please help me if I missing something
Just divide the first row by the DF:
df.iloc[0] / df
I have a time series as a dataframe. The first column is the week number, the second are values for that week. The first week (22) and the last week (48), are the lower and upper bounds of the time series. Some weeks are missing, for example, there is no week 27 and 28. I would like to resample this series such that there are no missing weeks. Where a week was inserted, I would like the corresponding value to be zero. This is my data:
week value
0 22 1
1 23 2
2 24 2
3 25 3
4 26 2
5 29 3
6 30 3
7 31 3
8 32 7
9 33 4
10 34 5
11 35 4
12 36 2
13 37 3
14 38 10
15 39 5
16 40 7
17 41 10
18 42 11
19 43 15
20 44 9
21 45 13
22 46 5
23 47 6
24 48 2
I am wondering if this can be achieved in Pandas without creating a loop from scratch. I have looked into pd.resample, but can't achieve the results I am looking for.
I would set week as index, reindex with fill_value option:
start, end = df['week'].agg(['min','max'])
df.set_index('week').reindex(np.arange(start, end+1), fill_value=0).reset_index()
Output (head):
week value
0 22 1
1 23 2
2 24 2
3 25 3
4 26 2
5 27 0
6 28 0
7 29 3
8 30 3
I have two dirs base and to_move. There are 10 files in base, which are named
0 1 2 3 4 5 6 7 8 9, and 3 files, 0 1 2, in to_move. What I want is to move the 3 files in to_move to base, with their names changed to 10 11 12.
Inside the dir to_move, I run the command
tmp=$(ls);for item in ${tmp[#]};do dst=$((item+10));echo $dst $item;done
what I got is
10 0
11 1
12 2
11 1
20 10
21 11
22 12
23 13
24 14
25 15
26 16
27 17
28 18
29 19
12 2
30 20
31 21
32 22
33 23
34 24
35 25
36 26
37 27
38 28
13 3
14 4
15 5
16 6
17 7
18 8
19 9
This makes no sense to me, it seems $(($item+10)) has some weird effects on $item.
Why this happens? And how can I modify the command to get this output?
10 0
11 1
12 2
I need to create a simple regression with the Data Analysis Toolpack. The thing is, the range for Y and X input is always different. To illustrate what I'm trying to say, here's an example of the table I need to work on:
A B C D E F G H I J K L
1 Y T T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
2 19 1
3 13 2 19
4 14 3 13 19
5 16 4 14 13 19
6 17 5 16 14 13 19
7 16 6 17 16 14 13 19
8 20 7 16 17 16 14 13 19
9 10 8 20 16 17 16 14 13 19
10 20 9 10 20 16 17 16 14 13 19
11 11 10 20 10 20 16 17 16 14 13 19
12 11 11 11 20 10 20 16 17 16 14 13 19
13 14 12 11 11 20 10 20 16 17 16 14 13
14 15 13 14 11 11 20 10 20 16 17 16 14
15 17 14 15 14 11 11 20 10 20 16 17 16
16 10 15 17 15 14 11 11 20 10 20 16 17
17 4 16 10 17 15 14 11 11 20 10 20 16
18 15 17 4 10 17 15 14 11 11 20 10 20
19 6 18 15 4 10 17 15 14 11 11 20 10
20 10 19 6 15 4 10 17 15 14 11 11 20
21 16 20 10 6 15 4 10 17 15 14 11 11
22 16 10 6 15 4 10 17 15 14 11
23 16 10 6 15 4 10 17 15 14
24 16 10 6 15 4 10 17 15
25 16 10 6 15 4 10 17
26 16 10 6 15 4 10
27 16 10 6 15 4
28 16 10 6 15
29 16 10 6
30 16 10
31 16
In this example, The Y input would be range A12:A21, that's because the first entry in the last column of the table (the "19" in cell L12) is in row 12 AND The last entry in the first column of the table (the "16" in cell A21) is in row 21; furthermore, The X input would be region B12:L21 for the same reasons.
After doing the first regression, I need to delete two columns out of the table and afterwards do ANOTHER regression. So if, for example I need to delete Columns J and L, the table would look like this:
A B C D E F G H I J
1 Y T T1 T2 T3 T4 T5 T6 T7 T9
2 19 1
3 13 2 19
4 14 3 13 19
5 16 4 14 13 19
6 17 5 16 14 13 19
7 16 6 17 16 14 13 19
8 20 7 16 17 16 14 13 19
9 10 8 20 16 17 16 14 13 19
10 20 9 10 20 16 17 16 14 13
11 11 10 20 10 20 16 17 16 14 19
12 11 11 11 20 10 20 16 17 16 13
13 14 12 11 11 20 10 20 16 17 14
14 15 13 14 11 11 20 10 20 16 16
15 17 14 15 14 11 11 20 10 20 17
16 10 15 17 15 14 11 11 20 10 16
17 4 16 10 17 15 14 11 11 20 20
18 15 17 4 10 17 15 14 11 11 10
19 6 18 15 4 10 17 15 14 11 20
20 10 19 6 15 4 10 17 15 14 11
21 16 20 10 6 15 4 10 17 15 11
22 16 10 6 15 4 10 17 14
23 16 10 6 15 4 10 15
24 16 10 6 15 4 17
25 16 10 6 15 10
26 16 10 6 4
27 16 10 15
28 16 6
29 10
30 16
And now the regression would be with inputs Y (A11:A21) because the first entry in the last column of the table ("19" in cell J11) is in row 11 AND The last entry in the first column of the table ("16" in cell A21) is in row 21. Likewise the X input would be (B11:J21) for the same reasons.
I have tried in a hundred different ways, but no luck. This is the closest I've been to creating what I need, but I'm still lost since it won't execute the regression:
Sub Prueba1()
Range("A1").Select
Selection.End(xlToRight).Select
Selection.End(xlDown).Select
Selection.End(xlToLeft).Select
Application.Run "ATPVBAEN.XLAM!Regress", Range(Selection, Selection.End(xlDown)).Select, _
Range(Selection.Offset(, 1), Selection.End(xlToRight)).Select, False, False, , Range("S1") _
, False, False, False, False, , False
End Sub
This User Defined Function (aka UDF) will return the range into your Application.Run "ATPVBAEN.XLAM!Regress" as a parameter.
Function regress_range()
Dim strAddr As String, c As Long
With Worksheets("Sheet4") '<~~set this worksheet name!
With .Cells(1, 1).CurrentRegion
Set regress_range = .Range(.Cells(.Cells(1, .Columns.Count).End(xlDown).Row, 1), _
.Cells(Application.Match(1E+99, .Columns(1)), .Columns.Count))
End With
End With
End Function
You need to make sure that it is properly referencing the correct worksheet in the third line.
This would become part of the run command like,
Application.Run "ATPVBAEN.XLAM!Regress", regress_range(), False, False, , Range("S1") _
, False, False, False, False, , False
I'm still concerned how Range("S1") may change (i.e. shift right) if columns are deleted from the regression range. Additionally, it has no explicitly referenced parent worksheet.
Output starting at your original data block:
$A$12:$L$21
$A$11:$J$21