With Pandas I'm trying to rename unnamed columns in dataframe with values on the first ligne of data.
My dataframe:
id
store
unnamed: 1
unnamed: 2
windows
unnamed: 3
unnamed: 4
0
B1
B2
B3
B1
B2
B3
1
2
c
12
15
15
14
2
4
d
35
14
14
87
My wanted result:
id
store_B1
store_B3
store_B2
windows_B1
windows_B2
windows_B3
0
B1
B2
B3
B1
B2
B3
1
2
c
12
15
15
14
2
4
d
35
14
14
87
I don't know how I can match the column name with the value in my data. Thanks for your help. Regards
You can use df.columns.where to make unnamed: columns NaN, then convert it to a Series and use ffill:
df.columns = pd.Series(df.columns.where(~df.columns.str.startswith('unnamed:'))).ffill() + np.where(~df.columns.isin(['id','col2']), ('_' + df.iloc[0].astype(str)).tolist(), '')
Output:
>>> df
id store_B1 store_B2 store_B3 windows_B1 windows_B2 windows_B3
0 0 B1 B2 B3 B1 B2 B3
1 1 2 c 12 15 15 14
2 2 4 d 35 14 14 87
I would like to do something similar to oracle LISTAGG in Webi. Below are my Queries.
Query 1
Id M1 ; columns
1 10
2 20
3 30
4 40
5 50
Query 2
Id D1 ; column
1 A11
1 A12
1 A13
2 A21
2 A22
2 A23
2 A24
3 A31
wanted outcome by merging Query 1 and Query 2 By Id
Id M1 New Column
1 10 A11;A12;A13
2 20 A21;A22;A23;A24
3 30 A31
4 40
5 50
I can get to the point below. Then, use NoFilter to keep values intact when applying a filter. However, the column F2 has the values "#MULTIVALUE". I can get NoFilter to work with one query. But, with two queries like this, NoFilter doesn't work. Any suggestion to address the issue.
Id M1 F1 (Measure) F2
1 10 A11;A12;A13 =NoFilter([F1])
1 10 A11;A12;A13
1 10 A11;A12;A13
2 20 A21;A22;A23;A24
2 20 A21;A22;A23;A24
2 20 A21;A22;A23;A24
2 20 A21;A22;A23;A24
3 30 A31
4 40
5 50
I wonder if anyone could show me how to achieve this.
Many thanks for your help,
Andre
I have a requirement to create GROUP_ID based on information present in two other fields. All ID_1 having same values must have a unique Group_ID and likewise, all ID_2 having same values must have a unique Group_ID. The Group_ID need not be contiguous.
ID_1 ID_2 GROUP_ID
X1 10 1
X1 20 1
Y1 30 2
Y2 30 2
A1 100 3
A1 200 3
B1 200 3
B1 200 3
B1 300 3
B1 300 3
C1 300 3
C1 400 3
I am using pyspark and I tried to solve in Spark SQL using window functions (see below), but unable to produce the desired output. Please let me know if there is an efficient way to solve this. My dataset is having >100M rows.
RowNum ID_1 ID_2 ID_1_1 ID_2_1 GROUP_ID
1 X1 10 1 1 1
2 X1 20 1 1 1
3 Y1 30 3 3 3
4 Y2 30 4 3 3
5 A1 100 5 5 5
6 A1 200 5 5 5
7 B1 200 7 5 5
8 B1 200 7 5 5
9 B1 300 7 7 5
10 B1 300 7 7 5
11 C1 300 11 7 7
12 C1 400 11 11 7
Where
ID_1_1 = First(ROWNUM) over (Partition by ID_1 order by RowNum)
ID_2_1 = First(ID_1_1) over (Partition by ID_2 order by ID_1_1)
Group_ID = First(ID_2_1) over (Partition by ID_1_1 order by ID_2_1)
Using above approach, Rows 11 and 12 gets a group ID of 7 instead of 5.
I have a large spreadsheet that needs a bit of fine-tuning. Column A contains numbers from 1-4000, sequentially-listed. Column B contains some of those same numbers, but many numbers are skipped (for example: 1, 5, 6, 7, 11, 12, 25...); those numbers are not "spaced out" relative to the numbers in column A. Column C contains text entries related to the Column B values. I need to add a formula in column D that will evaluate whether the number in A1 (for example) matches the number in B1, and if it does match it should place the value of C1 in D1. Likewise, if A2 matches B2, then the value of C2 should be placed in D1.
I tried writing some formulas similar to this, but they become far too large when you have to take into account every cell combination: =IF(A1=E1,F1). If I can just get the column B cells to "space themselves out" so that they match up with the cells in column A, that would get me where I need to be. Of course, the Column C data values would need to "follow along" with the Column B values when they are spaced out.
Any ideas on how I can make this happen?
If the data is like this:
A B C
---------------
1 1 Red
2 5 Blue
3 6 Red
4 7 Yellow
5 11 Black
6 12 Green
7 25 Yellow
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
And the desired result is this:
A B C D
--------------------
1 1 Red Red
2 5 Blue
3 6 Red
4 7 Yellow
5 11 Black Blue
6 12 Green Red
7 25 Yellow Yellow
8
9
10
11 Black
12 Green
13
14
15
16
17
18
19
20
21
22
23
24
25 Yellow
We can accomplish that using an INDEX-MATCH based formula starting in D1 and auto filling down:
=IFERROR(INDEX(C:C,MATCH(A1,B:B,0)),"")
It will look like this:
A B C D
-----------------------------------------------------------
1 1 Red =IFERROR(INDEX(C:C,MATCH(A1,B:B,0)),"")
2 5 Blue =IFERROR(INDEX(C:C,MATCH(A2,B:B,0)),"")
3 6 Red =IFERROR(INDEX(C:C,MATCH(A3,B:B,0)),"")
4 7 Yellow =IFERROR(INDEX(C:C,MATCH(A4,B:B,0)),"")
5 11 Black =IFERROR(INDEX(C:C,MATCH(A5,B:B,0)),"")
6 12 Green =IFERROR(INDEX(C:C,MATCH(A6,B:B,0)),"")
7 25 Yellow =IFERROR(INDEX(C:C,MATCH(A7,B:B,0)),"")
8 =IFERROR(INDEX(C:C,MATCH(A8,B:B,0)),"")
9 =IFERROR(INDEX(C:C,MATCH(A9,B:B,0)),"")
10 =IFERROR(INDEX(C:C,MATCH(A10,B:B,0)),"")
11 =IFERROR(INDEX(C:C,MATCH(A11,B:B,0)),"")
12 =IFERROR(INDEX(C:C,MATCH(A12,B:B,0)),"")
13 =IFERROR(INDEX(C:C,MATCH(A13,B:B,0)),"")
14 =IFERROR(INDEX(C:C,MATCH(A14,B:B,0)),"")
15 =IFERROR(INDEX(C:C,MATCH(A15,B:B,0)),"")
16 =IFERROR(INDEX(C:C,MATCH(A16,B:B,0)),"")
17 =IFERROR(INDEX(C:C,MATCH(A17,B:B,0)),"")
18 =IFERROR(INDEX(C:C,MATCH(A18,B:B,0)),"")
19 =IFERROR(INDEX(C:C,MATCH(A19,B:B,0)),"")
20 =IFERROR(INDEX(C:C,MATCH(A20,B:B,0)),"")
21 =IFERROR(INDEX(C:C,MATCH(A21,B:B,0)),"")
22 =IFERROR(INDEX(C:C,MATCH(A22,B:B,0)),"")
23 =IFERROR(INDEX(C:C,MATCH(A23,B:B,0)),"")
24 =IFERROR(INDEX(C:C,MATCH(A24,B:B,0)),"")
25 =IFERROR(INDEX(C:C,MATCH(A25,B:B,0)),"")
Is there a way to substitute the cell address containing a text string as the array criteria in the following formula?
=SUM(SUMIF(A5:A10,{1,22,3},E5:E10))
So instead of {1,22,3}, "1, 22, 3" is entered in cell A2 the formula becomes
=SUM(SUMIF(A5:A10,A2,E5:E10))
I have tried but get 0 as a result (refer C16)
A B C D E F G H
1 Tree
2 {1,22,3} 1
3 22
4 Tree Profit 3
5 1 105
6 2 96
7 1 105
8 1 75
9 2 76.8
10 1 45
11
12 330 =SUM(SUMIF(A5:A10,{1,22,3},B5:B10))
13
14 330 =SUMPRODUCT(SUMIF(A5:A10,E2:E3,B5:B10))
15
16 0 =SUM(SUMIF(A5:A10,A2,B5:B10))
17 NB: Custom Format "{"#"}" on Cell A2 I enter 1,22,3 so it displays {1,22,3}
Ok so after some further searching (see Excel string to criteria) and trial and error I have come up with the following solution.
Using Name Manager I created UDF called GetList which Refers to:
=EVALUATE(Sheet1!$A$3) NB: Cell A3 has this formula in it =TEXT(A2,"{#}")
I then used the following formula:
=SUMPRODUCT(SUMIF($A$5:$A$12,GetList,$B$5:$B$12))
which gives the desired result of 321 as per the other two formulas (see D12 below).
If anyone can suggest a better solution then feel free to do so.
Thanks to Dennis to my original post regarding table
A B C D E
1 Tree
2 1,22,3 1
3 {1,22,3} =TEXT(A2,"{#}") 22
4 Tree Profit 3
5 11 105
6 22 96
7 1 105
8 3 75
9 2 76.8
10 1 45
11
12 321 =SUMPRODUCT(SUMIF($A$5:$A$12,GetList,$B$5:$B$12))
13
14 321 =SUM(SUMIF(A5:A10,{1,22,3},B5:B10))
15
16 321 =SUMPRODUCT(SUMIF(A5:A10,E2:E3,B5:B10))
17
18 0 =SUM(SUMIF(A5:A10,A2,B5:B10))
19 NB: Custom Format "{"#"}" on Cell A2 I enter 1,22,3 so it displays {1,22,3}