Adding B where A is same [in excel] - excel

I want to add two rows where a variable is same.
sample.csv:
A B
corn 56
apple 43
banana 54
corn 87
mango 63
apple 67
corn 30
I want to add values of B where A is same and want to store answer in another column as follows:
corn 162
apple 110
banana 54
mango 63
Can I do this in excel? If yes, then what is the formula for that? I searched a-lot but unable to reach solution.

Try,
Select an empty cell.
Go to Data - Data Tools - Consolidate
Select the range (both columns)
Press add
Tick Left column
Press ok

Related

Look closest value in column C from column A Excel array formula

I have the numbers in Column A and corresponding codes in B. Now for numbers in C I want to search for each one the closest value in A and show its value of B.
So I use the following array formula in D2 and it works for almost all cases, but is printing wrong value for last value in column C (38909). The closest value for 38909 in column A is 38909 and its related code is PUK, but is printing JIM that is related with the value 3890947021.
May somebody help me to fix the formula to match all cases in this table please. Thanks
Formula I have is:
=IF($C2="","",IFERROR(LOOKUP(9.99999999999999E+307,
SEARCH(IF(LEN($A$2:$A$15)>LEN($C2),"|"
&LEFT($A$2:$A$15,LEN($C2)),"|"
&$A$2:$A$15),"|"&$C2),
$B$2:$B$15),"NOT FOUND"))
Table
[A] [B] [C] [D] [E]
CC CODE NUMBERS RESULT NOW RESULT EXPECTED
237 CMR 18763044 JAM JAM
230 MUS 187635 JAM JAM
61 AUS 23092 MUS MUS
31 NLD 3162 NLD NLD
599 ANT 38050 NOT FOUND NOT FOUND
358 FIN 33 FRA FRA
33751 FRA 49185 NOT FOUND NOT FOUND
65 SGP 51078 NOT FOUND NOT FOUND
1721 SXM 1246 BRB BRB
1876 JAM 389094702 JIM JIM
81 JPN 38909 JIM PUK
124622 BRB
38909 PUK
3890947021 JIM
Update
After some hours I was able to get this formula that works if the values in column A are sorted ascending.
=IF($C2="","",IFERROR(
LOOKUP(9.99999999999999E+307,
SEARCH(
IF(OR($A$2:$A$15=$C2),"|"&$A$2:$A$15,
IF(LEN($A$2:$A$15)>LEN($C2),
"|"&LEFT($A$2:$A$15,LEN($C2)),
"|"&$A$2:$A$15)),
"|"&$C2),$B$2:$B$15),"NOT FOUND"))
Thanks for your comments.
If i understand your formula correctly it does exactly what it should. Problem is that 38909 is in 3890947021. For 38909 your formula had two possible matches so it gives you the last one. If you change the order in your column A and B like:
[A] [B]
CC CODE
237 CMR
230 MUS
61 AUS
31 NLD
599 ANT
358 FIN
33751 FRA
65 SGP
1721 SXM
1876 JAM
81 JPN
124622 BRB
3890947021 JIM
38909 PUK
you'll get PUK for both numbers 3890947021 and 38909. The only way I see is to add another condition to the formula which would check if there is exact match in column A but it still wouldn't solve your problem if the order were reverse.

sort pyspark dataframe within groups

I would like to sort column "time" within each "id" group.
The data looks like:
id time name
132 12 Lucy
132 10 John
132 15 Sam
78 11 Kate
78 7 Julia
78 2 Vivien
245 22 Tom
I would like to get this:
id time name
132 10 John
132 12 Lucy
132 15 Sam
78 2 Vivien
78 7 Julia
78 11 Kate
245 22 Tom
I tried
df.orderby(['id','time'])
But I don't need to sort "id".
I have two questions:
Can I just sort "time" within same "id"? and How?
Will be more efficient if I just sort "time" than using orderby() to sort both columns?
This is exactly what windowing is for.
You can create a window partitioned by the "id" column and sorted by the "time" column. Next you can apply any function on that window.
# Create a Window
from pyspark.sql.window import Window
w = Window.partitionBy(df.id).orderBy(df.time)
Now use this window over any function:
For e.g.: let's say you want to create a column of the time delta between each row within the same group
import pyspark.sql.functions as f
df = df.withColumn("timeDelta", df.time - f.lag(df.time,1).over(w))
I hope this gives you an idea. Effectively you have sorted your dataframe using the window and can now apply any function to it.
If you just want to view your result, you could find the row number and sort by that as well.
df.withColumn("order", f.row_number().over(w)).sort("order").show()

Ranking Dates Based on Another Column - Spotfire

Does anyone know of way to circumvent the Spotfire limitation for using the OVER function to RANK or order dates when using a custom expression?
Providing a little background, I am trying to identify or mark a lease based on the below data as 1, 2, 3 etc. For example, since we see twice 63 in the left column, I would like to return a 1 and a 2 to identify the two different leases, starting on 1/1/2016 and 8/1/2016. Then a 1 and 2 for 72, a 1 for 140 and so one. Unfortunately, OVER functions can only be used with aggregation methods and I don't know of another method to produce the result that I am looking for.
Tenant Lease_From Lease_To Tenant_status
63 1/1/2016 1/31/2017 Current
63 8/1/2017 7/31/2018 Current
72 10/1/2016 7/31/2017 Current
72 8/1/2017 7/31/2018 Current
140 2/1/2017 7/31/2018 Current
149 8/1/2016 7/31/2017 Current
149 8/1/2017 7/31/2018 Current
156 1/15/2017 3/31/2018 Current
156 4/1/2018 3/31/2019 Current
Use this:
Rank([Lease_From], [Tenant])
Gives this as the result:
Tenant Lease_From Lease_To Tenant_status Rank([Lease_From], [Tenant])
63 1/1/2016 1/31/2017 Current 1
63 8/1/2017 7/31/2018 Current 2
72 10/1/2016 7/31/2017 Current 1
72 8/1/2017 7/31/2018 Current 2
140 2/1/2017 7/31/2018 Current 1
149 8/1/2016 7/31/2017 Current 1
149 8/1/2017 7/31/2018 Current 2
156 1/15/2017 3/31/2018 Current 1
156 4/1/2018 3/31/2019 Current 2
please consider #blakeoft's answer as the correct one!
that said, as an FYI, First() is considered an aggregation method, and OVER statements can be included inside of an If()! so you can accomplish the same thing with an expression like:
If([Lease_From] = First([Lease_From]) OVER ([Tenant]), 1, 2)
when you combine If() and OVER in this way, you can get some really cool and powerful visualizations, BUT you do lose the ability to mark data effectively. this is because the expression is evaluated from the context of the If() rather than the OVER; in other words, all rows are considered instead of only the ones selected.
you can get around this with some black magic (AKA data functions) but it's a bit contrived.
again, in this situation, Rank() is absolutely the correct solution.

How to transpose multiple columns to multiple rows excel?

Current data set:
14 72 73
54 75 66
98 87 65
Desired outcome:
14 72 73 54 75 66 98 87 65
I want to transpose multiple columns to multiple rows, any one have =OFFSET formula to make this done..
If 14 is in A2 then in say E2 and copied down something like:
=E1&","&A2&","&B2&","&C2
will concatenate the values in a single cell (with comma as the delimiter). To split these out, copy ColumnE, Paste Special, Values over the top and then apply Text to Columns to it, with comma as the delimiter.

Excel date/product count to specified limit

Column A "Sales Dates", Column B "=A2-A1" for "Date Diff", Column C "Customer Name", Column D "Item", Column E "Items Ordered Count"
My issue is I have to do a running 30 day total for each customer to see that specific items are not being ordered above "x" number within any 30-day period.
Does anyone have any ideas?
I may not be fully understanding your question, but I don't think you can do what you ask in excel. This might be a situation where a database that can do SQL might come in handy.
The best I can come up with in excel is a Pivot Table, with the customers as rows, dates as columns (group by month), and sum of Items Ordered in the data area. Then conditional format the data area to highlight values > your limit.
Perhaps if you provide some sample data & output I can come up with something more like what you need.
The formula would look something like this:
{=SUM(IF((A$2:A2>=A2-29)*(D$2:D2=D2),E$2:E2,0))}
It should be entered into cell F2 and copied down to the last row of your data. I pasted in a test spreadsheet below so you can see where things go (sorry for the formatting--hopefully it will look better if you paste it into Excel).
IMPORTANT: This is an array formula, so after you type in the formula (and don't type in the braces {} when you do), you must press Ctrl-Shift-Enter instead of just Enter (see this link for more details).
What does the formula do? It does two loops:
First, it loops through all the Sales Dates from the beginning of the log to the current row and checks if each date is between the date of the current row and 29 days earlier (which makes a 30-day window). (By "current row" I mean the row where the formula is located.)
Second, it loops through all the Items from the beginning of the log to the current row and checks if there is a match with the Item of the current row.
For any row where both checks are true (the "*" in the formula does an "and" operation), Items Ordered Count is added to the sum, otherwise zero is added to the sum. So, when it's finished, you have a count for each row of how many orders there were in the past 30 days for that item.
HTH,
-Dan
Sales Dates Date Diff Customer Name Item Items Ordered Count 30-Day Count
1/1/2009 0 dfsadf 11336 70 70
1/2/2009 1 asdfd 10218 121 121
1/3/2009 1 fsdfjkfl 10942 101 101
1/6/2009 3 slkdjflsk 13710 80 80
1/7/2009 1 slkdjls 10480 127 127
1/9/2009 2 sdjjf 11336 143 213
1/11/2009 2 woieuriwe 11501 84 84
1/14/2009 3 owqieyurtn 10191 78 78
1/15/2009 1 weisd 10480 113 240
1/16/2009 1 woieuriwe 12024 133 133
1/17/2009 1 vkcjl 13818 125 125
1/20/2009 3 sdflkj 11336 128 341
1/23/2009 3 jnbkdl 10480 141 381
1/25/2009 2 pqcvnlz 10480 137 518
1/27/2009 2 hwodkjgfh 12878 80 80
1/28/2009 1 zjdnfg;pwlkd 10942 123 224
1/31/2009 3 zlkdjnf;psod 13173 93 93
2/2/2009 2 zlknpdodfg 11336 119 390
2/4/2009 2 zjhdfpwskjh 12004 57 57
2/5/2009 1 asdfd 10218 121 121
2/8/2009 3 fsdfjkfl 10942 101 224
2/11/2009 3 slkdjflsk 13710 80 80
2/14/2009 3 slkdjls 10480 127 405
2/16/2009 2 sdjjf 11336 143 390
2/18/2009 2 woieuriwe 11501 84 84
2/21/2009 3 owqieyurtn 10191 78 78
2/24/2009 3 weisd 10480 113 240
2/25/2009 1 woieuriwe 12024 133 133
2/27/2009 2 vkcjl 13818 125 125
2/28/2009 1 sdflkj 11336 128 390

Resources