How to get latest 3 months data by default from complete data set - apache-spark

I have a full year data set and I have developed a power bI report on it and I scheduled it.
I need to show up last 3 months data every time.
Column a column b column c
a 1 2019-01-01
b 2 2019-02-01
c 3 2019-03-01
d 4 2019-04-01
e 5 2019-05-01
I am trying to get last 3 months data from above table by using hive query without hard coding the month name or month number in where condition.
like by using this kind of date function.
select add_month( month, max(month(COLUMN C)),-3) from tableA

Its add_months
select * from tableA where columnc > add_months(columnc,-3)
if the columnc is not a string then cast it
select * from tableA where (cast(columnc as string),'yyyy-MM-dd') > add_months((cast(columnc as string),'yyyy-MM-dd'),-3)

Related

Excel two column and condition

I have a column with dates like so where w represents weeks and d represents days.
Column1 Column2
in 1w3d out 1w5d
in 2w0d out 3w0d
in 24w2d out 23w0d
in 1w0d out 1w2d
Basically what I want to do is to check if both dates in Column 1 and Column 2 are between 1 week and 24 weeks. Both column values must be within the range.
Thus the output would be
Column1 Column2 Column3
in 1w3d out 1w5d Between 1 week and 24 weeks
in 2w0d out 3w0d Between 1 week and 24 weeks
in 24w2d out 23w0d Not between
in 1w0d out 1w2d Not between
I have an if statement for this written in excel that is like so
=IF(AND(A2>"in 1w0d,A2<"in 24w0d",B2>"out 1w0d,B2<"out 24w0d"),1,0)
However this does not get the desired results
Column3=MID(A2,SEARCH(" ",A2),SEARCH("w",A2)-SEARCH(" ",A2))*7+(MID(A2,LEN(A2)-1,1))
Column4=MID(B2,SEARCH(" ",B2);SEARCH("w",B2)-SEARCH(" ",B2))*7+(MID(B2,LEN(B2)-1,1))
Column5=IF(AND(C2>7,C2<24*7,D2>7,D2<24*7),1,0)
Maybe you need C2>=7 etc depending on where to draw the line

Calculate 3 month Average on the base of CustomerID

I am trying to calculate three month average sales in excel w.r.t customerid in excel. I tried by doing it by AverageIfs function but nothing helped.
A B C
Orderdate sales customerid
5/15/2019 7 1
5/15/2019 48.5 1
4/15/2019 92.94 1
3/17/2019 102.85 1
3/18/2019 49 1
3/18/2019 119.95 1
2/18/2019 58.96 1
1/20/2019 14.6 1
5/16/2019 17 6
4/15/2019 148.5 6
4/12/2019 912.94 6
3/17/2019 102.85 6
9/18/2018 22.34 6
Formula I tried: =AVERAGEIFS(B:B,C:C,C2)
output expected:
customerid average(3 months)
1 49.48
6 359.48
Let's start from today's date and the date 3 months ago (Make it dynamic):
Remember to change the cell format from General to Date. Otherwise, it will show [43563]
Next use the date as part of our filter:
Now you should get the most recent 3 months data:
Copy the filtered data into a new spreadsheet
Copy the filtered data into a new spreadsheet
Copy the filtered data into a new spreadsheet
Next Step: get the distinct customer ID:
You will get this:
Last Step:
Use the function "AVERAGEIF":
Done!

Convert sql to excel formula

I have 2 tables in excel.
Table 1
Item Quantity_Required Quantity_Remaining
A 5
B 10
C 3
Table 2
Source Item Quantity
1 A 2
2 A 1
1 B 5
My result should be to fill in Quantity_Remaining column in Table 1
Table 1
Item Quantity_Required Quantity_Remaining
A 5 2
B 10 5
C 3 3
The logic in SQL code is as follows.
SELECT A.Item,
A.Quantity_Required,
A.Quantity_Required - B.Quantity as Quantity_Remaining
FROM Table1 A
LEFT JOIN
(SELECT Item,
SUM(Quantity)
FROM Table2
GROUP BY Item) B
ON A.Item = B.Item
I need pointers on how to translate this to Excel.
For data placed in excel sheet like below, you can use this formula:
=$B2-SUMPRODUCT(($A2=$B$8:$B$10)*($C$8:$C$10))
So the second part of formula SUMPRODUCT is looking for cells in range B8:B10 which match with A2 and then taking their values from column C and adding them.

How to find values in one column in another column with multiple values

I have an excel like
A B START DATE END DATE
1 10 01-jan-2016 02-jan-2016
2 11 01- jan-2051 02-feb-2061
3 1 04-mar-2016 07-mar-2016
4 1 08-mar-2016 10-mar-2016
5 5 01-mar-2016 03-dec-2016
6 5 03-nov-2016 31-dec-4712
I am new to excel. I want to highlight or extract the columns in A column which can be found in B Column along with the start date and end date .
That is result should be like :
A start_date end_date
1 04-mar-2016 07-mar-2016
1 08-mar-2016 10-mar-2016
5 01-mar-2016 03-dec-2016
5 03-nov-2016 31-dec-4712
Can anyone pls suggest something ?
In E2 enter:
=IF(COUNTIF(A:A,B2)>0,"X","")
and copy down. Then filter the table
You can hide any un-wanted columns after that.

How to get the latest date with same ID in Excel

I want to Get the Record with the most recent date as same ID's have different dates. Need to pick the BOLD values. Below is the sample data, As original data consist of 10000 records.
ID Date
5 25/02/2014
5 7/02/2014
5 6/12/2013
5 25/11/2013
5 4/11/2013
3 5/05/2013
3 19/02/2013
3 12/11/2012
1 7/03/2013
2 24/09/2012
2 7/09/2012
4 6/12/2013
4 19/04/2013
4 31/03/2013
4 26/08/2012
What I would do is in column B use this formula and fill down
=LEFT(A1,1)
in column C
=DATEVALUE(MID(A1,2,99))
then filter column B to a specific value of interest and sort by column C to order these values by date.
Edit: Even easier do a two level sort by B then by C newest to oldest. The first B in the list is newest.
Do you need a programmatic / formula only solution or can you use a workflow? If a workflow will work, then how about this:
Construct a pivot table of your data
Make the Rows Labels the ID
Make the Values Max of Date
The resulting table is your answer.
Row Labels Max of Date
1 07/03/13
2 24/09/12
3 05/05/13
4 06/12/13
5 25/02/14

Resources