I have two attributes A & B as below. I want to fetch latest record from each set of duplicate record's based on creation time.
I tried doing it with MAX function but flexible search is not supporting MAX function. Any help would be appreciated?
Table : ABC
A B
100 11
100 11
100 11
200 12
200 12
300 13
Result :
100 11
200 12
300 13
You can do it using max function. Please use it as given below
SELECT max({modifiedtime}) FROM {Product}
If you want to select rows with MAX(creationTime) but DISTINCT by column A in SQL then you can try something like this
SELECT t1.*
FROM ABC t1
JOIN
({{
SELECT {aCol}, MAX(creationTime) AS maxDateTime
FROM ABC
GROUP BY {aCol}
}}) as t2
ON t1.aCol = t2.aCol
AND t1.creationTime = t2.maxDateTime
Related
I have a dataframe.Structure:
SEQ product_name prod_cost non-prd_cost mgmt grand_total
1 prod1 100 200 20 320
2 prod2 200 400 30 630
3 prod3 300 500 40 840
4 prod4 100 300 50 450
I want to calculate sumproduct(in excel) based on condition.The condition is based on product_name.
lets say I want to calculate a variable called
sumprod_prod1_prd_prod3_mgmt = SUMPRODUCT(SEQ 1-4,product_name='prod1'_prod_cost and 'prod3'_mgmt)/2 = 100+40=140
How can I do this in pandas?
While I am a bit confused by your question, since the excel SUMPRODUCT function returns the sum of the products of corresponding ranges or arrays, and you seem to want the SUM of a singular combination.
To get the desired value:
sumprod_prod1_prd_prod3_mgmt = df[df['product_name'] == 'prod1']['prod_cost'].values[0]+df[df['prod_name']=='prod3']['mgmt'].values[0]
This solution gives a single result for the specified values. If you need a solution which provides the same functionality as excel, please update your question and example to better define what you are looking for.
Hi I have a data table like this
Group CODE Quantity Value FLAG
A AP 10 100 1
A AM 10 200 1
A AP 03 50 2
B AM 50 150 2
I want to check if the column values for GROUP & CODE are equal & FLAG values are different, Then i want to subract Value with FLAG 1 to Value with Flag 2 and update it to new column as Delta.
If there exist no match with the column values for Group & Code then just update "0" to new Column as Delta.
The output data will look like this
Group CODE Quantity Value FLAG Delta
A AP 10 100 1 50
A AM 10 200 1 0
A AP 03 50 2 0
B AM 50 150 2 0
I would like to do this in Alteryx. Thanks for your support & Help
I have a large data set with millions of records which is something like
Movie Likes Comments Shares Views
A 100 10 20 30
A 102 11 22 35
A 104 12 25 45
A *103* 13 *24* 50
B 200 10 20 30
B 205 *9* 21 35
B *203* 12 29 42
B 210 13 *23* *39*
Likes, comments etc are rolling totals and they are suppose to increase. If there is drop in any of this for a movie then its a bad data needs to be identified.
I have initial thoughts about groupby movie and then sort within the group. I am using dataframes in spark 1.6 for processing and it does not seem to be achievable as there is no sorting within the grouped data in dataframe.
Buidling something for outlier detection can be another approach but because of time constraint I have not explored it yet.
Is there anyway I can achieve this ?
Thanks !!
You can use the lag window function to bring the previous values into scope:
import org.apache.spark.sql.expressions.Window
val windowSpec = Window.partitionBy('Movie).orderBy('maybesometemporalfield)
dataset.withColumn("lag_likes", lag('Likes, 1) over windowSpec)
.withColumn("lag_comments", lag('Comments, 1) over windowSpec)
.show
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-functions.html#lag
Another approach would be to assign a row number (if there isn't one already), lag that column, then join the row to it's previous row, to allow you to do the comparison.
HTH
Customer# Date Qty, Cost
12 1/2/2013 3 500
12 1/3/2013 5 200
12 1/4/2013 4 200
13 1/5/2013 1 150
14 1/6/2013 2 110
14 1/7/2013 1 110
15 1/8/2013 1 110
I have a table similar to the above table (with millions of records and 26 column).I would like to create two table based of this one. the first table is to show me the first order of each customer and its associated column and the second one is to show me data for the second order of each customer ( if they don't have it will be null).
the Result i am looking for
Table one- First order
Customer#, Date , Qty, Cost
12 , 1/2/2013, 3, 500
13 , 1/5/2013, 1, 150
14 , 1/6/2013, 2, 110
15 , 1/8/2013, 1, 110
Table two- second order table
Customer#, Date , Qty , Cost
12 , 1/3/2013, 5, 200
14 , 1/7/2013, 1 , 110
The formula i tried but failed to work
=INDEX(B:D,MATCH(A3,A:A,0))
I would appreciate if someone shares their ideas how to use the Index and match function in excel to solve this question.
I was able to solve the issue above using Tableau. I just used the Index() function to calculate the rank based on their order date and id and filtered by the rank to get the first and second order table.
Consider these values:
company_ID 3yr_value
1 10
2 20
3 30
4 40
5 50
I have this statement on my query and my goal is to compute for the percent rank of value 50 in the group
round(((percent_rank() over (partition by bb.company_id order by bb.3yr_value)) * 100))
in excel, this is equivalent to
=percentrank(b1:b5,b5)
BUT, what I need is an equivalent to this 1:=percentrank(b1:b4,b5) -- notice that I don't include A5 in the range that needs to be evaluated. I'm out of options, and already consulted Mr. Google but it seems I still cant find the solution. I always end up including B5 in my query.
I'm using postgres sql