PIVOT and sum two columns - pivot

i hope that i Formulated the ddl +dml query and my Question without mistakes.
i build this query
DECLARE #temp TEHOODOTRECHESH
(
VendorCode INT
, VendorName NVARCHAR(50)
, CheckDate DATETIME
, CheckSum DECIMAL(10,2)
, ObjType INT
)
INSERT INTO #temp TEHOODOTRECHESH (VendorCode, VendorName, CheckDate, CheckSum,ObjType
VALUES
(1, 'AAA', '20130101', 40,18),
(1, 'AAA', '20130101', 60,18),
(1, 'AAA', '20130101', 40,19),
(2, 'BBB', '20130303', 50,18),
(2, 'BBB', '20130601', 10,18),
(2, 'BBB', '20130604', 20,19)
SELECT * FROM
( SELECT
case when [ObjType]='18' then N'tr' else N'tz' end as 'DT',
CARDCODE,
CardName,
year(DocDueDate) as 'year',left(datename(month,DocDueDate),3) as [month],
DocTotal as 'Amount'
FROM TEHOODORRECHESH
WHERE DocStatus ='O' ) as monthsum
PIVOT
(
sum (Amount)
for [month] IN (jan, feb, mar, apr, may, jun, jul, aug, sep, oct,
nov, dec) ) AS SUMPIVOT
i want to sum the rows of documets from ObjType 18 / 19 (only two types) and sum the objtype by months.
is there an oportunity that if the ObjType 19 to put the numbers minus opertaor (-)
or it will in Brackets ()
when i wrote order by only vendor code it works, but if i add order by: vendorcode + objtype
its not working. why?

Related

Calculating %diff between 2 values in Presto

I have a field type double called values.
I want to calculate the %diff between last value and the one before last value, for example:
value
10
2
4
2
the output should be: -50%
How can I do this in presto?
If you have a field for ordering (otherwise result is not guaranteed) you can use lag window function:
-- sample data
WITH dataset ("values", date) AS (
VALUES (10, now()),
(4, now() + interval '1' hour),
(2, now() + interval '2' hour)
)
--query
select ("values" - l) * 100.0 / l as value
from(
select "values",
lag("values") over(order by date) as l,
date
from dataset
)
order by date desc
limit 1
Output:
value
-50.0

the sum function cannot work with string values

I'm working with DAX in PowerBI. I hava a column with 80 000 string values.
70% of these values is "European Desk". I want to show this percentage. It's string value, i don't understand how to do it with DAX
Any advice ?
The measure you are looking for is
% European Desk = DIVIDE(
CALCULATE(
COUNT('Table'[String]),
'Table'[String] = "European Desk"
),
COUNT('Table'[String])
)
With CALCULATE you can change the filter context for the COUNT() aggregation.
You can apply this formula to e.g. this table:
Table = DATATABLE(
"Index", INTEGER,
"String", STRING,
{
{1, "European Desk"},
{2, "European Desk"},
{3, "European Desk"},
{4, "African Desk"},
{5, "Asian Desk"}
}
)

force 0 when blank with addcolumn summarize excel 2016 dax for more accurate averages

I have one table that has three columns: lisa, customers and activity_type. I want to count the number of rows by customer and activity type and then average them over all customers by activity types.
If this were a table in sql, I'd do
SELECT
lisa,
customer,
activity_type,
average(ct)
FROM
(
SELECT
lisa,
customer,
activity_type,
CASE
WHEN
s.ct IS NULL
THEN
0
ELSE
s.ct
END
ct
FROM
(
SELECT
*
FROM
(
SELECT DISTINCT
lisa,
customer
FROM
TABLE
)
,
(
SELECT DISTINCT
activity_type
)
)
LEFT JOIN
(
SELECT
lisa,
customer,
activity_type,
COUNT(*) ct
FROM
TABLE
GROUP BY
1,
2,
3
)
s
)
s
But it's Dax, which is infinitely harder. I tried:
=
AVERAGEX(
ADDCOLUMNS(
CROSSJOIN( VALUES( Query1[customer] ), VALUES( Query1[activity_type] ) ),
"C", CALCULATE( COUNTA( Query1[engagio_activity_id] ) + 0 )
),
IF( [C] = BLANK(), 0, [C] )
)
and
=
AVERAGEX(
ADDCOLUMNS(
SUMMARIZE( Query1[lisa], Query1[activity_type] ),
"C", CALCULATE( COUNTA( Query1[engagio_activity_id] ) + 0 )
),
IF( [C] = BLANK(), 0, [C] )
)
But try as I might, I still get:
Where the blanks are not treated as 0 in the aggregate rows such as the "no" row in the picture above. That roll up amount ignores the blanks when calculating the averages. When I put the cross join into the dax studio, I forced the 0's
So it's a mystery to me where the 0s went.
I think you are complicating
Average=
VAR totalCustomers = COUNTROWS(ALL(Query1[customer])) //this gives you total # of customers
RETURN
DIVIDE(COUNT(Query1[engagio_activity_id]) + 0, //the +0 forces the count to always return something
totalCostumers)

Presto / AWS Athena query, historicized table (last value in aggregation)

I've got a table split in a static part and a history one. I have to create a query which groups by a series of dimensions, including year and month, and do some aggregations. One of the values that I need to project is a value of the last tuple of the history table matching the given year / month couple.
History table have validity_date_start and validity_date_end, and the latter is NULL if it's up-to-date.
This is the query I've done so far (using temporary tables for ease of reproduction):
SELECT
time.year,
time.month,
t1.name,
FIRST_VALUE(t2.value1) OVER(ORDER BY t2.validity_date_start DESC) AS value, -- take the last valid t2 part for the month
(CASE WHEN t1.id = 1 AND time.date >= timestamp '2020-07-01 00:00:00' THEN 27
ELSE CASE WHEN t1.id = 1 AND time.date >= timestamp '2020-03-01 00:00:00' THEN 1
ELSE CASE WHEN t1.id = 2 AND time.date >= timestamp '2020-05-01 00:00:00' THEN 42 END
END
END) AS expected_value
FROM
(SELECT year(ts.date) year, month(ts.date) month, ts.date FROM (
(VALUES (SEQUENCE(date '2020-01-01', current_date, INTERVAL '1' MONTH))) AS ts(ts_array)
CROSS JOIN UNNEST(ts_array) AS ts(date)
) GROUP BY ts.date) time
CROSS JOIN (VALUES (1, 'Hal'), (2, 'John'), (3, 'Jack')) AS t1 (id, name)
LEFT JOIN (VALUES
(1, 1, timestamp '2020-01-03 10:22:33', timestamp '2020-07-03 23:59:59'),
(1, 27, timestamp '2020-07-04 00:00:00', NULL),
(2, 42, timestamp '2020-05-29 10:22:31', NULL)
) AS t2 (id, value1, validity_date_start, validity_date_end)
ON t1.id = t2.id
AND t2.validity_date_start <= (CAST(time.date as timestamp) + interval '1' month - interval '1' second)
AND (t2.validity_date_end IS NULL OR t2.validity_date_end >= (CAST(time.date as timestamp) + interval '1' month - interval '1' second)) -- last_day_of_month (Athena doesn't have the fn)
GROUP BY time.date, time.year, time.month, t1.id, t1.name, t2.value1, t2.validity_date_start
ORDER BY time.year, time.month, t1.id
value and expected_value should match, but they don't (value is always empty). I've evidently misunderstood how FIRST_VALUE(...) OVER(...) works.
May you please help me?
Thank you very much!
I've eventually found out what I was doing wrong here.
In the documents it is written:
The partition specification, which separates the input rows into different partitions. This is analogous to how the GROUP BY clause separates rows into different groups for aggregate functions
This led me to think that if I already had a GROUP BY statement, this was useless. It is not: generally if you want to get the datum for the given group, you have to specify it in the PARTITION BY statement, too (or better the dimensions that you're projecting in the SELECT part).
SELECT
time.year,
time.month,
t1.name,
FIRST_VALUE(t2.value1) OVER(PARTITION BY (time.year, time.month, t1.name) ORDER BY t2.validity_date_start DESC) AS value, -- take the last valid t2 part for the month
(CASE WHEN time.date >= timestamp '2020-07-01 00:00:00' AND t1.id = 1 THEN 27
ELSE CASE WHEN time.date >= timestamp '2020-05-01 00:00:00' AND t1.id = 2 THEN 42
ELSE CASE WHEN time.date >= timestamp '2020-03-01 00:00:00' AND t1.id = 1 THEN 1 END
END
END) AS expected_value
FROM
(SELECT year(ts.date) year, month(ts.date) month, ts.date FROM (
(VALUES (SEQUENCE(date '2020-01-01', current_date, INTERVAL '1' MONTH))) AS ts(ts_array)
CROSS JOIN UNNEST(ts_array) AS ts(date)
) GROUP BY ts.date) time
CROSS JOIN (VALUES (1, 'Hal'), (2, 'John'), (3, 'Jack')) AS t1 (id, name)
LEFT JOIN (VALUES
(1, 1, timestamp '2020-03-01 10:22:33', timestamp '2020-07-03 23:59:59'),
(1, 27, timestamp '2020-07-04 00:00:00', NULL),
(2, 42, timestamp '2020-05-29 10:22:31', NULL)
) AS t2 (id, value1, validity_date_start, validity_date_end)
ON t1.id = t2.id
AND t2.validity_date_start <= (CAST(time.date as timestamp) + interval '1' month - interval '1' second)
AND (t2.validity_date_end IS NULL OR t2.validity_date_end >= (CAST(time.date as timestamp) + interval '1' month - interval '1' second)) -- last_day_of_month (Athena doesn't have the fn)
GROUP BY time.date, time.year, time.month, t1.id, t1.name, t2.value1, t2.validity_date_start
ORDER BY time.year, time.month, t1.id

Getting weekly and daily averages of timestamp data

I currently have data on a Spark data frame that is formatted as such:
Timestamp Number
......... ......
M-D-Y 3
M-D-Y 4900
The timestamp data is in no way uniform or consistent (i.e., I could have one value that is present on March 1, 2015, and the next value in the table be for the date September 1, 2015 ... also, I could have multiple entries per date).
So I wanted to do two things
Calculate the number of entries per week. So I would essentially want a new table that represented the number of rows in which the timestamp column was in the week that the row corresponded to. If there are multiple years present, I would ideally want to average the values per each year to get a single value.
Average the number column for each week. So for every week of the year, I would have a value that represents the average of the number column (0 if there is no entry within that week).
Parsing date is relatively easy using built-in functions by combining unix_timestamp and simple type casting:
sqlContext.sql(
"SELECT CAST(UNIX_TIMESTAMP('March 1, 2015', 'MMM d, yyyy') AS TIMESTAMP)"
).show(false)
// +---------------------+
// |_c0 |
// +---------------------+
// |2015-03-01 00:00:00.0|
// +---------------------+
With DataFrame DSL equivalent code would be something like this:
import org.apache.spark.sql.functions.unix_timestamp
unix_timestamp($"date", "MMM d, yyyy").cast("timestamp")
To fill missing entries you can use different tricks. The simplest approach is to use the same parsing logic as above. First let's create a few helpers:
def leap(year: Int) = {
((year % 4 == 0) && (year % 100 != 0)) || (year % 400 == 0)
}
def weeksForYear(year: Int) = (1 to 52).map(w => s"$year $w")
def daysForYear(year: Int) = (1 to { if(leap(2000)) 366 else 366 }).map(
d => s"$year $d"
)
and example reference data (here for weeks but you can do the same thing for days):
import org.apache.spark.sql.functions.{year, weekofyear}'
val exprs = Seq(year($"date").alias("year"), weekofyear($"date").alias("week"))
val weeks2015 = Seq(2015)
.flatMap(weeksForYear _)
.map(Tuple1.apply)
.toDF("date")
.withColumn("date", unix_timestamp($"date", "yyyy w").cast("timestamp"))
.select(exprs: _*)
Finally you can transform the original data:
val df = Seq(
("March 1, 2015", 3), ("September 1, 2015", 4900)).toDF("Timestamp", "Number")
val dfParsed = df
.withColumn("date", unix_timestamp($"timestamp", "MMM d, yyyy").cast("timestamp"))
.select(exprs :+ $"Number": _*)
merge and aggregate:
weeks2015.join(dfParsed, Seq("year", "week"), "left")
.groupBy($"year", $"week")
.agg(count($"Number"), avg($"Number"))
.na.fill(0)

Resources