How to pass column value to "TABLESAMPLE"? - apache-spark

I have a table which I want to sample using "TABLESAMPLE"
SELECT colA, colB FROM DATA TABLESAMPLE (3 PERCENT)";
Instead of doing the above, I want to do something like this
SELECT colA, colB FROM DATA TABLESAMPLE (colB PERCENT)";
The percent is constant and stored in colB.
How can I do this?

Related

Pivot table, Excel Online, show all months in time period as columns even if values are empty? (recreating Query from Google Sheets)

I have a Google Sheets spreadsheet with a query formula and I'm trying to recreate this in Excel. I thought Power Query would be the right place, but it doesn't seem to be available in 365 online.
I tried building a pivot table and got some what close, but it's missing months that have no data.
This is my Google Sheets formula
where F != 0
group by E
order by sum(F) desc
label sum(F) ''",0)
The formula spits out something like this:
(I included the headers in my screenshot, but they are not part of the query formula)
In my pivot table, this is the output
I have a helper table of all the months in the time period selected (they're dynamic based on the starting month the user selects), but I don't see a way of referencing that in my sheet.
Before I go buy a subscription to get the desktop version of Excel to work with Power Query, is there something I can do here to get the pivot table to do what I want?
Any help is appreciated.
Interesting approach... see if this is what you are looking for:
Assume you are given data set like this:
Formula:
=LAMBDA(RANGE,
LAMBDA(REVENUE,DATE,VALUE,
LAMBDA(UREVENUE,UDATE,VALUE,
LAMBDA(PIVOT,
HSTACK(
VSTACK("REVENUE",UREVENUE),
BYROW(PIVOT,LAMBDA(ROW,IF(NOT(ISNUMBER(INDEX(ROW,1))),"TOTAL",SUM(ROW)))),
PIVOT
)
)(
MAKEARRAY(COUNTA(UREVENUE)+1,COUNTA(UDATE),LAMBDA(ROW,COL,
IFS(
ROW=1,INDEX(UDATE,COL),
TRUE,SUM(FILTER(VALUE,(REVENUE=INDEX(UREVENUE,ROW-1))*(DATE=INDEX(UDATE,COL)),0))
)
))
)
)(UNIQUE(REVENUE),UNIQUE(DATE),IF(VALUE="",0,VALUE))
)(INDEX(RANGE,,1),INDEX(RANGE,,2),INDEX(RANGE,,3))
)($A$2:$C$19)
This formula mainly uses MAKEARRAY() with INDEX() to recreate the output of google QUERY().
name $A$2:$C$19 as RANGE with LAMBDA(),
name Col1,Col2,Col3 of RANGE as REVENUE,DATE,VALUE with LAMBDA(),
get unique values of REVENUE and DATE with UNIQUE(),
fill in 0 for empty values in VALUE with IF(),
name the unique values as UREVENUE and UDATE, and update the values of VALUE with LAMBDA(),
get pivot sum of REVENUE and DATE with MAKEARRAY(), which COUNTA(UREVENUE)+1 as number of ROW, and COUNTA(UDATE) as number of COL,
the reason why we need to +1 in ROW count, is because that we have to add a header row for the pivoted data,
inside MAKEARRAY(), fill in every CELL with IFS() according to some conditions,
when ROW index is 1, it should be the header row, so return the value of UDATE as header, INDEX() here determines which value of UDATE should be shown according to the COL index,
when ROW index is not 1, FILTER() the data of VALUE according to UREVENUE and UDATE, same as step 9, we use INDEX() to determines which value of the given data sets are we referencing, SUM() the result of FILTER() to form a single value for each CELL,
name the result of MAKEARRAY() as PIVOT with LAMBDA(),
get SUM() of each ROW in PIVOT with BYROW(),
stack the outputs with VSTACK() and HSTACK() to form the result.
Just noticed that you have mentioned about there is another table to select the data you would like to display according to months, so I updated the formula, and also separated the header part from MAKEARRAY() so the whole thing may hopefully looks logically easier to understand:
=LAMBDA(DATARANGE,SELECTED,HEADERS,
LAMBDA(REVENUE,DATE,VALUE,SELECTED,
LAMBDA(UREVENUE,VALUE,
LAMBDA(PIVOT,HEADERS,
LAMBDA(TOTAL,
VSTACK(HEADERS,HSTACK(UREVENUE,TOTAL,PIVOT))
)(BYROW(PIVOT,LAMBDA(ROW,SUM(ROW))))
)(
MAKEARRAY(COUNTA(UREVENUE),COUNTA(SELECTED),LAMBDA(ROW,COL,
SUM(FILTER(VALUE,(REVENUE=INDEX(UREVENUE,ROW))*(DATE=INDEX(SELECTED,COL)),0))
)),
HSTACK(HEADERS,TRANSPOSE(SELECTED))
)
)(UNIQUE(REVENUE),IF(VALUE="",0,VALUE))
)(INDEX(DATARANGE,,1),INDEX(DATARANGE,,2),INDEX(DATARANGE,,3),TRIM(TEXTSPLIT(SELECTED,,",")))
)($A$2:$C$19,$F$1,HSTACK("REVENUE","TOTAL"))
name $A$2:$C$19 as DATARANGE, $F$1 as SELECTED, HSTACK("REVENUE","TOTAL") as HEADERS with LAMBDA(), which the two texts inside HEADERS would be the titles of output column 1 and 2,
name Col1,Col2,Col3 of DATARANGE as REVENUE,DATE,VALUE, and TEXTSPLIT() SELECTED into an ARRAY, apply TRIM() on the ARRAY to get rid of extra spacing if there is any, and update it as values of SELECTED with LAMBDA(),
get unique values of REVENUE with UNIQUE(), and fill in 0 for empty values in VALUE with IF(),
name the unique values as UREVENUE, and update the values of VALUE with LAMBDA(),
update values of HEADERS, HSTACK() it with TRANSPOSE(SELECTED) to form the complete HEADERS,
get pivot sum of REVENUE and DATE with MAKEARRAY(), which COUNTA(UREVENUE) as number of ROW, and COUNTA(SELECTED) as number of COL,
inside MAKEARRAY(), fill in every CELL with FILTER(), filter the data of VALUE according to UREVENUE and SELECTED, uses INDEX() to determines which value of the given data are we referencing as criteria, SUM() the result of FILTER() to form a single value for each CELL, name the result of MAKEARRAY() as PIVOT with LAMBDA(),
get SUM() of each ROW in PIVOT with BYROW(), name the result of BYROW() as TOTAL with LAMBDA(),
stack the outputs with VSTACK() and HSTACK() to form the final result.
Some SORT() is added to help rearrange the output data if needed, controlled by the value of SORTREVENUE,SORTTOTAL,SORTPIVOT.
In the 1st LAMBDA(), pass ASC or DESC to activate related SORT(), or leave it blank to turn that SORT() off.
Be noticed, sort result of SORTTOTAL will overwrite sort result of SORTREVENUE.
=LAMBDA(DATARANGE,SELECTED,HEADERS,SORTREVENUE,SORTTOTAL,SORTPIVOT,
LAMBDA(REVENUE,DATE,VALUE,SELECTED,MONTHS,
LAMBDA(UREVENUE,SELECTED,
LAMBDA(PIVOT,HEADERS,
LAMBDA(TOTAL,
VSTACK(
HEADERS,
LAMBDA(OUTPUT,ASC,DESC,
IF(OR(ASC,DESC),SORT(OUTPUT,2,IF(DESC,-1,ASC)),OUTPUT)
)(HSTACK(UREVENUE,TOTAL,PIVOT),SORTTOTAL="ASC",SORTTOTAL="DESC")
)
)(BYROW(PIVOT,LAMBDA(ROW,SUM(ROW))))
)(
MAKEARRAY(COUNTA(UREVENUE),COUNTA(SELECTED),LAMBDA(ROW,COL,
SUM(FILTER(VALUE,(REVENUE=INDEX(UREVENUE,ROW))*(DATE=INDEX(SELECTED,COL)),0))
)),
HSTACK(HEADERS,TRANSPOSE(SELECTED))
)
)(
LAMBDA(REVENUE,ASC,DESC,
IF(OR(ASC,DESC),SORT(REVENUE,,IF(DESC,-1,ASC)),REVENUE)
)(UNIQUE(REVENUE),SORTREVENUE="ASC",SORTREVENUE="DESC"),
LAMBDA(TEXT,NUM,ASC,DESC,
IF(OR(ASC,DESC),
XLOOKUP(SORT(XLOOKUP(SELECTED,TEXT,NUM),,IF(DESC,-1,ASC)),NUM,TEXT),
SELECTED
)
)(INDEX(MONTHS,,1),INDEX(MONTHS,,2),SORTPIVOT="ASC",SORTPIVOT="DESC")
)
)(
INDEX(DATARANGE,,1),
INDEX(DATARANGE,,2),
INDEX(DATARANGE,,3),
TRIM(TEXTSPLIT(UPPER($F$1),,",")),
HSTACK({"JAN";"FEB";"MAR";"APR";"MAY";"JUN";"JUL";"AUG";"SEP";"OCT";"NOV";"DEC"},SEQUENCE(12))
)
)($A$2:$C$19,$F$1,HSTACK("REVENUE","TOTAL"),"DESC","DESC","DESC")

Concatenates the values of string expressions in azure synapse serverless pool

I would like to write a query in serverless pool for concatenation of string values from multiple rows into single row with comma separated values. I am getting below error when I use COALESE function which I am unable to fix "Queries referencing variables are not supported in distributed processing mode"
Input rows :
A
B
C
A
B
Output row (Looking for distinct values only while creating a list like below)
A,B,C
You can use STRING_AGG() function to concatenate values from multiple rows to a single row with comma-separated.
Get distinct values of a column and apply STRING_AGG on the results as below.
select STRING_AGG(col1, ',') output_col1 from (select distinct col1 from #tb1) a

How to select N columns from a PowerBI table?

I have the following table with three columns A, B, and C on PowerBI, from which I want to select columns A and B to create a new table.
And looking from the equivalent in Pandas to:
table_2 = table1[["A","B"]]
or from SQL:
SELECT A,B FROM table1;
But I'm not finding the equivalent function on Power BI; the SELECTCOLUMNS function is used to create a new column as stated in the docs:
Adds calculated columns to the given table or table expression.
The SELECTCOLUMNS function works fine for this. It allows you to create more complex calculated columns, but you can simply use the column itself as the calculation definition.
This should do the trick:
SELECTCOLUMNS(table1, "A", table1[A], "B", table1[B])

Pivot table: How to show value instead of calculation

I have input data as below
Who|Day|Eat
Papa|Monday|Milk
Mama|Monday|Water
Papa|Friday|Bread
I want to get summarize table as below, I'm trying pivot table but it's not show the values, it's show count of it.
Name|Monday|Friday|GrandTotal
Mama|Water||1
Papa|Milk|Bread|2
I don't think you'll be able to get the exact output as you'd like it using a pivot table. The two options would be:
Using a pivot table, but changing the format: Place the "who" and "eat" as row fields, and "Day" as the columns field. Then put either "who" or "eat" as the values field.
Manually building the pivot table you would like: You could manually build a pivot table using vlookup or index(match()). I would do this by creating a lookup variable (Who&Day ==> e.g. PapaMonday, PapaFriday), then doing the appropriate lookup. Let's say this lookup variable is setup before the data table (so in column A, and Who is column B, etc.). It would look something like:
Column A|Column B|Column C|Column D
Name|Monday|Friday|Grand Total
Mama | vlookup(A2&B1,sheet1!A:D,4,0) or index(sheet1!D:D,match(A2&B1,sheet1!A:A,0)) | vlookup(A2&C1,sheet1!A:D,4,0) or index(sheet1!D:D,match(A2&C1,sheet1!A:A,0)) | {Number of days}-countblank(B2:B3)
and so on and so forth. This is a pretty brute force method of doing it however.

Select a subset of a table for further processing

In Microsoft Excel, how can I compute a range (portion of a column), based on the values in another column of the same table, returning the result in an Array form for further processing by other functions?
In SQL, what I mean is "SELECT field1 FROM table WHERE field2=value".
The selected results will be fed (twice) to FREQUENCY(), to compute the number of distinct entries in "field1". That is: given an existing table like this:
Box Date
1 07/01/12
13 07/01/12
13 07/01/12
27 07/18/12
13 07/18/12
55 07/18/12
I want to produce a resulting table like this:
Boxes Date
2 07/01/12
3 07/18/12
Note that "13" is only counted once in the first date ("distinct"), but it's still counted again in the second date.
I already have an expression that does the right thing for the whole of the table,
=SUM(N(FREQUENCY(Box,Box)>0))
where "Box" is a named range of the first table, consisting of the whole Box column. (Using the same range/array/list as the data and the bins for FREQUENCY is a stupefyingly subtle trick actually contained in the Excel help but -- alas! -- by no means adequately explained.)
But I want (several) subsets, one for each date. I want to expand my "SUM(N(FREQUENCY…" expression to act only on the rows of the first table whose Date column matches the Date column of the row being computed. That is, again resorting to SQL,
SELECT count(DISTINCT t1.Box), t2.Date
FROM `t1` JOIN `t2` ON (Date)
GROUP BY Date
I can even build a pivot table of the interesting values (which gets me counts in its cells), then use a parallel, date-indexed column of
=COUNTIF(…)
to reduce each row of counts down to a single count of uniques for that date. But this requires me to update the pivot table to notice new data in the base table, and then to drag-expand the column of answers to include the new date (or suffer ugly value error markers). So something more automatic, less fussily manual, would be sweet.
I guess not available when you asked, but Excel 2013 has Distinct Count as an option in a PivotTable:

Resources