How to efficiently pivot data with coalesce filtering - pivot

We have a dataset we want to pivot. There are basically 60 columns for the resulting pivot column set but the catch is the data has a type. If PlanID 1 doesn't have a value then we want to use PlanID 2.
DECLARE #testTable TABLE
(
ID INT PRIMARY KEY IDENTITY(1,1),
[ColID] INT NOT NULL,
[PropertyName] NVARCHAR(50) NOT NULL,
[Value] NVARCHAR(MAX) NOT NULL,
[PlanID] INT NOT NULL
)
DECLARE #parentTable TABLE
(
[ColID] INT PRIMARY KEY
)
INSERT INTO #parentTable ([ColID])
select 200
union all
select 300
union all
select 400
INSERT INTO #testTable ([ColID], [PropertyName], [Value], [PlanID])
select 200, 'Prop1', 343, 1
union all
select 200, 'Prop1', 444, 2
union all
select 200, 'Prop2', 555, 2
union all
select 300, 'Prop2', 111, 2
select parent.[ColID],
COALESCE(VT_A.[Prop1],VT_F.[Prop1]) AS "Prop1",
COALESCE(VT_A.[Prop2],VT_F.[Prop2]) AS "Prop2"
from
(
select [ColID] from #parentTable
) parent
left join
(
select ColID,[Prop1],[Prop2]
from
(
select ColID, PropertyName, [Value]
FROM #testTable
WHERE PlanID = 1
) as sourcetable
pivot
(
min([Value]) for PropertyName in ([Prop1],[Prop2])
) as pivottable
) VT_A
on VT_A.ColID = parent.ColID
left join
(
select ColID,[Prop1],[Prop2]
from
(
select ColID, PropertyName, [Value]
FROM #testTable
WHERE PlanID = 2
) as sourcetable
pivot
(
min([Value]) for PropertyName in ([Prop1],[Prop2])
) as pivottable
) VT_F
on VT_F.ColID = parent.ColID
So we're code-generating this as a view with 60 columns, but the data table has probably 300,000 rows. The view performance is poor. It would be filtered by a range of [ColID]'s but nevertheless even filtered to 30,000 records it is performing poorly.
Is there a better way to structure such a query?

Related

Rename a file column name in Azure Synapse

I have a file with poor naming convention I would like to clean up before using in Azure Synapse. Is it possible to rename the column in the with block?
SELECT TOP 10 *
FROM OPENROWSET(
BULK 'path_to_file.csv'
FORMAT = 'CSV'
PARSER_VERSION = '2.0'
FIRSTROW = 2)
WITH (
[ORDER ID] varchar(50)
) as rows
I could use an alias in the select but was hoping to clean it up before that.
SELECT [ORDER ID] as order_id
And I could wrap this in a view - just was hoping there's a way to rename earlier.
Yes, it is possible to rename columns in WITH block; the name you provide there will override the column name read from the file (even with HEADER_ROW set to TRUE.
There's a caveat though. You will have to either provide names for all of your columns:
SELECT TOP 10 *
FROM OPENROWSET
(BULK 'path_to_file.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = true)
WITH
(
your_column_name_1 varchar(50)
...
your_column_name_N varchar(50)
)
AS rows
...or pick the ones you want to keep and/or rename using their ordinal number:
SELECT TOP 10 *
FROM OPENROWSET
(BULK 'path_to_file.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = true)
WITH
(
your_column_name_1 varchar(50) 1
your_column_name_4 varchar(50) 4
)
AS rows
You can also override the names with a subquery / derived table, eg
SELECT *
FROM (
SELECT TOP 100 *
FROM OPENROWSET (
BULK 'some path',
FORMAT = 'CSV',
PARSER_VERSION ='2.0',
FIRSTROW = 2
) AS [result]
) x ( col1, col2 )
This is more compact than the WITH clause I think, where you have to specify all columns, all data-types and all ordinals as far as I can tell. Unfortunately it won't let you put the column list after the [result] alias.

Insert new rows, continue existing rowset row_number count

I'm attempting to perform some sort of upsert operation in U-SQL where I pull data every day from a file, and compare it with yesterdays data which is stored in a table in Data Lake Storage.
I have created an ID column in the table in DL using row_number(), and it is this "counter" I wish to continue when appending new rows to the old dataset. E.g.
Last inserted row in DL table could look like this:
ID | Column1 | Column2
---+------------+---------
10 | SomeValue | 1
I want the next rows to have the following ascending ids
11 | SomeValue | 1
12 | SomeValue | 1
How would I go about making sure that the next X rows continues the ID count incrementally such that the next rows each increases the ID column by 1 more than the last?
You could use ROW_NUMBER then add it to the the max value from the original table (ie using CROSS JOIN and MAX). A simple demo of the technique:
DECLARE #outputFile string = #"\output\output.csv";
#originalInput =
SELECT *
FROM ( VALUES
( 10, "SomeValue 1", 1 )
) AS x ( id, column1, column2 );
#newInput =
SELECT *
FROM ( VALUES
( "SomeValue 2", 2 ),
( "SomeValue 3", 3 )
) AS x ( column1, column2 );
#output =
SELECT id, column1, column2
FROM #originalInput
UNION ALL
SELECT (int)(x.id + ROW_NUMBER() OVER()) AS id, column1, column2
FROM #newInput
CROSS JOIN ( SELECT MAX(id) AS id FROM #originalInput ) AS x;
OUTPUT #output
TO #outputFile
USING Outputters.Csv(outputHeader:true);
My results:
You will have to be careful if the original table is empty and add some additional conditions / null checks but I'll leave that up to you.

SubSelect MDX Query as filtered list of main query

SubSelect MDX Query as filtered list of main query
Hi all
I want to write MDX query like to SQL:
select a, b, sum(x)
from table1
where b = "True" and a in (select distinct c from table2 where c is not null and d="True")
group by a,b
I try something like this:
`Hi all
I want to write MDX query like to SQL:
select a, b, sum(x)
from table1
where b = "True" and a in (select distinct c from table2 where c is not null and d="True")
group by a,b
I try something like this:
SELECT
NON EMPTY { [Measures].[X] } ON COLUMNS,
NON EMPTY { [A].[Name].[Name]
*[B].[Name].[Name].&[True]
} ON ROWS
FROM
(
SELECT
{ ([A].[Name].[Name] ) } ON 0
FROM
( SELECT (
{EXCEPT([C].[Name].ALLMEMBERS, [C].[Name].[ALL].UNKNOWNMEMBER) }) ON COLUMNS
FROM
( SELECT (
{ [D].[Name].&[True] } ) ON COLUMNS
FROM [CUBE]))
)
But it returns me the sum of x from subquery.
How it should look like? '
Does X's measure group have relationship with D dimension? If it's true, the following code must just work:
Select
[Measures].[X] on 0,
Non Empty [A].[Name].[Name].Members * [B].[Name].&[True] on 1
From [CUBE]
Where ([D].[Name].&[True])
If you have many-to-many relationship, you need an extra measure (say Y):
Select
[Measures].[X] on 0,
Non Empty NonEmpty([A].[Name].[Name].Members,[Measures].[Y]) * [B].[Name].&[True] on 1
From [CUBE]
Where ([D].[Name].&[True])

Unpivot and Pivot does not return data

I'm trying to return data as columns.
I've written this unpivot and pivot query:
`select StockItemCode, barcode, barcode2 from (select StockItemCode, col+cast(seq as varchar(20)) col, value from (
select
(select min(StockItemCode)
from RTLBarCode t2
where t.StockItemCode = t2.StockItemCode) StockItemCode,
cast(BarCode as varchar(20)) barcode,
row_number() over(partition by StockItemCode order by StockItemCode) seq
from RTLBarCode t) d unpivot(
value
for col in (barcode) ) unpiv) src pivot ( max(value) for col in (barcode, barcode2)) piv;`
But the problem is only the "Barcode2" field are returning a value (the barcode field returns a null when in fact there is a value.
SAMPLE DATA
I have a Table called RTLBarCode
It has a field called Barcode and a field called StockItemCode
For StockItemCode = 10 I have 2 rows with a Barcode value of 5014721112824 and 0000000019149.
Can anyone see where I am going wrong?
Many thanks
You are indexing your barcode in unpiv.
This results in col's-values barcode1 and barcode2.
But then you are pivoting on barcode instead of barcode1. No value is found and the aggregate returns null.
The correct statement would be:
select StockItemCode, barcode1, barcode2 from
(
select StockItemCode, col+cast(seq as varchar(20)) col, value
from
(
select
(select min(StockItemCode)from RTLBarCode t2 where t.StockItemCode = t2.StockItemCode) StockItemCode,
cast(BarCode as varchar(20)) barcode,
row_number() over(partition by StockItemCode order by StockItemCode) seq
from RTLBarCode t
) d
unpivot(value for col in (barcode)) unpiv
) src
pivot (max(value) for col in (barcode1, barcode2)) piv

How to search for a table with an undefined column

In Cassandra, you can add and use a new column to a table like this:
cqlsh:mysite> CREATE TABLE mytable (
url timeuuid,
PRIMARY KEY (url)
);
cqlsh:mysite> ALTER TABLE mytable ADD tag_tagX text;
cqlsh:mysite> INSERT INTO mytable ( url ) VALUES ( now() );
cqlsh:mysite> SELECT * from mytable;
url | tag_tagx
--------------------------------------+----------
ad47de80-8a2c-11e4-8ab4-eb66c236961e | null
(1 rows)
cqlsh:mysite> CREATE INDEX ON mytable(tag_tagX);
cqlsh:mysite> SELECT * FROM mytable WHERE tag_tagX = null;
code=2200 [Invalid query] message="Unsupported null value for indexed column tag_tagx"
Since Cassandra allows INSERTs of rows without specifying some columns, how can we SELECT rows that do not has this column USED ?
You cannot select for values of null in Cassandra.

Resources