Possible to interleave a new table into a secondary index table? - google-cloud-spanner

I'm gonna guess no, but secondary indexes seem a lot like tables in that you can directly select from them FORCE_INDEX and even JOIN on them:
JOIN MyTable#{FORCE_INDEX=anIndexToUseFromMyTable} AS myTable
So maybe you can create a new table interleaved into an index?
Example
CREATE TABLE Foo (
primaryId STRING(64) NOT NULL,
secondaryId STRING(64) NOT NULL,
modifiedAt TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true),
) PRIMARY KEY (primaryId);
-- Index we would like to interleave into for another table
CREATE INDEX FooSecondaryIdIndex ON Foo(secondaryId);
-- interleave this table into the index above
-- and support DELETE CASCADE
CREATE TABLE Bar (
secondaryId STRING(64) NOT NULL,
extraData STRING(64) NOT NULL,
modifiedAt TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true),
) PRIMARY KEY (secondaryId),
INTERLEAVE IN PARENT Foo#{FORCE_INDEX=FooSecondaryIdIndex} ON DELETE CASCADE;

Well... it doesn’t look like that is supported:
Error parsing Spanner DDL statement: CREATE TABLE Bar ( secondaryId STRING(64) NOT NULL, extraData STRING(64) NOT NULL, modifiedAt TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true), ) PRIMARY KEY (secondaryId), INTERLEAVE IN PARENT Foo#{FORCE_INDEX=FooSecondaryIdIndex} ON DELETE CASCADE : Syntax error on line 6, column 25: Expecting 'EOF' but found '#'

Related

Postgres: complex foreign key constraints

I have this schema
CREATE TABLE public.item (
itemid integer NOT NULL,
itemcode character(100) NOT NULL,
itemname character(100) NOT NULL,
constraint PK_ITEM primary key (ItemID)
);
create unique index ak_itemcode on Item(ItemCode);
CREATE TABLE public.store (
storeid character(20) NOT NULL,
storename character(80) NOT NULL,
constraint PK_STORE primary key (StoreID)
);
CREATE TABLE public.storeitem (
storeitemid integer NOT NULL,
itemid integer NOT NULL,
storeid character(20) NOT NULL,
constraint PK_STOREITEM primary key (ItemID, StoreID),
foreign key (StoreID) references Store(StoreID),
foreign key (ItemID) references Item(ItemID)
);
create unique index ak_storeitemid on StoreItem (StoreItemID);
And here is the data on those tables
insert into Item (ItemID, ItemCode,ItemName)
Values (1,'abc','abc');
insert into Item (ItemID, ItemCode,ItemName)
Values (2,'def','def');
insert into Item (ItemID, ItemCode,ItemName)
Values (3,'ghi','ghi');
insert into Item (ItemID, ItemCode,ItemName)
Values (4,'lmno','lmno');
insert into Item (ItemID, ItemCode,ItemName)
Values (5,'xyz','xyz');
insert into Store (StoreID, StoreName)
Values ('B1','B1');
insert into StoreItem (StoreItemID, StoreID, ItemID)
Values (1,'B1',1);
insert into StoreItem (StoreItemID, StoreID, ItemID)
Values (2,'B1',2);
insert into StoreItem (StoreItemID, StoreID, ItemID)
Values (3,'B1',3);
Now I created this new table
CREATE TABLE public.szdata (
storeid character(20) NOT NULL,
itemcode character(100) NOT NULL,
textdata character(20) NOT NULL,
constraint PK_SZDATA primary key (ItemCode, StoreID)
);
I want to have the foreign key constraints set so that it will fail when you try to insert record which is not in StoreItem. For example this must fail
insert into SZData (StoreID, ItemCode, TextData)
Values ('B1', 'xyz', 'text123');
and this must pass
insert into SZData (StoreID, ItemCode, TextData)
Values ('B1', 'abc', 'text123');
How do I achieve this without complex triggers but using table constraints?
I prefer solution without triggers. SZData table is just for accepting input from external world and it is for single purpose.
Also database import export must not be impacted
I figured out having a function to execute on constraint will solve this issue.
The function is_storeitem does the validation. I believe this feature can be used for even complex validations
create or replace function is_storeitem(pItemcode nchar(40), pStoreId nchar(20)) returns boolean as $$
select exists (
select 1
from public.storeitem si, public.item i, public.store s
where si.itemid = i.itemid and i.itemcode = pItemcode and s.Storeid = pStoreId and s.storeid = si.storeid
);
$$ language sql;
create table SZData
(
StoreID NCHAR(20) not null,
ItemCode NCHAR(100) not null,
TextData NCHAR(20) not null,
constraint PK_SIDATA primary key (ItemCode, StoreID),
foreign key (StoreID) references Store(StoreID),
foreign key (ItemCode) references Item(ItemCode),
CONSTRAINT ck_szdata_itemcode CHECK (is_storeitem(Itemcode,StoreID))
);
This perfectly works with postgres 9.6 or greater.

Azure Data Flow creating / managing keys for identity relationships

Curious to find out what the best way is to generate relationship identities through ADF.
Right now, I'm consuming JSON data that does not have any identity information. This data is then transformed into multiple database sink tables with relationships (1..n, etc.). Due to FK constraints on some of the destination sink tables, these relationships need to be "built up" one at a time.
This approach seems a bit kludgy, so I'm looking to see if there are other options that I'm not aware of.
Note that I need to include the Surrogate key generation for each insert. If I do not do this, based on output database schema, I'll get a 'cannot insert PK null' error.
Also note that I turn IDENTITY_INSERT ON/OFF for each sink.
I would tend to take more of an ELT approach and use the native JSON abilites in Azure SQL DB, ie OPENJSON. You could land the JSON in a table in Azure SQL DB using ADF (eg a Stored Proc activity) and then call another stored proc to process the JSON, something like this:
-- Setup
DROP TABLE IF EXISTS #tmp
DROP TABLE IF EXISTS import.City;
DROP TABLE IF EXISTS import.Region;
DROP TABLE IF EXISTS import.Country;
GO
DROP SCHEMA IF EXISTS import
GO
CREATE SCHEMA import
CREATE TABLE Country ( CountryKey INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL UNIQUE )
CREATE TABLE Region ( RegionKey INT IDENTITY PRIMARY KEY, CountryKey INT NOT NULL FOREIGN KEY REFERENCES import.Country, RegionName VARCHAR(50) NOT NULL UNIQUE )
CREATE TABLE City ( CityKey INT IDENTITY(100,1) PRIMARY KEY, RegionKey INT NOT NULL FOREIGN KEY REFERENCES import.Region, CityName VARCHAR(50) NOT NULL UNIQUE )
GO
DECLARE #json NVARCHAR(MAX) = '{
"Cities": [
{
"Country": "England",
"Region": "Greater London",
"City": "London"
},
{
"Country": "England",
"Region": "West Midlands",
"City": "Birmingham"
},
{
"Country": "England",
"Region": "Greater Manchester",
"City": "Manchester"
},
{
"Country": "Scotland",
"Region": "Lothian",
"City": "Edinburgh"
}
]
}'
SELECT *
INTO #tmp
FROM OPENJSON( #json, '$.Cities' )
WITH
(
Country VARCHAR(50),
Region VARCHAR(50),
City VARCHAR(50)
)
GO
-- Add the Country first (has no foreign keys)
INSERT INTO import.Country ( CountryName )
SELECT DISTINCT Country
FROM #tmp s
WHERE NOT EXISTS ( SELECT * FROM import.Country t WHERE s.Country = t.CountryName )
-- Add the Region next including Country FK
INSERT INTO import.Region ( CountryKey, RegionName )
SELECT t.CountryKey, s.Region
FROM #tmp s
INNER JOIN import.Country t ON s.Country = t.CountryName
-- Now add the City with FKs
INSERT INTO import.City ( RegionKey, CityName )
SELECT r.RegionKey, s.City
FROM #tmp s
INNER JOIN import.Country c ON s.Country = c.CountryName
INNER JOIN import.Region r ON s.Region = r.RegionName
AND c.CountryKey = r.CountryKey
SELECT * FROM import.City;
SELECT * FROM import.Region;
SELECT * FROM import.Country;
This is a simple test script designed to show the idea and should run end-to-end but it is not production code.

nested map in cassandra data modelling

I have following requirement of my dataset, need to unserstand what datatype should I use and how to save my data accordingly :-
CREATE TABLE events (
id text,
evntoverlap map<text, map<timestamp,int>>,
PRIMARY KEY (id)
)
evntoverlap = {
'Dig1': {{'2017-10-09 04:10:05', 0}},
'Dig2': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0}},
'Dig3': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0},{'2017-10-09 04:11:05', 0}}
}
This gives an error :-
Error from server: code=2200 [Invalid query] message="Non-frozen collections are not allowed inside collections: map<text, map<timestamp, int>>"
How should I store this type of data in single column . Please suggest datatype and insert command for the same.
Thanks,
There is limitation of Cassandra - you can't nest collection (or UDT) inside collection without making it frozen. So you need to "froze" one of the collections - either nested:
CREATE TABLE events (
id text,
evntoverlap map<text, frozen<map<timestamp,int>>>,
PRIMARY KEY (id)
);
or top-level:
CREATE TABLE events (
id text,
evntoverlap frozen<map<text, map<timestamp,int>>>,
PRIMARY KEY (id)
);
See documentation for more details.
CQL collections limited to 64kb, if putting things like maps in maps you might push that limit. Especially with frozen maps you are deserializing the entire map, modifying it, and re inserting. Might be better off with a
CREATE TABLE events (
id text,
evnt_key, text
value map<timestamp, int>,
PRIMARY KEY ((id), evnt_key)
)
Or even a
CREATE TABLE events (
id text,
evnt_key, text
evnt_time timestamp
value int,
PRIMARY KEY ((id), evnt_key, evnt_time)
)
It would be more efficient and safer while giving additional benefits like being able to order the event_time's in ascending or descending order.

Azure SQL db query external Azure SQL DB - PPDwManagedToNativeInteropException

I have one Azure SQL server where I have several databases. I need to be able to query across these databases, and have at the moment solves this through external tables. A challange with this solution is that external tables does not support all the same data types as ordinary tables.
According to the following article the solution to incompatible data types are to use other similiar ones in the external table.
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-data-types#unsupported-data-types.
DDL for table in DB1
CREATE TABLE [dbo].[ActivityList](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Registered] [datetime] NULL,
[RegisteredBy] [varchar](50) NULL,
[Name] [varchar](100) NULL,
[ak_beskrivelse] [ntext] NULL,
[ak_aktiv] [bit] NULL,
[ak_epost] [bit] NULL,
[Template] [text] NULL
CONSTRAINT [PK_ActivityList] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
DDL for external table in DB2
CREATE EXTERNAL TABLE [dbo].[NEMDBreplicaActivityList]
(
[ID] [int] NOT NULL,
[Registered] [datetime] NULL,
[RegisteredBy] [varchar](50) NULL,
[Name] [varchar](100) NULL,
[ak_beskrivelse] [nvarchar](4000) NULL,
[ak_aktiv] [bit] NULL,
[ak_epost] [bit] NULL,
[Template] [varchar](900) NULL
)
WITH (DATA_SOURCE = [DS],SCHEMA_NAME = N'dbo',OBJECT_NAME = N'ActivityList')
Querying the external table NEMDBreplicaActivityList produces the following error
Error retrieving data from
server.database.windows.net.db1. The
underlying error message received was:
'PdwManagedToNativeInteropException ErrorNumber: 46723, MajorCode:
467, MinorCode: 23, Severity: 16, State: 1, ErrorInfo: ak_beskrivelse,
Exception of type
'Microsoft.SqlServer.DataWarehouse.Tds.PdwManagedToNativeInteropException'
was thrown.'.
I have tried defining the ak_beskrivelse column as other external table legal datatypes, such as varchar, with the same result.
Sadly I'm not allowed to edit the data type of columns in the db1 table.
I assume that the error is related to the data type. Any ideas how to fix it?
I solved a similar problem to this by creating a view over the source table which cast the text value as varchar(max), then pointed the external table to the view.
So:
CREATE VIEW tmpView
AS
SELECT CAST([Value] AS VARCHAR(MAX))
FROM [Sourcetable].
Then:
CREATE EXTERNAL TABLE [dbo].[tmpView]
(
[Value] VARCHAR(MAX) NULL
)
WITH (DATA_SOURCE = [myDS],SCHEMA_NAME = N'dbo',OBJECT_NAME = N'tmpView')
Creating the view and casting the text value worked perfect for me 😊
Thank you!
Created view vw_TestReport:
SELECT CAST([Report Date] AS VARCHAR(MAX)) AS [Report Date]
FROM dbo.TestReport
And created external table from view:
CREATE EXTERNAL TABLE [dbo].[TestReport](
[Report Date] [varchar](max) NULL
)
WITH (DATA_SOURCE = [REFToDB],SCHEMA_NAME = N'dbo',OBJECT_NAME = N'vw_TestReport')

Bad distributed join plan: result table shard keys do not match

We are very new to memsql/mysql and we are trying to play around with a memsql installation.
It is installed on a CentOS7 virtual machine and we are running version 5.1.0 of MemSQL.
We are receiving the error from one of the queries we are attempting:
ERROR 1889 (HY000): Bad distributed join plan: result table shard keys do not match. Please contact MemSQL support at support#memsql.com.
On one of our queries
We have two tables:
CREATE TABLE `MyObjects` (
`Id` INT NOT NULL AUTO_INCREMENT,
`Name` VARCHAR(128) NOT NULL,
`Description` VARCHAR(256) NULL,
`Boolean` BIT NOT NULL,
`Int8` TINYINT NOT NULL,
`Int16` SMALLINT NOT NULL,
`Int32` MEDIUMINT NOT NULL,
`Int64` INT NOT NULL,
`Float` DOUBLE NOT NULL,
`DateCreated` TIMESTAMP NOT NULL,
SHARD KEY (`Id`),
PRIMARY KEY (`Id`)
);
CREATE TABLE `MyObjectDetails` (
`MyObjectId` INT,
`Int32` MEDIUMINT NOT NULL,
SHARD KEY (`MyObjectId`),
INDEX (`MyObjectId`)
);
And here is the SQL we are executing and getting the error.
memsql> SELECT mo.`Id`,mo.`Name`,mo.`Description`,mo.`Boolean`,mo.`Int8`,mo.`Int16`,
mo.`Int32`,mo.`Int64`,mo.`Float`,mo.`DateCreated`,mods.`MyObjectId`,
mods.`Int32` FROM
( SELECT
mo.`Id`,mo.`Name`,mo.`Description`,mo.`Boolean`,mo.`Int8`,
mo.`Int16`,mo.`Int32`,mo.`Int64`,mo.`Float`,mo.`DateCreated`
FROM `MyObjects` mo LIMIT 10 ) AS mo
LEFT JOIN `MyObjectDetails` mods ON mo.`Id` = mods.`MyObjectId` ORDER BY `Name` DESC;
ERROR 1889 (HY000): Bad distributed join plan: result table shard keys do not match. Please contact MemSQL support at support#memsql.com.
Does anyone know why we are receiving this error, and if there is a possible change we can make to help alleviate this issue?
The one thing we do know is it has something to do with the inner select as if I pull it out and do the join it works, however we only get 10 total rows from the join. What we are attempting is getting the top 10 from the main table and include all of the details from the right.
We also tried changing the MyObjectDetails table to have an empty SHARD KEY, but that resulted in the same error.
SHARD KEY()
We also added an auto-incrementing Id column to the details table and put the shard on that column, and yet still received the same error.
Thanks in advance for any help.
UPDATE:
I contacted MemSQL through email (huge props to their customer service by the way -- very fast response time, less than a couple hours)
But from what Mike stated I changed the table to be a REFERENCE table and removed the SHARD KEY part of the create table statement. Once I did this, I was able to run the queries. I am not 100% sure on what ramifications this will have but it fixed my issue at hand. Thanks
CREATE REFERENCE TABLE `MyObjects` (
`Id` INT NOT NULL AUTO_INCREMENT,
`Name` VARCHAR(128) NOT NULL,
`Description` VARCHAR(256) NULL,
`Boolean` BIT NOT NULL,
`Int8` TINYINT NOT NULL,
`Int16` SMALLINT NOT NULL,
`Int32` MEDIUMINT NOT NULL,
`Int64` INT NOT NULL,
`Float` DOUBLE NOT NULL,
`DateCreated` TIMESTAMP NOT NULL,
PRIMARY KEY (`Id`)
);
Thanks to Mike Gallegos for looking into this, adding a summary of his answer here:
The error message here is bad, but the reason for the error is that MemSQL does not currently support a distributed left join where the left side (the Limit subquery in this case) has a LIMIT operator. If you cannot rewrite the query to do the limit after the join, then you could change the MyObjects table to a reference table to work around the issue.

Resources