What does that error mean in Azure Data factory? - azure

What I want to do is, I have 6 tables in Azure Warehouse and I have 2 tables in Azure database and I want to execute a pipeline which will copy data from one table SalesLT.ProductCategory which is at Warehouse to dbo.DimProductCategory which is at Database.
And I am getting the error like this.
{ "errorCode": "2200", "message": "ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed. Please search error to get more details.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=Violation of PRIMARY KEY constraint 'PK__DimProdu__3224ECEE2FD4E7AD'. Cannot insert duplicate key in object 'dbo.DimProductCategory'. The duplicate key value is (1).\r\nThe statement has been terminated.,Source=.Net SqlClient Data Provider,SqlErrorNumber=2627,Class=14,ErrorCode=-2146232060,State=1,Errors=[{Class=14,Number=2627,State=1,Message=Violation of PRIMARY KEY constraint 'PK__DimProdu__3224ECEE2FD4E7AD'. Cannot insert duplicate key in object 'dbo.DimProductCategory'. The duplicate key value is (1).,},{Class=0,Number=3621,State=0,Message=The statement has been terminated.,},],'", "failureType": "UserError", "target": "productcategory", "details": [] }
Here is the structure of the DimProductCategory Table..
create table DimProductCategory
(
ProductCategoryID int not null primary key,
name nvarchar(max)
)
I have tried deleting primary key from dimProductCategory table but still got the same error.

Related

SELECT comment FROM comments_by_video WHERE uuid = 'with id 357c33b4-9054-a5e1- 8da8-d9e38294fac1';

I create table with
CREATE TABLE comments_by_video (
videoid uuid,
userid uuid,
comment text,
PRIMARY KEY(videoid, commentid));
and copy the table.
I excuted this query below
SELECT comment FROM comments_by_video WHERE userid = 'with id 357c33b4-9054-a5e1- 8da8-d9e38294fac1';
and got this error.
InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid STRING constant (with id 357c33b4-9054-a5e1- 8da8-d9e38294fac1) for "userid" of type uuid"
PLEASE HELP!
First error - UUIDs are written as-is, without quotes: 357c33b4-9054-a5e1- 8da8-d9e38294fac1
Second error - you're having condition on the field that isn't a partition key - this will require full table scan and won't work at scale. In Cassandra table structure is modeled around queries, so you'll need to have a table with partition key for userid
I recommend to read first chapters of this free book to understand how Cassandra works.

Azure Data Factory Copy Data using XML Source

Lets assume I have a simple XML file source which I've mapped to a corresponding sink in my SQL server database.
<Date Date="2020-03-13Z">
<Identification>
<Identifier>Maverick</Identifier>
</Identification>
<Pilot HomeAirport="New York">
<AirportICAOCode>USA</AirportICAOCode>
</Pilot>
</Date>
And then the schema
CREATE TABLE pilots
identifier VARCHAR(20),
ICAO_code VARCHAR(3)
)
I created a stored procedure in my sql server database that takes an input of the user-defined table type pilots_type which corresponds to the above schema to merge my data correctly.
But the pipeline fails when run with the error:
{
"errorCode": "2200",
"message": "ErrorCode=UserErrorInvalidPluginType,'Type=Microsoft.DataTransfer.Common.Shared.PluginNotRegisteredException,Message=Invalid type 'XmlFormat' is provided in 'format'. Please correct the type in payload and retry.,Source=Microsoft.DataTransfer.ClientLibrary,'",
"failureType": "UserError",
"target": "Sink XML",
"details": []
}
See image
Here the source is a blob that contains the XML.
Is XML not supported as a source after all?
XML is supported as a source.
I've made a same test according to your sample xml file and sql table successfully.
I created a Table Type named ct_pilot_type:
CREATE TYPE ct_pilot_type AS TABLE(
identifier nvarchar(MAX),
ICAO_code nvarchar(MAX)
)
I created the stored procedure named spUpsertPolit:
CREATE PROCEDURE spUpsertPolit
#polit ct_pilot_type READONLY
AS
BEGIN
MERGE [dbo].[pilot_airports] AS target_sqldb
USING #polit AS source_tblstg
ON (target_sqldb.identifier = source_tblstg.identifier)
WHEN MATCHED THEN
UPDATE SET
identifier = source_tblstg.identifier,
ICAO_code = source_tblstg.ICAO_code
WHEN NOT MATCHED THEN
INSERT (
identifier,
ICAO_code
)
VALUES (
source_tblstg.identifier,
source_tblstg.ICAO_code
);
END
I set the sink in the Copy activity:
I set the mapping:
It cpoied successfully:
The result shows:

Data Factory V2 Query Azure Table Storage but use a lookup Value

I have a SQL watermark table which contains the last date in my destination table
My source data is coming from an Azure Storage Table and the date time is a string
I set up the date time in the watermark table to match the format in the Azure table storage
I create a lookup and a copy task
If I hard code the date into the Query for source and run this works fine CreatedAt ge '2019-03-06T14:03:11.000Z'
But obviously I dont want to hard code this value. I want to use the date from the lookup
But when I replace the hardcoded date with the lookup value
CreatedAt ge 'activity('LookupWatermarkOld').output'
I get an error
{
"errorCode": "2200",
"message":"ErrorCode=FailedStorageOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
storage operation failed with the following error 'The remote server returned an error: (400) Bad Request.'.,Source=,
''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (400) Bad Request.,
Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=Syntax
error at position 42 in 'CreatedAt ge 'activity('LookupWatermarkOld').output''.\nRequestId:8c65ced9-b002-0051-79d9-d41d49000000\nTime:2019-03-07T11:35:39.0640233Z,,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,'",
"failureType": "UserError",
"target": "CopyMentions"
}
Can anyone help me with this? How do you use the Lookup value in a Azure Table query?
check this out:
1) Lookup activity. Query field:
SELECT MAX(WatermarkColumnName) as LastId FROM TableName;
Also, make sure that you checked "First row only" option.
2) In Copy Data activity use query. Query field:
#concat('SELECT * FROM TableName as s WHERE s.WatermarkColumnName > ''', activity('LookupActivity').output.firstRow.LastID, '''')
Finally I got some help on this and it works with
CreatedAt gt '#{activity('LookupWatermarkOld').output.firstRow.WaterMarkValue}'
the WaterarkValue is the column name from the SQL Lookup table
The Lookup creates an array so you have to specify the FirstRow from this array
And wrap in '' so its used as a string value
--For recent ADFv2
Use the watermark/lookup/output value in parameter.
Example: ParamUserCount = #{activity('LookupActivity').output.count}
or for output function
and you can use it in query as
Example: "select * from userDetails where usercount = {$ParamUserCount}"
make sure you enclose the query in " " to set as string and parameter in query should be enclosed in { }

Unable to query on Partition key in DyanmoDB by boto3

I have one table TestTable and partition Key TestColumn.
Inputs Dates:
from_date= "2017-04-20T16:31:54.451071+00:00"
to_date = "2018-04-20T16:31:54.451071+00:00"
when I use equal query the date then it is working.
key_expr = Key('TestColumn').eq(to_date)
query_resp = table.query(KeyConditionExpression=key_expr)
but when I use between query then is not working.
key_expr = Key('TestColumn').between(from_date, to_date)
query_resp = table.query(KeyConditionExpression=key_expr)
Error:
Unknown err_msg while querying dynamodb: An error occurred (ValidationException) when calling the Query operation: Query key condition not supported
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
DynamoDB Query will return data from one and only one partition, meaning you have to supply a single partition key in the request.
KeyConditionExpression
The condition that specifies the key value(s)
for items to be retrieved by the Query action.
The condition must perform an equality test on a single partition key
value.
You can optionally use a BETWEEN operator on a sort key (but you still have to supply a single partition key).
If you use a Scan you can use an ExpressionFilter and use the BETWEEN operator on TestColumn

Bad distributed join plan: result table shard keys do not match

We are very new to memsql/mysql and we are trying to play around with a memsql installation.
It is installed on a CentOS7 virtual machine and we are running version 5.1.0 of MemSQL.
We are receiving the error from one of the queries we are attempting:
ERROR 1889 (HY000): Bad distributed join plan: result table shard keys do not match. Please contact MemSQL support at support#memsql.com.
On one of our queries
We have two tables:
CREATE TABLE `MyObjects` (
`Id` INT NOT NULL AUTO_INCREMENT,
`Name` VARCHAR(128) NOT NULL,
`Description` VARCHAR(256) NULL,
`Boolean` BIT NOT NULL,
`Int8` TINYINT NOT NULL,
`Int16` SMALLINT NOT NULL,
`Int32` MEDIUMINT NOT NULL,
`Int64` INT NOT NULL,
`Float` DOUBLE NOT NULL,
`DateCreated` TIMESTAMP NOT NULL,
SHARD KEY (`Id`),
PRIMARY KEY (`Id`)
);
CREATE TABLE `MyObjectDetails` (
`MyObjectId` INT,
`Int32` MEDIUMINT NOT NULL,
SHARD KEY (`MyObjectId`),
INDEX (`MyObjectId`)
);
And here is the SQL we are executing and getting the error.
memsql> SELECT mo.`Id`,mo.`Name`,mo.`Description`,mo.`Boolean`,mo.`Int8`,mo.`Int16`,
mo.`Int32`,mo.`Int64`,mo.`Float`,mo.`DateCreated`,mods.`MyObjectId`,
mods.`Int32` FROM
( SELECT
mo.`Id`,mo.`Name`,mo.`Description`,mo.`Boolean`,mo.`Int8`,
mo.`Int16`,mo.`Int32`,mo.`Int64`,mo.`Float`,mo.`DateCreated`
FROM `MyObjects` mo LIMIT 10 ) AS mo
LEFT JOIN `MyObjectDetails` mods ON mo.`Id` = mods.`MyObjectId` ORDER BY `Name` DESC;
ERROR 1889 (HY000): Bad distributed join plan: result table shard keys do not match. Please contact MemSQL support at support#memsql.com.
Does anyone know why we are receiving this error, and if there is a possible change we can make to help alleviate this issue?
The one thing we do know is it has something to do with the inner select as if I pull it out and do the join it works, however we only get 10 total rows from the join. What we are attempting is getting the top 10 from the main table and include all of the details from the right.
We also tried changing the MyObjectDetails table to have an empty SHARD KEY, but that resulted in the same error.
SHARD KEY()
We also added an auto-incrementing Id column to the details table and put the shard on that column, and yet still received the same error.
Thanks in advance for any help.
UPDATE:
I contacted MemSQL through email (huge props to their customer service by the way -- very fast response time, less than a couple hours)
But from what Mike stated I changed the table to be a REFERENCE table and removed the SHARD KEY part of the create table statement. Once I did this, I was able to run the queries. I am not 100% sure on what ramifications this will have but it fixed my issue at hand. Thanks
CREATE REFERENCE TABLE `MyObjects` (
`Id` INT NOT NULL AUTO_INCREMENT,
`Name` VARCHAR(128) NOT NULL,
`Description` VARCHAR(256) NULL,
`Boolean` BIT NOT NULL,
`Int8` TINYINT NOT NULL,
`Int16` SMALLINT NOT NULL,
`Int32` MEDIUMINT NOT NULL,
`Int64` INT NOT NULL,
`Float` DOUBLE NOT NULL,
`DateCreated` TIMESTAMP NOT NULL,
PRIMARY KEY (`Id`)
);
Thanks to Mike Gallegos for looking into this, adding a summary of his answer here:
The error message here is bad, but the reason for the error is that MemSQL does not currently support a distributed left join where the left side (the Limit subquery in this case) has a LIMIT operator. If you cannot rewrite the query to do the limit after the join, then you could change the MyObjects table to a reference table to work around the issue.

Resources