I see that the Consitency Level is changing to ALL, when there is a read timeout.
Used Cassandra session has next main properties:
Load balancing policy - DCAwareRoundRobinPolicy(localDc, usedHostPerRemoteDc = 3, allowRemoteDcForLocalConsistencyLevel = true)
Retry policy - DefaultRetryPolicy
Query options - QueryOptions with set consistency level to ConsistencyLevel.LOCAL_QUORUM
Below Exception is observed when the query was made.
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ALL (5 responses were required but only 4 replica responded)
at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:88) ~[cassandra-driver-core-3.1.4.jar!/:?]
at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:25) ~[cassandra-driver-core-3.1.4.jar!/:?]
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) ~[cassandra-driver-core-3.1.4.jar!/:?]
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245) ~[cassandra-driver-core-3.1.4.jar!/:?]
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68) ~[cassandra-driver-core-3.1.4.jar!/:?] ```
This could be an issue re-occurred in Cassandra 3.11.0, which is the version used currently.Also - this could be related to https://issues.apache.org/jira/browse/CASSANDRA-7868
https://issues.apache.org/jira/browse/CASSANDRA-7947
This might have re occured? Please provide your feedback on the same.
I am new to the Azure data factory and looking to copy CSV data into my table having a foreign key relationship. Here are my tables:
Customer table
CREATE TABLE [dbo].[Customer]
(
[Id] UNIQUEIDENTIFIER NOT NULL PRIMARY KEY, -- Primary Key column
[CustomerNumber] NVARCHAR(50) NOT NULL,
[FirstName] NVARCHAR(50) NOT NULL,
[LastName] NVARCHAR(50) NOT NULL,
[CreatedOn] datetime,
[CreatedBy] NVARCHAR(255),
[ModifiedOn] datetime,
[ModifiedBy] NVARCHAR(255)
);
GO
-- Insert rows into table 'Customer' in schema '[dbo]'
INSERT INTO [dbo].[Customer]
VALUES
(
NEWID(),'Tom123', 'Tom', 'Shehu',GETDATE(),'test',GETDATE(),'admin'
),
(
NEWID(),'Harol234', 'Harold', 'Haoxa',GETDATE(),'test',GETDATE(),'admin'
),
(
NEWID(),'Peter345', 'Peter', 'Begu',GETDATE(),'test',GETDATE(),'admin'
),
(
NEWID(),'Marlin09', 'Marlin', 'Hysi',GETDATE(),'test',GETDATE(),'admin'
)
GO
Product Table
CREATE TABLE [dbo].[Product]
(
[Id] UNIQUEIDENTIFIER NOT NULL PRIMARY KEY, -- Primary Key column
[Name] NVARCHAR(50) NOT NULL,
[ErpNumber] NVARCHAR(50) NOT NULL,
[Description] NVARCHAR(50) NOT NULL,
[CreatedOn] datetime,
[CreatedBy] NVARCHAR(255),
[ModifiedOn] datetime,
[ModifiedBy] NVARCHAR(255)
);
GO
-- Insert rows into table 'Product' in schema '[dbo]'
INSERT INTO [dbo].[Product]
VALUES
(
NEWID(), 'EI500CMZ', 'EI500CMZ','7-Day test product',GETDATE(),'Tom',GETDATE(),'Tom'
),
(
NEWID(), 'ST0SMX', 'ST0SMX','7-Day heavy duty product',GETDATE(),'Tom',GETDATE(),'Tom'
),
(
NEWID(), 'EH30MZ', 'EH30MZ','Electronic water test product',GETDATE(),'Tom',GETDATE(),'Tom'
)
CustomerProduct table
CREATE TABLE [dbo].[CustomerProduct]
(
[Id] UNIQUEIDENTIFIER NOT NULL PRIMARY KEY, -- Primary Key column
[CustomerId] UNIQUEIDENTIFIER NOT NULL,
[ProductId] UNIQUEIDENTIFIER NOT NULL,
[Name] NVARCHAR(255) NOT NULL,
[CreatedOn] datetime,
FOREIGN KEY(CustomerId) REFERENCES Customer(Id),
FOREIGN KEY(ProductId) REFERENCES Product(Id)
);
GO
Below is my CSV file data:
CustomerNumber,ErpNumber,Name
Tom123,EI500CMZ,EI500CMZ2340
Harol234,ST0SMX,ST0SMX74770
Peter345,EH30MZ,EH30MZ00234
Now I am looking to insert data into my 3rd table i.e CustomerProduct but I am not understanding how that "CustomerId", "ProductId" and "Name" will get inserted.
In the above CSV data, I am getting the "CustomerNumber" and "ErpNumber" but during the insertion "CustomerId" and "ProductId" should go in the table.
Not understanding how to do this.
So far I have done this in the Azure data factory:
Created a blob storage account. Added a container in blob storage and uploaded my CSV file.
Created a linked service of type Azure blob storage called "CustomerProductInputService" that will talk to blob storage
Created a linked service of type Azure SQL database called "CustomerProductOutputService" that will communicate with the "CustomerProduct" table.
Created a dataset of type azure blob. This will receive the data from "CustomerProductInputService".
Created a dataset of type azure SQL database.
Now I am stuck at copy activity. I am not understanding how to create pipeline for this scenario and insert data into the CustomerProduct table.
As I explained I am getting "CustomerNumber" and "ErpNumber" in the CSV file but I want to insert "CustomerId" and "ProductId" into my "CustomerProduct" table.
Can anybody help me?
You can Insert the CustomerProduct data from CSV to the table using dataflow activity using lookup transformations to get the CustomerID and ProductID from Customer and Product table respectively.
Source:
Add 3 source transformations in dataflow, 1 for CSV source file, 1 for Customer table, and 1 for Product table.
a) Source1 (CSV): Create and CSV dataset to source1 to get Input file data.
b) Source2 (CustomerTable): Connect to Customer table and get all the existing data from the Customer table.
• As we only need ID and CustomerNumber columns from the Customer table, add select transformation (Customer) after source2 to select only the required column list.
c) Source3 (ProductTable): Connect Source3 to Product Table to pull all the existing data from dbo.Product.
• Add Select transformation (Product) after Source3 to get only the required columns ID & ERPNumber from the column list.
Add Lookup transformation to Source1 (CSV) with Primary stream as CSV source and Lookup Stream as Customer (Source2 Select transformation) and Lookup Condition as CSV column “CustomerNumber” equals to (==) Customer table Column “CustomerNumber”.
As Lookup is like Left join here, it includes all columns from Source1 and Lookup columns from Source2 in the select list (which include duplicate columns).
a) So, using select transformation (CustomerSelectList) to select only the required columns in the Output. Also renaming the Column name “ID” which is pulled from Customer table to CustomerID to match with Sink table.
Add another Lookup transformation after Select (CustomerSelectList) to get the data from Product table.
a) Select Primary stream as CustomerSelectList (Select transformation) and Lookup stream as Product (Select of Source3)
b) With Lookup condition as CSV Source column “ErpNumber” equals to (==) Product table column “ErpNumber”.
Again, using the select transformation to ignore other columns and select only required columns from the select list. Also renaming column “ID” from Product table to ProductID.
Add Derived Column transformation to the select (CustomerProductSelectList) to add new columns ID and CreatedOn.
a) ID: as this is UNIQUEIDENTIFIER in the sink table, we can add an expression to generate the id using UUID() in the derived column.
b) CreatedOn: adding expression to get the **Current timestamp** to Sink table.
Finally add Sink transformation to Insert data onto CustomerProduct table.
Add this dataflow to a pipeline and run the pipeline to insert data.
Output:
First thing you would need to identify a key connection between customer and product. Next, create a pipeline from data factory and create 2 sources "Product" and "Customer" apply ADF transformation Join and Alter and sink it to dbo.CustomerProduct.
In spark SQL I am trying to join multiple table those are already in place.
How ever i need to use a function which will take user input and then get details from two other tables,then use this result in the join.
The query is some thing like below
select t1.col1,t1.col2,t2.col3,cast((t1.value * t3.value) from table1 t1
left join table2 t2 on t1.col = t2.col
left join fn_calculate (value1, value2) as t3 on t1.value = t3.value
here fn_calculate is the function which is taking value1,value2 as parameters and then that function returns a table of rows.(in SQL server that is returning Table)
I am trying to do this by using hive generic UDF which will take the input param and then return the dataframe? like below
public String evaluate(DeferredObject[] arguments) throws HiveException {
if (arguments.length != 1) {
return null;
}
if (arguments[0].get() == null) {
return null;
}
DataFrame dataFrame = sqlContext
.sql("select * from A where col1 = value and col2 =value2");
javaSparkContext.close();
return "dataFrame";
}
or the scala functions do i need use like below?
static class Z extends scala.runtime.AbstractFunction0<DataFrame> {
#Override`enter code here`
public DataFrame apply() {
// TODO Auto-generated method stub
return sqlContext.sql("select * from A where col1 = value and col2
=value2");
}
}
I encounter a problem of nullable column comparison.
If some columns are Option[T], I wonder how slick translate like === operation on these columns to sql.
There are two possibilities: null value( which is None in scala ) and non-null value.
In the case of null value, sql should use is instead of =.
However slick doesn't handle it correctly in the following case.
Here is the code (with H2 database):
object Test1 extends Controller {
case class User(id: Option[Int], first: String, last: String)
class Users(tag: Tag) extends Table[User](tag, "users") {
def id = column[Int]("id",O.Nullable)
def first = column[String]("first")
def last = column[String]("last")
def * = (id.?, first, last) <> (User.tupled, User.unapply)
}
val users = TableQuery[Users]
def find_u(u:User) = DB.db.withSession{ implicit session =>
users.filter( x=> x.id === u.id && x.first === u.first && x.last === u.last ).firstOption
}
def t1 = Action {
DB.db.withSession { implicit session =>
DB.createIfNotExists(users)
val u1 = User(None,"123","abc")
val u2 = User(Some(1232),"123","abc")
users += u1
val r1 = find_u(u1)
println(r1)
val r2 = find_u(u2)
println(r2)
}
Ok("good")
}
}
I print out the sql. It is following result for the first find_u.
[debug] s.s.j.J.statement - Preparing statement: select x2."id", x2."first", x2."last" from "users" x2 where (
(x2."id" = null) and (x2."first" = '123')) and (x2."last" = 'abc')
Notice that (x2."id" = null) is incorrect here. It should be (x2."id" is null).
Update:
Is it possible to only compare non-null fields in an automatic fashion? Ignore those null columns.
E.g. in the case of User(None,"123","abc"), only do where (x2."first" = '123')) and (x2."last" = 'abc')
Slick uses three-valued-logic. This shows when nullable columns are involved. In that regard it does not adhere to Scala semantics, but uses SQL semantics. So (x2."id" = null) is indeed correct under these design decisions. To to a strict NULL check use x.id.isEmpty. For strict comparison do
(if(u.id.isEmpty) x.id.isEmpty else (x.id === u.id))
Update:
To compare only when the user id is non-null use
(u.id.isEmpty || (x.id === u.id)) && ...
I need help with insert data from one table to another.
Table definitions are:
create table reg
(id int,
datum datetime,
status nvarchar(1)
)
create table gate
(sifra int,
mbr int,
datumin datetime,
datumout datetime
)
Table reg data are:
id datum status
46627 2014-05-22 12:55:02.000 I
46628 2014-05-22 18:55:02.000 O
49875 2014-08-11 18:55:02.000 O
49877 2014-09-11 18:55:02.000 I
49889 2014-09-03 18:50:02.000 O
I tryied something like this but it failed.
insert into gate values(
(select id from reg), (select id from reg),(select datum from reg where status = 'I'),
(select datum from reg where status = 'O'))
Any ideas how to manage this?
Your selects cannot return more than one record, or else you get this error. I dont know for sure what you are trying to do but if I were to guess, I think you want something like this:
INSERT INTO Gate(sifra,mbr,datumin,datumout)
SELECT ID,ID,CASE WHEN status = 'I' THEN datum ELSE NULL END, CASE WHEN status = 'O' THEN datum ELSE NULL END)