can clickhouse do subquery in remote server with remote function - subquery

db.table_1 created in server 10.1.4.160, table ddl as below
CREATE TABLE db.table_1
(
`column1` String,
`column2` String,
`value1` Int32,
`value2` Int32
)
ENGINE = MergeTree
ORDER BY (column1,
column2)
SETTINGS index_granularity = 8192
subquery using remote function in another server (exp: 10.1.4.159)
select column1, sum(value1) as value1_sum from (
select * from remote('10.1.4.160:9001', `db.table_1`) where value1 > 0
) group by column1 order by value1_sum
select from 10.1.4.160's system.query_log,
select query from system.query_log where type='QueryFinish' and has(databases, 'db') order by event_time desc
I find the query executed in remote is as below
SELECT `column1`, `value1` FROM `db`.`table_1` WHERE `value1` > 0
5 finally, my question is can I make the whole subquery execute in remote server, and how can I do that?
the whole suquery is
SELECT `column1`, sum(`value1`) AS `value1_sum` FROM `db`.`table_1` WHERE `value1` > 0 GROUP BY `column1` ORDER BY `value1_sum` ASC

Related

Getting error while running an sql script in ADW

Am getting an error that goes like this:
Insert values statement can contain only constant literal values or variable references.
these are the statements in which I am getting the errors:
INSERT INTO val.summary_numbers (metric_name, metric_val, dt_create) VALUES ('Total IP Enconters',
(SELECT
count(DISTINCT encounter_id)
FROM prod.encounter
WHERE encounter_type = 'Inpatient')
,
(SELECT min(mod_loadidentifier)
FROM ccsm.stg_demographics_baseline)
);
INSERT INTO val.summary_numbers (metric_name, metric_val, dt_create) VALUES ('Total 30d Readmits',
(SELECT
count(DISTINCT encounter_id)
FROM prod.encounter_attr
WHERE
attr_name = 'day_30_readmit' AND attr_value = 1)
,
(SELECT min(mod_loadidentifier)
FROM ccsm.stg_demographics_baseline));
Change your query like this:
insert into val.summary_numbers
select
'Total IP Enconters',
(select count(distinct encounter_id)
from prod.encounter
where encounter_type = 'Inpatient'),
(select min(mod_loadidentifier)
from ccsm.stg_demographics_baseline)
When using the ADW service, I would recommend that you consider using the CTAS operation possibly combined with a RENAME. The RENAME is a metadata operation so it is fast and the CTAS is parallel where the INSERT INTO will be row by row.
You may still have a data related issue that can be hard to determine with out the create table statement.
Thanks

Cassandra CQL alternative to OR in WHERE clause

Here's the code I used to create the table:
CREATE TABLE test.packages (
packageuuid timeuuid,
ruserid text,
suserid text,
timestamp int,
PRIMARY KEY (ruserid, suserid, packageuuid, timestamp)
);
and then I create a materialized view:
CREATE MATERIALIZED VIEW test.packages_by_userid
AS SELECT * FROM test.packages
WHERE ruserid IS NOT NULL
AND suserid IS NOT NULL
AND TIMESTAMP IS NOT NULL
AND packageuuid IS NOT NULL
PRIMARY KEY (ruserid, suserid, timestamp, packageuuid)
WITH CLUSTERING ORDER BY (packageuuid DESC);
I want to be able to search for packages sent between two IDs
so I would need something like this:
SELECT * FROM test.packages_by_userid WHERE (ruserid = '1' AND suserid = '2' AND suserid = '1' AND ruserid = '2') AND timestamp > 1496601553;
How would I accomplish something like this with CQL?
I've searched a bit but I can't figure it out.
I'm willing to change the structure of the table if it will make something like this possible.
If it's doable without a materialized view that would also be good.
Use In Clause:
SELECT * FROM test.packages_by_userid WHERE ruserid IN ( '1', '2') AND suserid IN ( '1','2') AND timestamp > 1496601553;
Note : Keep the in clause size smaller, Large in clause in the partition can cause GC pauses and heap pressure that leads to overall slower performance
In practical terms this means you’re waiting on this single coordinator node to give you a response, it’s keeping all those queries and their responses in the heap, and if one of those queries fails, or the coordinator fails, you have to retry the whole thing.
If the multiple partition in clause larger try to use separate query, for each partition (ruserid) with executeAsync.
SELECT * FROM test.packages_by_userid WHERE ruserid = '1' AND suserid IN ( '1','2') AND timestamp > 1496601553;
SELECT * FROM test.packages_by_userid WHERE ruserid = '2' AND suserid IN ( '1','2') AND timestamp > 1496601553;
Learn More : https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/
Since you always search for both sender and receiver, I'd model this with the following table layout:
CREATE TABLE test.packages (
ruserid text,
suserid text,
timestamp int,
packageuuid timeuuid,
PRIMARY KEY ((ruserid, suserid), timestamp)
);
In this way, for each pair of sender/receiver you need to run two queries, one for each partition:
SELECT * FROM packages WHERE ruserid=1 AND suserid=2 AND timestamp > 1496601553;
SELECT * FROM packages WHERE ruserid=2 AND suserid=1 AND timestamp > 1496601553;
This is IMHO the best solution because, remember, in Cassandra you start from your queries and build your table models on that, never the reverse.

Postgres sorting on timestamp works on mac but not linux

Using Postgres 9.4
I have a posts table which relates to a users table. I'm querying for two users and 3 of their most recent posts.
SELECT
"users"."id" AS "id",
"posts"."id" AS "posts__id",
"posts"."created_at" AS "posts__created_at"
FROM (
SELECT * FROM accounts
WHERE TRUE
ORDER BY "id" ASC
LIMIT 2
) AS "users"
LEFT JOIN LATERAL (
SELECT * FROM posts
WHERE "users".id = posts.author_id
ORDER BY "created_at" DESC, "id" DESC
LIMIT 3
) AS "posts" ON "users".id = "posts".author_id
On mac, the order is as expected.
"2016-04-17 18:49:15.942"
"2016-04-15 03:29:31.212"
"2016-04-13 15:07:15.119"
I get descending order on created_at, which is a timestamptz. However, when run on my travis build, which is Ubuntu, the ordering is stable, but neither ascending nor descending....
"2016-04-15 03:29:31.212"
"2016-04-13 15:07:15.119"
"2016-04-17 18:49:15.942"
I made user to create the databases with the same LC_COLLATE = en_US.UTF-8 with no luck. Why on earth isn't the ordering working on travis?
To solve this, just add the order by statement under your existing statements above.
i.e.
SELECT
"users"."id" AS "id",
"posts"."id" AS "posts__id",
"posts"."created_at" AS "posts__created_at"
FROM (
SELECT * FROM accounts
WHERE TRUE
ORDER BY "id" ASC
LIMIT 2
) AS "users"
LEFT JOIN LATERAL (
SELECT * FROM posts
WHERE "users".id = posts.author_id
ORDER BY "created_at" DESC, "id" DESC
LIMIT 3
) AS "posts" ON "users".id = "posts".author_id
order by posts.created_at desc
The order of output on postgres (and many other dbms's) cannot be guaranteed without an order by statement.
While you do indeed have order by statements, they are within sub-queries, you need the order by on the outer query.
you may need to order the outer query too because the in join between the 2 inner queries, even when they are ordered, won't be guaranteed.
SELECT
"users"."id" AS "id",
"posts"."id" AS "posts__id",
"posts"."created_at" AS "posts__created_at"
FROM (
SELECT * FROM accounts
WHERE TRUE
ORDER BY "id" ASC
LIMIT 2
) AS "users"
LEFT JOIN LATERAL (
SELECT * FROM posts
WHERE "users".id = posts.author_id
ORDER BY "created_at" DESC, "id" DESC
LIMIT 3
) AS "posts" ON "users".id = "posts".author_id
order by "posts"."created_at" DESC
Because the actual sort order depends on both the order of id in the first table and the order of the created_at & id in the second one prior to joining them. This means the order of the first table can produce unexpected results when computing the selected values from the joined table.
To fix the sort order, you should sort the final result set by relevant columns as well.

using MobileService.GetSyncTable PullAsync Trouble

I am trying to get using the Azure Backend with Sync to work for my app.
It looks like PullAsync is not populating my local table though.
My Table is set up like this:
private IMobileServiceSyncTable<Familie> FamilienTable = App.MobileService.GetSyncTable<Familie>(); // offline sync
Some code first. Initializing the table like this:
if (!App.MobileService.SyncContext.IsInitialized)
{
var store = new MobileServiceSQLiteStore("localGarden.db");
store.DefineTable<Familie>();
await App.MobileService.SyncContext.InitializeAsync(store);
}
I can see that the table is created and it is on my local system later (already tried setting up a new table, too).
And a little later in my code this is called:
try
{
await FamilienTable.PullAsync(null, FamilienTable.CreateQuery());
}
catch (Exception ex)
{
errorString = "Pull failed: " + ex.Message +
"\n\nIf you are still in an offline scenario, " +
"you can try your Pull again when connected with your Mobile Serice.";
}
if (errorString != null)
{
MessageDialog d = new MessageDialog(errorString);
await d.ShowAsync();
}
I set this up originally based on https://azure.microsoft.com/en-us/documentation/articles/mobile-services-xamarin-ios-get-started-offline-data/ and adapted to using a Win10 App. This is using Microsoft.Azure.Mobile.Client.SQLitestore 2.0.1 mainly.
I know that I need to replace the null in the PullAsync with a string for incremental updates in the future.
Using fiddler I discovered that there are 2 calls to my node.js powered API on Azure during the pullAsync:
[removed].azurewebsites.net/tables/Familie?$skip=0&$top=50&__includeDeleted=true (which returns a JSON with 6 rows I have in my table) and immidiatly after that [removed].azurewebsites.net/tables/Familie?$skip=6&$top=50&__includeDeleted=true which returns an empty JSON.
My local table stays empty though.
Can someone tell me, if that behavior is by design, explain why I get 2 calls and give me an idea where to look for the reason why my local table is not populated?
Thank you very much!
Additional Info: Output of LoggingHandler:
CREATE TABLE IF NOT EXISTS [Familie] ([id] INTEGER PRIMARY KEY, [Name] TEXT, [Deleted] BOOLEAN, [Version] DATETIME, [createdAt] DATETIME, [updatedAt] DATETIME)
CREATE TABLE IF NOT EXISTS [__operations] ([id] TEXT PRIMARY KEY, [kind] INTEGER, [state] INTEGER, [tableName] TEXT, [tableKind] INTEGER, [itemId] TEXT, [item] TEXT, [createdAt] DATETIME, [sequence] INTEGER, [version] INTEGER)
CREATE TABLE IF NOT EXISTS [__errors] ([id] TEXT PRIMARY KEY, [httpStatus] INTEGER, [operationVersion] INTEGER, [operationKind] INTEGER, [tableName] TEXT, [tableKind] INTEGER, [item] TEXT, [rawResult] TEXT)
CREATE TABLE IF NOT EXISTS [__config] ([id] TEXT PRIMARY KEY, [value] TEXT)
BEGIN TRANSACTION
INSERT OR IGNORE INTO [__config] ([id]) VALUES (#p0)
The thread 0x2fcc has exited with code 0 (0x0).
UPDATE [__config] SET [value] = #p0 WHERE [id] = #p1
COMMIT TRANSACTION
SELECT * FROM [__operations] ORDER BY [sequence] DESC LIMIT 1
SELECT COUNT(1) AS [count] FROM [__operations]
Pulling changes from remote server
'Gemüsebeetplaner.exe' (CoreCLR: CoreCLR_UWP_Domain): Loaded 'C:\Users\Jens\documents\visual studio 14\Projects\Gemüsebeetplaner\Gemüsebeetplaner\bin\x86\Debug\AppX\System.Linq.Queryable.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
SELECT * FROM [__operations] WHERE ([tableName] = #p1) LIMIT 0
SELECT COUNT(1) AS [count] FROM [__operations] WHERE ([tableName] = #p1)
Pulling changes from remote server
SELECT * FROM [__operations] WHERE ([tableName] = #p1) LIMIT 0
SELECT COUNT(1) AS [count] FROM [__operations] WHERE ([tableName] = #p1)
SELECT * FROM [Familie] ORDER BY [Name]
Additonal Info 2:
**Screenshots of the tables **
Screenshot
This is the log with parameters enabled:
Pulling changes from remote server
SELECT * FROM [__operations] WHERE ([tableName] = #p1) LIMIT 0
#p1:Familie
SELECT COUNT(1) AS [count] FROM [__operations] WHERE ([tableName] = #p1)
#p1:Familie
{
"count": 0
}
Pulling changes from remote server
SELECT * FROM [__operations] WHERE ([tableName] = #p1) LIMIT 0
#p1:Familie
SELECT COUNT(1) AS [count] FROM [__operations] WHERE ([tableName] = #p1)
#p1:Familie
{
"count": 0
}
SELECT * FROM [Familie] ORDER BY [Name]
The thread 0x24c8 has exited with code 0 (0x0).
The thread 0x1474 has exited with code 0 (0x0).
The thread 0x1a40 has exited with code 0 (0x0).
Class
public class Familie
{
[JsonProperty(PropertyName = "Id")]
public int Id { get; set; }
[JsonProperty(PropertyName = "Name")]
public string Name { get; set; }
}
There will always be at least 2 calls when doing a Pull operation, because the client has no way of knowing the server batch size. In your example, the server sent 6 records, so the client needs to do another query to find out if there are more records
What is the symptom of your local table not being populated? Do you get empty results when you do a query of FamilienTable?
Does the first call return results? If so, there must be something strange in how the local database is being populated. Try adding logging SQLite store to your app, which will log all local database statements, here's a sample: https://github.com/Azure-Samples/app-service-mobile-dotnet-todo-list-files/blob/master/src/client/MobileAppsFilesSample/Helpers/LoggingHandler.cs

Azure stream analyics - Compiling query failed

When I try to use the last function (https://msdn.microsoft.com/en-us/library/azure/mt421186.aspx). I get the following error:
Compiling query failed.
SELECT
deviceId
,System.TimeStamp as timestamp
,avg(events.externaltemp) as externaltemp
,LAST(System.Timestamp) OVER (PARTITION BY deviceId LIMIT DURATION(second, 1) when [externaltemp] is not null ) as Latest
INTO
[powerBI]
FROM
[EventHub] as events timestamp by [timestamp]
GROUP BY deviceId, TumblingWindow(second,1)
My last function looks very similar to the one in the msdn sample, so I'm not sure why there is a problem.
You are using [externaltemp] in your query, but it is not included in group by. That is the reason. And "last" function does not allow aggregates inside it, so below wouldn't work as well
LAST(System.Timestamp) OVER (PARTITION BY deviceId LIMIT DURATION(second, 1) when avg([externaltemp]) is not null ) as Latest
It can be achieved by splitting the query into two steps, like this
with DeviceAggregates
as
(
SELECT
System.TimeStamp as [Timestamp],
deviceId,
avg(events.externaltemp) as [externaltemp]
FROM
[EventHub] as events timestamp by [timestamp]
GROUP BY
deviceId,
TumblingWindow(second,1)
),
DeviceAggregatesWithLast as
(
select
*,
last([Timestamp]) over (partition by deviceId limit duration(second,1) when [externaltemp] is not null) [LastTimeThereWasANonNullTemperature]
from
DeviceAggregates
)
select *
INTO
[powerBI]
from
DeviceAggregatesWithLast

Resources