APACHE Kudu does not natively support range deletes or updates - apache-kudu

Clarification requested on KUDU.
In the KUDU guides the following is stated:
Row delete and update operations must also specify the full primary key of the row to be changed. Kudu does not natively support range deletes or updates.
The first part makes sense. However, using IMPALA via Hue I can easily issue commands like these that relate to the highlighted part of the prose:
delete from metrics_001 where (value >= 400 and value <= 600);
update metrics_001 set value = value + 1000 where (value >= 600 and value <= 800);
which execute as expected.
Does the statement mean that IMPALA allows this to be so? Could not find it from the documentation. I must be missing something elementary.

Impala first scans Kudu for the records that meet the filter criteria, and then it sends back to Kudu the individual delete/update operations for each key that was found.

Related

Copying data in and out of Snowflake via Azure Blob Storage

I'm trying to copy into blob storage and then copy out of blob storage. The copy into works:
copy into 'azure://my_blob_url.blob.core.windows.net/some_folder/MyTable'
from (select *
from MyTable
where condition = 'true')
credentials = (azure_sas_token = 'my_token');
But the copy out fails:
copy into MyTable
from 'azure://my_blob_url.blob.core.windows.net/some_folder/MyTable'
credentials = (azure_sas_token = 'my_token');
the error is:
SQL Compilation error: Function 'EXTRACT' not supported within a COPY.
Weirdly enough, it worked once and hasn't worked since. I'm at a loss, nothing turns up details for this.
I know there's an approach I could take using stages, but I don't want to for a bunch of reasons and even when I try with stages the same error presents itself.
Edit:
The cluster key definition is:
cluster by (idLocal, year(_ts), month(_ts), substring(idGlobal, 0, 1));
where the idLocal and idGlobal are varchars and the _ts is a TIMESTAMPTZ
I think I've seen this before with a cluster key on the table (which I don't think is supported with COPY INTO). The EXTRACT function (shown in the error) being part of the CLUSTER BY on the table.
This is a bit of a hunch, but assuming this isn't occurring for all your tables, hoping it leads to investigation on the table configuration and perhaps that might help.
Alex can you try with a different function in the cluster key on your target table like date_trunc('day',_ts)?
thanks
Chris

Increase column's size in Oracle DB tables with Knex.js

I have a Password's column in a table, stored in OracleDB 11g.
In order to store hashed passwords on it, I need to increment its size from 25 to 60 or 100 BYTE.
I do not want to do this manually, I hope I can find a script or anything else using KnexJS (Something like migrations or seeds)
Thank you.
The correct term for what you want to do is "increase", not "increment". It looks like Knex.js supports changing the default DDL for columns (which is to create) to alter via the alter method. http://knexjs.org/#Schema-alter
In theory, it should work something like this:
knex.schema.alterTable('user', function(t) {
t.string('password', 100).alter();
});
I must admit, the following verbage in this method has me a little concerned:
Alter is not done incrementally over older column type so if you like to add notNull and keep the old default value, the alter statement must contain both .notNull().defaultTo(1).alter().
I'm not sure what that means at the end of the day. Just be sure to test this in development before trying it in production!

USQL nested query performance

I have a USQL query that runs fine on it's own against 400M records in a managed table.
But during development, I don't want to run it against all records all the time, so I pop a where clause in, run it for a tiny subsection of data, and it completes in around 2 minutes (#5 AUs), writing out results to a tsv in my data lake.
Happy with that.
However, I now want to use it as the source for a second query and further processing.
So I create a view with the original USQL (minus the where clause).
Then to test, a new script :
'Select * from MyView WHERE <my original test filter>'.
Now I was expecting that to execute in around the same time as the original raw query. But instead I got to 4 minutes, only 10% through the plan, and cancelled - something is not right.
No expert at reading Job Graphs, but ...
The original script kicks off with 2* 'Extract Combine partition' both reading a couple of hundered MBs, my select on the saved View is reading over 100GB !!
So it is not taking the where clause into account at all at this stage.
Obviously this shows how little I yet understand about how DLA works behind the scenes !
Would someone please help me understand (a) what is going on and (b) a path forward to get the behavior I need ?
Currently having a play with stored procedures to store the 1st result in a table and then call the second query against that - but just seems overkill compared with 'traditional' SQL Server ?!?
All pointers & hints appreciated !
Many Thanks
Original Base Query:
CREATE VIEW IF NOT EXISTS Play.[M3_CycleStartPoints]
AS
//#BASE =
SELECT ROW_NUMBER() OVER (PARTITION BY A.[CTNNumber] ORDER BY A.[SeqNo]) AS [CTNCycleNo], A.[CTNNumber], A.[SeqNo], A.[BizstepDescription], A.[ContainerStatus], A.[FillStatus]
FROM
[Play].[RawData] AS A
LEFT OUTER JOIN
(
SELECT [CTNNumber],[SeqNo]+1 AS [SeqNo],[FillStatus],[ContainerStatus],[BizstepDescription]
FROM [Play].[RawData]
WHERE [FillStatus] == "EMPTY" AND [AssetUsage] == "CYLINDER"
) AS B
ON A.[CTNNumber] == B.[CTNNumber] AND A.[SeqNo] == B.[SeqNo]
WHERE (
(A.[FillStatus] == "FULL" AND
A.[AssetUsage] == "CYLINDER" AND
B.[CTNNumber] == A.[CTNNumber]
) OR (
A.[SeqNo] == 1
)
);
//AND A.[CTNNumber] == "BE52XH7";
//Only used to test when running script as stand-alone & output to tsv
Second Query
SELECT *
FROM [Play].[M3_CycleStartPoints]
WHERE [CTNNumber] == "BE52XH7";
Ok, I think I've got this, or at least in part.
Table valued Functions
http://www.sqlservercentral.com/articles/U-SQL/146839/
to allow the passing of an argument to a view and return the result.
Would be interested in finding some reading material around this subject still though.
Coming from a T-SQL world, seems that there are some fundamental differences I'm still tripping over.

How to get generated key in oracle 12c with mybatis

I used identity column in oracle 12c:
col1 NUMBER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1)
I use Spring + mybatis, how to get generated value, useGeneratedKeys seems doesn't work.
Thanks!
Anyway, the question has been asked and answered here.
You might not notice the generated value is not returned by the insert statement, but stored in the input parameter in the keyProperty.

Parameterized job using Uno-choice plugin

I'm using Uno choice plugin to select parameter values based on previous selections.
(This plugin helped me to reduce parameter count. I can reuse same parameter for multiple platform based on the platform selection)
I used the groovy script to select parameter values.
But it takes too much time to load parameters.
Is there any way to speed up this process?
I had faced similar issues and I was also using groovy scripts to cal shell scripts.I did the following things to reduce time:-
When you click on build with Parameters all task(scripts run at once together) are performed at once.
Use else conditions properly.
Also use Fallback script.
For eg:-
you have parameters such as
1) country
2) state
3) city
each parameter depends on the previous values.
1) Try to only display contents on Jenkins front-end.(cat command).
2) Call a script if only it matches valid values in the previous parameter.
3) minimum on the fly scripts.
4) optimize delays/sleep according to your load time.
5) Remove any extensions whether in chrome/Firefox.
5) Try using the same page in incognito mode.
6) If options are invalid through invalid option without going into any computation.
7) Uninstall plugin which are not required.
Will add more suggestions as I find.
I would request you also to please update if you find any method to optimize time.

Resources