Does Trino/Presto support PL/SQL? - presto

Does Trino/Presto support PL/SQL type programming, i.e. data returned by one sql statement can be passed on to another sql statement for processing? Does it support For loop, while loop etc

Related

How to check length of text field of Cassandra table

There is one field 'name' in our Cassandra database whose data type is type 'text'
How do I retrieve the data which has length of the 'name' field greater than some number using Cassandra query.
As was pointed in the comment, it's easy to add the user-defined function, and use it to retrieve the length of the text field, but the catch is that you can't use the user-defined function in the WHERE condition (see CASSANDRA-8488).
Even if it was possible, if you only have this as condition - that's a bad query for Cassandra, as it will need to go through all data in the database, and filter them out. For such tasks, usually things like, Spark are used - you can read data via Spark Cassandra Connector, and apply necessary filtering conditions. But this will involve reading all of the data from database, and then performing the filtering - this would be quite slower than normal CQL queries, but at least automatically parallelized.

Is there any way to pass a U-SQL script a parameter from a C# program?

I'm using U-SQL with a table in Azure Data Lake Analytics. Is there any way to pass a list of partition keys generated in a C# program to the U-SQL script then have the script return all the elements in those partitions?
Do you want to run the C# code on your dev box and pass values to a U-SQL script or run C# code inside your U-SQL Script? Your description is not clear. Based on your question title, I will answer your first question.
Passing values as parameters from a C# program: The ADLA SDK (unlike Azure Data Factory) does not yet provide a parameter model for U-SQL scripts (please file a request at http://aka.ms/adlfeedback, although I know it is on our backlog already, having external customer demand helps in prioritization).
However it is fairly easy to add your parameter values by prepending DECLARE statements like the following in the beginning of the script and have the script refer to them as variables.
DECLARE #param = new SqlArray<int>( 1, 2, 3, 4); // 1,2,3,4 were calculated in your C# code (I assume you have int partition keys).
Then you should be able to use the array in a predicate (e.g., #param.Contains(partition_col)). That will not (yet, we have a workitem for it) trigger partition elimination though.
If you want partition elimination, you will have to have a fixed set of parameter values and use them in an IN clause. E.g., you want to check up to 3 months, you would write the query predicate as:
WHERE partition_col IN (#p1, #p2, #p3);
And you prepend definitions for #p1, #p2 and #p3, possibly duplicating values for the parameters you do not need.

U-SQL - Execution related queries

I wrote multiple U-SQL scripts and its output got stored in ADLA, based on this I have few question.
How we can run dependent jobs in U-SQL?
How to execute statement based on some condition like
If RecordCount > 0 then
insert into table1
endif
How we can schedule U-SQL jobs?
Can we write multiple scripts and call them from main script?
During script execution, compiler prepare and compiles the code. It took almost 30-40 secs. How we can bundle the compiled code and create the ADF pipeline?
You can schedule and orchestrate U-SQL jobs with Azure Data Factory or by writing your own scheduler with one of the SDKs (Powershell, C#, Java, node.js, Python).
U-SQL supports two ways for conditional execution:
If your conditional can be evaluated at compile time, e.g., when you pass a parameter value or check for the existence of a file, you can use the IF statement.
If your conditional can only be determined during the execution of the script, then you can use the WHERE clause as wBob outlines in his comment.
As wBob mentions, you can encapsulate most of the U-SQL statements in procedures and then call them from other scripts/procedures, or you can write your own way of inclusion/orchestration if you need script file reuse.
There is currently no ability to reuse and submit just compiled code, since the compilation depends on the exact information such as what files are present and the statistics of the accessed data.

performance of parameterized queries for different db's

A lot of people know that it is important to use parameterized queries to prevent sql injection attacks.
Parameterized queries are also much faster in sqlite and oracle when doing online transaction processing because the query optimizer doesn't have to reparse every parameterized sql statement before executing. I've seen sqlite becoming 3 times faster when you use parameterized queries, oracle can become 10 times faster when you use parameterized queries in some extreme cases with a lot of concurrency.
How about other db's like mysql, ms sql, db2 and postgresql?
Is there an equal difference in performance between parameterized queries and literal queries?
With respect to MySQL, MySQLPerformanceBlog reported some benchmarks of queries per second with non-prepared statements, prepared statements, and query cached statements. Their conclusion is that prepared statements is actually 14.5% faster than non-prepared statements on MySQL. Follow the link for details.
Of course the ratio varies based on the query.
Some people suppose that there's some overhead because you're making an extra round-trip from the client to the RDBMS -- one to prepare the query, the second to pass parameters and execute the query.
But the reality is that these are false assumptions made without actually measuring. I've never heard of prepared statements being slower in any brand of database.
I've nearly always seen an increase in speed - but only the first time generally. After the plans are loaded and cached I would have surmised that the various db engines will behave the same for either type.

Multi-threading access to SubmitChanges() (LINQ to SQL)

I am using Visual Studio 2010 Beta 2.
In Parallel.For loop I execute the same method with different parameter values. After execution processed data must be stored in the database.
But I've got an exception hat says that I could not work with the same data context from different threads.
So the question will be how to work with data context and SubmitChanges() from multiple threads?
I would recommend creating a threadsafe structure for storing your results. Once your parallel for has completed you can read these out of the structure and push them into your linq dataset.

Resources