Why in Presto, it shows cardinality function is "Python function"? - presto

But this cardinality function doesn't exist in Python though..
https://prestodb.io/docs/current/search.html?q=cardinality+function

Related

Understanding pandas_udf

The documentation page of pandas_udf in pyspark documentation has the following paragraph:
The user-defined functions do not support conditional expressions or short circuiting in boolean expressions and it ends up with being executed all internally. If the functions can fail on special rows, the workaround is to incorporate the condition into the functions.
Can somebody explain to me what this means? It seems it is saying that the UDF does not support conditional statements (if else blocks) and then suggesting that the workaround is to include the if else condition in the function body. This does not make sense to me. Please help
I read something similar in Learning Spark - Lightning-Fast Data Analytics
In Chapter 5 - User Defined Functions it talks about evaluation order and null checking in Spark SQL.
If your UDF can fail when dealing with NULL values it's best to move that logic inside the UDF itself just like it says in the quote you provided.
Here's the reasoning behind it:
Spark SQL (this includes DataFrame API and Dataset API) does not quarantee the order of evaluation of subexpressions. For example the following query does not guarantee that the s IS NOT NULL clause is executed prior to the strlen(s):
spark.sql("SELECT s FROM test1 WHERE s IS NOT NULL AND strlen(s) > 1")
Therefore to perform proper null checking it is recommended that you make the UDF itself null-aware and do null checking inside the UDF.

What does it mean to "run" a function in Alloy?

My understanding is that functions in Alloy return a value. However, I noticed that you can run a function using a run command the same way you would run a predicate. What does running a function mean and how is this functionality used in Alloy?
In this respect, you can think of a function as being just like a predicate: it's a constraint, and when you run it, Alloy finds an instance that makes the constraint true. In this case, it will be a collection of arguments for the function, the values of signatures and fields, and the function result.
Running a function is used, like running a predicate, to give you a better understanding by showing you sample executions. Think of it as like running test cases, but without having to write the tests :-)

If Python dictionaries are ordered, why can't I index them?

The question says it all. Python dictionaries are insertion ordered since Python 3.6 (in the CPython implementation). It is a language feature in 3.7. I can even d.popitem(). Why can't I index them, i.e. d[3]? Or can I?

How to use values (as Column) in function (from functions object) where Scala non-SQL types are expected?

I'd like to undertand how I can dynamically add number of days to a given timestamp: I tried something similar to the example shown below. The issue here is that the second argument is expected to be of type Int, however in my case it returns type Column. How do I unbox this / get the actual value? (The code examples below might not be 100% correct as I write this from top of my head ... I don't have the actual code with me currently)
myDataset.withColumn("finalDate",date_add(col("date"),col("no_of_days")))
I tried casting:
myDataset.withColumn("finalDate",date_add(col("date"),col("no_of_days").cast(IntegerType)))
But this did not help either. So how is it possible to solve this?
I did find a workaround by using selectExpr:
myDataset.selectExpr("date_add(date,no_of_days) as finalDate")
While this works, I still would like to understand how to get the same result with withColumn.
withColumn("finalDate", expr("date_add(date,no_of_days)"))
The above syntax should work.
I think it's not possible as you'd have to use two separate similar-looking type systems - Scala's and Spark SQL's.
What you call a workaround by using selectExpr is probably the only way to do it as you're confined in a single type system, in Spark SQL's and since the parameters are all defined in Spark SQL's "realm" that's the only possible way.
myDataset.selectExpr("date_add(date,no_of_days) as finalDate")
BTW, you've just showed me another reason where support for SQL is different from Dataset's Query DSL. It's about the source of the parameters to functions -- only from structured data sources, only from Scala or a mixture thereof (as in UDFs and UDAFs). Thanks!

How do I get the terminal size?

In bash there are two environmental variables: COLUMNS and LINES that store the number of columns and rows for the terminal. I have been trying to obtain that information in Haskell.
Since unlike ruby Haskell's run-time doesn't calculate that by default, I resorted to calling stty size. However, calling this command from Haskell with
readProcess "stty" ["size"] ""
results in the following run-time error:
readCreateProcess: stty "size" (exit 1): failed
What would be a good way to retrieve such information?
I would try the System.Console.Terminal.Size package, which in turn is based on Get Terminal width Haskell

Resources