Passing URL parameters to Apache Zeppelin paragraph - apache-spark

I need to pass request parameters to a specified zeppelin paragraph have them available to the spark context. tbh this is proving a real nightmare. I can write some js in the %angular interpreter to retrieve the query parameters but as z.angularBind("myparam", "value") currently only works in Spark Interpreter(scala) I can't use this.
My next thought was to retrieve the Paragraph and/or Notebook object - I'm thinking it must have a reference somewhere to the url that invoked it. However all you can easily get is the paragraphId/noteId from the InterpreterContext.
Anyone point me in the right direction?

You can pass parameters through dynamic form. Create the parameters through dynamic form for your notebook. To pass a value for the dynamic form, use the following
{
"params": {
"formLabel1": "value1",
"formLabel2": "value2"
}
}
Doc: https://zeppelin.apache.org/docs/0.7.2/rest-api/rest-notebook.html#run-a-paragraph-synchronously
Note that you can pass params only when you want to run a single paragraph.

Related

How to retrieve nested output from XCom using taskflow syntax in Airflow

Well, I know this seems to be possible I just don't know how. To begin with, I am using traditional operators (without #task decorator) but I am interested in XComArgs return output format from these operators that can be used in downstream tasks. Below is a sample example
task_1 = DummyOperator(
task_id = 'task_1'
) # returns {"data": {"foo" : [{"cmd": "ls"}]}}
task_2 = BashOperator(
task_2='task_2',
cmd=task_1.output['return_value']['data']['foo'][0]['cmd'] # does not give what I need and returns null.
#cmd = f"{{ ti.xcom_pull(task_ids = 'task_1', key='return_value')['data']['foo'][0]['cmd'] }}" Gives what I need
)
In this example what is working for me which is pure Jinja templating and the new syntax does not work for me using XComArgs. I have tried changing the argument render_template_as_native_obj=True in Dag configuration but does not change anything. I want to use .output format which returns XcomArgs object and is returning the complete dict but have not been able to use the nested keys like above. Also, have tried converting string to JSON and all those combinations but does not seem to work.
Unfortunately, retrieving nested values from XComArgs in a limitation of the TaskFlow API.
The TaskFlow API uses __getitem__ to override the XCom key to use. In your example, the key ends up being "cmd" rather than the value of what cmd represents in that nested object. You'll have to use the original ti.xcom_pull() method until that limitation is addressed.

How to pass session parameters with python to snowflake?

The below code is my attempt at passing a session parameter to snowflake through python. This part of an existing codebase which runs in AWS Glue, & the only part of the following that doesn't work is the session_parameters.
I'm trying to understand how to add session parameters from within this code. Any help in understanding what is going on here is appreciated.
sf_credentials = json.loads(CACHE["SNOWFLAKE_CREDENTIALS"])
CACHE["sf_options"] = {
"sfURL": "{}.snowflakecomputing.com".format(sf_credentials["account"]),
"sfUser": sf_credentials["user"],
"sfPassword": sf_credentials["password"],
"sfRole": sf_credentials["role"],
"sfDatabase": sf_credentials["database"],
"sfSchema": sf_credentials["schema"],
"sfWarehouse": sf_credentials["warehouse"],
"session_parameters": {
"QUERY_TAG": "Something",
}
}
In AWS Cloudwatch, I can find the parameter was sent with the other options. In snowflake, the parameter was never set.
I can add more detail where necessary, I just wasn't sure what details are needed.
It turns out that there is no need to specify that a given parameter is a session parameter when you are using the Spark Connector. So instead:
sf_credentials = json.loads(CACHE["SNOWFLAKE_CREDENTIALS"])
CACHE["sf_options"] = {
"sfURL": "{}.snowflakecomputing.com".format(sf_credentials["account"]),
"sfUser": sf_credentials["user"],
"sfPassword": sf_credentials["password"],
"sfRole": sf_credentials["role"],
"sfDatabase": sf_credentials["database"],
"sfSchema": sf_credentials["schema"],
"sfWarehouse": sf_credentials["warehouse"],
"QUERY_TAG": "Something",
}
Works perfectly.
I found this in the Snowflake Documentation for Using the Spark Connector: Here's the section on setting Session Parameters

Is there a way to pass a parameter to google bigquery to be used in their "IN" function

I'm currently writing an app that accesses google bigquery via their "#google-cloud/bigquery": "^2.0.6" library. In one of my queries I have a where clause where i need to pass a list of ids. If I use UNNEST like in their example and pass an array of strings, it works fine.
https://cloud.google.com/bigquery/docs/parameterized-queries
However, I have found that UNNEST can be really slow and just want to use IN on its own and pass in a string list of ids. No matter what format of string list I send, the query returns null results. I think this is because of the way they convert parameters in order to avoid sql injection. I have to use a parameter because I, myself want to avoid SQL injection attacks on my app. If i pass just one id it works fine, but if i pass a list it blows up so I figure it has something to do with formatting, but I know my format is correct in terms of what IN would normally expect i.e. IN ('', '')
Has anyone been able to just pass a param to IN and have it work? i.e. IN (#idParam)?
We declare params like this at the beginning of the script:
DECLARE var_country_ids ARRAY<INT64> DEFAULT [1,2,3];
and use like this:
WHERE if(var_country_ids is not null,p.country_id IN UNNEST(var_country_ids),true) AND ...
as you see we let NULL and array notation as well. We don't see issues with speed.

Passing sets of properties and nodes as a POST statement wit KOA-NEO4J or BOLT

I am building a REST API which connects to a NEO4J instance. I am using the koa-neo4j library as the basis (https://github.com/assister-ai/koa-neo4j-starter-kit). I am a beginner at all these technologies but thanks to some help from this forum I have the basic functionality working. For example the below code allows me to create a new node with the label "metric" and set the name and dateAdded propertis.
URL:
/metric?metricName=Test&dateAdded=2/21/2017
index.js
app.defineAPI({
method: 'POST',
route: '/api/v1/imm/metric',
cypherQueryFile: './src/api/v1/imm/metric/createMetric.cyp'
});
createMetric.cyp"
CREATE (n:metric {
name: $metricName,
dateAdded: $dateAdded
})
return ID(n) as id
However, I am struggling to know how I can approach more complicated examples. How can I handle situations when I don't know how many properties will be added when creating a new node beforehand or when I want to create multiple nodes in a single post statement. Ideally I would like to be able to pass something like JSON as part of the POST which would contain all of the nodes, labels and properties that I want to create. Is something like this possible? I tried using the below Cypher query and passing a JSON string in the POST body but it didn't work.
UNWIND $props AS properties
CREATE (n:metric)
SET n = properties
RETURN n
Would I be better off switching tothe Neo4j Rest API instead of the BOLT protocol and the KOA-NEO4J framework. From my research I thought it was better to use BOLT but I want to have a Rest API as the middle layer between my front and back end so I am willing to change over if this will be easier in the longer term.
Thanks for the help!
Your Cypher syntax is bad in a couple of ways.
UNWIND only accepts a collection as its argument, not a string.
SET n = properties is only legal if properties is a map, not a string.
This query should work for creating a single node (assuming that $props is a map containing all the properties you want to store with the newly created node):
CREATE (n:metric $props)
RETURN n
If you want to create multiple nodes, then this query (essentially the same as yours) should work (but only if $prop_collection is a collection of maps):
UNWIND $prop_collection AS props
CREATE (n:metric)
SET n = props
RETURN n
I too have faced difficulties when trying to pass complex types as arguments to neo4j, this has to do with type conversions between js and cypher over bolt and there is not much one could do except for filing an issue in the official neo4j JavaScript driver repo. koa-neo4j uses the official driver under the hood.
One way to go about such scenarios in koa-neo4j is using JavaScript to manipulate the arguments before sending to Cypher:
https://github.com/assister-ai/koa-neo4j#preprocess-lifecycle
Also possible to further manipulate the results of a Cypher query using postProcess lifecycle hook:
https://github.com/assister-ai/koa-neo4j#postprocess-lifecycle

node-postgres: how to prepare a statement without executing the query?

I want to create a "prepared statement" in postgres using the node-postgres module. I want to create it without binding it to parameters because the binding will take place in a loop.
In the documentation i read :
query(object config, optional function callback) : Query
If _text_ and _name_ are provided within the config, the query will result in the creation of a prepared statement.
I tried
client.query({"name":"mystatement", "text":"select id from mytable where id=$1"});
but when I try passing only the text & name keys in the config object, I get an exception :
(translated) message is binding 0 parameters but the prepared statement expects 1
Is there something I am missing ? How do you create/prepare a statement without binding it to specific value in order to avoid re-preparing the statement in every step of a loop ?
I just found an answer on this issue by the author of node-postgres.
With node-postgres the first time you issue a named query it is
parsed, bound, and executed all at once. Every subsequent query issued
on the same connection with the same name will automatically skip the
"parse" step and only rebind and execute the already planned query.
Currently node-postgres does not support a way to create a named,
prepared query and not execute the query. This feature is supported
within libpq and the client/server protocol (used by the pure
javascript bindings), but I've not directly exposed it in the API. I
thought it would add complexity to the API without any real benefit.
Since named statements are bound to the client in which they are
created, if the client is disconnected and reconnected or a different
client is returned from the client pool, the named statement will no
longer work (it requires a re-parsing).
You can use pg-prepared for that:
var prep = require('pg-prepared')
// First prepare statement without binding parameters
var item = prep('select id from mytable where id=${id}')
// Then execute the query and bind parameters in loop
for (i in [1,2,3]) {
client.query(item({id: i}), function(err, result) {...})
}
Update: Reading your question again, here's what I believe you need to do. You need to pass a "value" array as well.
Just to clarify; where you would normally "prepare" your query, just prepare the object you pass to it, without the value array. Then where you would normally "execute" your query, set the value array in the object and pass it to the query. If it's the first time, the driver will do the actual prepare for you the first time around, and simple do binding and execution for the rest of the iteration.

Resources