hudi delta streamer job via apache livy - apache-spark

Please help how to pass --props file and --source-class file to LIVY API POST .
spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4 \
--master yarn \
--deploy-mode cluster \
--conf spark.sql.shuffle.partitions=100 \
--driver-class-path $HADOOP_CONF_DIR \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
--table-type MERGE_ON_READ \
--source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
--source-ordering-field tst \
--target-base-path /user/hive/warehouse/stock_ticks_mor \
--target-table test \
--props /var/demo/config/ \
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \

I have converted the configs you are using in a json file to be passed to LIVY API
"className": "org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer",
"proxyUser": "root",
"driverCores": 1,
"executorCores": 2,
"executorMemory": "1G",
"numExecutors": 4,
"queue": "default",
"name": "stock_ticks_mor",
"file": "hdfs://tmp/hudi-utilities-bundle_2.12-0.8.0.jar",
"conf": {
"spark.sql.shuffle.partitions": "100",
"spark.jars.packages": "org.apache.hudi:hudi-spark-bundle_2.12:0.8.0,org.apache.spark:spark-avro_2.12:3.0.2",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.task.cpus": "1",
"spark.executor.cores": "1"
"args": [
"--source-class", "org.apache.hudi.utilities.sources.JsonKafkaSource",
You can submit this json to the LIVY endpoint like
curl -H "X-Requested-By: admin" -H "Content-Type: application/json" -X POST -d #config.json http://localhost:8999/batches
For reference :


Passing in VM arguments to Spark application via Argo

I have a spark application which is being triggered from argo yaml via dockerized image.
The argo workflow yaml is as :
kind: Workflow
generateName: test-argo-spark
namespace: argo
entrypoint: sparkapp
- name: sparkapp
name: main
args: [
"/opt/spark/bin/spark-submit \
--master k8s://https://kubernetes.default.svc \
--deploy-mode cluster \
--conf spark.kubernetes.container.image=/test-spark:latest \
--conf spark.driver.extraJavaOptions='-Divy.cache.dir=/tmp -Divy.home=/tmp' \
--conf \
--conf spark.jars.ivy=/tmp/.ivy \
--conf spark.kubernetes.driverEnv.HTTP2_DISABLE=true \
--conf spark.kubernetes.namespace=argo \
--conf spark.executor.instances=1 \
--conf \
--packages org.postgresql:postgresql:42.1.4 \
--conf \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=default \
--class SparkMain \
image: /test-spark:latest
imagePullPolicy: IfNotPresent
resources: {}
This is calling in the spark submit which will invoke the jar file that has code residing in
The java code is as :
public class SparkMain {
public static void main(String args[]) {
SparkSession spark = SparkHelper.getSparkSession("SparkMain_Application");
System.out.println("Spark Java appn " + spark.logName());
The Spark session is being created as follows:
public static SparkSession getSparkSession(String appName) {
String sparkMode = System.getProperty("spark_mode");
if (sparkMode == null) {
sparkMode = "cluster";
if (sparkMode.equalsIgnoreCase("cluster")) {
return createSparkSession(appName);
} else if (sparkMode.equalsIgnoreCase("local")) {
return createLocalSparkSession(appName);
} else {
throw new RuntimeException("Invalid spark_mode option " + sparkMode);
If we see here there is a system property which needs to be passed in ,
String sparkMode = System.getProperty("spark_mode");
Can anyone tell how can we pass in these VM args from argo yaml when calling in the Spark submit.
Also how can we pass in multiple properties for a program

dataproc create cluster gcloud equivalent command in python

How do I replicate the following gcloud command in python?
gcloud beta dataproc clusters create spark-nlp-cluster \
--region global \
--metadata 'PIP_PACKAGES=google-cloud-storage spark-nlp==2.5.3' \
--worker-machine-type n1-standard-1 \
--num-workers 2 \
--image-version 1.4-debian10 \
--initialization-actions gs://dataproc-initialization-actions/python/ \
--optional-components=JUPYTER,ANACONDA \
Here is what I have so far in python:
cluster_data = {
"project_id": project,
"cluster_name": cluster_name,
"config": {
"gce_cluster_config": {"zone_uri": zone_uri},
"master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-1"},
"worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-1"},
cluster = dataproc.create_cluster(
request={"project_id": project, "region": region, "cluster": cluster_data}
Not sure how to convert these gcloud commands to python:
--metadata 'PIP_PACKAGES=google-cloud-storage spark-nlp==2.5.3' \
--initialization-actions gs://dataproc-initialization-actions/python/ \
You can try as this :
cluster_data = {
"project_id": project,
"cluster_name": cluster_name,
"config": {
"gce_cluster_config": {"zone_uri": zone_uri},
"master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-1"},
"worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-1"},
"initialization_actions":{"executable_file" : "gs://dataproc-initialization-actions/python/"},
"gce_cluster_config": {"metadata": "PIP_PACKAGES=google-cloud-storage,spark-nlp==2.5.3"},
"endpoint_config": {"enable_http_port_access":True},
You can access for more : GCP Cluster Configs

Sort does not work on Text query with parse-server

The parse-server documentation is a bit outdated:
Try this query:
curl -X GET \
-H "X-Parse-Application-Id: ${APPLICATION_ID}" \
-H "X-Parse-REST-API-Key: ${REST_API_KEY}" \
-G \
--data-urlencode 'where={"name":{"$text":{"$search":{"$term":"Milk"}}}}' \
--data-urlencode 'order="$score"' \
--data-urlencode 'key="$score"' \
And it with return this error:
"code": 102,
"error": "Invalid parameter for query: key"
Question, how do you sort by "score" when doing Full-text search with Parse if the query parameters does not work?

Livy always run on local mode

I am trying to run Pyspark (or Spark) job via Livy server with "spark.master=yarn".
What I have done:
1) In spark-defaults.conf:
spark.master yarn
spark.submit.deployMode client
2) In livy.conf:
livy.spark.master = yarn
livy.spark.deployMode = client
3) I send request via CURL with "conf": {"spark.master": "yarn"}
curl -X POST -H "Content-Type: application/json" localhost:8998/batches --data '{"file": "hdfs:///user/grzegorz/", "name": "MY", "conf": {"spark.master": "yarn"} }'
{"id":3,"state":"running","appId":null,"appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":["stdout: ","\nstderr: "]}
And what I am always getting in logs:
18/01/02 14:45:07.880 qtp1758624236-28 INFO BatchSession$: Creating batch session 3: [owner: null, request: [proxyUser: None, file: hdfs:///user/grzegorz/, name: MY, conf: spark.master -> yarn]]
18/01/02 14:45:07.883 qtp1758624236-28 INFO SparkProcessBuilder: Running '/usr/local/share/spark/spark-2.0.2/bin/spark-submit' '--name' 'MY' '--conf' 'spark.master=local' 'hdfs:///user/grzegorz/'
I hope somebody have any ideas how to get it over. Thank you in advance.

How to enable gzip for yii2?

I need to add new rules to .htaccess or to add the code to index.php of YII2?
My site is on shared hosting.
I want to compress only .css and .js files. I don't want to compress all responses.
You can make it work by attaching event handler on yii\web\Response in index.php.
$application = new yii\web\Application($config);
$application->on(yii\web\Application::EVENT_BEFORE_REQUEST, function(yii\base\Event $event){
$event->sender->response->on(yii\web\Response::EVENT_BEFORE_SEND, function($e){
$event->sender->response->on(yii\web\Response::EVENT_AFTER_SEND, function($e){
I added the following rules to .htaccess:
<IfModule mod_filter.c>
AddOutputFilterByType DEFLATE "application/atom+xml" \
"application/javascript" \
"application/json" \
"application/ld+json" \
"application/manifest+json" \
"application/rdf+xml" \
"application/rss+xml" \
"application/schema+json" \
"application/vnd.geo+json" \
"application/" \
"application/x-font-ttf" \
"application/x-javascript" \
"application/x-web-app-manifest+json" \
"application/xhtml+xml" \
"application/xml" \
"font/eot" \
"font/opentype" \
"image/bmp" \
"image/svg+xml" \
"image/" \
"image/x-icon" \
"text/cache-manifest" \
"text/css" \
"text/html" \
"text/javascript" \
"text/plain" \
"text/vcard" \
"text/vnd.rim.location.xloc" \
"text/vtt" \
"text/x-component" \
"text/x-cross-domain-policy" \
Configuration list:
