Where is a list of *all* Spark property keys? - apache-spark

Where is a list of all (valid, built-in) Spark properties?
The list of Available Properties on the official Spark documentation does not include all (valid, built-in) properties for the current stable version of Spark (2.4.4 as of 2020-01-22). An example is spark.sql.shuffle.partitions, which defaults to 200. Unfortunately, properties like this one do not appear to be accessible via any of sparkConf.getAll(), sparkConf.toDebugString(), or sql.("SET -v").
Rather, built-in defaults appear to be accessible only by explicit name (i.e. sparkConf.get("foo")). However, this does not help me since the exact property name must be already known, and I need to survey properties that I don't already know about for debugging/optimization/support purposes.

you can use.
sql("SET -v").show(500,false)
Which will give you a near complete list not including the internal properties.
+-----------------------------------------------------------------+-------------------------------------------------+
|key |value |
+-----------------------------------------------------------------+-------------------------------------------------+
|spark.sql.adaptive.enabled |false |
|spark.sql.adaptive.shuffle.targetPostShuffleInputSize |67108864b |
|spark.sql.autoBroadcastJoinThreshold |10485760 |
|spark.sql.avro.compression.codec |snappy |
|spark.sql.avro.deflate.level |-1 |
...

I don't think this is the complete answer, but it can help. It will show more properties than your alternatives. At least will show options modified by some kind of middle ware, like Livy.
Set this parameter:
spark.logConf=true
Now all your session configuration will be saved in yarn log at level INFO. Do a yarn logs -applicattionID <your app id> and search for spark.app.name= to find your session properties.
Another problem is that you will see the properties values just after executing the job.

Related

How to buil an app like google pdf viewer? [duplicate]

So the idea is to make an encryption software which will work only on .txt files and apply some encryption functions on it and generate a new file. To avoid the hassle of user having to drag-and-drop the file, I have decided to make an option similar to my anti-virus here.
I want to learn how to make these for various OS, irrespective of the architecture :)
What are these menus called? I mean the proper name so next time I can refer to them in a more articulate way
How to make these?
My initial understanding:
What I think it will do is: pass the file as an argument to the main() method and then leave the rest of the processing to me :)
Probably not exactly the answer you were hoping for, but it seems that this is a rather complicated matter. Anyway, I'll share what I know about it and it will hopefully prove enough to (at least) get you started.
Unfortunately, the easiest way to create a context menu using Java is editing the Registry. I'll try to summarize the milestones of the overall requirements and steps to achieve our objective.
<UPDATE>
See at the end of the post for links to sample code and a working demo.
</UPDATE>
What needs to be done
We need to edit the Registry adding an additional entry (for our java-app) in the context menus of the file-types we are interested in (e.g. .txt, .doc, .docx).
We need to determine which entries in Registry to edit, because our targeted file-extensions might be associated with another 'Class' (I couldn't test it on XP, but on Windows 7/8 this seems to be the case). E.g. instead of editing ...\Classes\.txt we might need to edit ...\Classes\txtfile, which the .txt Class is associated with.
We need to specify the path to the installed jre (unless we can be sure that the directory containing javaw.exe is in the PATH variable).
We need to insert the proper keys, values and data under the proper Registry nodes.
We need a java-app packaged as a .JAR file, with a main method expecting a String array containing one value that corresponds to the path of the file we need to process (well, that's the easy part - just stating the obvious).
All this is easier said than done (or is it the other way around ?), so let's see what it takes to get each one done.
First of all, there are some assumption we'll be making for the rest of this post (for the sake of simplicity/clarity/brevity and the like).
Assumptions
We assume that the target file-category is .TXT files - the same steps could be applied for every file-category.
If we want the changes (i.e. context-menus) to affect all users, we need to edit Registry keys under HKCR\ (e.g. HKCR\txtfile), which requires administrative priviledges.
For the sake of simplicity, we assume that only current user's settings need to be changed, thus we will have to edit keys under HKCU\Software\Classes (e.g. HKCU\Software\Classes\txtfile), which does not require administrative priviledges.
If one chooses to go for system-wide changes, the following modifications are necessary:
In all REG ADD/DELETE commands, replace HKCU\Software\Classes\... with HKCR\... (do not replace it in REG QUERY commands).
Have your application run with administrative priviledges. Two options here (that I am aware of):
Elevate your running instance's priviledges (can be more complicated with latest windows versions, due to UAC). There are plenty of resources online and here in SO; this one seems promising (but I haven't tested it myself).
Ask the user to explicitely run your app "As administrator" (using right-click -> "Run as administrator" etc).
We assume that only simple context-menu entries are needed (as opposed to a context-submenu with more entries).
After some (rather shallow) research, I have come to believe that adding a submenu in older versions of Windows (XP, Vista), would require more complex stuff (ContextMenuHandlers etc). Adding a submenu in Windows 7 or newer is considerably more easy. I described the process in the relevant part of this answer (working demo provided ;)).
That said, let's move on to...
Getting things done
You can achieve editing the Registry by issuing commands of the form REG Operation [Parameter List], with operations involving ADD, DELETE, QUERY (more on that later).
In order to execute the necessary commands, we can use a ProcessBuilder instance. E.g.
String[] cmd = {"REG", "QUERY", "HKCR\\.txt", "/ve"};
new ProcessBuilder(cmd).start();
// Executes: REG QUERY HKCR\.txt /ve
Of course, we will probably want to capture and further process the command's return value, which can be done via the respective Process' getInputStream() method. But that falls into scope "implementation details"...
"Normally" we would have to edit the .txt file-class, unless it is associated with another file-class. We can test this, using the following command:
// This checks the "Default" value of key 'HKCR\.txt'
REG QUERY HKCR\.txt /ve
// Possible output:
(Default) REG_SZ txtfile
All we need, is parse the above output and find out, if the default value is empty or contains a class name. In this example we can see the associated class is txtfile, so we need to edit node HKCU\Software\Classes\txtfile.
Specifying the jre path (more precisely the path to javaw.exe) falls outside the scope of this answer, but there should be plenty of ways to do it (I don't know of one I would 100% trust though).
I'll just list a few off the top of my head:
Looking for environment-variable 'JAVA_HOME' (System.getenv("java.home");).
Looking in the Registry for a value like HKLM\Software\JavaSoft\Java Runtime Environment\<CurrentVersion>\JavaHome.
Looking in predifined locations (e.g. C:\Program Files[ (x86)]\Java\).
Prompting the user to point it out in a JFileChooser (not very good for the non-experienced user).
Using a program like Launch4J to wrap your .JAR into a .EXE (which eliminates the need of determining the path to 'javaw.exe' yourself).
Latest versions of Java (1.7+ ?) put a copy of javaw.exe (and other utilities) on the path, so it might be worth checking that as well.
3. So, after collecting all necessary data, comes the main part: Inserting the required values into Registry. After compliting this step, our HKCU\Software\Classes\txtfile-node should look like this:
HKCU
|_____Software
|_____Classes
|_____txtfile
|_____Shell
|_____MyCoolContextMenu: [Default] -> [Display name for my menu-entry]
|_____Command: [Default] -> [<MY_COMMAND>]*
*: in this context, a '%1' denotes the file that was right-clicked.
Based on how you addressed step (1.2), the command could look like this:
"C:\Path\To\javaw.exe" -jar "C:\Path\To\YourApp.jar" "%1"
Note that javaw.exe is usually in ...\jre\bin\ (but not always only there - recently I've been finding it in C:\Windows\System32\ as well).
Still being in step (1.3), the commands we need to execute, in order to achieve the above structure, look as follows:
REG ADD HKCU\Software\Classes\txtfile\Shell\MyCoolContextMenu /ve /t REG_SZ /d "Click for pure coolness" /f
REG ADD HKCU\Software\Classes\txtfile\Shell\MyCoolContextMenu\Command /ve /t REG_SZ /d "\"C:\Path\To\javaw.exe\" -jar \"C:\Path\To\Demo.jar\" \"%%1\" /f"
// Short explanation:
REG ADD <Path\To\Key> /ve /t REG_SZ /d "<MY_COMMAND>" /f
\_____/ \___________/ \_/ \_______/ \_______________/ \_/
__________|_______ | | |___ | |
|Edit the Registry | | _______|________ | _______|_______ |
|adding a key/value| | |Create a no-name| | |Set the data | |
-------------------- | |(default) value | | |for this value.| |
| ------------------ | |Here: a command| |
_______________|______________ | |to be executed.| |
|Edit this key | | ----------------- |
|(creates the key plus | ____|_________ _________|_____
| any missing parent key-nodes)| |of type REG_SZ| |No confirmation|
-------------------------------- |(string) | -----------------
----------------
Implementation Considerations:
It is probably a good idea to check if our target class (e.g. txtfile), does already have a context-menu entry named "MyCoolContextMenu", or else we might be overriding an existing entry (which will not make our user very happy).
Since the data part of the value (the part that comes after /d and before /f) needs to be enclosed in "", keep in mind that you can escape " inside the string as \".
You also need to escape the %1 so that it is stored in the Registry value as-is (escape it like: %%1).
It is a good idea to provide your user with an option to "un-register" your context-menu entry.
The un-registering can be achieved by means of the command:
REG DELETE HKCU\Software\Classes\txtfile\Shell\MyCoolContextMenu /f
Omitting the /f at the end of the commands may prompt the "user" (in this case your app) for confirmation, in which case you need to use the Process' getOutputStream() method to output "Yes" in order for the operation to be completed.
We can avoid that unnecessary interaction, using the force flag (/f).
Almost, there !
Finding ourselves at step (2), we should by now have the following:
A context-menu entry registered for our files in category txtfile (note that it is not restricted to .TXT files, but applies to all files pertained by the system as "txtfiles").
Upon clicking that entry, our java-app should be run and its main() method passed a String array containing the path to the right-clicked .TXT file.
From there, our app can take over and do its magic :)
That's (almost) all, folks !
Sorry, for the long post. I hope it turns out to be of use to someone.
I'll try to add some demo-code soon (no promises though ;)).
UPDATE
The demo is ready !
I created a tiny demo-project.
Here is the source code.
Here is a ready-to-go JARred App.

Using bind variable in DevCenter, getting error "Invalid amount of bind variables"

How to use bind variables in a select statement.
When I am using it directly it is retrieving the values as below.
select event_hour
from stage_insight.insight_hourly_ts
where tag_id='UP247490.UPSYSCPWLV001A'
LIMIT 1;
How to use it dynamically?
select event_hour
from stage_insight.insight_hourly_ts
where tag_id = ? ;
For the second one, an error is displayed like, wrong amount of bind variables....
I am working with DataStax DevCenter. So, here I am trying to fetch the values directly from CassandraDB.
ResponseError: Invalid amount of bind variables\n
at FrameReader.readError (D:\\EACApp\\eac-app-management\\node_modules\\cassandra-driver\\lib\\readers.js:326:15)\n
at Parser.parseBody (D:\\EACApp\\eac-app-management\\node_modules\\cassandra-driver\\lib\\streams.js:194:66)\n
at Parser._transform (D:\\EACApp\\eac-app-management\\node_modules\\cassandra-driver\\lib\\streams.js:137:10)\n
at Parser.Transform._read (_stream_transform.js:205:10)\n at Parser.Transform._write (_stream_transform.js:193:12)\n
at writeOrBuffer (_stream_writable.js:352:12)\n at Parser.Writable.write (_stream_writable.js:303:10)\n
at Protocol.ondata (_stream_readable.js:719:22)\n at Protocol.emit (events.js:315:20)\n
cqlsh> show version
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
It isn't possible to use bind variables in DevCenter since they are only available when using prepared statements programatically.
If you are using bind variables in your Node.js app, my best guess is that you are passing the query parameters incorrectly although it's hard to say since you haven't provided enough information about your issue. In fact, the information you provided in your original question do not match what you've stated in the comments section.
Since you're new to Stack Overflow, a friendly suggestion that you learn how to ask good questions. The general guidance is that you (a) provide a good summary of the problem that includes software/component versions, the full error message + full stack trace; (b) describe what you've tried to fix the problem, details of investigation you've done; and (c) minimal sample code that replicates the problem.
In your case, you need to provide:
the CQL table schema
the driver you're using + version
minimal sample code
If you don't provide sufficient information in your questions, you are less likely to get help from forums or not get the right answers. Cheers!

display only specific resources by type with kusto in Resource Graph Explorer

I have an issue with showing specific resources with azure kusto query.
what i want is to write a kusto query that show only database resources and server resources in azure.
i have written following query regarding Databases:
resources
| where type in ("microsoft.sql/servers/databases","microsoft.dbforpostgresql/servers","microsoft.azuredata/postgresinstances","microsoft.dbformariadb/servers","microsoft.dbformysql/flexibleservers","microsoft.dbformysql/servers","microsoft.dbforpostgresql/flexibleservers","microsoft.dbforpostgresql/servergroups","microsoft.kusto/clusters/databases","microsoft.sql/managedinstances/databases","microsoft.synapse/workspaces/sqldatabases","ravenhq.db/databases","microsoft.documentdb/databaseaccounts")
| summarize Amount=count() by type
But when i execute the query it shows me two Databases even though i only have create one, the extra one is a "master" which should not be included because there is only one resource in the resource group
i have also tried with the following query:
resources
| where type contains "database" | distinct type
| summarize Amount=count() by type
But then the issue is that it doesnt include all the db's that doesnt have the word "database" in the type name for example "microsoft.azuredata/postgresinstances"
so the question is, how do i write a query that shows ALL the databases on my dashboard.
The second part of the question which is similar to the previous with databases is how i show all the Servers.
I have tried with the following queries:
resources
| where split(type,"/")[array_length(split(type,"/"))] contains "servers"
it gave me no result even though i had a server.
then i tried:
resources
| where type contains "/server" | distinct type
| summarize Amount=count() by type
that didnt work because it also returned all the database resources cuntaining the work "server"
i have tried to look through microsofts documentation, but cannot figure out what to do.
If you don't want the master databases (which are the databases that store system level data in SQL databases, you can simply filter them out:
resources
| where type in ("microsoft.sql/servers/databases","microsoft.dbforpostgresql/servers","microsoft.azuredata/postgresinstances","microsoft.dbformariadb/servers","microsoft.dbformysql/flexibleservers","microsoft.dbformysql/servers","microsoft.dbforpostgresql/flexibleservers","microsoft.dbforpostgresql/servergroups","microsoft.kusto/clusters/databases","microsoft.sql/managedinstances/databases","microsoft.synapse/workspaces/sqldatabases","ravenhq.db/databases","microsoft.documentdb/databaseaccounts")
| where name type != "microsoft.sql/servers/databases" or name != "master"
| summarize Amount=count() by type
Regarding the 2nd question, this should work since the has operator will only match whole tokens (and a slash separates tokens):
resources | where type has "servers"

How to use s skalar stored in 'let' in a where clause with '!contains' in Kusto Query Language

I have a problem which bothers me even though i think the solution must be super simple.
I have to build a query with Kusto Query Language for my Azure Analytics log analyzer metric.
I want to make this script working for the latest app version and for the second latest app version.
This is the query code to get the latest app version as a skalar.
customEvents
| where client_OS contains "Android"
| summarize max(application_Version)
Now my understanding would be, that i could store this in a let and use it later on to get the second latest app version like this:
let latestVersion = customEvents
| where client_OS contains "Android"
| summarize max(application_Version);
customEvents
| where client_OS contains "Android" and application_Version !contains latestVersion
|summarize max(application_Version)
But unfortunately the compiler wont let me use a skalar with !contains. I have to use a string.
Is there any way for me to make string out of this, so i can use it?
Or do you have any other good way to retrieve the second highest value from application_Version column?
I created this according to how i would do it in SQL, but it seems that Kusto is a bit different.
I hope you can help me fixing this and enlighten me and enhance my Kusto skills.
Best regards,
Maverick
latestVersion is not a scalar. To make it scalar, you have to surround it with toscalar(...).
In any case, if you want to find the top 2 items, there's a much more efficient way to do it:
customEvents
| where client_OS contains "Android"
| top 2 by application_Version desc

Spark: how to get all configuration parameters

I'm trying to find out what configuration parameters my spark app is executing with. Is there a way to get all parameters, including the default ones?
E.g. if you execute "set;" on a Hive console, it'll list full Hive configuration. I'm looking for an analogous action/command for Spark.
UPDATE:
I've tried the solution proposed by karthik manchala. I'm getting these results. As far as I know, these are not all parameters. E.g. this one spark.shuffle.memoryFraction (and a lot more) is missing.
scala> println(sc.getConf.getAll.deep.mkString("\n"));
(spark.eventLog.enabled,true)
(spark.dynamicAllocation.minExecutors,1)
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,...)
(spark.repl.class.uri,http://...:54157)
(spark.tachyonStore.folderName,spark-46d43c17-b0b3-4b61-a017-a186075849ca)
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://...)
(spark.driver.host,...l)
(spark.yarn.jar,local:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/spark/lib/spark-assembly.jar)
(spark.yarn.historyServer.address,http://...:18088)
(spark.dynamicAllocation.executorIdleTimeout,60)
(spark.serializer,org.apache.spark.serializer.KryoSerializer)
(spark.authenticate,false)
(spark.fileserver.uri,http://...:33681)
(spark.app.name,Spark shell)
(spark.dynamicAllocation.maxExecutors,30)
(spark.dynamicAllocation.initialExecutors,3)
(spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)
(spark.driver.port,46781)
(spark.shuffle.service.enabled,true)
(spark.master,yarn-client)
(spark.eventLog.dir,hdfs://.../user/spark/applicationHistory)
(spark.app.id,application_1449242356422_80431)
(spark.driver.appUIAddress,http://...:4040)
(spark.driver.extraLibraryPath,/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native)
(spark.dynamicAllocation.schedulerBacklogTimeout,1)
(spark.shuffle.service.port,7337)
(spark.executor.id,<driver>)
(spark.jars,)
(spark.dynamicAllocation.enabled,true)
(spark.executor.extraLibraryPath,/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native)
(spark.yarn.am.extraLibraryPath,/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native)
You can do the following:
sparkContext.getConf().getAll();

Resources