configuring Hortonworks Data Platform Sandbox 2.6.5 from the command line

configuring Hortonworks Data Platform Sandbox 2.6.5 from the command line - apache-spark

I am building a a demo/training environment for one of our products which work with Hive & Spark. I am using HDP 2.6.5 and If I configure the hive settings I need (primarily these: ACID Settings) through the Ambari GUI it works fine. But I want to automate this and setting these in hive-site.xml is not working (I have found many copies of this file, so it could simply be I am using the wrong one? )
How can I change from the command line what changes when I make changes in Dashboard->Hive->Configs ?
Where are these changes stored? I am sure I have missed something obvious in the docs, but I can't find it.
Thanks!

#Leigh K You should check out the Ambari REST API to make changes to hive. I did not find a quick link to official documentation, but I was able to find this post that goes into detail using PIG:
https://markobigdata.com/2018/07/22/adding-service-to-hdp-using-rest-api/2/

Related

Setting Up Databricks Connect

After running databricks-connect configure, when I run databricks-connect test, I am getting "The system cannot find the path specified." and then nothing happens, no error nothing. Please help me resolve this. Since there is no error message as well I am short pressed on what to google as well.

Update: I resolved this by matching the JAVA versions. The Databricks runtime in the cluster is 6.5 and on checking the documentation, it said JAVA 1.8.0_252 and so I had to look for a version closer to this and it is working now (both JDK and JRE are working).
There is still a caveat though. For tables that belong to a data lake I am still unable to make it work with
sparklyr::spark_read_parquet(sc = sc, path = "/.../parquet_table", header = TRUE, memory = FALSE)
It does work for the tables that belong to the "default" database in databricks. Not sure if this is just in my case but I am tired of all the tweaking I have been doing for the past week lol. Please comment if anyone has been able to get this working!

One of the hints is that you have JDK 15

How to implement spark.ui.filter

I have a spark cluster set up on 2 CentOS machines. I want to secure the web UI of my cluster (master node). I have made a BasicAuthenticationFilter servlet. I am unable to understand:
how should I use spark.ui.filter to secure my web UI.
Where should I place the servlet/jar file.
Kindly help.

I also needed to handle this security problem to prevent unauthorized access to spark standalone UI. At last I fixed it after surfing on the web, the procedure is :
code and compile a java filter using standard basic authentication protocol, I refered to this [blog]: http://lambda.fortytools.com/post/26977061125/servlet-filter-for-http-basic-auth
packaged above filter class as a jar file, put it in $spark_home/jars/
add config lines in $spark_home/conf/spark-default.conf as :
spark.ui.filters xxx.BasicAuthFilter # the full class name
spark.test.BasicAuthFilter.params user=foo,password=cool,realm=some
the username and password need to provide to access the spark UI, “realm” is insignificant whatever you typed
restart all slave and master process and test to find it works

Hi place the jar file in all the nodes in the folder /opt/spark/conf/. In terminal, type the following commands:
Navigate to the directory /usr/local/share/jupyter/kernels/pyspark/kernel.json
Edit the file kernel.json
Add the following argument to the PYSPARK_SUBMIT_ARGS --jars /opt/spark/conf/filterauth.jar –conf spark.ui.filters=authenticate.MyFilter
Here, filterauth.jar is the jar file created and authenticate.MyFilter represents <package name>.<class name>
Hope this answers your query. :)

How can I see the log of Spark job server task?

I deployed spark job server according https://github.com/spark-jobserver/spark-jobserver. Then I created a job server project, and uploaded to the hob server. While I run the project, how can I see the logs?

It looks like it's not possible to see the logs while running a project. I browsed through the source code and couldn't find any references to a feature like this, and it's clearly not a feature of the ui. It seems like your only option would be to view the logs after running a job, which are stored by default in /var/log/job-server, which you probably already know.

Cassandra Nagios plugins

I am trying to monitor cassandra using nagios casandra plugins by following this link.
http://assets.nagios.com/downloads/nagiosxi/docs/Monitoring_Apache_Cassandra_Databases_with_Nagios_XI.pdf
I do not see core config manager as I am using Nagios 3.3.1. How do we configure casandra specific checks using Nagios Core 3.3.1. Can anyone who did it point me to a good resource please.
Thank you in advance.

The Nagios Core version 3 manual can be found here: http://library.nagios.com/library/products/nagioscore/manuals/
Instead of performing steps 5 and 6 in the pdf you attached, you will have to add the server, the command, and the service check to the Nagios .cfg files manually.
Create the host object for the server using this page: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#host
Create a command object for the new Cassandra check, using this: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#command
Create the service object, the details are found here: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#service
Remember that the items in 'red' on those pages are required in the object entries.

Cassandra Installation Issue

Followed the steps to install the apache-cassandra latest build. Upon first startup (./cassandra -f), I get this:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/cassandra/thrift/CassandraDaemon
Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.thrift.CassandraDaemon
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:334)
Could not find the main class: org.apache.cassandra.thrift.CassandraDaemon. Program will exit.
I exported the JAVA_HOME path, etc. What am I doing wrong? I should note that I am on an Ubuntu Lucid machine.

The first thing you should do is setup CASSANDRA_HOME path to the Cassandra root directory.
Try running cassandra cassandra -f and everything will run smooth. (Cassandra actually checks CASSANDRA_HOME environment variable to find the lib folder to run the deamon).

If using Ubuntu (Lucid), use the tutorial here: http://dustyreagan.com/installing-cassandra-on-ubuntu-linux/, which is based on the debian package. Building from Git didn't work.

This answer may help you if you don't technically need to build from source, and if you're just getting started with a fresh Cassandra install rather than upgrading an existing one.
I had the same problem when building from source. To get around it, I used a development build from the "Latest Builds (Hudson)" http://cassandra.apache.org/download/ link here.
The next problem you'll encounter is that no keyspaces will be set up on a fresh install. To get around that problem you can use the last release 0.6.3. That solution didn't work for me, because I wanted to use Pycassa which needs 0.7.
So what I had to do was the following steps:
Fire up a JMX console. Personally I'm not located with the server running Cassandra, so I needed to use ssh tunnels, like this:
jconsole -J-DsocksProxyHost=localhost -J-DsocksProxyPort=1080
Then used this funky looking url to connect:
service:jmx:rmi:///jndi/rmi://my.hostname.com:8080/jmxrmi
Then on the left side
expand org.apache.cassandra.service
expand Storage Service expand Operations
select loadSchemaFromYAML
at the top right, click the loadSchemaFromYAML button to invoke it.
You can use the same steps to add new keyspaces during development, once you figure out what you want your schema to look like. But the above steps only work if you have no data. So you would have to remove all your data using rm /var/lib/cassandra/* after taking down the server. (Of course, there are other steps you can take that are more complicated to migrate data without destroying it.)
I realize you didn't ask about creating keyspaces, but on a trunk version of cassandra, if your'e just getting started, that's the very next problem you'll have. I just spent a day solving it, and am hoping it helps.

Can you provide more details? Are you using ubuntu Open JDK 6?
Also, you don't have to build from source. Just get the binary from the following url:
http://mirror.nexcess.net/apache/cassandra/1.1.2/apache-cassandra-1.1.2-bin.tar.gz

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string