Apache Spark, Error on sbt/assembly - apache-spark

I tried to install apache spark on my local Linux Mint but when I do
sbt assembly
I got error like below :
How to fix this?
Please advise

From your picture which is spark binary folder.
This mean you don't need to install that just use it by running bin/spark-shell, bin/spark-submit.
If you want to compile it need to download the source code

Related

can't run apache spark on Mac

I installed apache spark via homebrew but when I try to use spark-submitenter image description here this picture happens

How do I connect to Cassandra with the C++ driver?

I am trying to connect cassandra with c++ in linux ‘titan’ by using the cmake .nothing in the internet useful I found,
Can anyone help me with steps please?
There are packages available for RHEL/CentOS and Ubuntu on the C++ driver Installation page. You can find the download links for the dependencies on the same page.
You'll need to install the dependencies and runtime library. See the Installation instructions for details.
Once you've installed the driver, try running the sample code for connecting to a Cassandra cluster in the C++ driver Getting started guide. Cheers!

Can PySpark work without Spark?

I have installed PySpark standalone/locally (on Windows) using
pip install pyspark
I was a bit surprised I can already run pyspark in command line or use it in Jupyter Notebooks and that it does not need a proper Spark installation (e.g. I did not have to do most of the steps in this tutorial https://medium.com/#GalarnykMichael/install-spark-on-windows-pyspark-4498a5d8d66c ).
Most of the tutorials that I run into say one needs to "install Spark before installing PySpark". That would agree with my view of PySpark being basically a wrapper over Spark. But maybe I am wrong here - can someone explain:
what is the exact connection between these two technologies?
why is installing PySpark enough to make it run? Does it actually install Spark under the hood? If yes, where?
if you install only PySpark, is there something you miss (e.g. I cannot find the sbin folder which contains e.g. script to start history server)
As of v2.2, executing pip install pyspark will install Spark.
If you're going to use Pyspark it's clearly the simplest way to get started.
On my system Spark is installed inside my virtual environment (miniconda) at lib/python3.6/site-packages/pyspark/jars
PySpark installed by pip is a subfolder of full Spark. you can find most of PySpark python file in spark-3.0.0-bin-hadoop3.2/python/pyspark. so if you'd like to use java or scala interface, and deploy distribute system with hadoop, you must download full Spark from Apache Spark and install it.
PySpark has a Spark installation installed. If installed through pip3, you can find it with pip3 show pyspark. Ex. for me it is at ~/.local/lib/python3.8/site-packages/pyspark.
This is a standalone configuration so it can't be used for managing clusters like a full Spark installation.

Do I need Hadoop in my windows to connect on hbase running on linux?

Do I need Hadoop in my windows to connect on hbase running on ununtu with hadoop?
My hbase is running fine on my ubuntu machine. I am able to connect with eclipse on same machine ( I am using kundera to connect hbase). Now I want to connect hbase from my windows 7 eclipse IDE . Do I need to install hadoop on my windows to connect remote hbase which is on ubuntu .?? when I tried I am getting something like this
Failed to locate the winutils binary in the hadoop binary path
Read about open-source technology .IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
All you need are the hadoop, hbase jars and then Configuration object initialized
with:
1. hbase.zookeeper.quorum(if cluster) details and other information initialized.
2. hbase.zookeeper.property.clientPort
3. zookeeper.znode.parent
And then, getting connection with the above config object
This problem usually occurs in Hadoop 2.x.x version. One of the option is to build Windows distribution for Hadoop version.
Refer this link:
http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os
But, before building try to use the zip file given in this link:
http://www.srccodes.com/p/article/39/error-util-shell-failed-locate-winutils-binary-hadoop-binary-path
Extract this zip file and paste the files under hadoop-common-2.2.0/bin to $HADOOP_HOME/bin directory.
Note: For me this works even for Hadoop 2.5 version.

How Install Hadoop and Hive in Ubuntu Linux in VM box?

I am using Windows 7 OS, I would like to learn Hive and Hadoop. So I installed Ubuntu 13.04 version in My VM Box. When i select download the Hadoop and Hive The below URL having multiple files to download Could you please help me out to install Hive in Ubuntu box else Is there any other steps do you have any steps
http://mirror.tcpdiag.net/apache/hadoop/common/hadoop-1.1.2/
hadoop-1.1.2-1.i386.rpm
hadoop-1.1.2-1.i386.rpm.mds
hadoop-1.1.2-1.x86_64.rpm
hadoop-1.1.2-1.x86_64.rpm.mds
hadoop-1.1.2-bin.tar.gz
hadoop-1.1.2-bin.tar.gz.mds
hadoop-1.1.2.tar.gz
hadoop-1.1.2.tar.gz.mds
hadoop_1.1.2-1_i386.deb
hadoop_1.1.2-1_i386.deb.mds
hadoop_1.1.2-1_x86_64.deb
hadoop_1.1.2-1_x86_64.deb.mds
Since you are new to both Hadoop and Hive, you are better off going ahead with their .tar.gz archives, IMHO. In case things don't go smooth you don't have to do the entire uninstall and reinstall stuff again and again. Just download hadoop-1.1.2.tar.gz, unzip it, keep the unzipped folder at some convenient location and proceed with the configuration. If you want some help regarding configuration you can visit this post. I have tried to explain the complete procedure with all the details.
Configuring Hive is quite straightforward. Download the .tar.gz file. unpack it just like you did with Hadoop. Then follow the steps shown here.
i386: Compiled for a 32-bit architecture
x86_64: Compiled for a 64-bit architecture
.rpm: Red Hat Package Manager file
.deb: Debian Package Manager file
.tar.gz: GZipped archive of the source files
bin.tar.gz: GZipped archive of the compiled source files
.mds: Checksum file
A Linux Package Manager is (sort of) like an installer in Windows. It automatically collects the necessary dependencies. If you download the source files you have to link (and/or compile) all the dependencies yourself.
There you're on Ubuntu, which is a Debian Linux distribution, and you don't seem to have much experience in a Linux environment I would recommend you to download the .deb file for your architecture. Ubuntu will automatically launch the package manager when you launch the .deb file if I remember correctly.
1 .Install Hadoop as single node cluster setup.
2 . Install hive after that.Hive requires Hadoop preinstalled.
Hadoop requires Java 1.6 at least and for single node setup you require SSH installed on your machine.rest of the steps are easy.
goto this link and Download the
http://mirror.tcpdiag.net/apache/hadoop/common/stable/
hadoop-1.1.2.tar.gz file (59M) from link and Install it...same as if you want install hive then go to offical site and download the stable version from it...

Resources