DataStax Bulk Loader for Apache Cassandra isn't installing on Windows

DataStax Bulk Loader for Apache Cassandra isn't installing on Windows - cassandra

I'm trying to install DataStax Bulk Loader on my Windows machine in order to import json file to Cassandra databse. I just follow the installation instructions from the official webstie. It's just unpack the folder. Printing dsbulkfrom any catalogue into cmd prints the following result: "dsbulk" is not internal or external command, executable program, or batch file. However I added C:\DSBulk\dsbulk-1.7.0\bin into PATH variables. Anyone who faced with this problem what did you do? Thanks :D

Change into the bin/ directory where you unzipped the package. For example:
C:> cd C:\DSBulk\dsbulk-1.7.0\bin
Then run the dsbulk.cmd from there.
NOTE: Make sure you have both the classpath and Java home set in your environment. Cheers!

Related

How to use cassandra-loader on ubuntu

I want to use cassandra-loader on ubuntu 14.04.
I have cassandra installed on my machine along with other prerequisites require for loader.
I am following this link for the same;
https://github.com/brianmhess/cassandra-loader
I downloaded the cassandra-loader tool but when trying to run any cassandra-loader command it prompts cassandra-loader command not found.
Kindly guide if I am missing anything or need to install other prerequisite as well.

in the README it says:
To run cassandra-loader, simply run the cassandra-loader executable (e.g., located at build/cassandra-loader)
so everywhere where you see the cassandra-loader alone, or just copy build/cassandra-loader somewhere in your PATH and use it.

Done. Just needed to run chmod command on utility to work properly.

Find the Source/Bin Folder of Cassandra Which Installed by HomeBrew

I am having difficulty to find the source of the Cassandra installed on Mac, which the Cassandra was installed using HomeBrew.
We were asked to use the cql files to populating tables, and I checked couple times that the physical cql file is stored in the listed location. However, I would receive the following error message says the file or directory cannot be found. Could anyone advise please? Thanks!
cqlsh:stockwatcher> source '/Users/UserName/Downloads/insertusers.cql';
Could not open '/Users/UserName/Downloads/insertusers.cql':
[Errno 2] No such file or directory: '/Users/UserName/Downloads/insertusers.cql'

I am not a mac user but did you try whereis cassandra? If not, find / -name cassandra. If that does not work either, try to look for currently running java apps then look at the how the java executable was invoked; which has details like what libraries are included etc from which we can decipher the path.
But the error message looks more like permissions issue.

Spark on windows 10 not working

Im trying to get spark working on win10. When i try to run spark shell i get this error :
'Spark\spark-2.0.0-bin-hadoop2.7\bin..\jars""\ is not recognized as an internal or external command,operable program or batch file.
Failed to find Spark jars directory. You need to build Spark before running this program.
I am using a pre-built spark for hadoop 2.7 or later. I have installed java 8, eclipse neon, python 2.7, scala 2.11, gotten winutils for hadoop 2.7.1 And i still get this error.
When I donwloaded spark it comes in the tgz, when extracted there is another tzg inside, so i extracted it also and then I got all the bin folders and stuff. I need to access spark-shell. Can anyone help?
EDIT:
Solution i ended up using:
1) Virtual box
2) Linux mint

I got the same error while building Spark. You can move the extracted folder to C:\
Refer this:
http://techgobi.blogspot.in/2016/08/configure-spark-on-windows-some-error.html

You are probably giving the wrong folder path to Spark bin.
Just open the command prompt and change directory to the bin inside the spark folder.
Type spark-shell to check.
Refer: Spark on win 10

"On Windows, I found that if it is installed in a directory that has a space in the path (C:\Program Files\Spark) the installation will fail. Move it to the root or another directory with no spaces."
OR
If you have installed Spark under “C:\Program Files (x86)..” replace 'Program Files (x86)' with Progra~2 in the PATH env variable and SPARK_HOME user variable.

Installing Apache Spark on linux

I am installing Apache Spark on linux. I already have Java, Scala and Spark downloaded and they are all in the Downloads folder inside the Home folder with the path /home/alex/Downloads/X where X=scala, java, spark, literally that's what the folders are called.
I got scala to work but when I try to run spark by typing ./bin/spark-shell it says:
/home/alex/Downloads/spark/bin/saprk-class: line 100: /usr/bin/java/bin/java: Not a directory
I have already included the file path by editing the bashrc with sudo gedit ~/.bashrc:
# JAVA
export JAVA_HOME=/home/alex/Downloads/java
export PATH=$PATH:$JAVA_HOME/bin
# scala
export SCALA_HOME=/home/alex/Downloads/scala
export PATH=$PATH:$SCALA_HOME/bin
# spark
export SPARK_HOME=/home/alex/Downloads/spark
export PATH=$PATH:$SPARK_HOME/bin
When I try to type sbt/sbt package in the spark folder it say no such file or directory is found also. What should I do from here?

It seems you have a few issues, namely your JAVA_HOME is not pointed to a directory with java, when you are running sbt in spark you should run ./sbt/sbt (or in new versions ./build/sbt). While you can download Java & Scala by hand, you may find that your system packages are sufficient (make sure to get jdk 7 or later).

Furthermore, after using system packages as Holden points out, in Linux you may use the command whereis to make sure of the right paths.
Finally, the following link may prove useful:
http://www.tutorialspoint.com/apache_spark/apache_spark_installation.htm
Hope this helps.

Note: It looks like there may be a configuration issue, misspelling, of the directory name
/home/alex/Downloads/spark/bin/saprk-class: line 100: /usr/bin/java/bin/java: Not a directory
saprk-class
That could be a configuration issue only, but it's worth a look if it is called /spark-class elsewhere to see if it's causing related issues.

Why does spark-submit and spark-shell fail with "Failed to find Spark assembly JAR. You need to build Spark before running this program."?

I was trying to run spark-submit and I get
"Failed to find Spark assembly JAR.
You need to build Spark before running this program."
When I try to run spark-shell I get the same error.
What I have to do in this situation.

On Windows, I found that if it is installed in a directory that has a space in the path (C:\Program Files\Spark) the installation will fail. Move it to the root or another directory with no spaces.

Your Spark package doesn't include compiled Spark code. That's why you got the error message from these scripts spark-submit and spark-shell.
You have to download one of pre-built version in "Choose a package type" section from the Spark download page.

Try running mvn -DskipTests clean package first to build Spark.

If your spark binaries are in a folder where the name of the folder has spaces (for example, "Program Files (x86)"), it didn't work. I changed it to "Program_Files", then the spark_shell command works in cmd.

In my case, I install spark by pip3 install pyspark on macOS system, and the error caused by incorrect SPARK_HOME variable. It works when I run command like below:
PYSPARK_PYTHON=python3 SPARK_HOME=/usr/local/lib/python3.7/site-packages/pyspark python3 wordcount.py a.txt

Go to SPARK_HOME. Note that your SPARK_HOME variable should not include /bin at the end. Mention it when you're when you're adding it to path like this: export PATH=$SPARK_HOME/bin:$PATH
Run export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" to allot more memory to maven.
Run ./build/mvn -DskipTests clean package and be patient. It took my system 1 hour and 17 minutes to finish this.
Run ./dev/make-distribution.sh --name custom-spark --pip. This is just for python/pyspark. You can add more flags for Hive, Kubernetes, etc.
Running pyspark or spark-shell will now start pyspark and spark respectively.

If you have downloaded binary and getting this exception
Then please check your Spark_home path may contain spaces like "apache spark"/bin
Just remove spaces will works.

Just to add to #jurban1997 answer.
If you are running windows then make sure that SPARK_HOME and SCALA_HOME environment variables are setup right. SPARK_HOME should be pointing to {SPARK_HOME}\bin\spark-shell.cmd

For Windows machine with the pre-build version as of today (21.01.2022):
In order to verify all the edge cases you may have and avoid tedious guesswork about what exactly is not configred properly:
Find spark-class2.cmd and open it in with a text editor
Inspect the arguments of commands staring with call or if exists by typing the arguments in Command Prompt like this:
Open Command Prompt. (For PowerShell you need to print the var another way)
Copy-paste %SPARK_HOME%\bin\ as is and press enter.
If you see something like bin\bin in the path displayed now then you have appended /bin in your environment variable %SPARK_HOME%.
Now you have to add the path to the spark/bin to your PATH variable or it will not find spark-submit command
Try out and correct every path variable that the script in this file uses and and you should be good to go.
After that enter spark-submit ... you may now encounter the missing hadoop winutils.exe for which problem you can go get the tool and paste it where the spark-submit.cmd is located

Spark Installation:
For Window machine:
Download spark-2.1.1-bin-hadoop2.7.tgz from this site https://spark.apache.org/downloads.html
Unzip and Paste your spark folder in C:\ drive and set environment variable.
If you don’t have Hadoop,
you need to create Hadoop folder and also create Bin folder in it and then copy and paste winutils.exe file in it.
download winutils file from [https://codeload.github.com/gvreddy1210/64bit/zip/master][1]
and paste winutils.exe file in Hadoop\bin folder and set environment variable for c:\hadoop\bin;
create temp\hive folder in C:\ drive and give the full permission to this folder like:
C:\Windows\system32>C:\hadoop\bin\winutils.exe chmod 777 /tmp/hive
open command prompt first run C:\hadoop\bin> winutils.exe and then navigate to C:\spark\bin>
run spark-shell

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

DataStax Bulk Loader for Apache Cassandra isn't installing on Windows - cassandra

Change into the bin/ directory where you unzipped the package. For example: C:> cd C:\DSBulk\dsbulk-1.7.0\bin Then run the dsbulk.cmd from there. NOTE: Make sure you have both the classpath and Java home set in your environment. Cheers!

Related

How to use cassandra-loader on ubuntu

Find the Source/Bin Folder of Cassandra Which Installed by HomeBrew

Spark on windows 10 not working

Installing Apache Spark on linux

Why does spark-submit and spark-shell fail with "Failed to find Spark assembly JAR. You need to build Spark before running this program."?

Categories

Resources