cql-import tool not present in sqoop 1.4.6 - cassandra

I am currently stuck on Data migration, I want to migrate data from Oracle Database to Cassandra.
I have following tools installed on Linux
DSE 4.8
Hadoop 2.7.3
Sqoop 1.4.6
I am not sure why my SQOOP version is not having cql-import or any cassandra related commands.
Following are the available commands I can see in the "SQOOP help" output
Available commands:
codegen
create-hive-table
eval
export
help
import
import-all-tables
import-mainframe
job
list-databases
list-tables
merge
metastore
version
I have searched throughout the net and found following link having latest sqoop version, but cql-import tool is missing in all of them.
https://www-eu.apache.org/dist/sqoop/
http://mirrors.ibiblio.org/apache/sqoop/1.4.6/
It would be very helpful if any one has the link for a sqoop version which supports cassandra data migration commands like "cql-import".
Editted:
One more point to add, I have manually configured Hadoop and Sqoop.
Thanks in advance

Related

Installing Spark 2 on CDH 5.* with RPM?

I have a Cloudera CDH 5.11 cluster installed from RPM packages (we don't want to use Cloudera Manager or parcels). Has anyone found/built Spark 2 RPM packages for CDH? It seems Cloudera only ships Spark 2 as parcels.
You won't. For now, the doc "Spark 2 Known Issues" clearly states:
Package Install is not Supported
The Cloudera Distribution of Apache Spark 2 is only installable as a parcel.
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#ki_package_install
The best way is to use Spark on Yarn instead of using Spark Master/Worker. You are free to use any Spark version you like, independent of what the vendor ships.
What you need to do is to package Spark History Server to be able to look at jobs after they finishes. And, if you want to use Dynamic Allocation, you need Spark Shuffle Service configured in Yarn.
Looks like I can't comment on an issue so excuse this post as an answer.
Is it possible to install the Spark2 parcel on a RPM installed cluster using CM?
From CDH 6.0 Spark 2 is included as RPMs. Problem solved.

Unable to connect to spark-sql cli

I am using CDH 5.5.7 quick start VM which has Spark 1.6.0 running. I am trying to connect to the spark-sql cli but it fails.
According to this link by issuing spark-sql command I should be able to enter the cli but I get the below error.
[cloudera#quickstart ~]$ spark-sql
-bash: spark-sql: command not found
I have also tried the below and getting the same error
[cloudera#quickstart ~]$ ./bin/spark-sql
-bash: ./bin/spark-sql: No such file or directory
Any help is much appreciated.
This probably will not work in Cloudera's distribution of Spark.
I think they stopped shipping spark-sql since CDH 5.4.
spark-sql is not included because CDH Spark doesn't have Thift service or because of some other reason.
I can't find confirmation in online documentation, but my CDH 5.8 doesn't have spark-sql in neither Spark 1.6 nor Spark 2.0 parcels.

How to upgrade Apache Spark version

Currently, I have installed Spark 1.5.0 version on AWS using spark-ec2.sh script.
Now, I want to upgrade my Spark version to 1.5.1. How do i do this? Is there any upgrade procedure or do i have to build it from scratch using the spark-ec2 script? In that case i will lose all my existing configuration.
Please Advise
Thanks
1.5.1 has identical configuration fields with the 1.5.0, I am not aware of any automation tools, but upgrade should be trivial. C/P $SPARK_HOME/conf should suffice. Back up the old files, nevertheless.

How to connect Zeppelin to Spark 1.5 built from the sources?

I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql.
Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom Spark build to the local maven repository and set the custom Spark version in the Zeppelin build command. The build process finished successfully but when I try to run basic things like sc inside notebook, it throws:
akka.ConfigurationException: Akka JAR version [2.3.11] does not match the provided config version [2.3.4]
Version 2.3.4 is set in pom.xml and spark/pom.xml, but simply changing them won’t even let me get a build.
If I rebuild Zeppelin with the standard -Dspark.vesion=1.4.1, everything works.
Update 2016-01
Spark 1.6 support has landed to master and is available under -Pspark-1.6 profile.
Update 2015-09
Spark 1.5 support has landed to master and is available under -Pspark-1.5 profile.
Work on supporting Spark 1.5 in Apache Zeppelin (incubating) was done under this PR apache/incubator-zeppelin#269 which will lend to master soon.
For now, building from Spark_1.5 branch with -Pspark-1.5 should do the trick.

Installing Hive and sqoop on Windows (Cygwin)

Can someone help me providing the steps to install Hive and Sqoop on Cygwin. I already installed Hadoop-0.20.2 and Hbase latest stable-0.94.1 on Cygwin and working good.
Typically a Hadoop distribution includes both. Inspect the directories containing the Hadoop binaries and see if you discover the bin files. For example, sqoop is simply named sqoop and is executable.

Resources