Input/output error while copying from hadoop file system to local - linux

hadoop fs -copyToLocal /paulp /abcd (I want to copy the folder paulp in hadoop file system to abcd folder in local)
But the oputput of that command shows like this( copyToLocal: mkdir `/abcd': Input/output error)
I use ubuntu 14.04 and hadoop 2.7.1 ...
can you provide apt solution to this?

Related

Cloudera Quick Start VM lacks Spark 2.0 or greater

In order to test and learn Spark functions, developers require Spark latest version. As the API's and methods earlier to version 2.0 are obsolete and no longer work in the newer version. This throws a bigger challenge and developers are forced to install Spark manually which wastes a considerable amount of development time.
How do I use a later version of Spark on the Quickstart VM?
Every one should not waste setup time which I have wasted, so here is the solution.
SPARK 2.2 Installation Setup on Cloudera VM
Step 1: Download a quickstart_vm from the link:
Prefer a vmware platform as it is easy to use, anyways all the options are viable.
Size is around 5.4gb of the entire tar file. We need to provide the business email id as it won’t accept personal email ids.
Step 2: The virtual environment requires around 8gb of RAM, please allocate sufficient memory to avoid performance glitches.
Step 3: Please open the terminal and switch to root user as:
su root
password: cloudera
Step 4: Cloudera provides java –version 1.7.0_67 which is old and does not match with our needs. To avoid java related exceptions, please install java with the following commands:
Downloading Java:
wget -c --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
Switch to /usr/java/ directory with “cd /usr/java/” command.
cp the java download tar file to the /usr/java/ directory.
Untar the directory with “tar –zxvf jdk-8u31-linux-x64.tar.gz”
Open the profile file with the command “vi ~/.bash_profile”
export JAVA_HOME to the new java directory.
export JAVA_HOME=/usr/java/jdk1.8.0_131
Save and Exit.
In order to reflect the above change, following command needs to be executed on the shell:
source ~/.bash_profile
The Cloudera VM provides spark 1.6 version by default. However, 1.6 API’s are old and do not match with production environments. In that case, we need to download and manually install Spark 2.2.
Switch to /opt/ directory with the command:
cd /opt/
Download spark with the command:
wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz
Untar the spark tar with the following command:
tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz
We need to define some environment variables as default settings:
Please open a file with the following command:
vi /opt/spark-2.2.0-bin-hadoop2.7/conf/spark-env.sh
Paste the following configurations in the file:
SPARK_MASTER_IP=192.168.50.1
SPARK_EXECUTOR_MEMORY=512m
SPARK_DRIVER_MEMORY=512m
SPARK_WORKER_MEMORY=512m
SPARK_DAEMON_MEMORY=512m
Save and exit
We need to start spark with the following command:
/opt/spark-2.2.0-bin-hadoop2.7/sbin/start-all.sh
Export spark_home :
export SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.7/
Change the permissions of the directory:
chmod 777 -R /tmp/hive
Try “spark-shell”, it should work.

Unable to get disk usage on mounted folder with CYGWIN

I have installed CYGWIN on a 2008R2 server and have some disks which I have mounted to folders as below (example):
l:\mounted\mounted_hd1
l:\mounted\mounted_hd2
l:\mounted\mounted_hd3
I have data and additional folders under the mountpoints (example):
l:\mounted\mounted_hd1\photos
l:\mounted\mounted_hd2\backup_data
l:\mounted\mounted_hd3\data
When I run the following command: C:\cygwin\bin\df -k /cydrive/L/mounted/mounted_hd1
I get the following:
/usr/bin/df: cannot stat '..': No such file or directory
/usr/bin/df: no file systems processed
However, when I run C:\cygwin\bin\df -k /cydrive/L/mounted
I get the size of the disk L: which was created for mounting the disks...
Why am I not able to run df -k on the mounted folders?
Thanks for your help!
As far as I tell, in the x86 version of Cygwin , the df -k command for determining disk usage is not able to properly run against child folders for disks mounted as folders in disk management.
I was able to resolve my issue by updating Cygwin to the x64 version:
Determine the version of Cygwin installed - via cmd - run:
uname -a
The x86 version will contain: i686
The x64 version will contain: x86_64
If you require the above functionality as I did in my specific use-case , then install the latest x64 version needed as required:
https://cygwin.com/

Is it necessary to install hadoop in /usr/local?

I am trying to build a hadoop cluster with four nodes.
The four machines are from my school's lab and I found their /usr/local are mount from a same public disk which means their /usr/local are identical.
The problem is, I can not start data node on slaves because the hadoop files are always the same(like tmp/dfs/data).
I am planning to configure and insatll hadoop in other dirs like /opt .
The problem is I found almost all the installation tutorial ask us to install it in /usr/local , so I was wondering will there be any bad consequence if I install hadoop in other place like /opt ?
Btw, I am using Ubuntu 16.04
As long as HADOOP_HOME points to where you extracted the hadoop binaries, then it shouldn't matter.
You'll also want to update PATH in ~/.bashrc, for example.
export HADOOP_HOME=/path/to/hadoop_x.yy
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
For reference, I have some configuration files inside of /etc/hadoop.
(Note: Apache Ambari makes installation easier)
It is not at all necessary to install hadoop under /usr/local. That location is generally used when you install single node hadoop cluster (although it is not mandatory). As long as you have following variables specified in .bashrc, any location should work.
export HADOOP_HOME=<path-to-hadoop-install-dir>
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

How to copy hadoop examples jar from local to hadoop environment?

I'm still newbie with Hadoop.
I've downloaded a cloudera VM image of hadoop and it did not contain hadoop-examples.jar.
I want to manually copy the hadoop-examples.jar (I got it from somewhere) and that is currently in my local disk, to the hadoop environment, specifically to the usr/jars
So that if I run hadoop jar usr/jars/hadoop-examples.jar wordcount words.txt out it will properly run the jar.
Thanks!
To copy a file from local filesystem to HDFS location use command
hdfs dfs -put /local_disk_path/hadoop-examples.jar /usr/jars/

How to move data to another folder in memsql

Working with memsql cluster as primary storage design, by default data files are installed in a place like the following on CentOS 6.x:
/var/lib/memsql-ops/data/installs/MI9dfcc72a5b044f2694b5f7028803a21e
Is there any way to relocate the data path to another folder on the same machine?
This is not a best way but it works. I just re-install MemSQL to other directory:
sudo mkdir /data/memsql
sudo ./install.sh --root-dir /data/memsql
In this case MemSQL Ops still will be in /var/lib/memsql-ops but all nodes will be installed to /data/memsql directory (look at symlink /var/lib/memsql) and all data will be inside this directory too.
P.S. Additional installation options you can find use memsql-ops agent-install --help command.

Resources