HDFS + create simbolic link between HDFS folder to local filesystem folder - linux

I searched in google but not find it,
is it possible to create link between HDFS folder to local folder?
example
we want to create link between folder_1 in HDFS to /home/hdfs_mirror local folder
HDFS folder:
su hdfs
$ hdfs dfs -ls /hdfs_home/folder_1
Linux local folder:
ls /home/hdfs_mirror

I do not think it is possible.
This is because we are talking about two different File Systems (HDFS and Local FileSystem).
in case we want to keep syncing the Local Data Directory to HDFS directory then need to make use of some tools like Apache Flume.

Related

How to transfer data from local file system (linux) to a Hadoop Cluster made on Google Cloud Platform

I am a beginner in Hadoop, I made a Hadoop Cluster (one master and two slaves) on Google Cloud Platform.
I accessed the master of the cluster using from the local file system (Linux): ssh -i key key#public_ip_of_master
Then I did sudo su - inside the cluster because Hadoop functions only appears while being root.
Then I initiated the HDFS using start-dfs.sh and start-all.sh
Now the problem is that I want to tranfer files from the local Linux file system to the Hadoop Cluster and vice versa using the following command (inserting the command inside the cluster while being root):
root#master:~# hdfs dfs -put /home/abas1/Desktop/chromFa.tar.gz /Hadoop_File
The problem is that the local path which is: /home/abas1/Desktop/chromFa.tar.gz is never recognized and I can not seem to know what to do.
I am sure I am missing something trivial but I do not know what it is. I have to use either -copyFromLocal or -put.
local path is never recognized
That is not a Hadoop problem, then. You are on the master node (over SSH), as the root user. There is a /root folder with files, and probably no /home/abas1.
In other words, run ls -l /home, and you see what local files are available.
To get files to the master server to upload from that terminal session, you will want to SCP files first to there from a different machine.
Exit the SSH session
scp -i key root#master-ip home/abas1/Desktop/chromFa.tar.gz /tmp
ssh -i key root#master-ip
Then you can do this
hdfs mkdir /Hadoop_File
ls -l /tmp | grep chromFa # for example, to check file
hdfs -put /tmp/chromFa.tar.gz /Hadoop_file/
Hadoop functions only appears while being root.
Please do not use root for interacting with Hadoop services. Create unique user accounts for HDFS, YARN, Zookeeper, etc. with restricted permissions like you would for any other Unix process.
Using DataProc will do this... And you can still SSH to it, so you should really considering using it instead of manual GCE cluster.

Trying to move a csv file from local file system to hadoop file system

I am trying to copy a csv file from my local file system to hadoop. But I am not able to successfully do it. I am not sure which permissions i need to change. As I understand. hdfs super user does not have access to the /home/naya/dataFiles/BlackFriday.csv
hdfs dfs -put /home/naya/dataFiles/BlackFriday.csv /tmp
# Error: put: Permission denied: user=naya, access=WRITE, inode="/tmp":hdfs:supergroup:drwxr-xr-x
sudo -u hdfs hdfs dfs -put /home/naya/dataFiles/BlackFriday.csv /tmp
# Error: put: `/home/naya/dataFiles/BlackFriday.csv': No such file or directory
Any help is highly appreciated. I want to do it via the command line utility. I can do it via cloudera manager from the hadoop side. But I want to understand whats happening behind the commands

how to create ini file as HDFS on linux

I am new to linux. Cloudera documentation has mentioned creating sentry-provider.ini file on cloudera CHD 5.4 as HDFS file. I am not finding good article on how to create ini file on linux.
I am trying to configure Apache Sentry on cloudera setup to have role based security on hive metadata
how to create ini file as HDFS on linux?
Simple way is: You can create this "sentry-provider.ini" file on your local (on linux terminal)
vi sentry-provider.ini
Then put the content specified at this link in the file by pressing i and then pasting the content.
After this put the file on HDFS file system using command :
hdfs dfs -copyFromLocal sentry-provider.ini etc/sentry/
Remember that the path etc/sentry/ is the path on HDFS from your user's home directory which is typically /user/username/

Can't copy local file in linux to hadoop

I just installed Hadoop on a VM linux system. Now I am following my guide book to copy a file from locally to hadoop (file is saved on VM desktop). here is what I did:
hdfs dfs -copyFromLocal filename.csv /user/root
However, I received message saying
"copyFromLocal: 'filename.csv': no such file or directory"
Can anyone tell me what went wrong and what should I do to make it right?
Thanks!
you need to be in your Desktop folder ( containing your file to find the file)
cd /root/Desktop
there are two methods for placing file from local host to hadoop's hdfs:
1) copyFromLocal - as you have used
2) hadoop - hadoop dfs -put yourfilepath(local) hdfspath

Command to store File on HDFS

Introduction
A Hadoop NameNode and three DataNodes have been installed and are running. The next step is to provide a File to HDFS. The following commands have been executed:
hadoop fs -copyFromLocal ubuntu-14.04-desktop-amd64.iso
copyFromLocal: `.': No such file or directory
and
hadoop fs -put ubuntu-14.04-desktop-amd64.iso
put: `.': No such file or directory
without succes.
Question
Which command needs to be issued in order to store a file on HDFS?
If no path is provided, hadoop will try to copy the file in your hdfs home directory. In other words, if you're logged as utrecht, it will try to copy ubuntu-14.04-desktop-amd64.iso to /user/utrecht.
However, this folder doesn't exist from scratch (you can normally check the dfs via a web browser).
To make your command work, you have two choices :
copy it elsewhere (/ works, but putting everything there may lead to complications in the future)
create the directory you want with hdfs dfs -mkdir /yourFolderPath

Resources