Can't copy local file in linux to hadoop - linux

I just installed Hadoop on a VM linux system. Now I am following my guide book to copy a file from locally to hadoop (file is saved on VM desktop). here is what I did:
hdfs dfs -copyFromLocal filename.csv /user/root
However, I received message saying
"copyFromLocal: 'filename.csv': no such file or directory"
Can anyone tell me what went wrong and what should I do to make it right?
Thanks!

you need to be in your Desktop folder ( containing your file to find the file)
cd /root/Desktop

there are two methods for placing file from local host to hadoop's hdfs:
1) copyFromLocal - as you have used
2) hadoop - hadoop dfs -put yourfilepath(local) hdfspath

Related

Trying to move a csv file from local file system to hadoop file system

I am trying to copy a csv file from my local file system to hadoop. But I am not able to successfully do it. I am not sure which permissions i need to change. As I understand. hdfs super user does not have access to the /home/naya/dataFiles/BlackFriday.csv
hdfs dfs -put /home/naya/dataFiles/BlackFriday.csv /tmp
# Error: put: Permission denied: user=naya, access=WRITE, inode="/tmp":hdfs:supergroup:drwxr-xr-x
sudo -u hdfs hdfs dfs -put /home/naya/dataFiles/BlackFriday.csv /tmp
# Error: put: `/home/naya/dataFiles/BlackFriday.csv': No such file or directory
Any help is highly appreciated. I want to do it via the command line utility. I can do it via cloudera manager from the hadoop side. But I want to understand whats happening behind the commands

HDFS + create simbolic link between HDFS folder to local filesystem folder

I searched in google but not find it,
is it possible to create link between HDFS folder to local folder?
example
we want to create link between folder_1 in HDFS to /home/hdfs_mirror local folder
HDFS folder:
su hdfs
$ hdfs dfs -ls /hdfs_home/folder_1
Linux local folder:
ls /home/hdfs_mirror
I do not think it is possible.
This is because we are talking about two different File Systems (HDFS and Local FileSystem).
in case we want to keep syncing the Local Data Directory to HDFS directory then need to make use of some tools like Apache Flume.

how to create ini file as HDFS on linux

I am new to linux. Cloudera documentation has mentioned creating sentry-provider.ini file on cloudera CHD 5.4 as HDFS file. I am not finding good article on how to create ini file on linux.
I am trying to configure Apache Sentry on cloudera setup to have role based security on hive metadata
how to create ini file as HDFS on linux?
Simple way is: You can create this "sentry-provider.ini" file on your local (on linux terminal)
vi sentry-provider.ini
Then put the content specified at this link in the file by pressing i and then pasting the content.
After this put the file on HDFS file system using command :
hdfs dfs -copyFromLocal sentry-provider.ini etc/sentry/
Remember that the path etc/sentry/ is the path on HDFS from your user's home directory which is typically /user/username/

Command to store File on HDFS

Introduction
A Hadoop NameNode and three DataNodes have been installed and are running. The next step is to provide a File to HDFS. The following commands have been executed:
hadoop fs -copyFromLocal ubuntu-14.04-desktop-amd64.iso
copyFromLocal: `.': No such file or directory
and
hadoop fs -put ubuntu-14.04-desktop-amd64.iso
put: `.': No such file or directory
without succes.
Question
Which command needs to be issued in order to store a file on HDFS?
If no path is provided, hadoop will try to copy the file in your hdfs home directory. In other words, if you're logged as utrecht, it will try to copy ubuntu-14.04-desktop-amd64.iso to /user/utrecht.
However, this folder doesn't exist from scratch (you can normally check the dfs via a web browser).
To make your command work, you have two choices :
copy it elsewhere (/ works, but putting everything there may lead to complications in the future)
create the directory you want with hdfs dfs -mkdir /yourFolderPath

PDI hadoop file browser no list

I've hadoop single instance cluster configured to run with some IP address ( instead of localhost ) on centos linux. I was able to execute example mapreduce job correctly. That tells me that the hadoop setup appears to be fine.
I have also addded couple of data files to hadoop databse under "/data" folder and are visible through the "dfs" comand
bin/hadoop dfs -ls /data
I am trying to connect to this HDFS system from PDI/Kettle. In the HDFS File browser, if I put the HDFS connection parameters incorrectly, e.g. incorrect port, it says it can not connect to the HDFS server. Instead, If I put in all parameters correctly ( server,port,user,password ), and click 'connect' it does not give the error, meaning it is able to connect. But in the file list, it shows "/" .
Doesnt show data folder. What could be going wrong ?
I've already tried this :
tried chmod 777 to the datafiles using "bin/hadoop dfs -chmod -R 777 /data"
tried using root and also hdfs linux user in the PDI file browser
tried adding the data files in some other location
re-formatting hdfs several times and adding data files again
copying the hadoop-core jar file from hadoop installable to PDI extlib
but it does not list files in the PDI browser. I can not see anything in the PDI log either... Need quick help ... thanks !!!
-abhay
I got past this issue. On windows, PDI was not logging anything in the log file. I tried same thing on linux, when it showed me in the log that it was missing a library from Apache, the commons-configuration. I downloaded latest version of the same and put it under the extlib/pentaho folder and boom ! it worked !!

Resources