how to write text to file in HDFS without appending [duplicate]

how to write text to file in HDFS without appending [duplicate] - linux

I am using
hdfs dfs -put myfile mypath
and for some files I get
put: 'myfile': File Exists
does that mean there is a file with the same name or does that mean the same exact file (size, content) is already there?
how can I specify an -overwrite option here?
Thanks!

put: 'myfile': File Exists
Means,the file named "myfile" already exists in hdfs. You cannot have multiple files of the same name in hdfs
You can overwrite it using hadoop fs -put -f /path_to_local /path_to_hdfs

You can overwrite your file in hdfs using -f command.For example
hadoop fs -put -f <localfile> <hdfsDir>
OR
hadoop fs -copyFromLocal -f <localfile> <hdfsDir>
It worked fine for me. However -f command won't work in case of get or copyToLocal command. check this question

A file with the same name exists at the location you're trying to write to.
You can overwrite by specifying the -f flag.

Just updates to this answer, in Hadoop 3.X the command a bit different
hdfs dfs -put -f /local/to/path hdfs://localhost:9870/users/XXX/folder/folder2

Related

How to upload files to hdfs web page from terminal?

I just started hadoop and doing hdfs configuration. I have done all the steps but this last part of uploading the file is not working.
I used this to make my directory, it works
hadoop fs -mkdir /user/syed
But when i do :
hadoop fs -copyFromLocal /home/syed/hadoopMR/input input
OR
hadoop fs -copyFromLocal /home/syed/hadoopMR/input input
OR
hadoop fs -copyFromLocal /home/syed/hadoopMR/input/file.txt user/syed/input
OR
hadoop fs -copyFromLocal /home/syed/hadoopMR/input/file.txt input
OR
hdfs dfs -put /home/syed/hadoopMR/input/file.txt input
It only creates an empty 'input' directory and when i specify 'file.txt' it gives nothing.
This is being shown on the terminal:
copyFromLocal: File /user/syed/input/input/fileA.txt.COPYING could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
Can you please guide me, i have a home task and i have already put so many hours into finding the solution.

The data nodes are missing. When i format the name-node it gets assigned a new ID but the datanode does not get formatted and still has the old id.
stop-dfs.sh // if hdfs running
hdfs namenode -format
rm -rvf /home/syed/hdfs/data/*
// the path of data node is in your home/hadoop/etc/hadoop/hdfs-site.xml
start-dfs.sh
AND it should work !

You are missing / before user. That is why you're unable to see the file. Try the below commands.
To make directory use this command.
hadoop fs -mkdir /user/syed
To make the input directory in hadoop path.
hadoop fs -mkdir /user/syed/input
You can list the local path to check if the file is present.
ls /home/syed/hadoopMR/input/
Then use the following command to put the file to hadoop path.
hdfs dfs -put /home/syed/hadoopMR/input/file.txt /user/syed/input
Hope this answers your question.

How to read csv files stored on adls path without downloading it locally

Command to find file is as below :
hdfs dfs -ls {adls file location path}
command to read listed file

you can read a file from hdfs like below. here is a good tutorial.
hdfs dfs -cat <path>

Is it possible to untar a tar.gz file on HDFS and put it in different HDFS folder without bringing it to local systems

I have employee_mumbai.tar.gz file inside this I have name.json and salary.json.
And the tar.gz is present in HDFS location. Is it possible to untar/Unzip the gzip file and put the json files in HFDS folder without bringing it to a local file system.
N.B:
Please remember it is not a text file and both json file unique information.
Please let me know if it can be achieved to read the both file separately in different data frame directly too in spark.

This worked for me:
hdfs dfs -cat /data/<data.gz> | gzip -d | hdfs dfs -put - /data/

Error while decompress a file from Local Linux to HDFS

This command works fine in Local linux
gzip -d omega_data_path_2016-08-10.csv.gz
I would like to decompress a file with extension .csv.gz to HDFS location.
I tried the below command and i get this error
[cloudera#client08 localinputfiles]$ gzip -d omega_data_path_2016-08-10.csv.gz | hadoop dfs -put /user/cloudera/inputfiles/
gzip: omega_data_path_2016-08-10.csv already exists; do you wish to overwrite (y or n)? DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
put: `/user/cloudera/inputfiles/': No such file or directory
Could someone help me to fix this?

To make gzip write the output on standard output, use -c flag.
So the command would be,
gzip -dc omega_data_path_2016-08-10.csv.gz | hdfs dfs -put - /user/cloudera/omega_data_path_2016-08-10.csv

How to find Hadoop hdfs directory on my system?

How to find Hadoop HDFS directory on my system?
I need this to run following command -
hadoop dfs -copyFromLocal <local-dir> <hdfs-dir>
In this command I don't knon my hdfs-dir.
Not sure if its helpful or not but I ran following command and got this output -
hdfs dfs -ls
-rw-r--r-- 3 popeye hdfs 127162942 2016-04-01 19:47 .
In hdfs-site.xml, I found following entry -
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop/hdfs/data</value>
<final>true</final>
</property>
I tried to run following command but it gives error -
[root#sandbox try]# hdfs dfs -copyFromLocal 1987.csv /hadoop/hdfs/data
copyFromLocal: `/hadoop/hdfs/data': No such file or directory
FYI - I am doing all this on hortonworks sandbox on azure server.

Your approach is wrong or may be understanding is wrong
dfs.datanode.data.dir, is where you want to store your data blocks
If you type hdfs dfs -ls / you will get list of directories in hdfs. Then you can transfer files from local to hdfs using -copyFromLocal or -put to a particular directory or using -mkdir you can create new directory
Refer below link for more information
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html

If you run:
hdfs dfs -copyFromLocal foo.txt bar.txt
then the local file foo.txt will be copied into your own hdfs directory /user/popeye/bar.txt (where popeye is your username.) As a result, the following achieves the same:
hdfs dfs -copyFromLocal foo.txt /user/popeye/bar.txt
Before copying any file into hdfs, just be certain to create the parent directory first. You don't have to put files in this "home" directory, but (1) better to not clutter "/" with all sorts of files, and (2) following this convention will help prevent conflicts with other users.

As per the first answer, I am elaborating it in detailed for Hadoop 1.x -
Suppose, you are running this script on pseudo distribution model, you will probably get one or two list of users(NameNodes) illustrated -
on our fully distribution model, first you have the administrator rights to perform these things and there will be N number of list of NameNodes(users).
So now we move to our point -
First reach to your Hadoop home directory and from there run this script -
bin/hadoop fs -ls /
Result will like this -
drwxr-xr-x - xuiob78126arif supergroup 0 2017-11-30 11:20 /user
so here xuiob78126arif is my name node(master/user) and the NameNode(user) directory is -
/user/xuiob78126arif/
now you can go to your browser and search the address -
http://xuiob78126arif:50070
and from there you can get the Cluster Summary, NameNode Storage, etc.
Note : the script will provide results only in one condition, if at least any file or directory exist in DataNode otherwise you will get -
ls: Cannot access .: No such file or directory.
so, in that case you first put any file by bin/hadoop fs -put <source file full path>
and there after run the bin/hadoop fs -ls / script.
and now I hope, you have get a bit on your issue, thanks.

To locate HDFS directory and make sure you are drill down to directory where hadoop is installed. If bin/hadoop fs -ls / shows no o/p means "/" is hdfs directory. use mkdir to create a new dir for map-reduce job [ eg: hdfs fs -mkdir /user/<local_username> ]. After this put or copyfromLocal commands will work.
I learned this the hardway. I hope someone who is stucked like me finds this helpful

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how to write text to file in HDFS without appending [duplicate] - linux

I am using hdfs dfs -put myfile mypath and for some files I get put: 'myfile': File Exists does that mean there is a file with the same name or does that mean the same exact file (size, content) is already there? how can I specify an -overwrite option here? Thanks!

put: 'myfile': File Exists Means,the file named "myfile" already exists in hdfs. You cannot have multiple files of the same name in hdfs You can overwrite it using hadoop fs -put -f /path_to_local /path_to_hdfs

You can overwrite your file in hdfs using -f command.For example hadoop fs -put -f <localfile> <hdfsDir> OR hadoop fs -copyFromLocal -f <localfile> <hdfsDir> It worked fine for me. However -f command won't work in case of get or copyToLocal command. check this question

A file with the same name exists at the location you're trying to write to. You can overwrite by specifying the -f flag.

Just updates to this answer, in Hadoop 3.X the command a bit different hdfs dfs -put -f /local/to/path hdfs://localhost:9870/users/XXX/folder/folder2

Related

How to upload files to hdfs web page from terminal?

How to read csv files stored on adls path without downloading it locally

Is it possible to untar a tar.gz file on HDFS and put it in different HDFS folder without bringing it to local systems

Error while decompress a file from Local Linux to HDFS

How to find Hadoop hdfs directory on my system?

Categories

Resources