How to get max directory by its name from HDFS? - linux

The below is the directory structure of my HDFS as per hadoop 2.6.0
/user/cloudera/output_files/file_date_2016-12-27/outputfile.txt
/user/cloudera/output_files/file_date_2016-12-28/outputfile.txt
/user/cloudera/output_files/file_date_2016-12-29/outputfile.txt
..
I would like to get the max output directory by its name from a parent HDFS directory
OUTPUT_HDFS_DIR=/user/cloudera/output_files
latest_output_dir= hdfs dfs -ls -d $OUTPUT_HDFS_DIR/* | sort -n | tail -1
echo $latest_output_dir// This line is printing
latest_date_dir=$(basename "$latest_output_dir")
echo $latest_date_dir//This line is not printin. Getting a empty space.
Output of above shell script
[cloudera#client09 scripts]$ bash latest_dir.sh
drwxrwx--- - cloudera cloudera 0 2017-04-19 13:35 /user/cloudera/output_files/file_date_2016-12-29
I am expecting $latest_date_dir to be printed as file_date_2016-12-29,but it is not displaying that.
Could someone help me to fix this issue?

Change following line:
latest_output_dir= hdfs dfs -ls -d $OUTPUT_HDFS_DIR/* | sort -n | tail -1
to:
latest_output_dir=`hdfs dfs -ls -d $OUTPUT_HDFS_DIR/* | sort -n | tail -1`
Explanation: Your command will be executed but the output wont be assigned to the variable. The change which I am suggesting will do the missing part (assign it to the variable).

Related

Copy specific word from a file to another file using shell script

i am new to shell scripting.
my folder structure is like below format, in that every folder one file is there the file name is note.json, so i want to copy from note.json specific word like "user", i tried this for single file, it's working but showing unnecessary data and also i needed in loop format (means going to every folder doing the same) can any body help me out?
my folder structure:
drwxr-xr-x - zeppelin hdfs 0 2020-06-01 16:20 /user/zeppelin/notebook/2FBC2M3K2
drwxr-xr-x - zeppelin hdfs 0 2020-05-20 18:01 /user/zeppelin/notebook/2FBDEKUGP
drwxr-xr-x - zeppelin hdfs 0 2020-05-26 20:32 /user/zeppelin/notebook/2FBDXNZRC
drwxr-xr-x - zeppelin hdfs 0 2020-05-26 21:00 /user/zeppelin/notebook/2FBEAGZEE
drwxr-xr-x - zeppelin hdfs 0 2020-05-25 14:18 /user/zeppelin/notebook/2FBGXSHZR
drwxr-xr-x - zeppelin hdfs 0 2020-05-20 14:31 /user/zeppelin/notebook/2FBHCNKJP
drwxr-xr-x - zeppelin hdfs 0 2020-06-02 17:34 /user/zeppelin/notebook/2FBJCZ212
I tried for single folder using below command,
$ cat note.json | grep "user"
"user": "Ayan.Paul",
"data": "org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [Ayan.Paul] does not have [USE] privilege on [snt_mmedata_upload_prd]\n\tat org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300)\n\tat org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286)\n\tat org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:324)\n\tat org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:265)\n\tat org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)\n\tat org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)\n\tat org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:718)\n\tat org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:801)\n\tat org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)\n\tat org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)\n\tat org.apache.zeppelin.scheduler.Job.run(Job.java:188)\n\tat org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\n\tat java.lang.Thread.run(Thread.java:745)\nCaused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [Ayan.Paul] does not have [USE] privilege on [snt_mmedata_upload_prd]\n\tat org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)\n\tat org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)\n\tat org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)\n\tat org.apache.hive.service.cli.operation.Operation.run(Operation.java:247)\n\tat org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)\n\tat org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)\n\tat org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)\n\tat org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)\n\tat org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)\n\tat org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)\n\tat org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)\n\tat org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)\n\tat org.apache.thrift.server.TServlet.doPost(TServlet.java:83)\n\tat org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:208)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:707)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\t... 3 more\nCaused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAccessControlException:Permission denied: user [Ayan.Paul] does not have [USE] privilege on [snt_mmedata_upload_prd]\n\tat org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:483)\n\tat org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1330)\n\tat org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1094)\n\tat org.apache.hadoop.hive.ql.Driver.compile(Driver.java:705)\n\tat org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1863)\n\tat org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1810)\n\tat org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1805)\n\tat org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)\n\
As said above, if it is json structured the best and clean way is to use jq.
otherwise, if this line always stay the same you can try:
cat note.json | grep "\"user\":" | sed 's/\"//g' | sed 's/,//g' | sed 's/ //g'
where
grep "\"user\":" - will take the the line you wanted
cut -d":" -f2 - will take from the second column by ":" separator
sed 's/\"//g' - remove "
sed 's/,//g' - remove commas
sed 's/ //g' - will remove spaces just in case ( you don't have to use it)
if you need the loop for it, lets say:
folder_Path='/path/to/myfolder'
files_in_folder=$(ls ${folder_Path})
for file in ${files_in_folder}
do
if [[ ${file} == "note.json" ]]
then
cat ${file} | grep "\"user\":" | sed 's/\"//g' | sed 's/,//g' | sed 's/ //g' > ${new_file_path}
fi
If you know that the note.json file always has "user" at the beginning of a line, then you can grep for that. It also sounds like you want the value of the "user" JSON field. Try using jq to parse that. Below is the "cheap and dirty" way of stripping out the extra characters. (We'll stick with a loop because you're probably doing something other things for each file...)
for file in $(find . -name note.json); do
grep "^.user" $file | cut -c 10- | tr -d '",'
done
If you want help with using jq to parse JSON, just ask a different question showing a "note.json" file and your attempt at pasring it!

Linux commands to get Latest file depending on file name

I am new to linux. I have a folder with many files in it and i need to get the latest file depending on the file name. Example: I have 3 files RAT_20190111.txt RAT_20190212.txt RAT_20190321.txt . I need a linux command to move the latest file here RAT20190321.txt to a specific directory.
If file pattern remains the same then you can try below command :
mv $(ls RAT*|sort -r|head -1) /path/to/directory/
As pointed out by #wwn, there is no need to use sort, Since the files are lexicographically sortable ls should do the job already of sorting them so the command will become :
mv $(ls RAT*|tail -1) /path/to/directory
The following command works.
ls | grep -v '/$' |sort | tail -n 1 | xargs -d '\n' -r mv -- /path/to/directory
The command first splits output of ls with newline. Then sorts it, takes the last file and then it moves this to the required directory.
Hope it helps.
Use the below command
cp ls |tail -n 1 /data...

Linux find file over ssh whose filename is partially known

I have a list of files:
XX_1
XX_2
XX_3
whose numbers keep incrementing every single time I run the program.
I want to find the latest file using a linux command. I tried:
find . -maxdepth 1 -name "*XX_*" -print
but this gives me all the files with XX_. I just want XX_3 and need to save the output that I get using this command to a variable so that I can copy the file. How do I do that?
I tried:
var=$(ssh pi#192.168.0.101 ls -1 FlightLog* | sort -t_ -k2 -nr | head -1)
ssh pi#192.168.0.101 sftp "$var"
And I got the following error:
/Users/ykathur2/bin/GetFile.sh: line 3: var: command not found
ssh: Could not resolve hostname flightlog_88.dat: Name or service not known
Couldn't read packet: Connection reset by peer
Please help!
How about this
$ ls -1 XX*
XX_1
XX_2
XX_3
$ ls -1 XX* | sort -t_ -k2 -nr | head -1
XX_3

display the most recent file in a directory using linux shell

I need to display the content of the recent file in a directory, for my example it's a log file that gets generated for each execution hence I need to display the newest one
the command :
ls -Art | tail -n 1
output the right name, meaning the most recent file in a directory, my goal is to output the content of this file by further piping
How could we do this?
cat `ls -Art | tail -n1`
or,
ls -Art | tail -n 1 | xargs cat

How to get the latest filename alone in a directory?

I am using
ls -ltr /homedir/mydirectory/work/ |tail -n 1|cut -d ' ' -f 10
But this is a very crude way of getting the desired result.And also its unreliable.
The output I get on simply executing
ls -ltr /homedir/mydirectory/work/ |tail -n 1
is
-rw-r--r-- 1 user pusers 1764 Apr 1 12:06 firstfile.xml
So here I get the file name.
But if the output on doing the above command is like
-rw-r--r-- 100 user pusers 1764 Apr 1 12:06 firstfile.xml
the first command fails ! And understandably as I am cutting the result from the 10th character which does not hold valid now.
So how to refine it.
Why do you use the -l flag for ls if you don't need it? Make ls simply output the filenames if you don't need more information instead of trying to "parse" its non-unified output (raping poor text processing utilities...).
LAST_MODIFIED_FILE=`ls -tr | tail -n 1`
If you really want to achieve this using your method, then, use awk instead of cut
ls -ltr /var/log/ |tail -n 1| awk '{print $9}'
Extended user user529758 answer which can give result as per file name
use below commnad as per the file name
ls -tr Filename* | tail -n 1

Resources