Databricks notebook: use magic commands for several lines - databricks

I may be missing the obvious, but:
I am using the Databricks community edition notebook.
I am trying to use several %fs lines within the same cell
Is this possible... ?
I tried this, as cell content:
%fs rm /FileStore/tables/file.txt
%fs ls /FileStore/tables/
and also this:
%%fs
rm /FileStore/tables/file.txt
ls /FileStore/tables/
...and just in case...
%fs
rm /FileStore/tables/file.txt
ls /FileStore/tables/
Having the rm and ls commands in different cells works, but is there a way to have them both in the same cell...?

You can't do that for %fs - it treats the rest as arguments for the first command. The same for the other magic commands.
If you want to execute multiple commands in one cell, then you need to use dbutils.fs... commands in Python or Scala (doc):
dbutils.fs.rm("/FileStore/tables/file.txt")
dbutils.fs.ls("/FileStore/tables/")

Related

Databricks - How to remove files , directories based on regular expression

I had a lot of files in databricks and wanted to clean them. Some of the files having a prefix such as "tweets1*.
How could I delete the files using a prefix something like linux pattern. I applied the following command, and it didnt work.
dbutils.fs.rm("/tweets1*",recurse=True)
You can go with the classic bash.
Inside your cell type:
%sh
rm -rf path/to/your/folder/tweets1*
When I have to perform some complex operation which I already know how to do with bash I use it directly inside the cell.

How to create a shell script

I am trying to create a shell script to remove certain files from a directory. How would I be able to achieve this?
Can I write the standard commands in a script as follows:
#!/bin/sh
rm -f /directory/of/file/file1.txt
rm -f /directory/of/file/file2.txt
rm -f /directory/of/file/file3.txt
rm -f /directory/of/file/file4.txt
Or is there a specific way to delete files in a shell script.
This is my first question here, so please bear with me as I do not know all the rules.
Thanks in advance :)
Edit:
Thanks for all the answers in a short matter of time, I really appreciate it.
Forgot to mention this will executed by root cron (crontab -e) every Tuesday and Friday # 5PM.
Do I still need to chmod +x the file if root is executing the file?
Your question can split into a few points:
You can use those commands to delete the specific files (if you have the permissions)
Make sure you add running permissions to the shell script file (that is used to perform the rm commands) by using: chmod +x file_name.sh
In order to delete the folder contents and not the folder itself the command should be: rm -r /path/to/dir/*
Yes you can. However if you don't have the permission to delete the files then you may get error on the statement. Try to handle that error and you are good to go

Dealing with spaces in directory names in Bash

Disclaimer: I am very new to Bash scripting (and Linux in general), so forgive me for a stupid question.
A friend of mine gave me a script which makes a backup copy of certain files onto Dropbox. Here's the code in full:
#!/bin/sh
DATE=`date +%Y-%m-%d`
tarname='backup-'$DATE'.tar.gz'
cd ~/
directoriesToBack='.bashrc Desktop/School/ Desktop/Research\ Project'
tar -X ~/Desktop/My\ Programs/scripts/crons/exclude.txt -zcvf $tarname $directoriesToBack
mv $tarname ~/Dropbox
The variable directoriesToBack obviously contains the directories to be copied. Exclude.txt is a text file of files which are not to be backed up.
If I try to run this script, I get an error because of Desktop/Research Project: my computer looks for the directory Desktop/Research instead. I've tried to use double quotes instead of single quotes, and to replace \ with an ordinary space, but these tries didn't work. Does anyone know how I can make a backup of a directory with spaces in its name?
Don't try to do this with strings. It will not work and it will cause pain. See I'm trying to put a command in a variable, but the complex cases always fail! for various details and discussion.
Use an array instead.
#!/bin/bash
DATE=$(date +%Y-%m-%d)
tarname=backup-$DATE.tar.gz
cd ~/
directoriesToBack=(.bashrc Desktop/School "Desktop/Research Project")
tar -X ~/Desktop/My\ Programs/scripts/crons/exclude.txt -zcvf "$tarname" "${directoriesToBack[#]}"
I also fixed the quoting of variables/etc. and used $() instead of backticks for the date command execution (as $() can be nested and generally has better semantics and behaviour).
Please run the script and show the EXACT error message. I suspect that what is going wrong is not what you think it is. I suspect that the envar directoriesToBack is not what you think it is.
cd Desktop/"Research Project" (With Quotation marks)
You'll find that a lot of code in many languages use Quotes to signify a space.

Assign directory to variable in a source file

I am building a source file with some alias to executable files (these are working just fine) and assigning directories to variables in order to get to the directory quicker, with less typing. For example, if I source example.source:
#!/usr/bin/bash
mydir="/path/to/some/dir"
I can get to /path/to/some/dir with
cd $mydir
However, I am not being able to use tab complete to navigate through other sub-directories like I would do by typing the complete path. I mean, if I use the tab key to complete the variable I get cd $mydir but not cd $mydir/ (I have to delete the last space character and manually type the slash / to see the next sub-directories). Hope this is an understandable question. Is there any workaround for this?
EDIT: the linux distribution I'm using is Slackware Linux 3.2.31.c x86_64 GenuineIntel GNU/Linux
EDIT2: GNU bash, version 4.2.37(2)-release
Apparently this feature is starting to be implemented in bash 4.3, release 26-Feb-2014 09:25.
Reading the NEWS file in bash 4.3 I found this:
i. The word completion code checks whether or not a filename
containing a
shell variable expands to a directory name and appends `/' to the word
as appropriate. The same code expands shell variables in command names
when performing command completion.
Unfortunately I cannot do a de novo installation of bash (because I'm working on a server) but I hope this can help others.
If I understand your question, then I believe it can be solved by putting this at the top of your example.source. This will list your contents every-time that you cd.
#!/usr/bin/bash
# Make cd change directories and then list the contents
function cd() {
builtin cd $*;
ls;
}
mydir="/path/to/some/dir"
cd $mydir
My other suggestion is to try to put cd within your alias. Something like this:
mydir="cd /path/to/some/dir"
$mydir

How to directly overwrite with 'unexpand' (spaces-to-tabs conversion)?

I'm trying to use something along the lines of
unexpand -t 4 *.php
but am unsure how to write this command to do what I want.
Weirdly,
unexpand -t 4 file.php > file.php
gives me an empty file. (i.e. overwriting file.php with nothing)
I can specify multiple files okay, but don't know how to then overwrite each file.
I could use my IDE, but there are ~67000 instances of to be replaced over 200 files, and this will take a while.
I expect that the answers to my question(s) will be standard unix fare, but I'm still learning...
You can very seldom use output redirection to replace the input. Replacing works with commands that support it internally (since they then do the basic steps themselves). From the shell level, it's far better to work in two steps, like so:
Do the operation on foo, creating foo.tmp
Move (rename) foo.tmp to foo, overwriting the original
This will be fast. It will require a bit more disk space, but if you do both steps before continuing to the next file, you will only need as much extra space as the largest single file, this should not be a problem.
Sketch script:
for a in *.php
do
unexpand -t 4 $a >$a-notab
mv $a-notab $a
done
You could do better (error-checking, and so on), but that is the basic outline.
Here's the command I used:
for p in $(find . -iname "*.js")
do
unexpand -t 4 $(dirname $p)/"$(basename $p)" > $(dirname $p)/"$(basename $p)-tab"
mv $(dirname $p)/"$(basename $p)-tab" $(dirname $p)/"$(basename $p)"
done
This version changes all files within the directory hierarchy rooted at the current working directory.
In my case, I only wanted to make this change to .js files; you can omit the iname clause from find if you wish, or use different args to cast your net differently.
My version wraps filenames in quotes, but it doesn't use quotes around 'interesting' directory names that appear in the paths of matching files.
To get it all on one line, add a semi after lines 1, 3, & 4.
This is potentially dangerous, so make a backup or use git before running the command. If you're using git, you can verify that only whitespace was changed with git diff -w.

Resources