Creating symbolic link in a Databricks FileStore's folder - databricks

I would like to create a symbolic link like in linux env with the command : ln -s.
Unfortunately I can't find anything similar to do in a Databricks FileStore.
And it seems that ln operation is not a member of dbutils.
Is there a way to do this maybe differently?
Thanks a lot for your help.

FileStore is located on DBFS (Databricks File System) that is baked either by S3, or ADLS that don't have a notion of symlink. You have a choice - either rename file, or copy it, or modify your code to generate correct file name from an alias.

Related

Databricks cli - dbfs commands to copy files

I'm working on the Deployment of the Purview ADB Lineage Solution Accelerator. In step 3 of Install OpenLineage on Your Databricks Cluster section, the author is asking to run the following in thepowershell to Upload the init script and jar to dbfs using the Databricks CLI.
dbfs mkdirs dbfs:/databricks/openlineage
dbfs cp --overwrite ./openlineage-spark-*.jar dbfs:/databricks/openlineage/
dbfs cp --overwrite ./open-lineage-init-script.sh dbfs:/databricks/openlineage/open-lineage-init-script.sh
Question: Do I correctly understand the above code as follows? If that is not the case, before running the code, I would like to know what exactly the code is doing.
The first line creates a folder openlineage in the root directory of dbfs
It's assumed that you are running the powershell command from the location where .jar and open-lineage-init-script.sh are located
The second and third lines of the code are copying the jar and .sh files from your local directory to the dbfs:/databricks/openlineage/ in dbfs of Databricks
dbfs mkdirs is an equivalent of UNIX mkdir -p, ie. under DBFS root it will create a folder named databricks, and inside it another folder named openlineage - and will not complain if these directories already exist.
and 3. Yes. Files/directories not prefixed with dbfs:/ mean your local filesystem. Note that you can copy from DBFS to local or vice versa, or between two DBFS locations. Just not between local filesystem only.

Pyspark list files by filtetypes in a directory

I want to list files by filetypes in a directory. The directory has .csv,.pdf etc files types and I want to list all the .csv files.
I am using the following command
dbutils.fs.ls("/mnt/test-output/*.csv")
I am expecting to get the list of all csv files in that directory.
I am getting the following error in databricks
java.io.FileNotFoundException: No such file or directory: /test-output/*.csv
Try using a shell cell with %sh. You can access DBFS and the mnt directory from there, too.
%sh
ls /dbfs/mnt/*.csv
Should get you a result like
/dbfs/mnt/temp.csv
%fs is a shortcut to dbutils and its access to the file system. dbutils doesn't support all unix shell functions and syntax, so that's probably the issue you ran into. Notice also how when running the %sh cell we access DBFS with /dbfs/.
I think you're mixing DBFS with local file system. Where is /mnt/test-output/*.csv?
If you're trying to read from DBFS then it will work.
Can you try running dbutils.fs.ls("/") to ensure that /mnt exist in DBFS.

libffi-d78936b1.so.6.0.4: cannot open shared object file Error on AWS Lambda function

I am trying to deploy a python Lambda package with watson_developer_cloud sdk. Cryptography is one of many dependencies this package have. I have build this package on Linux machine. My package includes .libffi-d78936b1.so.6.0.4 hidden file too. But it is still not accessible to my lambda function. I am still getting 'libffi-d78936b1.so.6.0.4: cannot open shared object file' Error.
I have built my packages on Vagrant server, using instructions from here: https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example-deployment-pkg.html#with-s3-example-deployment-pkg-python
Exact error:
Unable to import module 'test_translation': libffi-d78936b1.so.6.0.4: cannot open shared object file: No such file or directory
On a note, as explained in this solution, I have already created my package using zip -r9 $DIR/lambda_function.zip . instead of *. But it is still not working for me.
Any direction is highly appreciable.
The libffi-d78936b1.so.6.0.4 is in a hidden folder named .libs_cffi_backend.
So to add this hidden folder in your lambda zip, you should do something like:
zip -r ../lambda_function.zip * .[^.]*
That will create a zip file in the directory above with the name lambda_function.zip, containing all files in the current directory (first *) and every thing starting with .* but not ..* ([^.])
In a situation like this, I would invest some time setting up a local SAM environment so you can:
1 - Debug your Lambda
2 - Check what is being packaged and the files hierarchy
https://docs.aws.amazon.com/lambda/latest/dg/test-sam-cli.html
Alternatively you can remove this import and instrument your lambda function to print some of the files and directories it "sees".
I strongly recommend you giving SAM a try though, since it will make not only this debugging way easier but any further test you need to perform down the road. Lambdas are tricky to debug.
A little late, and I would comment on Frank's answer but not enough reputation.
I was including the the hidden directory .libs_cffi_backend in my deployment package, but for some reason Lambda could not find the libffi-d78936b1.so.6.0.4 file located within.
After copying this file into the same 'root' level directory as my lambda handler it was able to load the dependency and execute.
Also, make sure all the files in the deployment package are readable chmod -R 644 .

Using linux, how can I create a device file outside of the /dev directory that will give me permission to open and read it?

I am using linux kernel 3.8, ubuntu 13.04, I am curious if there is a way to do this. Also, I would like to know if there is some generic reason for why this doesn't work for me. Thanks.
Device files are created with mknod. Permissions and owner can be changed with chmod and chown respectively. If the device file already exists, you may want to create a symbolic link to it instead using ln -s.

Is there a system variable that stores the location of a user's Dropbox?

At least in the linux version of Dropbox, the user can choose which folder becomes their Dropbox. Is there a simple way to get this programmatically?
Maybe this will give you a clue. http://wiki.dropbox.com/TipsAndTricks/TextBasedLinuxInstall
The Dropbox configuration is stored in a sqlite database in your ~/.dropbox folder.
I found a small script which edits the Dropbox folder location:
http://dl.dropboxusercontent.com/u/119154/permalink/dropboxdir.py
http://cmdlinetips.com/2012/05/how-to-change-dropboxs-default-directory-location/
The most forward way to force Dropbox to use another folder location is a symbolic link.
Stop the dropbox daemon and simply replace the default folder with a symlink:
> mv ~/Dropbox /other/dropbox/folder
> ln -s /other/dropbox/folder ~/Dropbox

Resources