Cannot find '/dbfs/databricks-datasets' in my notebook [duplicate] - databricks

Trying to read delta log file in databricks community edition cluster. (databricks-7.2 version)
df=spark.range(100).toDF("id")
df.show()
df.repartition(1).write.mode("append").format("delta").save("/user/delta_test")
with open('/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
for l in f:
print(l)
Getting file not found error:
FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<command-1759925981994211> in <module>
----> 1 with open('/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
2 for l in f:
3 print(l)
FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
I have tried with adding /dbfs/,dbfs:/ nothing got worked out,Still getting same error.
with open('/dbfs/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
for l in f:
print(l)
But using dbutils.fs.head i was able to read the file.
dbutils.fs.head("/user/delta_test/_delta_log/00000000000000000000.json")
'{"commitInfo":{"timestamp":1598224183331,"userId":"284520831744638","userName":"","operation":"WRITE","operationParameters":{"mode":"Append","partitionBy":"[]"},"notebook":{"","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputBytes":"1171","numOutputRows":"100"}}}\n{"protocol":{"minReaderVersi...etc
How can we read/cat a dbfs file in databricks with python open method?

By default, this data is on the DBFS, and your code need to understand how to access it. Python doesn't know about it - that's why it's failing.
But there is a workaround - DBFS is mounted to the nodes at /dbfs, so you just need to append it to your file name: instead of /user/delta_test/_delta_log/00000000000000000000.json, use /dbfs/user/delta_test/_delta_log/00000000000000000000.json
update: on community edition, in DBR 7+, this mount is disabled. The workaround would be to use dbutils.fs.cp command to copy file from DBFS to local directory, like, /tmp, or /var/tmp, and then read from it:
dbutils.fs.cp("/file_on_dbfs", "file:///tmp/local_file")
please note that if you don't specify URI schema, then the file by default is referring DBFS, and to refer the local file you need to use file:// prefix (see docs).

Related

Issue while trying to read a text file in databricks using Local File API's rather than Spark API

I'm trying to read a small txt file which is added as a table to the default db on Databricks. While trying to read the file via Local File API, I get a FileNotFoundError, but I'm able to read the same file as Spark RDD using SparkContext.
Please find the code below:
with open("/FileStore/tables/boringwords.txt", "r") as f_read:
for line in f_read:
print(line)
This gives me the error:
FileNotFoundError Traceback (most recent call last)
<command-2618449717515592> in <module>
----> 1 with open("dbfs:/FileStore/tables/boringwords.txt", "r") as f_read:
2 for line in f_read:
3 print(line)
FileNotFoundError: [Errno 2] No such file or directory: 'dbfs:/FileStore/tables/boringwords.txt'
Where as, I have no problem reading the file using SparkContext:
boring_words = sc.textFile("/FileStore/tables/boringwords.txt")
set(i.strip() for i in boring_words.collect())
And as expected, I get the result for the above block of code:
Out[4]: {'mad',
'mobile',
'filename',
'circle',
'cookies',
'immigration',
'anticipated',
'editorials',
'review'}
I was also referring to the DBFS documentation here to understand the Local File API's limitations but of no lead on the issue.
Any help would be greatly appreciated. Thanks!
The problem is that you're using the open function that works only with local files, and doesn't know anything about DBFS, or other file systems. To get this working, you need to use DBFS local file API and append the /dbfs prefix to file path: /dbfs/FileStore/....:
with open("/dbfs/FileStore/tables/boringwords.txt", "r") as f_read:
for line in f_read:
print(line)
Alternatively you can simply use the built-in csv method:
df = spark.read.csv("dbfs:/FileStore/tables/boringwords.txt")
Alternatively we can use dbutils
files = dbutils.fs.ls('/FileStore/tables/')
li = []
for fi in files:
print(fi.path)
Example ,

How to import text file in Data bricks

I am trying to write text file with some text and loading same text file in data-bricks but i am getting error
Code
#write a file to DBFS using Python I/O APIs
with open("/dbfs/FileStore/tables/test_dbfs.txt", 'w') as f:
f.write("Apache Spark is awesome!\n")
f.write("End of example!")
# read the file
with open("/dbfs/tmp/test_dbfs.txt", "r") as f_read:
for line in f_read:
print(line)
Error
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/test_dbfs.txt'
The /dbfs mount doesn't work on Community Edition with DBR >= 7.x - it's a known limitation.
To workaround this limitation you need to work with files on the driver node and upload or download files using the dbutils.fs.cp command (docs). So your writing will look as following:
#write a file to local filesystem using Python I/O APIs
with open("'file:/tmp/local-path'", 'w') as f:
f.write("Apache Spark is awesome!\n")
f.write("End of example!")
# upload file to DBFS
dbutils.fs.cp('file:/tmp/local-path', 'dbfs:/FileStore/tables/test_dbfs.txt')
and reading from DBFS will look as following:
# copy file from DBFS to local file_system
dbutils.fs.cp('dbfs:/tmp/test_dbfs.txt', 'file:/tmp/local-path')
# read the file locally
with open("/tmp/local-path", "r") as f_read:
for line in f_read:
print(line)

Python: FileNotFoundError: [Errno 2] No such file or directory - How to add file to the right directory

I'm new to python! Seen many issues related to this problem but can't find the right way of doing it.
I want to import a picture and change it.
My code is:
from PIL import Image, ImageFilter
import os
root_dir= os.path.dirname(os.path.abspath(r'C:\Users\User\eclipse-workspace\Practice Python CS50 2019\images\Mario.png'))
before = Image.open('Mario.png')
after=before.filter(ImageFilter.BLUR)
after.save("MarioBLUR.png")
The error I'm getting is:
Traceback (most recent call last):
File "C:\Users\User\eclipse-workspace\Practice Python CS50 2019\src\Class 6\blur.py", line 5, in
before = Image.open('Mario.png')
File "C:\Users\User\anaconda3\lib\site-packages\PIL\Image.py", line 2809, in open
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'Mario.png'
My windows location for this picture is: C:\Users\User\Downloads\Mario.png
My eclipse location is: C:\Users\User\eclipse-workspace\Practice Python\images\Mario.png
How to add this picture to the right directory to make sure I won't have this issue anymore?
You only need the directory path and not the filename in os.path.dirname, for example:
root_dir= os.path.dirname('C:/Users/User/eclipse-workspace/Practice Python CS50 2019/images/')
before = Image.open(root_dir + 'Mario.png')
should work fine

Pysaprk: IOError: [Errno 2] No such file or directory

My PySpark code runs directly in hadoop cluster. But when i am opening this file it gives me this error :IOError: [Errno 2] No such file or directory:
with open("/tmp/CIP_UTILITIES/newjsonfile.json", "w") as fp:
json.dump("json_output", fp)
for this cases when you are working with files it is better to use pathlib module.
Can you run Debug to see where this path is actually pointing?
greeting

Python: Cannot Create Zip File

I simply copy-and-pasted this code from a Python tutorial website, but the code won't work. What's missing? I am using version 3.4.3. Thank you.
import zipfile
# Create zip file
print("Creating zip archive")
zf = zipfile.ZipFile("python_zip_file.zip", mode = "w")
try:
# Add file to our zip
zf.write("zippy2.py")
finally:
print("closing")
zf.close()
Traceback (most recent call last):
File "/Users/Cindy/Documents/Python/Zip.py", line 9, in <module>
zf.write("zippy2.py")
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 1326, in write
st = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory: 'zippy2.py'
# Add file to our zip
zf.write("zippy2.py")
You should have a file named zippy2.py in the folder.
Since you just copied the code, you might not have the file that was mentioned in the code. create file
zippy2.py in the same folder and check.
Try learning with this..
#!/usr/bin/env python
import zipfile
print("Creating zip archive")
zip = zipfile.ZipFile(‘Archive.zip’, ‘w’) #Archive is the name of the zip file
zip.write(‘file.txt’) #file.txt should be in the current working directory
zip.write(‘file1.txt’) #file1.txt too
zip.close()

Resources