My PySpark code runs directly in hadoop cluster. But when i am opening this file it gives me this error :IOError: [Errno 2] No such file or directory:
with open("/tmp/CIP_UTILITIES/newjsonfile.json", "w") as fp:
json.dump("json_output", fp)
for this cases when you are working with files it is better to use pathlib module.
Can you run Debug to see where this path is actually pointing?
greeting
Related
I need to automate some boring stuff , one of such is unzipping all zip file in the current directory
this is my code:
import os
import zipfile
directory = 'D:\\Python ds and alg by mostafa'
for file in os.listdir(directory):
if file.endswith('.zip'):
zipfile.ZipFile(file).extractall(directory)
however when i run this code I have this error:
Traceback (most recent call last):
File "D:/Python Automation Files/extract_zip_files.py", line 7, in <module>
zipfile.ZipFile(file).extractall(directory)
File "C:\Python310\lib\zipfile.py", line 1247, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '08_Logical_and_physical_Data_Structures.zip'
The problem seems to be that you try to extract the file '08_Logical_and_physical_Data_Structures.zip' which is not located in the same folder as your script (because its in your directory you defined). So you find it while you search for it (because here you search in the correct directory) but in the line where you try to extract it you dont tell python to extract the file which is located in the directory. So it should work if you change your code to:
import os
import zipfile
directory = 'D:\\Python ds and alg by mostafa'
for file in os.listdir(directory):
if file.endswith('.zip'):
zipfile.ZipFile(directory + file).extractall(directory)
or to be safe you could use os.path.join(directory, file)
Edit: because I just saw it. You try to extract the file:
D:/Python Automation Files/08_Logical_and_physical_Data_Structures.zip
but your code should extract:
D:\\Python ds and alg by mostafa\\08_Logical_and_physical_Data_Structures.zip
Trying to read delta log file in databricks community edition cluster. (databricks-7.2 version)
df=spark.range(100).toDF("id")
df.show()
df.repartition(1).write.mode("append").format("delta").save("/user/delta_test")
with open('/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
for l in f:
print(l)
Getting file not found error:
FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<command-1759925981994211> in <module>
----> 1 with open('/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
2 for l in f:
3 print(l)
FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
I have tried with adding /dbfs/,dbfs:/ nothing got worked out,Still getting same error.
with open('/dbfs/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
for l in f:
print(l)
But using dbutils.fs.head i was able to read the file.
dbutils.fs.head("/user/delta_test/_delta_log/00000000000000000000.json")
'{"commitInfo":{"timestamp":1598224183331,"userId":"284520831744638","userName":"","operation":"WRITE","operationParameters":{"mode":"Append","partitionBy":"[]"},"notebook":{"","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputBytes":"1171","numOutputRows":"100"}}}\n{"protocol":{"minReaderVersi...etc
How can we read/cat a dbfs file in databricks with python open method?
By default, this data is on the DBFS, and your code need to understand how to access it. Python doesn't know about it - that's why it's failing.
But there is a workaround - DBFS is mounted to the nodes at /dbfs, so you just need to append it to your file name: instead of /user/delta_test/_delta_log/00000000000000000000.json, use /dbfs/user/delta_test/_delta_log/00000000000000000000.json
update: on community edition, in DBR 7+, this mount is disabled. The workaround would be to use dbutils.fs.cp command to copy file from DBFS to local directory, like, /tmp, or /var/tmp, and then read from it:
dbutils.fs.cp("/file_on_dbfs", "file:///tmp/local_file")
please note that if you don't specify URI schema, then the file by default is referring DBFS, and to refer the local file you need to use file:// prefix (see docs).
I am trying to write text file with some text and loading same text file in data-bricks but i am getting error
Code
#write a file to DBFS using Python I/O APIs
with open("/dbfs/FileStore/tables/test_dbfs.txt", 'w') as f:
f.write("Apache Spark is awesome!\n")
f.write("End of example!")
# read the file
with open("/dbfs/tmp/test_dbfs.txt", "r") as f_read:
for line in f_read:
print(line)
Error
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/test_dbfs.txt'
The /dbfs mount doesn't work on Community Edition with DBR >= 7.x - it's a known limitation.
To workaround this limitation you need to work with files on the driver node and upload or download files using the dbutils.fs.cp command (docs). So your writing will look as following:
#write a file to local filesystem using Python I/O APIs
with open("'file:/tmp/local-path'", 'w') as f:
f.write("Apache Spark is awesome!\n")
f.write("End of example!")
# upload file to DBFS
dbutils.fs.cp('file:/tmp/local-path', 'dbfs:/FileStore/tables/test_dbfs.txt')
and reading from DBFS will look as following:
# copy file from DBFS to local file_system
dbutils.fs.cp('dbfs:/tmp/test_dbfs.txt', 'file:/tmp/local-path')
# read the file locally
with open("/tmp/local-path", "r") as f_read:
for line in f_read:
print(line)
I'm new to python! Seen many issues related to this problem but can't find the right way of doing it.
I want to import a picture and change it.
My code is:
from PIL import Image, ImageFilter
import os
root_dir= os.path.dirname(os.path.abspath(r'C:\Users\User\eclipse-workspace\Practice Python CS50 2019\images\Mario.png'))
before = Image.open('Mario.png')
after=before.filter(ImageFilter.BLUR)
after.save("MarioBLUR.png")
The error I'm getting is:
Traceback (most recent call last):
File "C:\Users\User\eclipse-workspace\Practice Python CS50 2019\src\Class 6\blur.py", line 5, in
before = Image.open('Mario.png')
File "C:\Users\User\anaconda3\lib\site-packages\PIL\Image.py", line 2809, in open
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'Mario.png'
My windows location for this picture is: C:\Users\User\Downloads\Mario.png
My eclipse location is: C:\Users\User\eclipse-workspace\Practice Python\images\Mario.png
How to add this picture to the right directory to make sure I won't have this issue anymore?
You only need the directory path and not the filename in os.path.dirname, for example:
root_dir= os.path.dirname('C:/Users/User/eclipse-workspace/Practice Python CS50 2019/images/')
before = Image.open(root_dir + 'Mario.png')
should work fine
I need to move files from my PC to a network location, however if I execute the script I get an error. If have tested this to execute on my PC to a different local folder and it works perfectly.
Here is my code which I got, and modified slightly, from https://thispointer.com/python-how-to-move-files-and-directories/ (giving credit to the author):
import shutil, os, glob, time
def moveAllFilesinDir(srcDir, dstDir):
# Check if both the are directories
if os.path.isdir(srcDir) and os.path.isdir(dstDir) :
# Iterate over all the files in source directory
for filePath in glob.glob(srcDir + '\*'):
# Move each file to destination Directory
if(os.path.getctime(filePath) != os.path.getmtime(filePath)):
shutil.move(filePath, dstDir);
else:
print("srcDir & dstDir should be Directories")
sourceDir = r"C:\Folder A"
destDir = r"\\Server\Folder B"
moveAllFilesinDir(sourceDir,destDir)
Any help will be highly appreciated.
Update
I forgot to mention that I am making use of Remote Desktop to access the server.
Errors I receive:
FileNotFoundError: [WinError 67] The network name cannot be found.
FileNotFoundError: [Errno 2] No such file or directory