writing appending text file from databricks to azure adls gen1 - python-3.x

I want to write kind of a log file back to azure adls gen1
I can write (not append) using
dbutils.fs.put(filename,"random text")
but i cant append it using
with open("/dbfs/mnt/filename.txt","a"):
f.write("random text")
it give me error
1 with open("/dbfs/mnt/filename.txt", "a") as f:
----> 2 f.write("append values")
OSError: [Errno 95] Operation not supported
alternatively, i tried using logger.basicconfig(logging.basicConfig(filename='dbfs:/mnt/filename.txt', filemode='w')
but looks like its not writing into the path.
can anyone help please

Append Only (‘a’) : Open the file for writing. The file is created if it does not exist. The handle is positioned at the end of the file. The data being written will be inserted at the end, after the existing data.
file = open("myfile.txt","a")#append mode
file.write("Today \n")
Output of append file:

You can do that to a DBFS file.
https://kb.databricks.com/en_US/dbfs/errno95-operation-not-supported
You may need to figure out a logic to read a file from datalake using python CLI and write to it.

Related

How to open index.html file in databricks or browser?

I am trying to open index.html file through databricks. Can someone please let me know how to deal with it? I am trying to use GX with databricks and currently, data bricks store this file here: dbfs:/great_expectations/uncommitted/data_docs/local_site/index.html I want to send index.html file to stakeholder
I suspect that you need to copy the whole folder as there should be images, etc. Simplest way to do that is to use Dataricks CLI fs cp command to access DBFS and copy files to the local storage. Like this:
databricks fs cp -r 'dbfs:/.....' local_name
To open file directly in the notebook you can use something like this (note that dbfs:/ should be replaced with /dbfs/):
with open("/dbfs/...", "r") as f:
data = "".join([l for l in f])
displayHTML(data)
but this will break links to images. Alternatively you can follow this approach to display Data docs inside the notebook.

How to give dynamic expression path for file location (Wildcard file paths) in Azure data factory?

I am getting every data single excel file in my data lake. My container name is 'odoo' in the data lake. Excel files get stored in the folder called 'odoo' and below is the name of the file
report_2022-01-20.xlsx
I am using dataflow and I wanted to take everyday file using a wildcard path. Below is the dynamic expression I am trying to give but no success
/odoo/#concat('report_', string(formatDateTime(utcNow(), 'yyyy-MM-dd')), '.xlsx')
Can anyone advise me how to write the correct expression? I am a newbie to the adf.
Your expression looks fine. In your dataset, browse the container/folder and add the expression in the file path to get the file dynamically.
Source file:
#concat('report_', string(formatDateTime(utcNow(), 'yyyy-MM-dd')), '.xlsx')
Dataflow:
Can you try: #concat('/odoo/report_', formatDateTime(utcNow(), 'yyyy-MM-dd'), '.xlsx')
The string() might be causing the issue

file transfer from DBFS to Azure Blob Storage

I need to transfer the files in the below dbfs file system path:
%fs ls /FileStore/tables/26AS_report/customer_monthly_running_report/parts/
To the below Azure Blob
dbutils.fs.ls("wasbs://"+blob.storage_account_container+"#"
+ blob.storage_account_name+".blob.core.windows.net/")
WHAT SERIES OF STEPS SHOULD I FOLLOW? Pls suggest
The simplest way would be to load the data into a dataframe and then to write that dataframe into the target.
df = spark.read.format(format).load("dbfs://FileStore/tables/26AS_report/customer_monthly_running_report/parts/*")
df.write.format(format).save("wasbs://"+blob.storage_account_container+"#" + blob.storage_account_name+".blob.core.windows.net/")
You will have to replace "format" with the source file format and the format you want in the target folder.
Keep in mind that if you do not want to do any transformations to the data but to just move it, it will most likely be more efficient not to use pyspark but to just use the az-copy command line tool. You can also run that in Databricks with the %sh magic command if needed.

Python on Chrome OS through Linux: Cannot Export DataFrame [duplicate]

I am trying to write a DataFrame to a .csv file:
now = datetime.datetime.now()
date = now.strftime("%Y-%m-%d")
enrichedDataDir = "/export/market_data/temp"
enrichedDataFile = enrichedDataDir + "/marketData_optam_" + date + ".csv"
dbutils.fs.ls(enrichedDataDir)
df.to_csv(enrichedDataFile, sep='; ')
This throws me the following error
IOError: [Errno 2] No such file or directory:
'/export/market_data/temp/marketData_optam_2018-10-12.csv'
But when i do
dbutils.fs.ls(enrichedDataDir)
Out[72]: []
There is no error! When i go on the directory levels (one level higher):
enrichedDataDir = "/export/market_data"
dbutils.fs.ls(enrichedDataDir)
Out[74]:
[FileInfo(path=u'dbfs:/export/market_data/temp/', name=u'temp/', size=0L)
FileInfo(path=u'dbfs:/export/market_data/update/', name=u'update/', size=0L)]
This works, too. This mean for me that i have really all the folders which i want to access. But i dont know thy the .to_csv option throws the error. I also have checked the permissions, which are fine!
The main problem was, that i am using Micrsoft Azure Datalake Store for storing those .csv files. And for whatever reason, it is not possible through df.to_csv to write to Azure Datalake Store.
Due to the fact that i was trying to use df.to_csv i was using a Pandas DataFrame instead of a Spark DataFrame.
I changed to
from pyspark.sql import *
df = spark.createDataFrame(result,['CustomerId', 'SalesAmount'])
and then write to csv via the following lines
from pyspark.sql import *
df.coalesce(2).write.format("csv").option("header", True).mode("overwrite").save(enrichedDataFile)
And it works.
Here is a more general answer.
If you want to load file from DBFS to Pandas dataframe, you can do this trick.
Move the file from dbfs to file
%fs cp dbfs:/FileStore/tables/data.csv file:/FileStore/tables/data.csv
Read data from file dir
data = pd.read_csv('file:/FileStore/tables/data.csv')
Thanks
have you tried opening the file first ? (replace last row of your first example with below code)
from os import makedirs
makedirs(enrichedDataDir)
with open(enrichedDataFile, 'w') as output_file:
df.to_csv(output_file, sep='; ')
check the permissions on the sas token you used for the container when you mounted this path.. if it starts with "sp=racwdlmeopi" then you have a sas token with immutable storage.. your token should start with "sp=racwdlmeop"

Failed to save a file in azure data lake from azure data bricks

I'm trying to save the string content into azure data lake as XML content.
a string variable contains below mentioned xml content.
<project>
<dateformat>dd-MM-yy</dateformat>
<timeformat>HH:mm</timeformat>
<useCDATA>true</useCDATA>
</project>
i have used the below code to process the file into data lake.
xmlfilewrite = "/mnt/adls/ProjectDataDecoded.xml"
with open(xmlfilewrite , "w") as f:
f.write(project_processed_var)
it throws the following error:
No such file or directory: '/mnt/adls/ProjectDataDecoded.xml"
I'm able to access the data lake by using the above mounting point but unable do with the above function "open".
can anyone help me?
Issue is solved.
In databricks when you have a mount point existing on Azure Data Lake,we need to add "/dbfs" to the path and pass it to OPEN function.
The issue is solved by using below code
xmlfilewrite = "/dbfs/mnt/adls/ProjectDataDecoded.xml"
with open(xmlfilewrite , "w") as f:
f.write(project_processed_var)
You could try using the Spark-XML library. Convert your string to a dataframe where each row denotes one project. Then you can write it to ADLS in this way.
df.select("dateformat", "timeformat","useCDATA").write \
.format('xml') \
.options(rowTag='project', rootTag='project') \
.save('/mnt/adls/ProjectDataDecoded.xml')
Here is how you can include an external library -https://docs.databricks.com/libraries.html#create-a-library

Resources