How do I determine the physical file server for a DFS path? - microsoft-distributed-file-system

Given a DFS path, how can I programatically determine which physical file server it maps to?

Use NetDfsGetInfo()

Powershell:
Get-DfsnFolderTarget -Path "\dfsnamespace\dfsshare"

Related

Entering a proper path to files on DBFS

I uploaded files to DBFS:
/FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv
I tried to access them by pandas and I always receive information that such files don't exist.
I tried to use the following paths:
/dbfs/FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv
dbfs/FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv
dbfs:/FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv
./FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv
What is funny, when I check them by dbutils.fs.ls I see all the files.
I found this solution, and I tried it already: Databricks dbfs file read issue
Moved them to a new folder:
dbfs:/new_folder/
I tried to access them from this folder, but still, it didn't work for me. The only difference is that I copied files to a different place.
I checked as well the documentation: https://docs.databricks.com/data/databricks-file-system.html
I use Databricks Community Edition.
I don't understand what I'm doing wrong and why it's happening like that.
I don't have any other ideas.
The /dbfs/ mount point isn't available on the Community Edition (that's a known limitation), so you need to do what is recommended in the linked answer:
dbutils.fs.cp(
'dbfs:/FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv',
'file:/tmp/file_name.csv')
and then use /tmp/file_name.csv as input parameter to Pandas' functions. If you'll need to write something to DBFS, then you do other way around - write to local file /tmp/..., and copy that file to DBFS.

How to find the location where Spark stores all the config values

I am new to Spark & learning its internal's, one of the things of Spark that is eluding to me is how the spark session gets all the config properties. I want to collect all the Spark config including the default ones too. I can easily find the ones explicitly set in the spark-session and also by looking into spark-defaults.conf file by running a small code like below
configurations = spark.sparkContext.getConf().getAll()
for item in configurations: print(item)
My question is where does Spark pick the rest of the default parameters, is there a default location that Spark distribution maintains internally that is not exposed to the user.
eg : spark.conf.get("spark.sql.adaptive.enabled") give me "true" as an output. But if I do a find in the root directory, I can't seem the find the value present in any of the physical files on the server
find / -name "." 2>/dev/null -exec grep -li "spark.sql.adaptive.enabled" '{}' ;
Return nothing :(
Any assistance would be much appreciated.
Regards,
Pankaj
See the code, default values which are not in the conf file is embedded in spark code.
This is the SparkSql default conf value code

How can I change path to new partion on second SSD?

I created partition on my second SSD. To acces it I need to write very long path.
/media/ivan/8845bdd5-64ba-45be-90f6-83fd38ca946b
How can I change this path to something shorter?
You can create a symbolic link where you want using linux command ln.
For more details read this guide

Azure Databricks dbfs with python

In azure databricks i have different results for the directory list of dbfs by simply adding two dots.
Can anybody explain to me why this happens?
With dbutils, you can only use "dbfs:/" paths.
If you do not specify "dbfs:/" at the start of your path, it will simply auto-add it.
dbutils.fs.ls('pathA')
--> dbfs:/pathA
is exactly the same as
dbutils.fs.ls('dbfs:/pathA')
but if you do not use the ':', then it will add it silently.
dbutils.fs.ls('dbfs/pathB')
--> dbfs:/dbfs/pathB
It means your dbfs/ is considered as a folder name dbfs at the root of your dbfs:/
To avoid confusion, always specify dbfs:/ to your path.

Spark create a temp directory structure on each node

I am working on a spark java wrapper which uses third party libraries, which will read files from a hard coded directory name say "resdata" from where job executes. I know this is twisted but will try to explain.
when I execute the job it is trying to find the required files in the path something like this below,
/data/Hadoop/yarn/local//appcache/application_xxxxx_xxx/container_00_xxxxx_xxx/resdata
I am assuming it is looking for the files in the current data directory , under that looking for directory name "resdata". At this point I don't know how to configure the current directory to any path on hdfs or local.
So looking for options to create directory structure similar to what the third party libraries expecting and copying required files over there. This I need to do on each node. I am working on spark 2.2.0
Please help me in achieving this?
just now got the answer I need to put all the files under resdata directory and zip it say restdata.zip, pass the file using the options "--archives" . Then each node will have directory restdata.zip/restdata/file1 etc

Resources