Is "root" the part of "directory"? - node.js

Analyzing the path, the Node.js considering the root as the part of the directory:
/home/user/dir/file.txt
┌─────────────────────┬────────────┐
│ dir │ base │
├──────┬ ├──────┬─────┤
│ root │ │ name │ ext │
" / home/user/dir / file .txt "
└──────┴──────────────┴──────┴─────┘
C:\\path\\dir\\file.txt
┌─────────────────────┬────────────┐
│ dir │ base │
├──────┬ ├──────┬─────┤
│ root │ │ name │ ext │
" C:\ path\dir \ file .txt "
└──────┴──────────────┴──────┴─────┘
Is it actually so? Developing a new library, I am thinking must I consider the root as the part of directory, or no.
Well, actually the "directory" is a fuzzy term. According the definition,
In computing, a directory is a file system cataloging structure which
contains references to other computer files, and possibly other
directories.
Wikipedia
Nothing that answers on my question. What else we know?
When we are using the cd (the abbreviation of "change directory") command, we are specifying the path relative to current location. Nothing related with root.
The cd command works inside the specific drive (at least, on Windows). This indirectly could means that the root and directory could be combined but initially separated.
More exact terms are the "absolute path of the directory" and "relative path of the directory". But what is the "directory" itself?
For the Windows case, the data storage name could be different on separate computers but it does not affect to files structure inside the storage. Again, the root and directory are separate in this case.

Related

What does the "/" mean in VSCode file formatting?

I have started using VSCode and was wondering what the / meant when I click on a file (see attached screenshot). Is it simply the full path of the file that I've clicked on? Thanks!
It means sub folder inside main folder you have created
I think you may be referring to the feature where a folder with a single subfolder are shown on the same line.
For example, let's consider the following structure:
X/
├─ a/
│ ├─ a1/
│ │ └- a1.txt
│ └- a2/
│ └- a2.txt
└ b/
└- b1/
└- b1.txt
Since b1 is the only folder under b the explorer window will show b and b1 on a single line:
eth-sig-util is child folder of metamask folder which is parent.
To access child we have to use '/' after parent folder.

Spark glob filter to match a specific nested partition

I'm using Pyspark, but I guess this is valid to scala as well
My data is stored on s3 in the following structure
 main_folder
└──  year=2022
└──  month=03
├──  day=01
│ ├──  valid=false
│ │ └──  example1.parquet
│ └──  valid=true
│ └──  example2.parquet
└──  day=02
├──  valid=false
│ └──  example3.parquet
└──  valid=true
└──  example4.parquet
(For simplicity there is only one file in any folder, and only two days, in reality, there can be thousands of files and many days/months/years)
The files that are under the valid=true and valid=false partitions have a completely different schema, and I only want to read the files in the valid=true partition
I tried using the glob filter, but it fails with AnalysisException: Unable to infer schema for Parquet. It must be specified manually. which is a symptom of having no data (so no files matched)
spark.read.parquet('s3://main_folder', pathGlobFilter='*valid=true*)
I noticed that something like this works
spark.read.parquet('s3://main_folder', pathGlobFilter='*example4*)
however, as soon as I try to use a slash or do something above the bottom level it fails.
spark.read.parquet('s3://main_folder', pathGlobFilter='*/example4*)
spark.read.parquet('s3://main_folder', pathGlobFilter='*valid=true*example4*)
I did try to replace the * with ** in all locations, but it didn't work
pathGlobFilter seems to work only for the ending filename, but for subdirectories you can try below, however it may ignore partition discovery. To consider partition discovery add basePath property in load option
spark.read.format("parquet")\
.option("basePath","s3://main_folder")\
.load("s3://main_folder/*/*/*/valid=true/*")
However I am not sure if you can combine both wildcarding and pathGlobFilter if you want to match based on both subdirectories and end filenames.
Reference:
https://simplernerd.com/java-spark-read-multiple-files-with-glob/
https://spark.apache.org/docs/latest/sql-data-sources-parquet.html

Rust Workspaces do not respect individual project targets defined in .cargo/config.toml

Consider the following project directory:
root/
├──project_one/
│ ├──.cargo/
│ │ └──config.toml
│ ├──Cargo.toml
│ └──target.json
└──Cargo.toml
Cargo.toml is the workspace manifest with members = [ "project_one" ]
project_one/Cargo.toml is the project manifest
project_one/avr-atmega328p.json defines target properties used by rustc(?)
root/project_one/.cargo/config.toml listing target = "target.json" under [build]
Problem
project_one does not compile using the configured target unless I delete root/Cargo.toml.
The build fails with an error message indicating that information for a correct target platform is missing (error: language item required, but not found: 'eh_personality').
Potential Solution
At the time of writing there has been a very recent PR merged into the rust-lang master branch: https://github.com/rust-lang/cargo/pull/9030
Are there any other solutions to avoid having to wait for a fix?

Terraform modules: correct references of variables?

I'm writing a terraform script to create an EKS cluster with its worker nodes on AWS. First time doing it so I'm a bit confused.
Here is the folder organisation:
├─── Int AWS Account
│ ├─── variables.tf
│ ├─── eks-cluster.tf (refers the modules)
│ ├─── others
│
├─── Prod AWS Account
│ ├─── (will be the same than Int with different settings in variables)
│
├─── ReadMe.md
│
├─── data sources
│
├─── Modules
│ ├─── cluster.tf
│ ├─── worker-nodes.tf
│ ├─── worker-nodes-sg.tf
I am a bit confused regarding how to use and pass variables. Right now, what I'm doing is that I refer to ${var.name} in the module folder, in the eks-cluster.tf, I either put a direct value name = blabla (mostly avoiding it), or refer to the variable again and have a variable file in the account folder.
Is that correct?
I'm not sure if I get your question correctly but in general you would want to keep your module files with variables only, as modules are intended to be generic so you can easily include them in different environments.
When including the module in eks_cluster_int.tf or eks_cluster_prod.tf you would then pass the values for all variables defined in the module itself. This way you can use the environment specific values in the same module.
module "cluster" {
source = "..."
var1 = value1 # directly passing value
var2 = ${var.int_specific_var} # can be defined in variables.tf of environment
...
}
Does this answer your question?

Read all files in a nested folder in Spark

If we have a folder folder having all .txt files, we can read them all using sc.textFile("folder/*.txt"). But what if I have a folder folder containing even more folders named datewise, like, 03, 04, ..., which further contain some .log files. How do I read these in Spark?
In my case, the structure is even more nested & complex, so a general answer is preferred.
If directory structure is regular, lets say something like this:
folder
├── a
│   ├── a
│   │   └── aa.txt
│   └── b
│   └── ab.txt
└── b
├── a
│   └── ba.txt
└── b
└── bb.txt
you can use * wildcard for each level of nesting as shown below:
>>> sc.wholeTextFiles("/folder/*/*/*.txt").map(lambda x: x[0]).collect()
[u'file:/folder/a/a/aa.txt',
u'file:/folder/a/b/ab.txt',
u'file:/folder/b/a/ba.txt',
u'file:/folder/b/b/bb.txt']
Spark 3.0 provides an option recursiveFileLookup to load files from recursive subfolders.
val df= sparkSession.read
.option("recursiveFileLookup","true")
.option("header","true")
.csv("src/main/resources/nested")
This recursively loads the files from src/main/resources/nested and it's subfolders.
if you want use only files which start with name "a" ,you can use
sc.wholeTextFiles("/folder/a*/*/*.txt") or sc.wholeTextFiles("/folder/a*/a*/*.txt")
as well. We can use * as wildcard.
sc.wholeTextFiles("/directory/201910*/part-*.lzo") get all match files name, not files content.
if you want to load the contents of all matched files in a directory, you should use
sc.textFile("/directory/201910*/part-*.lzo")
and setting reading directory recursive!
sc._jsc.hadoopConfiguration().set("mapreduce.input.fileinputformat.input.dir.recursive", "true")
TIPS: scala differ with python, below set use to scala!
sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive", "true")

Resources