Can anyone explain to me why this holds true:
Normalize a string path, taking care of '..' and '.' parts.
When multiple slashes are found, they're replaced by a single one;
when the path contains a trailing slash, it is preserved. On Windows
backslashes are used.
Example:
path.normalize('/foo/bar//baz/asdf/quux/..')
// returns '/foo/bar/baz/asdf'
When I would expect it to return
'/foo/bar/baz/asdf/quux'
This is from the Node Documentation
http://nodejs.org/api/path.html#path_path_normalize_p
Edit
After running some test I know "why" this is happening, but do not understand the logic behind it.
Below are three examples with their input and output.
/foo/bar//baz/asdf/quux/.. /foo/bar//baz/asdf
/foo/bar//baz/asdf/quux/. /foo/bar//baz/asdf/quux
/foo/bar//baz/asdf/quux/ /foo/bar//baz/asdf/quux/
So for the original I can see that the double period ".." removed the final folder and the single period "." removes the trailing slash. I understand that when including files in parental folders you would prefix a path with ../ I am assuming that you can actually place this anywhere within a path, although there seems little point to me currently to be able to place it say mid path.
A double colon (..) means the parent directory as is standard in Linux. So, /foo/bar//baz/asdf/quux/.. basically selects the parent directory of /foo/bar//baz/asdf/quux
Related
It's a weird problem
to_be_stripped="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120"
And two strings below:
s1="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120\\[Content_Types].xml"
s2="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120\\_rels\.rels"
When I use the command below:
s1.strip(to_be_stripped)
s2.strip(to_be_stripped)
I get these outputs:
'[Content_Types].x'
'_rels\\.'
If I use lstrip(), they will be:
'[Content_Types].xml'
'_rels\\.rels'
Which is the right outputs.
However, if we replace all Project Known with zeus_pipeline:
to_be_stripped="D:\\Users\\UserKnown\\PycharmProjects\\zeus_pipeline\\PT\\collections\\120"
And:
s2="D:\\Users\\UserKnown\\PycharmProjects\\zeus_pipeline\\PT\\collections\\120\\_rels\.rels"
s2.lstrip(to_be_stripped)will be '.rels'
If I use / instead of \\, nothing goes wrong. I am wondering why this problem happens.
strip isn't meant to remove full strings exactly. Rather, you give it a string, and every character in that string is removed from the start and of the string to be stripped.
In your case, the variable to_be_stripped contains the characters m and l, so those are stripped from the end of s1. However, it doesn't contain the character x, so the stripping stops there and no characters beyond that are removed.
Check out this question. The accepted answer is probably more extensive than you need - I like another user's suggestion of using replace instead of strip. This would look like:
s1.replace(to_be_stripped, "")
The code shown below are examples used to explain path.resolve() on https://nodejs.org/api/path.html
path.resolve('/foo/bar', './baz');
// Returns: '/foo/bar/baz'
path.resolve('/foo/bar', '/tmp/file/');
// Returns: '/tmp/file'
path.resolve('wwwroot', 'static_files/png/', '../gif/image.gif');
// If the current working directory is /home/myself/node,
// this returns '/home/myself/node/wwwroot/static_files/gif/image.gif'
I noticed that all dots are just omited.
./baz is converted to baz in the first example.
../gif/image.gif is converted to /gif/image.gif in the 3rd example.
Then, why bother writing these dots?
What would happen if these dots didn't exist in the two examples?
Thx!
The path.resolve() method is used to resolve a sequence of path-segments to an absolute path.
It works by processing the sequence of paths from right to left, prepending each of the paths until the absolute path is created. The resulting path is normalized and trailing slashes are removed as required.
If no path segments are given as parameters, then the absolute path of the current working directory is used.
The passed argument is a series of file paths that would be resolved together to form an absolute path.
According to the documentation:
The path.join() method joins all given path segments together using
the platform-specific separator as a delimiter, then normalizes the
resulting path.
Zero-length path segments are ignored. If the joined path string is a
zero-length string then '.' will be returned, representing the current
working directory.
path.join('/foo', 'bar', 'baz/asdf', 'quux', '..');
// Returns: '/foo/bar/baz/asdf'
path.join('foo', {}, 'bar');
// Throws 'TypeError: Path must be a string. Received {}'
A TypeError is thrown if any of the path segments is not a string.
Am I missing something? Why is:
path.join('/foo', 'bar', 'baz/asdf', 'quux', '..');
// Returns: '/foo/bar/baz/asdf'
Ignoring 'quux' and '..' ?
They're are not zero length?
Even played around in the REPL (see screenshot)
Path.join isn't ignoring the last two parameters. Path.join takes the parameters you input and outputs a normalized path in string format.
So what's actually going on here is that it's constructing your string to form a path left to right, /foo/bar/baz/asdf/quux/, and the last parameter (..) is instructing path.join to 'go back a directory'. So your final result will be: /foo/bar/baz/asdf/
Part 1: what do you expect to happen when you provide an object instead of a string? To cut it short: It doesn’t make sense and hence doesn’t work.
Part 2: since .. means „up one directory“, this clears the last part of the path, hence it seems to not have any effect. Actually, it doesn’t get ignored - it’s just that the last two parameters clear each other.
With regard to path.join('foo', {}, 'bar');, {} represents an empty object, not a string (empty or not). Therefore, it is an invalid parameter for path.join().
With regard to path.join('/foo', 'bar', 'baz/asdf', 'quux', '..');, .. refers to a parent directory.
Through experimentation in a terminal, you will find that...
/foo/bar/baz/asdf/quux/.. is equivalent to /foo/bar/baz/asdf
Azure docs:
Avoid blob names that end with a dot (.), a forward slash (/), or a
sequence or combination of the two.
I cannot avoid such names due to legacy s3 compatibility and so I must encode them.
How should I encode such names?
I don't want to use base64 since that will make it very hard to debug when looking in azure's blob console.
Go has https://golang.org/pkg/net/url/#QueryEscape but it has this limitation:
From Go's implementation of url.QueryEscape (specifically, the
shouldEscape private function), escapes all characters except the
following: alphabetic, decimal digits, '-', '_', '.', '~'.
I don't think there's any universal solution to handle this outside your application scope. Within your application scope, you can do ANY encoding so it falls to personal preference how you like your data to be laid out. There is not "right" way to do this.
Regardless, I believe you should go for these properties:
Conversion MUST be bidirectional and without conflicts in your expected file name space
DO keep file names without ending dots unencoded
with dot-ending files, DO encode just the conflicting dots, keeping the original name readable.
This would keep most (the non-conflicting) files short and with the original intuitive or hopefully meaningful names and should you ever be able to rename or phase out the conflicting files just remove the conversion logic without restructuring all stored data and their urls.
I'll suggest 2 examples for this. Lets suggest you have files:
/someParent/normal.txt
/someParent/extensionless
/someParent/single.
/someParent/double..
Use special subcontainers
You could remove N dots from end of filename and translate them to subcontainer name "dot", "dotdot" etc.
The result urls would like:
/someParent/normal.txt
/someParent/extensionless
/someParent/dot/single
/someParent/dotdot/double
When reading you can remove the "dot"*N folder level and append N dots back to file name.
Obviously this assumes you don't ever need to have such "dot" folders as data themselves.
This is preferred if stored files can come in with any extension but you can make some assumptions on folder structure.
Use discardable artificial extension
Since the conflict is at the end you could just append a never-used dummy extension to given files. For example "endswithdots", but you could choose something more suitable depending on what the expected extensions are:
/someParent/normal.txt
/someParent/extensionless
/someParent/single.endswithdots
/someParent/double..endswithdots
On reading if the file extension is "endswithdots" you remove the "endswithdots" part from end of filename.
This is preferred if your data could have any container structure but you can make some assumptions on incoming extensions.
I would suggest against Base64 or other full-name encoding as it would make file names notably longer and lose any meaningful details the file names may contain.
What is the difference (if any) between path.normalize(your_path) and path.resolve(your_path)?
I know path.resolve(...) can accept multiple arguments, but is the behavior with a single argument the same as calling path.normalize()?
EDIT: If they are supposed to behave the same way, I don't understand the purpose of exposing the path.normalize(...) function when you can simply pass the path into path.resolve(...) Or, maybe, it's for documentation purposes. For example, they say in the documentation for path.resolve(...):
... The resulting path is normalized, and ...
Exposing the path.normalize(...) makes it easier to explain what "normalized" means? I don't know.
path.normalize gets rid of the extra ., .., etc. in the path. path.resolve resolves a path into an absolute path. Example (my current working directory was /Users/mtilley/src/testing):
> path.normalize('../../src/../src/node')
'../../src/node'
> path.resolve('../../src/../src/node')
'/Users/mtilley/src/node'
In other words, path.normalize is "What is the shortest path I can take that will take me to the same place as the input", while path.resolve is "What is my destination if I take this path."
Note however that path.normalize() is much more context-independent than path.resolve(). Had path.normalize() been context-dependent (i.e. if it had taken into consideration the current working directory), the result in the example above would've been ../node, because that's the shortest path one could take from /Users/mtilley/src/testing to /Users/mtilley/src/node.
Ironically, this means that path.resolve() produces a relative path in absolute terms (you could execute it anywhere, and it would produce the same result), whereas path.normalize() produces an absolute path in relative terms (you must execute it in the path relative to which you want to calculate the absolute result).
From the docs:
Another way to think of resolve is as a sequence of cd commands in a shell.
Links to path.resolve and path.normalize in the documentation. I mostly don't want to just provide links in an answer but the Node.js docs are very decent.