Checking that a provided string is a single path component - rust

I have a function that accepts a string which will be used to create a file with that name (e.g. f("foo") will create a /some/fixed/path/foo.txt file). I'd like to prevent users from mistakenly passing strings with / separators that would introduce additional sub-directories. Since PathBuf::push() accepts strings with multiple components (and, confusingly, so does PathBuf::set_file_name()) it doesn't seem possible to prevent pushing multiple components onto a PathBuf without a separate check first.
Naively, I could do a .contains() check:
assert!(!name.contains("/"), "name should be a single path element");
But obviously that's not cross-platform. There is path::is_separator() so I could do:
name.chars().any(std::path::is_separator)
Alternatively I looked at Path for any sort of is_single_component() check or similar, I could check the file_name() equals the whole path:
let name = Path::new(name);
assert_eq!(Some(name.as_os_str()), name.file_name(),
"name should be a single path element");
or that iterating over the path yields one element:
assert_eq!(Path::new(name).iter().count(), 1,
"name should be a single path element");
I'm leaning towards this last approach, but I'm just curious if there's a more idiomatic way to ensure pushing a string onto a PathBuf will just add one path component.

If you are fine with limiting yourself to path names that are valid UTF-8, I suggest this succinct implementation:
fn has_single_component(path: &str) -> bool {
!path.contains(std::path::is_separator)
}
In contrast to your Path-based approaches, it will stop at the first separator found, and it's easy to read.
Note that testing whether a path only consists of a single component is a rather uncommon thing to do, so there isn't a standard way of doing it.

Related

Can you set multiple (different) tags with the same value?

For some of my projects, I have had to use the viper package to use configuration.
The package requires you to add the mapstructure:"fieldname" to identify and set your configuration object's fields correctly, but I have also had to add other tags for other purposes, leading to something looking like the following :
type MyStruct struct {
MyField string `mapstructure:"myField" json:"myField" yaml:"myField"`
}
As you can see, it is quite redundant for me to write tag:"myField" for each of my tag, so I was wondering if there was any way to "bundle" them up and reduce the verbosity, with something like this mapstructure,json,yaml:"myField"
Or is it simply not possible and you must specify every tag separately ?
Struct tags are arbitrary string literals. Data stored in struct tags may look like whatever you want them to be, but if you don't follow the conventions, you'll have to write your own parser / processing logic. If you follow the conventions, you may use StructTag.Get() and StructTag.Lookup() to easily get tag values.
The conventions do not support "merging" multiple tags, so just write them all out.
The conventions, quoted from reflect.StructTag:
By convention, tag strings are a concatenation of optionally space-separated key:"value" pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ' '), quote (U+0022 '"'), and colon (U+003A ':'). Each value is quoted using U+0022 '"' characters and Go string literal syntax.
See related question: What are the use(s) for tags in Go?

Recommended data structure to store a changeable sequence with a number

I am trying to build a FP tree, and feeling quite confused which data structure I should use to record the prefix path and its occurrence. The prefix path is a sequence recording item set like ('coffee','milk','bear') and its occurrence is an int number. I post two requirements of the data structure below so that you don't need to go deep into FP-tree:
The occurrence of prefix path need to be searched frequently, so maybe dict like {prefix_path : occurrence} is the best way to store them.
The prefix path need to be updated(re-rank and filter) in a conditional FP tree.
I have searched other's work in Github, and found out people would use {tuple(['coffee','milk','bear']):occurrence} or {frozenset(['coffee','milk','bear']):occurrence} to do so. However, when prefix path update, they need to change tuple or frozensetinto list then change it back. I think this is quite not pythonic.
I am wondering if there is a better way to store prefix path with its occurrence.

How can I store a format string template outside of my source code?

When translating, messages can be in different languages and have format parameters. I want to be able to do this where the template can be stored in a file:
static PATTERN: &'static str = r"Hello {inner};";
/// in some implementation
fn any_method(&self) -> String {
format!(PATTERN, inner = "world");
}
That's not possible. Format strings must be actual literal strings.
The next best approach would be some kind of dynamic string format library. Or, failing that, you could always use str::replace if your needs aren't too complex.
This is definitely possible and trivial to do using the include_str macro that has been available in the standard library since version 1.0.0. The following example was tested with rustc 1.58.1:
Contents of src/main.rs:
println!(include_str!("src/hello-world.tmpl"), "world");
Contents of src/hello-world.tmpl
Hello {inner}
This works because include_str injects the contents of the template file as a string literal before println, format, and friends have a chance to evaluate their arguments. This approach only works when the format template you want to include is available during macro expansion - like it is in your example. If it's not, then you should consider other options like the ones suggested by #DK.
As an added bonus: You can also define format strings in source code locations other than the site where they are used by defining them as macros.

How to encode blob names that end with a period?

Azure docs:
Avoid blob names that end with a dot (.), a forward slash (/), or a
sequence or combination of the two.
I cannot avoid such names due to legacy s3 compatibility and so I must encode them.
How should I encode such names?
I don't want to use base64 since that will make it very hard to debug when looking in azure's blob console.
Go has https://golang.org/pkg/net/url/#QueryEscape but it has this limitation:
From Go's implementation of url.QueryEscape (specifically, the
shouldEscape private function), escapes all characters except the
following: alphabetic, decimal digits, '-', '_', '.', '~'.
I don't think there's any universal solution to handle this outside your application scope. Within your application scope, you can do ANY encoding so it falls to personal preference how you like your data to be laid out. There is not "right" way to do this.
Regardless, I believe you should go for these properties:
Conversion MUST be bidirectional and without conflicts in your expected file name space
DO keep file names without ending dots unencoded
with dot-ending files, DO encode just the conflicting dots, keeping the original name readable.
This would keep most (the non-conflicting) files short and with the original intuitive or hopefully meaningful names and should you ever be able to rename or phase out the conflicting files just remove the conversion logic without restructuring all stored data and their urls.
I'll suggest 2 examples for this. Lets suggest you have files:
/someParent/normal.txt
/someParent/extensionless
/someParent/single.
/someParent/double..
Use special subcontainers
You could remove N dots from end of filename and translate them to subcontainer name "dot", "dotdot" etc.
The result urls would like:
/someParent/normal.txt
/someParent/extensionless
/someParent/dot/single
/someParent/dotdot/double
When reading you can remove the "dot"*N folder level and append N dots back to file name.
Obviously this assumes you don't ever need to have such "dot" folders as data themselves.
This is preferred if stored files can come in with any extension but you can make some assumptions on folder structure.
Use discardable artificial extension
Since the conflict is at the end you could just append a never-used dummy extension to given files. For example "endswithdots", but you could choose something more suitable depending on what the expected extensions are:
/someParent/normal.txt
/someParent/extensionless
/someParent/single.endswithdots
/someParent/double..endswithdots
On reading if the file extension is "endswithdots" you remove the "endswithdots" part from end of filename.
This is preferred if your data could have any container structure but you can make some assumptions on incoming extensions.
I would suggest against Base64 or other full-name encoding as it would make file names notably longer and lose any meaningful details the file names may contain.

Copy and transform a file using Node.js

I want to copy some files using Node.js. Basically, this is quite easy, but I have two special requirements I need to fulfill:
I need to parse the file's content and replace some placeholders by actual values.
The file name may include a placeholder as well, and I need to replace this as well with an actual value.
So, while this is not a complex task basically, I guess there are various ways how you could solve this. E.g., it would be nice if I could use a template engine to do the replacements, but on the other hand then I need to have the complete file as a string. I'd prefer a stream-based approach, but then - how should I do the replacing?
You see, lots of questions, and I am not able to decide which way to go.
Any hints, ideas, best practices, ...?
Or - is there a module yet that does this task?
You can write your own solution without reading the entire file. fs.readFile() should only be used when you are 100% sure that the files are no longer than a buffer chunk (typically 8KB or 16KB).
The simplest solution is to create a readable stream, attach a data event listener and iterate the buffer reading character by character. If you have a placeholder like this: ${label}, then check if you find ${, then set a flag to true. Begin storing the label name. If you find } and flag is true then you've finished. Set flag to false and the temporal label string to "".
You don't need any template engine or extra module.
If the whole file can be safely loaded into memory (isn't crazy big), then the library fs-jetpack might be very good tool for this use case.
const jetpack = require("fs-jetpack");
const src = jetpack.cwd("path/to/source/folder");
const dst = jetpack.cwd("path/to/destination");
src.find({ matching: "*" }).forEach((path) => {
const content = src.read(path);
const transformedContent = transformTheFileHoweverYouWant(content);
const transformedPath = transformThePath(path);
dst.write(transformedPath, transformedContent);
});
In the example code is synchronous, but you can easily make async equivalent.

Resources