Copy and transform a file using Node.js

Copy and transform a file using Node.js - node.js

I want to copy some files using Node.js. Basically, this is quite easy, but I have two special requirements I need to fulfill:
I need to parse the file's content and replace some placeholders by actual values.
The file name may include a placeholder as well, and I need to replace this as well with an actual value.
So, while this is not a complex task basically, I guess there are various ways how you could solve this. E.g., it would be nice if I could use a template engine to do the replacements, but on the other hand then I need to have the complete file as a string. I'd prefer a stream-based approach, but then - how should I do the replacing?
You see, lots of questions, and I am not able to decide which way to go.
Any hints, ideas, best practices, ...?
Or - is there a module yet that does this task?

You can write your own solution without reading the entire file. fs.readFile() should only be used when you are 100% sure that the files are no longer than a buffer chunk (typically 8KB or 16KB).
The simplest solution is to create a readable stream, attach a data event listener and iterate the buffer reading character by character. If you have a placeholder like this: ${label}, then check if you find ${, then set a flag to true. Begin storing the label name. If you find } and flag is true then you've finished. Set flag to false and the temporal label string to "".
You don't need any template engine or extra module.

If the whole file can be safely loaded into memory (isn't crazy big), then the library fs-jetpack might be very good tool for this use case.
const jetpack = require("fs-jetpack");
const src = jetpack.cwd("path/to/source/folder");
const dst = jetpack.cwd("path/to/destination");
src.find({ matching: "*" }).forEach((path) => {
const content = src.read(path);
const transformedContent = transformTheFileHoweverYouWant(content);
const transformedPath = transformThePath(path);
dst.write(transformedPath, transformedContent);
});
In the example code is synchronous, but you can easily make async equivalent.

Related

How to encode blob names that end with a period?

Azure docs:
Avoid blob names that end with a dot (.), a forward slash (/), or a
sequence or combination of the two.
I cannot avoid such names due to legacy s3 compatibility and so I must encode them.
How should I encode such names?
I don't want to use base64 since that will make it very hard to debug when looking in azure's blob console.
Go has https://golang.org/pkg/net/url/#QueryEscape but it has this limitation:
From Go's implementation of url.QueryEscape (specifically, the
shouldEscape private function), escapes all characters except the
following: alphabetic, decimal digits, '-', '_', '.', '~'.

I don't think there's any universal solution to handle this outside your application scope. Within your application scope, you can do ANY encoding so it falls to personal preference how you like your data to be laid out. There is not "right" way to do this.
Regardless, I believe you should go for these properties:
Conversion MUST be bidirectional and without conflicts in your expected file name space
DO keep file names without ending dots unencoded
with dot-ending files, DO encode just the conflicting dots, keeping the original name readable.
This would keep most (the non-conflicting) files short and with the original intuitive or hopefully meaningful names and should you ever be able to rename or phase out the conflicting files just remove the conversion logic without restructuring all stored data and their urls.
I'll suggest 2 examples for this. Lets suggest you have files:
/someParent/normal.txt
/someParent/extensionless
/someParent/single.
/someParent/double..
Use special subcontainers
You could remove N dots from end of filename and translate them to subcontainer name "dot", "dotdot" etc.
The result urls would like:
/someParent/normal.txt
/someParent/extensionless
/someParent/dot/single
/someParent/dotdot/double
When reading you can remove the "dot"*N folder level and append N dots back to file name.
Obviously this assumes you don't ever need to have such "dot" folders as data themselves.
This is preferred if stored files can come in with any extension but you can make some assumptions on folder structure.
Use discardable artificial extension
Since the conflict is at the end you could just append a never-used dummy extension to given files. For example "endswithdots", but you could choose something more suitable depending on what the expected extensions are:
/someParent/normal.txt
/someParent/extensionless
/someParent/single.endswithdots
/someParent/double..endswithdots
On reading if the file extension is "endswithdots" you remove the "endswithdots" part from end of filename.
This is preferred if your data could have any container structure but you can make some assumptions on incoming extensions.
I would suggest against Base64 or other full-name encoding as it would make file names notably longer and lose any meaningful details the file names may contain.

Converting an ASTNode into code

How does one convert an ASTNode (or at least a CompilationUnit) into a valid piece of source code?
The documentation says that one shouldn't use toString, but doesn't mention any alternatives:
Returns a string representation of this node suitable for debugging purposes only.
CompilationUnits have rewrite, but that one does not work for ASTs created by hand.
Formatting options would be nice to have, but I'd basically be satisfied with anything that turns arbitrary ASTNodes into semantically equivalent source code.

In JDT the normal way for AST manipulation is to start with a basic CompilationUnit and then use a rewriter to add content. Then ASTRewriteAnalyzer / ASTRewriteFormatter should take care of creating formatted source code. Creating a CU just containing a stub type declaration shouldn't be hard, so that's one option.
If that doesn't suite your needs, you may want to experiement with directly calling the internal org.eclipse.jdt.internal.core.dom.rewrite.ASTRewriteFlattener.asString(ASTNode, RewriteEventStore). If not editing existing files, you may probably ignore the events collected in the RewriteEventStore, just use the returned String.

Capybara: Should I get rid of extracted constants or keep them?

I was wondering about some best practices regarding extraction of selectors to constants. As a general rule, it is usually recommended to extract magic numbers and string literals to constants so they can be reused, but I am not sure if this is really a good approach when dealing with selectors in Capybara.
At the moment, I have a file called "selectors.rb" which contains the selectors that I use. Here is part of it:
SELECTORS = {
checkout: {
checkbox_agreement: 'input#agreement-1',
input_billing_city: 'input#billing\:city',
input_billing_company: 'input#billing\:company',
input_billing_country: 'input#billing\:country_id',
input_billing_firstname: 'input#billing\:firstname',
input_billing_lastname: 'input#billing\:lastname',
input_billing_postcode: 'input#billing\:postcode',
input_billing_region: 'input#billing\:region_id',
input_billing_street1: 'input#billing\:street1',
....
}
In theory, I put my selectors in this file, and then I could do something like this:
find(SELECTORS[:checkout][:input_billing_city]).click
There are several problems with this:
If I want to know the selector that is used, I have to look it up
If I change the name in selectors.rb, I could forget to change it somewhere else in the file which will result in find(nil).click
With the example above, I can't use this selector with fill_in(SELECTORS[:checkout][:input_billing_city]), because it requires an ID, name or label
There are probably a few more problems with that, so I am considering to get rid of the constants. Has anyone been in a similar spot? What is a good way to deal with this situation?

Someone mentioned the SitePrism gem to me: https://github.com/natritmeyer/site_prism
A Page Object Model DSL for Capybara
SitePrism gives you a simple, clean and semantic DSL for describing
your site using the Page Object Model pattern, for use with Capybara
in automated acceptance testing.
It is very helpful in that regard and I have adjusted my code accordingly.

automapper - simplest option to only write to destination property if the source property is different?

NOTE: The scenario is using 2 entity framework models to sync data between 2 databases, but I'd imagine this is applicable to other scenarios. One could try tackling this on the EF side as well (like in this SO question) but I wanted to see if AutoMapper could handle it out-of-the-box
I'm trying to figure out if AutoMapper can (easily :) compare the source and dest values (when using it to sync to an existing object) and do the copy only if the values are different (based on Equals by default, potentially passing in a Func, like if I decided to do String.Equals with StringComparison.OrdinalIgnoreCase for some particular pair of values). At least for my scenario, I'm fine if it's restricted to just the TSource == TDest case (I'll be syncing over int's, string's, etc, so I don't think I'll need any type converters involved)
Looking through the samples and tests, the closest thing seems to be conditional mapping (src\UnitTests\ConditionalMapping.cs), and I would use the Condition overload that takes the Func (since the other overload isn't sufficient, as we need the dest information too). That certainly looks on the surface like it would work fine (I haven't actually used it yet), but I would end up with specifying this for every member (although I'm guessing I could define a small number of actions/methods and at least reuse them instead of having N different lambdas).
Is this the simplest available route (outside of changing AutoMapper) for getting a 'only copy if source and dest values are different' or is there another way I'm not seeing? If it is the simplest route, has this already been done before elsewhere? It certainly feels like I'm likely reinventing a wheel here. :)

Chuck Norris (formerly known as Omu? :) already answered this, but via comments, so just answering and accepting to repeat what he said.
#James Manning you would have to inherit ConventionInjection, override
the Match method and write there return c.SourceProp.Name =
c.TargetProp.Name && c.SourceProp.Value != c.TargetProp.Value and
after use it target.InjectFrom(source);
In my particular case, since I had a couple of other needs for it anyway, I just customized the EF4 code generation to include the check for whether the new value is the same as the current value (for scalars) which takes care of the issue with doing a 'conditional' copy - now I can use Automapper or ValueInject or whatever as-is. :)
For anyone interested in the change, when you get the default *.tt file, the simplest way to make this change (at least that I could tell) was to find the 2 lines like:
if (ef.IsKey(primitiveProperty))
and change both to be something like:
if (ef.IsKey(primitiveProperty) || true) // we always want the setter to include checking for the target value already being set

How to make this Groovy string search code more efficient?

I'm using the following groovy code to search a file for a string, an account number. The file I'm reading is about 30MB and contains 80,000-120,000 lines. Is there a more efficient way to find a record in a file that contains the given AcctNum? I'm a novice, so I don't know which area to investigate, the toList() or the for-loop. Thanks!
AcctNum = 1234567890
if (testfile.exists())
{
lines = testfile.readLines()
words = lines.toList()
for (word in words)
{
if (word.contains(AcctNum)) { done = true; match = 'YES' ; break }
chunks += 1
if (done) { break }
}
}

Sad to say, I don't even have Groovy installed on my current laptop - but I wouldn't expect you to have to call toList() at all. I'd also hope you could express the condition in a closure, but I'll have to refer to Groovy in Action to check...
Having said that, do you really need it split into lines? Could you just read the whole thing using getText() and then just use a single call to contains()?
EDIT: Okay, if you need to find the actual line containing the record, you do need to call readLines() but I don't think you need to call toList() afterwards. You should be able to just use:
for (line in lines)
{
if (line.contains(AcctNum))
{
// Grab the results you need here
break;
}
}

When you say efficient you usually have to decide which direction you mean: whether it should run quickly, or use as few resources (memory, ...) as possible. Often both lie on opposite sites and you have to pick a trade-off.
If you want to search memory-friendly I'd suggest reading the file line-by-line instead of reading it at once which I suspect it does (I would be wrong there, but in other languages something like readLines reads the whole file into an array of strings).
If you want it to run quickly I'd suggest, as already mentioned, reading in the whole file at once and looking for the given pattern. Instead of just checking with contains you could use indexOf to get the position and then read the record as needed from that position.

I should have explained it better, if I find a record with the AcctNum, I extract out other information on the record...so I thought I needed to split the file into multiple lines.

if you control the format of the file you are reading, the solution is to add in an index.
In fact, this is how databases are able to locate records so quickly.
But for 30MB of data, i think a modern computer with a decent harddrive should do the trick, instead of over complicating the program.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Copy and transform a file using Node.js - node.js

Related

How to encode blob names that end with a period?

Converting an ASTNode into code

Capybara: Should I get rid of extracted constants or keep them?

automapper - simplest option to only write to destination property if the source property is different?

How to make this Groovy string search code more efficient?

Categories

Resources