yaml library which supports writing multiple documents - node.js

I have a small NodeJS app which generates two YAML-files.
I want to merge them into one file so that I have one file with two Document nodes. Something like:
---
yaml1
---
yaml2
I tried using the npm package yaml but to no avail.
Browsing through the docs of js-yaml, I cannot find how to achieve this.
Any help is appreciated.

YAML has been designed so that it is easy to merge multiple documents in a stream. Quoting the spec:
Concatenating two YAML streams requires both to use the same character encoding. In addition, it is necessary to separate the last document of the first stream and the first document of the second stream. This is easily ensured by inserting a document end marker between the two streams. Note that this is safe regardless of the content of either stream. In particular, either or both may be empty, and the first stream may or may not already contain such a marker.
The document end marker is ... (followed by a newline). Joining the contents of both files with this marker will do the trick. This works since YAML allows a document to be ended by multiple document end markers. On the other hand, the directives end marker (---) you use always starts a document, so it is not safe to join the documents with it since the second document may already start with one, leading to the creation of an empty document in between.

Related

How to load only changed portion of YAML file in Ruamel

I am using ruamel.yaml library to load and process YAML file.
The YAML file can get updated after I have called
yaml.load(yaml_file_path)
So, I need to call load() on the same YAML file multiple times.
Is there a way/optimization parameter to pass to loader to load only the new entries in the YAML file?
There is no such facility currently built into ruamel.yaml.
If a file consists of multiple YAML documents, you can optimize the loading, by splitting the file on the document marker (---). This is fairly trivial and then you can load a single document from start to finish.
If you only want to reload parts of a document things get more difficult. If there are anchors and aliases involved, there is no easy way to do this as you may need a (non-updated) anchor definition in an updated part that needs an alias. If there are no such aliases, and you know the structure of your file, and have a way to determine what got updated, you can do partial loads and update your data structure. You would need to do some parsing of the YAML document, but if you only use a subset of YAML possibilities, this is often possible.
E.g. if you know that you only have simple scalar keys at the root level mapping of a YAML document, you can parse the document and extract non-indented strings that are followed by the value indicator. Any such string that is not in your "old" data structure is a new key and its value should be parsed (i.e. the YAML document content until the next non-indented string).
The above is far less trivial to do for any added data that is not added at the root level (whether mapping or sequence).
Since there is no indication within the YAML specification of the complexity of a YAML docment (i.e. whether it includes anchors, alias, tags etc), any of this is less easy to built in ruamel.yaml itself.
Without specific information on the format of your YAML document, and what can get updated, specific implementation details cannot be given. I assume however that you will not update and write out the loaded data, if that is so, make sure to use
yaml = YAML(typ='safe')
when possible as this will get you much faster loading times than the default round-trip loader provides.

How to convert CSV to JSON using template via Azure Logic App

Is it possible to convert CSV to JSON using built-in/managed/3rd party template, without using Azure Function via Azure Logic App?
Below is using Azure Function, which is generated automaticately. However, I cannot find the link like what it mentions. Ideally, no Azure function is required.
http://blogs.recneps.org/post/Processing-a-flat-file-with-Azure-Logic-Apps
https://social.msdn.microsoft.com/Forums/en-US/e0ea1adc-1979-44df-a4d1-52290338bc78/transform-csv-in-logic-app?forum=azurelogicapps
Below, No CSV to JSON available.
https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-enterprise-integration-liquid-transform
I will admit this is not my proudest work but it seems to work fairly well. I was able to turn a CSV file from my onedrive into JSON objects.
//Updated with less variables, splits, set and replaces actions
Input
Output(second object, first object and last needs to be purged)
How? A lot of steps that could possibly be removed and merged but. Using splits and replace actions I could single out each line and down the line create a JSON object. I was going first for an array but eventually, it was not that hard to make it into a JSON object. Not entirely sure how it works with null values.
This is probably not the best way to handle this, drawbacks here is that it is a lot of actions, the first object is the headers and that needs to be removed, there will also be a very last object that is just null(which is fine).
Entire schema
Concurrency set to 1 here

Using a list for a feature in an ML model

I want to run a machine learning algorithm on some data, so I'm exporting the data into a file first.
But one of my features for the text I'm classifying is a list of tags,
and each text can have multiple tags ex. (["mystery", "thriller"]).
Is it recommended that when I write to my CSV file for exporting the data, that I write that entire list as one of the features for my data (the "tags" feature).
Or is it better to make a separate feature for each tag. The only problem then is that most examples will only have one tag, so the other feature columns for those will be blank.
So it seems like writing this list of tags as one feature makes the most sense, but then when parsing it for training, would I then treat every element of that list as its own feature still or no?
If you do it as a single feature just make sure to use some delimiter to separate the tags that won't occur in any of the tags, and also isn't a comma (as that will mess with the csv format), something like | would probably do fine. When you go to build your models and read in that list of tags you can then split it based on that delimiter. In Java this would look like:
String[] tagList = inputString.split("|");
I'm sure most languages will have a similar method to do this.

How to parse single file for different outputs

Does somebody know how to parse single line from a file and parse it for different outputs? For example: input is a log file, outputs are elasticsearch indices with different templates. I need to parse every line and save it into the first index and some of lines which has a promo code (like ?promo=wteaewfsthser) I need to put to another index as well. I think it's possible to use two logstash instances (correct me if I'm wrong please). But I want to know is it possible to use single instance of logstash and one configuration file?
Thanks,
Igor
Sounds like you're looking for clone. Note that only the filters that are present after the clone{} will be run on the cloned event.

GridFS: Clean out all unreferenced files

I have just moved towards storing things in my GridFS in MongoDB. During testing, I noticed many files are being created but not deleted properly. I have a collection users, which has a field avatar. It contains the ObjectId of the file.
Now I'd like to have some command I could use to remove all the files and chunks that are not referenced there. Is it possible to do that with one query? Maybe 'map-reduce'?
Also I am not sure how to properly delete GridFS-Files in node-mongodb-native properly.
? Now I'd like to have some command I could use to remove all the files and chunks that are not referenced there.
Key terms here is "referenced". MongoDB does not have any joins and therefore, it does not have concept of "references".
Maybe 'map-reduce'?
Map / Reduce is a query tool, not a data modification tool. The same is true of the newer "Aggregration Framework".
What you will have to do is loop through your files and check the references for each one individually. You will then be able to delete those files.
Take a look at some documented examples on how to issue those deletions.

Resources