How to apply resource only if content would change

How to apply resource only if content would change - puppet

I am trying to maintain a nonstandard hosts location using puppet's hosts resource type. Since it is nonstandard hosts file its content is not "prefetched" by puppet and then it is not possible to do something like purging entries.
In order to workaround the issue I want to remove the file before puppet is applying any changes to that file. However, I don't want to remove the file every time puppet is running but rather only if there is something to be changed. Is there a way to apply configurations for the resource only if there is going to change anything ?
Right now, I define hosts via hiera and use create_resources function to produce the desired hosts resources.
create_resources(host,$host_entries)
To make sure that there are not any other entries, my most simple idea is to make sure file doesnt exist, right before applying the host configuration:
file { '/nonstandard/hosts':
ensure => absent,
}
By doing so the hosts will be always removed, even if there is nothing to change. As it will be in 99 percent of the cases.
So what options I have to remove the file only in case of create_resources(host,$host_entries) will really bring something new.
Maybe there is a different and more simple approach ?

Is there a way to apply configurations for the resource only if there is going to change anything ?
Not in a general sense. What you could do instead is write a custom fact that provides a list of the hosts defined by your custom hosts file (only the hostnames are needed), and based on the value of that fact and your hiera data, generate Host resources to ensure absent those hosts for which you do not have definitions. That does, however, assume that all of the hosts that should be listed in the file are known to you from Hiera data.

Related

How to load only changed portion of YAML file in Ruamel

I am using ruamel.yaml library to load and process YAML file.
The YAML file can get updated after I have called
yaml.load(yaml_file_path)
So, I need to call load() on the same YAML file multiple times.
Is there a way/optimization parameter to pass to loader to load only the new entries in the YAML file?

There is no such facility currently built into ruamel.yaml.
If a file consists of multiple YAML documents, you can optimize the loading, by splitting the file on the document marker (---). This is fairly trivial and then you can load a single document from start to finish.
If you only want to reload parts of a document things get more difficult. If there are anchors and aliases involved, there is no easy way to do this as you may need a (non-updated) anchor definition in an updated part that needs an alias. If there are no such aliases, and you know the structure of your file, and have a way to determine what got updated, you can do partial loads and update your data structure. You would need to do some parsing of the YAML document, but if you only use a subset of YAML possibilities, this is often possible.
E.g. if you know that you only have simple scalar keys at the root level mapping of a YAML document, you can parse the document and extract non-indented strings that are followed by the value indicator. Any such string that is not in your "old" data structure is a new key and its value should be parsed (i.e. the YAML document content until the next non-indented string).
The above is far less trivial to do for any added data that is not added at the root level (whether mapping or sequence).
Since there is no indication within the YAML specification of the complexity of a YAML docment (i.e. whether it includes anchors, alias, tags etc), any of this is less easy to built in ruamel.yaml itself.
Without specific information on the format of your YAML document, and what can get updated, specific implementation details cannot be given. I assume however that you will not update and write out the loaded data, if that is so, make sure to use
yaml = YAML(typ='safe')
when possible as this will get you much faster loading times than the default round-trip loader provides.

Puppet: don't manage single line(s) in a template

I have defined a config file in my puppet manifest and I need to use an .erb template so I can load in dynamic parameters.
The problem is however that the application insists on changing a couple of lines in that file prior to puppet running. Such lines cannot be easily discovered and put into the template (for example, a build number that increments). If I tell puppet to refresh the service when the config file changes, then every puppet run I have my service being restarted, which isn't good.
Is there any way that I could use an .erb template with Puppet but tell it not to care if specific lines in it change? I'm not sure if this is possible or even if it's going to work, but it would be good to know.
Cheers

You can use either a file_line resource from stdlib module or an augeas lens to instruct puppet about what lines do you want at the config file. Those lines will be puppet managed, and rest of file will rest unchanged.

If you don't like file_line or augeas (one is a bit of a hack and the other difficult to figure out), you might have to create a custom fact to inform the master of the current state of the file. The master could then apply logic to update that content only if necessary.
Granted, that's not much more intuitive or maintainable than the aforementioned methods.

How to load files in a specific order

I would like to know how I can load some files in a specific order. For instance, I would like to load my files according to their timestamp, in order to make sure that subsequent data updates are replayed in the proper order.
Lets say I have 2 types of files : deal info files and risk files.
I would like to load T1_Info.csv, then T1_Risk.csv, T2_Info.csv, T2_Risk.csv...
I have tried to implement a comparator, as it is said on Confluence, but it seems that the loadInstructions file has the priority. It will order the Info files and the risk files independently. (loading T1_Info.csv, T2_Info.csv and then T1_Risk.csv, T2_Risk.csv..)
Do I have to implement a custom file loader, or is it possible using an AP configuration ?

The loading of the files based on load instructions is done in
com.quartetfs.tech.store.csv.impl.CSVDataModelFactory.load(List<FileLoadDescriptor>). The FileLoadDescriptor list you receive is created directly from the load instructions files.
What you can do is create a simple instructions files with 2 entries, one for deal info and one for risk. So your custom implementation of CSVDataModelFactory will be called with a list of two items. In your custom implementation you scan the directory where the files are, sort them in the order you want them to be parsed and call the super.load() with the list of FileLoadDescriptor you created from the directory scanning.
If you want to also load files that are place in the future in this folder you have to add to your load instructions a line that will match all files and that will make the super.load() implementation to create a directory watcher for that (you should then maybe override createDirectoryWatcher() to not watch the files already present in the folder when load is called).

how can i store elasticsearch settings+mappings in one file (like schema.xml for Solr)

How can I store elasticsearch settings+mappings in one file (like schema.xml for Solr)? Currently, when I want to make a change to my mapping, I have to delete my index settings and start again. Am I missing something?
I don't have a large data set as of now. But in preparation for a large amount of data that will be indexed, I'd like to be able to modify the settings and some how reindex without starting completely fresh each time. Is this possible and if so, how?

These are really multiple questions disguised as one. Nevertheless:
How can I store elasticsearch settings+mappings in one file (like schema.xml for Solr)?
First, note, that you don't have to specify mapping for lots of types, such as dates, integers, or even strings (when the default analyzer is OK for you).
You can store settings and mappings in various ways, in ElasticSearch < 1.7:
In the main elasticsearch.yml file
In an index template file
In a separate file with mappings
Currently, when I want to make a change to my mapping, I have to delete my index settings and start again. Am I missing something?
You have to re-index data, when you change mapping for an existing field. Once your documents are indexed, the engine needs to reindex them, to use the new mapping.
Note, that you can update index settings, in specific cases, such as number_of_replicas, "on the fly".
I'd like to be able to modify the settings and some how reindex without starting completely fresh each time. Is this possible and if so, how?
As said: you must reindex your documents, if you want to use a completely new mapping for them.
If you are adding, not changing mapping, you can update mappings, and new documents will pick it up when being indexed.

Since Elasticsearch 2.0:
It is no longer possible to specify mappings in files in the config directory.
Find the documentation link here.
It's also not possible anymore to store index templates within the config location (path.conf) under the templates directory.
The path.conf (/etc/default/elasticsearch by default on Ubuntu) stores now only environment variables including heap size, file descriptors.
You need to create your templates with curl.
If you are really desperate, you could create your indexes and then backup your data directory and then use this one as your "template" for new Elasticsearch clusters.

XUL accessing resources and application structure

So I'm new to XUL.
As a language it seems easy enough and I'm already pretty handy at javascript, but the thing I can't wrap my mind around is the way you access resources from manifest files or from xul files. So I did the 'Getting started with XULRunner' tutorial... https://developer.mozilla.org/en/getting_started_with_xulrunner
and I'm more confused than ever... so I'm hoping someone can set me straight.
Here is why... (you may want to open the tutorial for this).
The manifest file, the prefs.js and the xul file all refer to a package called 'myapp', that if everything I've read thus far on MDN can be trusted means that inside the chrome directory there must be either a jar file or directory called myapp, but there is neither. The root directory of the whole app is called myapp, but I called mine something completely different and it still worked.
When I placed the content folder, inside another folder called 'foo', and changed all references to 'myapp' to 'foo', thus I thought creating a 'foo' package, a popup informed me that it couldn't find 'chrome://foo/content/main.xul', though that's exactly where it was.
Also in the xul file it links to a stylesheet inside 'chrome://global/skin/' which doesn't exist. Yet something is overriding any inline styling I try to do to the button. And when I create a css file and point the url to it, the program doesn't even run.
Can someone please explain what strange magic is going on here... I'm very confused.

When you register a content folder in a chrome.manifest you must use the following format:
content packagename uri/to/files/ [flags]
The uri/to/files/ may be absolute or relative to the location of the manifest. That is, it doesn't matter what the name of the containing folder is relative to your package name; the point is to tell chrome how to resolve URIs of the following form:
chrome://packagename/content/...
The packagename simply creates a mapping to the location of the files on disk (wherever that may be).

The chrome protocol defines a logical package structure, it simply maps one URL to another. The structure on disk might be entirely different and the files might not even be located on disk. When the protocol handler encounters an address like chrome://foo/content/main.xul it checks: "Do we have a manifest entry somewhere that defines the content mapping for package foo?" And if it then finds content foobar file:///something/ - it doesn't care whether that URL refers to a file, it simply resolves main.xul relatively to file:///something/ which results in file:///something/main.xul. So file:///something/browser.xul will be the URL from which the data will be read in the end - but you could also map a chrome package to another chrome URL, a jar URL or something else (theoretically you could even use http but that is forbidden for security reasons).
If you look into the Firefox/XULRunner directory you will see another chrome.manifest there (in Firefox 4/5 it is located inside omni.jar file). That's where the mappings for global package are defined for example.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string