Why would one use "resource collectors"? - puppet

I was looking at the OpenStack modules on Puppet Forge. These modules make use of "resource collectors" so I was reading up on "resource collectors" here: https://docs.puppet.com/puppet/latest/reference/lang_collectors.html
I still cannot figure out why one would need to use a resource collector?
Here's a example where the OpenStack/puppet-keystone module uses a resource collector:
if !is_service_default($memcache_servers) or !is_service_default($cache_memcache_servers) {
Service<| title == 'memcached' |> -> Anchor['keystone::service::begin']
}
I guess this would do resource ordering; causing the memcached service resource to execute before the keystone::service::begin Anchor. I don't really know what an Anchor is. I am guessing its used for resource ordering?

Resource collectors have several uses:
They realize virtual resources, or they collect exported ones, depending on the form of the collector. The realize() function for virtual resources plays in this space too, but there is no way other than collectors to collect exported resources. See also below.
They can be used in chain expressions, such as your example demonstrates, to set ordering constraints.
They can be used to override resource parameters.
In each of these uses, collectors have properties that at times make them particularly convenient, among them:
Collectors operate over all matching resources in the catalog, including any that have not yet been declared at the time the collector expression is evaluated, regardless of the locus or scope of the declarations.
Collectors support filter predicates (e.g. title == 'memcached') that help fine tune which resources are collected. This can be particularly useful when combined with tags.
Collectors can collect zero resources, and that's ok.
The last seems to be the point of the specific example you presented: since there can be at most one Service having title == 'memcached', the overall expression causes that service to be synced before Anchor['keystone::service::begin'] if it is included in the catalog, but it has no effect if no such service is declared, independent of manifest parse order. I don't think there is any other parse-order-independent way to accomplish this.

Related

In Bitbucket Pipelines, can manual pipelines present descriptions in the web interface?

In Bitbucket Pipelines, for manually run pipelines (i.e. "custom pipelines") where you use the web UI to set variables, is it possible to insert any documentation? For example, so that the UI presents a description above or alongside the input form for a variable? (Or are you limited only to being able to name the pipeline and optionally give the variables each a default value and a set of allowed values?)
I don't want other users of the same pipeline (from the web UI) to misinterpret what a keyword expects, or indeed what the pipeline will do, and doubt they will always refer the source code itself to find comments.
Not that I know.
Documentation only describes a name, default and allowed-values attributes https://support.atlassian.com/bitbucket-cloud/docs/configure-bitbucket-pipelinesyml/#variables
But I think the culprit of your issue boils down to a general programming problem: adequately naming variables. Comments making up for bad variable names is among the top-10 programming bad practices.
Variable names should always be unambiguous and self-explanatory for anyone reading it.

Terraform plan --refresh-only return False Drift with empty tags in deployed resources

While using terraform plan --refresh-only to detect drifted resources, the command returns a plan where 1 or more resources are changed externally. All of them mark the tags attribute with an empty map {}
# aws_iam_role.lalala_poweruser has changed
~ resource "aws_iam_role" "lalala_poweruser" {
id = "lalala_poweruser"
name = "lalala_poweruser"
+ tags = {}
# (11 unchanged attributes hidden)
}
To make emphasis, the + tags = {} shouldn't be marked with a + symbol as no one changed this property in the deployed infrastructure.
What I've tried so far:
Adding the attribute with an empty map {} to each resource
Adding the attribute with real tags to each resource
Adding the defaults_tags to the provider configuration
The endgame of detecting drift using this command is to programmatically detect any change in a CICD pipeline before deploying new changes. If there is a real drift, we want to know, and we want to stop this deployment. Also, something helpful about this command is when a real drift is detected, it returns the diff so we can log the information and take action.
More info about this process from the Terraform docs:
Manage Resource Drift
In Terraform's architecture, one of the responsibilities for a provider is to read the current state of each remote object already under management, to compare it with the result of the most recent apply, and produce a new object which reconciles the two.
Part of that responsibility is to deal with situations where values might appear different in Terraform but are effectively the same as far as the remote system is concerned. For each detected change, the provider must broadly place it into one of the following two categories:
"Drift": the remote system state is different from the previous run state in a way that is functionally significant.
"Normalization": the remote system returned some data in a slightly different shape than was previously returned, but it still means the same thing. Some typical examples of this are: case-insensitive strings where the remote API always returns them in lowercase, or JSON strings where the remote API always returns them minified regardless of the input.
In the case you've shown here, it seems like this provider is not making a correct classification: there is no functional difference between tags = {} and omitting that argument altogether, and so the provider should've classified this as normalization and preserved the fact that you didn't set it, rather than reporting this as drift.
You mentioned that you've already tried setting tags = {} explicitly in your configuration but that didn't fix the problem. Unfortunately that suggests a second opposite bug in the provider: during its create/update step it seems to be normalizing an empty map to be the same as null and so the provider disagrees with itself about what is the normalized form for this argument. If that is true then I don't think there will be anything you can do within your Terraform module to avoid the problem; the only recourse will be to fix the bug in the provider codebase.
The AWS provider team is tracking various bugs of this type. That issue says that provider release v3.70.0 fixed a number of these bugs and so I would suggest making sure you're running at least that version as a first step. However there is a comment describing a similar problem with aws_iam_policy, and these two resource types are closely related and so probably share the same bugs in this regard. If you are already running the latest version of the provider then I think the best thing to do here is to report this by adding another comment to that same GitHub issue, so the provider team will be aware of it.
I'm sorry I don't have an answer that can be implemented only within your Terraform module.

How can I split up resource creation into different modules with Terraform?

I would like to split up resource creation using different modules, but I have been unable to figure out how.
For this example I want to create a web app (I'm using the azurerm provider). I've done so by adding the following to a module:
resource "azurerm_app_service" "test" {
name = "${var.app_service_name}"
location = "${var.location}"
resource_group_name = "${var.resource_group_name}"
app_service_plan_id = "${var.app_service_plan_id}"
app_settings {
SOME_SETTING = "${var.app_setting}"
}
}
It works, but I'd like to have a separate module for applying the app settings, so:
Module 1 creates the web app
Module 2 applies the app settings
Potential module 3 applies something else to the web app
Is this possible? I tried splitting it up (having two modules defining the web app, but only one of them containing the app settings), but I get an error stating that the web app already exists, so it doesn't seem to understand that I try to manipulate the same resource.
Details on how it's going to be used:
I am going to provide a UI for the end user on which he/she can choose a stack of resources needed and tick a range of options desired for that person's project, along with filling out required parameters for the infrastructure.
Once done and submitted the parameters are applied to a Terraform template. It is not feasible to have a template for each permutation of options, so it will have to include different modules depending of the chosen options.
For example: if a user ticks web app, Cosmos DB and application insights the Terraform template will include these modules (using the count trick to create conditions). In this example I'll need to pass the instrumentation key from application insights to the web app's application settings and this is where my issue is.
If the user didn't choose application insights I don't want a setting for the web app and that it why I need to gradually build up a Terraform resource. Also, depending on the type of database the user chose, different settings will be added to the web app's settings.
So my idea is to create a module to apply certain application settings. I don't know if this is possible (or a better way exists), hence my question.
The best way for you to do this would be to wrap terraform with a bash script or whatever scripting language you want (python). Then create a template in bash or python (jinja2) to generate the resource with whatever options the customer selected for the settings, run the template to generate your terraform code, and then apply it.
I've done this with S3 buckets quite a bit. In terraform 0.12, you can generate templates in terraform.
I know this is 5 months old but I think that part of the situation here is that the way you describe splitting it up does not exactly fit how modules are intended to be used. For one thing, you cannot "build up a resource" dynamically, as Terraform is by design declarative. You can only define a resource and then dynamically provide it's predefined inputs (including it's count to activate/deactivate). Secondly, modules are in no way necessary to turn on and off portions of the stack via configuration. Modules simply are a way of grouping sets of resources together for containment and reuse. Any dynamism available to you with a stack consisting of a complex hierarchy of modules would be equally available with essentially the same code all in a giant monolithic blob. The monolith would just be a mess, is the problem with that, and you wouldn't be able to use pieces of it for other things. Finally, using a module to provide settings is not really what they are for. Yes, theoretically you could create a module with a "null_data_source" and then use it purely as some kind of shim to provide settings, but likely this would be a convoluted unnecessary approach to something done better done by simply providing a variable the way you have already shown.
You probably will need to wrap this in some kind of bash (etc) script at the top level as mentioned in other answers and this is not a limitation of terraform. For example once you have your modules, you would want to keep all currently applied stacks (for each customer or whatever) in some kind of composition repository. How will you create the composition stacks for those customers after they fill out the setup form? You will have to do that with some file creation automation that is not what Terraform is there for. Terraform is there to execute stacks that exist. It's not a limitation of Terraform that you have to create the .tf files with an external text editor to begin with, and it's not a limitation that in a situation like this you would use some external scripting to dynamically create the composition stacks for the customer, it's just part of the way you would use automation to get things ready for Terraform to do it's job of applying the stacks.
So, you cannot avoid this external script tooling, and you would probably use it to create the folders for the customer specific composition stacks (that refer to your modules), populate the folders with default files, and to create a .tfvars file based on the customers input from the form. Then you could go about this multiple ways:
You have the .tfvars file be the only difference between customer composition stacks. Whatever modules you do or don't want to use would be activated/deactivated by the "count trick" you mention, given variables from .tfvars. This way has the advantage of being easy to reason about as all customer composition stacks are the same thing just configured differently.
You could have the tooling actually insert the module definitions that you want into the files of the composition stacks. This will create more concise composition stacks, with fewer janky "count tricks" but every customer's stack would be it's own weird snowflake.
As far as splitting up the modules, be aware that there is a whole art/science/philosophy about this. I refer you to this resource on IAC design patterns and this talk on the subject (relevant bit starts at 19:15). Those are general context on the subject.
For your particular case, you would basically want put all the the smallest divisible functional chunks (that might be turned on/off) into their own modules, each to be referenced by higher level consuming modules. You mention it not being feasible to have a module for every permuation, again that is thinking about it wrong. You would aim for something that would be more of a tree of modules and combinations. At the top level you would have (bash etc) tooling that creates the new customers composition folder, their .tfvars file, and drop in the same composition stack that will be the top of the "module-tree". Each module that represents an optional part of the customers stack will take a count. Those modules will either have functional resources inside or will be intermediate modules, themselves instantiating a configurable set of alternative sub modules containing resources.
But it will be up to you to sit down and think through the design of a "decision tree", implemented as a hierarchy of modules, which covers all the permutations that are not feasible to create separate monolithic stacks for.
TFv12 dynamic nested blocks would help you specifically with the one aspect of having or not having a declared block like app_settings. Since you cannot "build up a resource" dynamically, the only alternative in a case like that would be to have an intermediate module that declares the resource in multiple ways (with and without the app_settings block) and one will be chosen via the "count trick" or other input configuration. That sort of thing is just not necessary now that dynamic blocks exist.

run puppet resource types only when file has changed

I feel like this is a very common scenario but yet I haven't found a solution that meets my needs. I want to create something that works like below. (Unfortunately set_variable_only_if_file_was_changed doesn't exist)
file { 'file1':
path => "fo/file1",
ensure => present,
source => "puppet:///modules/bar/file1",
set_variable_only_if_file_was_changed => $file_change = true
}
if $file_change{
file{'new file1'
...
}
exec{'do something1'
...
}
package{'my package'
...
}
}
I have seen how exec can be suppressed using the refreshonly => true but with files it seems harder. I would like to avoid putting code into a .sh file and execute it through a exec.
Puppet does not have the feature you describe. Resources that respond to refresh events do so in a manner appropriate to their type; there is no mechanism to define custom refresh behavior for existing resources. There is certainly no way to direct which resources will be included in a node's catalog based on the results of applying other resources -- as your pseudocode seems to suggest -- because the catalog is completely built before anything in it is applied.
Furthermore, that is not nearly as constraining as you might suppose if you're coming to Puppet with a substantial background in scripting, as many people do. People who think in terms of scripting need to make a mental adjustment to Puppet, because Puppet's DSL is definitely not a scripting language. Rather, it is a language for describing the construction of a machine state description (a catalog). It is important to grasp that such catalogs describe details of the machine's target state, not the steps required to take it from its current state to some other state. (Even Execs, a possible exception, are most effective when used in that way.) It is very rare to be unable to determine a machine's intended state from its identity and initial state. It is uncommon even for it to be difficult to do so.
How you might achieve the specific configuration you want depends on the details. Often, the answer lies in user-defined facts. For example, instead of capturing whether a given file was changed during catalog application, you provide a fact that you can interrogate during catalog building to evaluate whether that file will be changed, and adjust the catalog details appropriately. Sometimes you don't need such tests in the first place, because Puppet takes action on a given resource only when changes to it are needed.

standard way of encoding pagination info on a restful url get?

I think my question would be better explained with a couple of examples...
GET http://myservice/myresource/?name=xxx&country=xxxx&_page=3&_page_len=10&_order=name asc
that is, on the one hand I have conditions ( name=xxx&country=xxxx ) and on the other hand I have parameters affecting the query ( _page=3&_page_len=10&_order=name asc )
now, I though about using some special prefix ( "_" in thes case ) to avoid collisions between conditions and parameters ( what if my resourse has an "order" property? )
is there some standard way to handle these situations?
--
I found this example (just to pick one)
http://www.peej.co.uk/articles/restfully-delicious.html
GET http://del.icio.us/api/peej/bookmarks/?tag=mytag&dt=2009-05-30&start=1&end=2
but in this case condition fields are already defined (there is no start nor end property)
I'm looking for some general solution...
--
edit, a more detailed example to clarify
Each item is completely indepent from one another... let's say that my resources are customers, and that (luckily) I have a couple millions of them in my db.
so the url could be something like
http://myservice/customers/?country=argentina,last_operation=2009-01-01..2010-01-01
It should give me all the customers from argentina that bought anything in the last year
Now I'd like to use this service to build a browse page, or to fill a combo with ajax, for example, so the idea was to add some metada to control what info should I get
to build the browse page I would add
http://...,_page=1,_page_len=10,_order=state,name
and to fill an autosuggest combo with ajax
http://...,_page=1,_page_len=100,_order=state,name,name=what_ever_type_the_user*
to fill the combo with the first 100 customers matching what the user typed...
my question was if there was some standard (written or not) way of encoding this kind of stuff in a restfull url manner...
While there is no standard, Web API Design (by Apigee) is a great book of advice when creating Web APIs. I treat it as a sort of standard, and follow its recommendations whenever I can.
Under "Pagination and partial response" they suggest (page 17):
Use limit and offset
We recommend limit and offset. It is more common, well understood in leading databases, and easy for developers.
/dogs?limit=25&offset=50
There's no standard or convention which defines a way to do this, but using underscores (one or two) to denote meta-info isn't a bad idea. This is what's used to specify member variables by convention in some languages.
Note:
I started writing this as a comment to my previous answer. Then I was going to add it as an edit, but I think that it belongs as a separate answer instead. This is a completely different approach and a separate answer in its own right since it is a different approach.
The more that I have been thinking about this, I think that you really have two different resources that you have to deal with:
A page of resources
Each resource that is collected into the page
I may have missed something (could be... I've been guilty of misinterpretation). Since a page is a resource in its own right, the paging meta-information is really an attribute of the resource so placing it in the URL isn't necessarily the wrong approach. If you consider what can be cached downstream for a page and/or referred to as a resource in the future, the resource is defined by the paging attributes and the query parameters so they should both be in the URL. To continue with my entirely too lengthy response, the page resource would be something like:
http://.../myresource/page-10/3?name=xxx&country=yyy&order=name&orderby=asc
I think that this gets to the core of your original question. If the page itself is a resource, then the URI should describe the page so something like page-10 is my way of saying "a page of 10 items" and the next portion of the page is the page number. The query portion contains the filter.
The other resource names each item that the page contains. How the items are identified should be controlled by what the resources are. I think that a key question is whether the result resources stand on their own or not. How you represent the item resources differs based on this concept.
If the item representations are only appropriate when in the context of the page, then it might be appropriate to include the representation inline. If you do this, then identify them individually and make sure that you can retrieve them using either URI fragment syntax or an additional path element. It seems that the following URLs should result in the fifth item on the third page of ten items:
http://.../myresource/page-10/3?...#5
http://.../myresource/page-10/3/5?...
The largest factor in deciding between these two is how strongly coupled the individual item is with the page. The fragment syntax is considerably more binding than the path element IMHO.
Now, if the item resources are free-standing and the page is simply the result of a query (which I think is likely the case here), then the page resource should be an ordered list of URLs for each item resource. The item resource should be independent of the page resource in this case. You might want to use a URI that is based on the identifying attribute of the item itself. So you might end up with something like:
http://.../myresource/item/42
http://.../myresource/item/307E8599-AD9B-4B32-8612-F8EAF754DFDB
The key deciding factor is whether the items are freestanding resources or not. If they are not, then they are derived from the page URI. If they are freestanding, then they should have their are defined by their own resources and should be included in the page resource as links instead.
I know that the RESTful folk tend to dislike the usage of HTTP headers, but has anyone actually looked into using the HTTP ranges to solve pagination. I wrote a ISAPI extension a few years back that included pagination information along with other non-property information in the URI and I never really like the feel of it. I was thinking about doing something like:
GET http://...?name=xxx&country=xxxx&_orderby=name&_order=asc HTTP/1.1
Range: pageditems=20-29
...
This puts the result set parameters (e.g., _orderby and _order) in the URI and the selection as a Range header. I have a feeling that most HTTP implementations would screw this up though especially since support for non-byte ranges is a MAY in RFC2616. I started thinking more seriously about this after doing a bunch of work with RTSP. The Range header in RTSP is a nice example of extending ranges to handle time as well as bytes.
I guess another way of handling this is to make a separate request for each item on the page as an individual resource in its own right. If your representation allows for this, then you might want to consider it. It is more likely that intermediate caching would work very well with this approach. So your resources would be defined as:
myresource/name=xxx;country=xxx/orderby=name;order=asc/20/
myresource/name=xxx;country=xxx/orderby=name;order=asc/21/
myresource/name=xxx;country=xxx/orderby=name;order=asc/22/
myresource/name=xxx;country=xxx/orderby=name;order=asc/23/
myresource/name=xxx;country=xxx/orderby=name;order=asc/24/
I'm not sure if anyone has tried something like this or not. This would make URIs constructible which is always a useful property IMHO. The bonus to this approach is that the individual responses could be cached and the server is free to optimize handling of collecting pages of items and what not in the most efficient way. The basic idea is to have the client specify the query in the URI and the index of them item that it wants to retrieve. No need to push the idea of a "page" into the resource or even to make it visible. The client can iteratively retrieve objects until it's page is full or it receives a 404.
There is a downside of course... the HTTP server and infrastructure has to support pipelining or the cost of creation/destruction of connections might kill the idea outright.

Resources