I have a variable that I set at a very early stage, the variable is boolean, then at some point I want to exec a command based on the fact if this variable is true or false, something like this:
exec { 'update-to-latest-core':
command => "do something",
user => 'root',
refreshonly => true,
path => [ "/bin/bash", "sbin/" , "/usr/bin/", "/usr/sbin/" ],
onlyif => 'test ${latest} = true',
notify => Exec["update-to-latest-database"],
}
So this command doesn't work (onlyif => ['test ${latest} = true'],) I tried several other ways too, but didn't work. Something so simple cannot be so hard to do. Can someone help me with this, and also explain the rules behind getting commands to execute inside onlyif close. (also I cannot use if-close on a higher level because I have other dependencies )
Since $latest is a Puppet variable, it is not useful to defer checking its value until the catalog is applied. Technically, in fact, you cannot do so: Puppet will interpolate the variable's value into the Exec resource's catalog representation, so there is no variable left by the time the catalog is applied, only a literal.
Since the resource in question is notified by another resource, you must not suppress it altogether. If the code presented in your question does not work, then it is likely because the interpolated value of $latest is different than you expect -- for example, '1' instead of 'true'. Running the agent in --debug mode should show you the details of the command being executed.
Alternatively, you could approach it a different way:
exec { 'update-to-latest-core':
command => $latest ? { true => "do something", default => '/bin/true' },
user => 'root',
refreshonly => true,
path => [ "/bin/bash", "sbin/" , "/usr/bin/", "/usr/sbin/" ],
notify => Exec["update-to-latest-database"],
}
That will execute (successfully, and with no external effect) whenever $latest is false, and it will run your do something command whenever $latest is true. The choice of command is made during catalog building.
Related
I wrote a bash script that finds CSV files in specified folders and pipes them into logstash with the correct config file. However when running this script I run into the following error, saying that the shutdown process is stalled, causing an infinite loop until I manually stop it with ctrl+c:
[2018-03-22T08:59:53,833][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.2.3"}
[2018-03-22T08:59:54,211][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2018-03-22T08:59:57,970][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-03-22T08:59:58,116][INFO ][logstash.pipeline ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0xf6851b3 run>"}
[2018-03-22T08:59:58,246][INFO ][logstash.agent ] Pipelines running {:count=>1, :pipelines=>["main"]}
[2018-03-22T08:59:58,976][INFO ][logstash.outputs.file ] Opening file {:path=>"/home/kevin/otrs_customer_user"}
[2018-03-22T09:00:06,471][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>0, "stalling_thread_info"=>{["LogStash::Filters::CSV", {"separator"=>";", "columns"=>["IOT", "OID", "SUM", "XID", "change_by", "change_time", "city", "company", "company2", "create_by", "create_time", "customer_id", "email", "fax", "first_name", "id", "inst_city", "inst_first_name", "inst_last_name", "inst_street", "inst_zip", "last_name", "login", "mobile", "phone", "phone2", "street", "title", "valid_id", "varioCustomerId", "zip"], "id"=>"f1c74146d6672ca71f489aac1b4c2a332ae515996657981e1ef44b441a7420c8"}]=>[{"thread_id"=>23, "name"=>nil, "current_call"=>"[...]/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:90:in `read_batch'"}]}}
[2018-03-22T09:00:06,484][ERROR][logstash.shutdownwatcher ] The shutdown process appears to be stalled due to busy or blocked plugins. Check the logs for more information.
[2018-03-22T09:00:11,438][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>0, "stalling_thread_info"=>{["LogStash::Filters::CSV", {"separator"=>";", "columns"=>["IOT", "OID", "SUM", "XID", "change_by", "change_time", "city", "company", "company2", "create_by", "create_time", "customer_id", "email", "fax", "first_name", "id", "inst_city", "inst_first_name", "inst_last_name", "inst_street", "inst_zip", "last_name", "login", "mobile", "phone", "phone2", "street", "title", "valid_id", "varioCustomerId", "zip"], "id"=>"f1c74146d6672ca71f489aac1b4c2a332ae515996657981e1ef44b441a7420c8"}]=>[{"thread_id"=>23, "name"=>nil, "current_call"=>"[...]/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:90:in `read_batch'"}]}}
When I run the same file and the same config manually with bash logstash -f xyz.config < myfile.config it works as desired and the process gets properly terminated. In the bash script I'm basically using the exact command and I run into the error above.
I also noticed that the problem appears to be random and not every time on the same file and config.
My config consists of a stdin input a csv filter and for testing an output in json format to a file (also removed stdout{}).
Does anybody have an idea why my process stalls during script execution? Or if not, is there maybe a way to tell logstash to shutdown when it's stalled?
Sample config:
input {
stdin {
id => "${LS_FILE}"
}
}
filter {
mutate {
add_field => { "foo_type" => "${FOO_TYPE}" }
add_field => { "[#metadata][LS_FILE]" => "${LS_FILE}"}
}
if [#metadata][LS_FILE] == "contacts.csv" {
csv {
separator => ";"
columns =>
[
"IOT",
"OID",
"SUM",
"XID",
"kundenid"
]
}
if [kundenid]{
mutate {
update => { "kundenid" => "n-%{kundenid}" }
}
}
}
}
output {
if [#metadata][LS_FILE] == "contacts.csv" {
file{
path => "~/contacts_file"
codec => json_lines
}
}
}
Sample script:
LOGSTASH="/customer/app/logstash-6.2.3/bin/logstash"
for file in $(find $TARGETPATH -name *.csv) # Loop each file in given path
do
if [[ $file = *"foo"* ]]; then
echo "Importing $file"
export LS_FILE=$(basename $file)
bash $LOGSTASH -f $CFG_FILE < $file # Starting logstash
echo "file $file imported."
fi
done
I export environment variables in the bash script and set them to metadata in the logstash configs to perform some conditinals for differet input files. The output to JSON in a file is just for testing purposes.
Logstash tries to performs various steps when you try to shutdown such as,
It stop all input, filter and output plugins
Process all in-flight events
Terminate the Logstash process
and there are various factors which makes the shutdown process very unpredictable such as,
An input plugin receiving data at a slow pace.
A slow filter, like a Ruby filter executing sleep(10000) or an Elasticsearch filter that is executing a very heavy query.
A disconnected output plugin that is waiting to reconnect to flush in-flight events.
From Logstash documentation,
Logstash has a stall detection mechanism that analyzes the behavior of
the pipeline and plugins during shutdown. This mechanism produces
periodic information about the count of inflight events in internal
queues and a list of busy worker threads.
You can use --pipeline.unsafe_shutdown flag while starting logstash to force terminate the process in case of stalled shutdown. When --pipeline.unsafe_shutdown isn’t enabled, Logstash continues to run and produce these reports periodically, this is why the problem appears to be random in your case.
Remember that Unsafe shutdowns, force-kills of the Logstash process, or crashes of
the Logstash process for any other reason may result in data loss
(unless you’ve enabled Logstash to use persistent queues).
In Cloudformation I have two stacks (one nested).
Nested stack "ec2-setup":
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Parameters" : {
// (...) some parameters here
"userData" : {
"Description" : "user data to be passed to instance",
"Type" : "String",
"Default": ""
}
},
"Resources" : {
"EC2Instance" : {
"Type" : "AWS::EC2::Instance",
"Properties" : {
"UserData" : { "Ref" : "userData" },
// (...) some other properties here
}
}
},
// (...)
}
Now in my main template I want to refer to nested template presented above and pass a bash script using the userData parameter. Additionally I do not want to inline the content of user data script because I want to reuse it for few ec2 instances (so I do not want to duplicate the script each time I declare ec2 instance in my main template).
I tried to achieve this by setting the content of the script as a default value of a parameter:
{
"AWSTemplateFormatVersion": "2010-09-09",
"Parameters" : {
"myUserData": {
"Type": "String",
"Default" : { "Fn::Base64" : { "Fn::Join" : ["", [
"#!/bin/bash \n",
"yum update -y \n",
"# Install the files and packages from the metadata\n",
"echo 'tralala' > /tmp/hahaha"
]]}}
}
},
(...)
"myEc2": {
"Type": "AWS::CloudFormation::Stack",
"Properties": {
"TemplateURL": "s3://path/to/ec2-setup.json",
"TimeoutInMinutes": "10",
"Parameters": {
// (...)
"userData" : { "Ref" : "myUserData" }
}
But I get following error while trying to launch stack:
"Template validation error: Template format error: Every Default
member must be a string."
The error seems to be caused by the fact that the declaration { Fn::Base64 (...) } is an object - not a string (although it results in returning base64 encoded string).
All works ok, if I paste my script directly into to the parameters section (as inline script) when calling my nested template (instead of reffering to string set as parameter):
"myEc2": {
"Type": "AWS::CloudFormation::Stack",
"Properties": {
"TemplateURL": "s3://path/to/ec2-setup.json",
"TimeoutInMinutes": "10",
"Parameters": {
// (...)
"userData" : { "Fn::Base64" : { "Fn::Join" : ["", [
"#!/bin/bash \n",
"yum update -y \n",
"# Install the files and packages from the metadata\n",
"echo 'tralala' > /tmp/hahaha"
]]}}
}
but I want to keep the content of userData script in a parameter/variable to be able to reuse it.
Any chance to reuse such a bash script without a need to copy/paste it each time?
Here are a few options on how to reuse a bash script in user-data for multiple EC2 instances defined through CloudFormation:
1. Set default parameter as string
Your original attempted solution should work, with a minor tweak: you must declare the default parameter as a string, as follows (using YAML instead of JSON makes it possible/easier to declare a multi-line string inline):
AWSTemplateFormatVersion: "2010-09-09"
Parameters:
myUserData:
Type: String
Default: |
#!/bin/bash
yum update -y
# Install the files and packages from the metadata
echo 'tralala' > /tmp/hahaha
(...)
Resources:
myEc2:
Type: AWS::CloudFormation::Stack
Properties
TemplateURL: "s3://path/to/ec2-setup.yml"
TimeoutInMinutes: 10
Parameters:
# (...)
userData: !Ref myUserData
Then, in your nested stack, apply any required intrinsic functions (Fn::Base64, as well as Fn::Sub which is quite helpful if you need to apply any Ref or Fn::GetAtt functions within your user-data script) within the EC2 instance's resource properties:
AWSTemplateFormatVersion: "2010-09-09"
Parameters:
# (...) some parameters here
userData:
Description: user data to be passed to instance
Type: String
Default: ""
Resources:
EC2Instance:
Type: AWS::EC2::Instance
Properties:
UserData:
"Fn::Base64":
"Fn::Sub": !Ref userData
# (...) some other properties here
# (...)
2. Upload script to S3
You can upload your single Bash script to an S3 bucket, then invoke the script by adding a minimal user-data script in each EC2 instance in your template:
AWSTemplateFormatVersion: "2010-09-09"
Parameters:
# (...) some parameters here
ScriptBucket:
Description: S3 bucket containing user-data script
Type: String
ScriptKey:
Description: S3 object key containing user-data script
Type: String
Resources:
EC2Instance:
Type: AWS::EC2::Instance
Properties:
UserData:
"Fn::Base64":
"Fn::Sub": |
#!/bin/bash
aws s3 cp s3://${ScriptBucket}/${ScriptKey} - | bash -s
# (...) some other properties here
# (...)
3. Use preprocessor to inline script from single source
Finally, you can use a template-preprocessor tool like troposphere or your own to 'generate' verbose CloudFormation-executable templates from more compact/expressive source files. This approach will allow you to eliminate duplication in your source files - although the templates will contain 'duplicate' user-data scripts, this will only occur in the generated templates, so should not pose a problem.
You'll have to look outside the template to provide the same user data to multiple templates. A common approach here would be to abstract your template one step further, or "template the template". Use the same method to create both templates, and you'll keep them both DRY.
I'm a huge fan of cloudformation and use it to create most all my resources, especially for production-bound uses. But as powerful as it is, it isn't quite turn-key. In addition to creating the template, you'll also have to call the coudformation API to create the stack, and provide a stack name and parameters. Thus, automation around the use of cloudformation is a necessary part of a complete solution. This automation can be simplistic ( bash script, for example ) or sophisticated. I've taken to using ansible's cloudformation module to automate "around" the template, be it creating a template for the template with Jinja, or just providing different sets of parameters to the same reusable template, or doing discovery before the stack is created; whatever ancillary operations are necessary. Some folks really like troposphere for this purpose - if you're a pythonic thinker you might find it to be a good fit. Once you have automation of any kind handling the stack creation, you'll find it's easy to add steps to make the template itself more dynamic, or assemble multiple stacks from reusable components.
At work we use cloudformation quite a bit and are tending these days to prefer a compositional approach, where we define the shared components of the templates we use, and then compose the actual templates from components.
the other option would be to merge the two stacks, using conditionals to control the inclusion of the defined resources in any particular stack created from the template. This works OK in simple cases, but the combinatorial complexity of all those conditions tends to make this a difficult solution in the long run, unless the differences are really simple.
Actually I found one more solution than already mentioned. This solution on the one hand is a little "hackish", but on the other hand I found it to be really useful for "bash script" use case (and also for other parameters).
The idea is to create an extra stack - "parameters stack" - which will output the values. Since outputs of a stack are not limited to String (as it is for default values) we can define entire base64 encoded script as a single output from a stack.
The drawback is that every stack needs to define at least one resource, so our parameters stack also needs to define at least one resource. The solution for this issue is either to define the parameters in another template which already defines existing resource, or create a "fake resource" which will never be created becasue of a Condition which will never be satisified.
Here I present the solution with fake resource. First we create our new paramaters-stack.json as follows:
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "Outputs/returns parameter values",
"Conditions" : {
"alwaysFalseCondition" : {"Fn::Equals" : ["aaaaaaaaaa", "bbbbbbbbbb"]}
},
"Resources": {
"FakeResource" : {
"Type" : "AWS::EC2::EIPAssociation",
"Condition" : "alwaysFalseCondition",
"Properties" : {
"AllocationId" : { "Ref": "AWS::NoValue" },
"NetworkInterfaceId" : { "Ref": "AWS::NoValue" }
}
}
},
"Outputs": {
"ec2InitScript": {
"Value":
{ "Fn::Base64" : { "Fn::Join" : ["", [
"#!/bin/bash \n",
"yum update -y \n",
"# Install the files and packages from the metadata\n",
"echo 'tralala' > /tmp/hahaha"
]]}}
}
}
}
Now in the main template we first declare our paramters stack and later we refer to the output from that parameters stack:
{
"AWSTemplateFormatVersion": "2010-09-09",
"Resources": {
"myParameters": {
"Type": "AWS::CloudFormation::Stack",
"Properties": {
"TemplateURL": "s3://path/to/paramaters-stack.json",
"TimeoutInMinutes": "10"
}
},
"myEc2": {
"Type": "AWS::CloudFormation::Stack",
"Properties": {
"TemplateURL": "s3://path/to/ec2-setup.json",
"TimeoutInMinutes": "10",
"Parameters": {
// (...)
"userData" : {"Fn::GetAtt": [ "myParameters", "Outputs.ec2InitScript" ]}
}
}
}
}
Please note that one can create up to 60 outputs in one stack file, so it is possible to define 60 variables/paramaters per single stack file using this technique.
I'm trying to write script in bash, for AWS Autoscaling Group. That means even if instance is terminated, Autoscaling Group reinstall instance and all packages from tags by Package name and Value Package number.
Here is LaunchConfiguration group from AWS Cloudformation template:
"WorkerLC": {
"Type" : "AWS::AutoScaling::LaunchConfiguration",
"Properties" : {
"ImageId": {"Ref" : "SomeAMI"},
"InstanceType" : "m3.medium",
"SecurityGroups" : [{"Ref": "SecurityGroup"}],
"UserData" : {
"Fn::Base64": {
"Fn::Join": [ "", [
{"Fn::Join": ["", ["Engine=", {"Ref": "Env"},".app.net"," \n"]]},
{"Fn::Join": ["", [
"#!/bin/bash\n",
"cd /app/\n",
"./worker-install-package.sh"
]]}
]]
}
}
}
}
And I want to take from tags of AutoscalingGroup like that:
"Worker": {
"Type" : "AWS::AutoScaling::AutoScalingGroup",
"Properties": {
"LaunchConfigurationName": {"Ref": "Worker"},
"LoadBalancerNames": [{"Ref": "WorkerELB"}],
"AvailabilityZones": {"Ref": "AZs"},
"MinSize" : "1",
"MaxSize" : "1",
"HealthCheckGracePeriod": 300,
"Tags" : [
{"Key": "WorkersScalingGroup", "Value": {"Fn::Join": ["", ["Offering-", {"Ref": "Env"} "-Worker-1"]]}, "PropagateAtLaunch": true},
{"Key": "EIP", "Value": {"Ref": "WorkerIP"}, "PropagateAtLaunch": true},
{"Key": "Environment", "Value": {"Ref": "Env"}, "PropagateAtLaunch": true}
]
}
}
So, now is hard part. Now I tried to find in Userdata tags with text "worker". Because I have couple types of instances and each one comes with other couple packages.
It first time when I wrote something in bash.
Here is worker-install-package.sh:
#read tag for the installed package
EC2_REGION='us-east-1'
AWS_ACCESS_KEY='xxxxx'
AWS_SECRET_ACCESS_KEY='xxxxxxxxxxxxxxxxxx'
InstanceID=`/usr/bin/curl -s http://169.254.169.254/latest/meta-data/instance-id`
PackageName=`/opt/aws/apitools/ec2/bin/ec2-describe-tags -O $AWS_ACCESS_KEY -W $AWS_SECRET_ACCESS_KEY --filter resource-id=$InstanceID --filter key='worker' | cut -f5`
while read line
if [ "$PackageNmae" = "worker" ]; then
sudo -- sh -c "./install-package.sh ${PackageName} ${Value}"
/opt/aws/apitools/ec2/bin/ec2-create-tags $InstanceID -O $AWS_ACCESS_KEY -W $AWS_SECRET_ACCESS_KEY --tag "worker-${PackageName}"=$Value
fi
done
I have two questions. First, if I'm doing that in right way. And second, is How I can take value of package name value(it some number of package version)?
Thanks!
First of all, as a best practice don't include your AWS keys in your script. Instead attach a role to your instance at launch (this can be done in the launch configuration of your autoscaling group).
Secondly, what you do is one way to go, and it can definitely work. Another way (proper but slightly more complex) to achieve this would be to use a tool like puppet or AWS opsworks.
However, I don't really get what you are doing in your script, which seem overcomplicated for this purpose: why don't you include your package name in your userdata script? If this is only a matter of agility when it comes to change/update the script, you can outsource this script to an S3 bucket and have the instances download / execute it at creation time. This way you don't need to read from the tags.
That been said, and more as a comment, if you do want to remain reading tags, then I don't really understand you script. If you do need help on the script, please provide more details in that sense (e.g debug samples etc):
when you evaluate PackageName, does this work?
PackageName=`/opt/aws/apitools/ec2/bin/ec2-describe-tags -O $AWS_ACCESS_KEY -W $AWS_SECRET_ACCESS_KEY --filter resource-id=$InstanceID --filter key='worker' | cut -f5`
not sure why you filter with "key=worker", and not "WorkersScalingGroup"
Then you call the below if condition:
if [ "$PackageNmae" = "worker" ]; then
(I assume there is typo here, and should be PackageName) and right below you call:
"worker-${PackageName}"
which would give "worker-worker"?
I am running the following filter in a logstash config file:
filter {
if [type] == "logstash" {
grok {
match => {
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:mymessage}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:mymessage}"
]
}
}
}
}
It kind of works:
it does identify and carve out variables "timestamp", "severity", "instance", "mymessage", and "reason"
Really what I wanted was to have text which is now %{mymessage} to be the ${message} but when I add any sort of mutate command to this grok it stops working (btw, should there be a log that tells me what is breaking? I didn't see it... ironic for a logging solution to not have verbose logging).
Here's what I tried:
filter {
if [type] == "logstash" {
grok {
match => {
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:mymessage}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:mymessage}"
]
}
mutate => {
replace => [ "message", "%{mymessage}"]
remove => [ "mymessage" ]
}
}
}
}
So in summary I'd like to understand:
Are there log files I can look at to see why/where a failure is happening?
Why would my mutate commands illustated above not work?
I also thought that if I never used the mymessage variable but instead just referred to message as the variable that maybe it would automatically truncate message to just the matched pattern but that appeared to append the results instead ... what is the correct behaviour?
Using the overwrite option is the best solution, but I thought I'd address a couple of your questions directly anyway.
It depends on how Logstash is started. Normally you'd run it via an init script that passes the -l or --log option. /var/log/logstash would be typical.
mutate is a filter of its own, not a part of grok. You could've done like this (or used rename instead of replace + remove):
grok {
...
}
mutate {
replace => [ "message", "%{mymessage}" ]
remove => [ "mymessage" ]
}
I'd do it a different way. For what you're trying to do, the overwrite option might be more apt.
Something like this:
grok {
overwrite => "message"
match => [
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:message}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:message}"
]
]
}
This'll replace 'message' with the 'grokked' bit.
I know that doesn't directly answer your question - about all I can say is when you start logstash, it writes to STDOUT - at least on the version I'm using - which I'm capturing and writing to a file. In here, it reports some of the errors.
There's a -l option to logstash that lets you specify a log file to use - this will usually show you what's going on in the parser, but bear in mind that if something doesn't match a rule, it won't necessarily tell you why it didn't.
Why isn't the "tidy" resource removing files on a new provision. I have the following:
package {'apache2':
ensure => present,
before => [
File["/etc/apache2/apache2.conf"],
File["/etc/apache2/envvars"]
],
}->
#Remove the conf files in the conf.d directory except the charset.
tidy { 'tidy_apache_conf':
path => '/etc/apache2/conf.d/',
recurse => 1,
backup => true,
matches => [
'localized-error-pages',
'other-vhosts-access-log',
'security'
],
}
On provisioning the files specified in the matches attribute aren't removed. However by specifying a "file" resource, I see the desired results.
$unwanted_apache_conf = [
'/etc/apache2/conf.d/localized-error-pages',
'/etc/apache2/conf.d/other-vhosts-access-log',
'/etc/apache2/conf.d/security'
]
package {'apache2':
ensure => present,
before => [
File["/etc/apache2/apache2.conf"],
File["/etc/apache2/envvars"]
],
}->
file { $unwanted_apache_conf:
ensure => absent
}
Why isn't the tidy resource removing the files? The tidy resource should be generating a file resource for each file matched. Am I missing an attribute in the tidy resource, or just missing the concept entirely? Is there any way to see what the file resources the tidy resource is generating look like? Thanks for any input.
This is because the resource Tidy for about 5 years didn't support notify mechanisms.
https://projects.puppetlabs.com/issues/3924