Forward logs with logstash running in ec2 instance to Amazon Elasticsearch service - logstash

I'm currently running logstash in an ec2 instance with the default linux AMI, and attempting to send logs to an AWS ES instance. If I used the standard 'elasticsearch' output, I can send unsigned data to the AWS ES instance, but I'm trying to set up a prod-ready framework, and everything I've read recommends using the AWS labs logstash output plugin here( https://github.com/awslabs/logstash-output-amazon_es ).
I can confirm the plugin is installed, but when I run logstash using the below conf file, I get a message that 'Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties', but no data appears in my elasticsearch endpoint with the '/_search?pretty=true' flag when I refresh after making a stdin entry.
input {
stdin {
}
}
output {
amazon_es {
hosts => ["https://search-secretstuff.es.amazonaws.com"]
region => "xxxxx"
aws_access_key_id => 'xxxxxx'
aws_secret_access_key => 'xxxxxx'
index => "prod-logs-%{+YYYY.MM.dd}"
template => "/etc/logstash/mappings/es6-template.json"
}
}
In addition to using stdin, I've tried using a file input, ex
input {
file {
path => "/var/log/amazon/ssm/errors.log"
}
}
The template I'm using is below, as per the accepted answer for this post (Logstash conf error - amazon_es)
{
"template" : "logstash-*",
"version" : 60001,
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"dynamic_templates" : [ {
"message_field" : {
"path_match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text",
"norms" : false
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text", "norms" : false,
"fields" : {
"keyword" : { "type": "keyword", "ignore_above": 256 }
}
}
}
} ],
"properties" : {
"#timestamp": { "type": "date"},
"#version": { "type": "keyword"},
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "half_float" },
"longitude" : { "type" : "half_float" }
}
}
}
}
}
}
Does anything in the configuration jump out as a potential pain point? I've tried a number of iterations of both the template file and the logstash.conf file, and now feel like I'm beating my head against the wall to no avail.

Related

Error in reading json using aws-python-sdk

I am registering thing using register_thing sdk command and passing my CSR as parameters:
f = open("template.json", "r")
print(f.read())
template = json.dumps(f.read())
#template1 = f.read()
f.close()
response = client.register_thing(templateBody=template, parameters={"ThingName": thing_name,
"CSR": csr.decode('UTF-8')
})
But I am getting error in reading json:
I checked my json file and found no problem:
{
"Parameters" : {
"ThingName" : {
"Type" : "String"
},
"SerialNumber" : {
"Type" : "String"
},
"Location" : {
"Type" : "String",
"Default" : "WA"
},
"CSR" : {
"Type" : "String"
}
},
"Resources" : {
"thing" : {
"Type" : "AWS::IoT::Thing",
"Properties" : {
"ThingName" : {"Ref" : "ThingName"},
"AttributePayload" : { "version" : "v1", "serialNumber" : {"Ref" : "SerialNumber"}}
}
},
"certificate" : {
"Type" : "AWS::IoT::Certificate",
"Properties" : {
"CertificateSigningRequest": {"Ref" : "CSR"},
"Status" : "ACTIVE"
}
},
"policy" : {
"Type" : "AWS::IoT::Policy",
"Properties" : {
"PolicyDocument": "{\"Version\": \"2012-10-17\",\"Statement\": [{\"Effect\": \"Allow\",\"Action\": [\"iot:*\"],\"Resource\": [\"*\"]}]}"
}
}
}
}
Can someone give me hint what is wrong here?
Check your template is saved well. I think the json.dumps is not giving the correct result. Let us change the code as follows:
import json
with open("test.json", "r") as f:
contents = f.read()
template = json.dumps(contents)
which will give you the correct template.

Mongo replica sets load management

Currently I have two node applications which use a mongo replica set (1 primary and 6 secondary). Currently the read queries of one application cause a load on mongo and affect the performance of the other application. So, I want to divide the secondary nodes such that 1 application uses the primary and 4 secondary members and the other application uses the primary and the other 2 secondary members. I don't want the load on one application to affect the other. How do i achieve the same?
You can set different tags per application and read from SECONDARY members per defined tag , for example:
app1--> db.collection.find({}).readPref( "secondary", [ { "app": "app1" } ] )
app2--> db.collection.find({}).readPref( "secondary", [ { "app": "app2" } ] )
And have you replicaSet as follow:
{
"_id" : "myrs",
"version" : 2,
"members" : [
{
"_id" : 0,
"host" : "host1:27017",
"tags" : {
"app": "app1"
}
}, {
"_id" : 1,
"host" : "host2:27017",
"tags" : {
"app": "app1"
}
}, {
"_id" : 2,
"host" : "host3:27017",
"tags" : {
"app": "app1"
}
}, {
"_id" : 3,
"host" : "host4:27017",
"tags" : {
"app": "app2"
}
}, {
"_id" : 4,
"host" : "host4:27017",
"tags" : {
"app": "app2"
}
}
]
}
How to add the tags:
conf = rs.conf();
conf.members[0].tags = { "app": "app1" };
conf.members[1].tags = { "app": "app1" };
conf.members[2].tags = { "app": "app2" };
conf.members[3].tags = { "app": "app2" };
conf.members[4].tags = { "app": "app2" };
rs.reconfig(conf);

using terms in elasticsearch js

because of the wonderful documentation Elasticsearch has , I cant figure out the proper syntax to search for a term , this is my code:
let checkuser = await client.search({
index: "users",
type: "my_users",
body: {
query: {
term: {
email: req.body.email
}
}
}
});
I wish to search for an object that has a key value pair of 'email' with a certain email, but I wish it to be the exact email I wrote, if its a#mail.com ab#mail.com should not match, I know I need to use terms but when i write it like that it doesn't work, whats wrong with my syntax?
PS this is my index mapping:
"users" : {
"mappings" : {
"jobix_users" : {
"properties" : {
"confirmed" : {
"type" : "boolean"
},
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"firstName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"jobNotification" : {
"type" : "boolean"
},
"jobTitle" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"password" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"userName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
You use default mapping and this means that you use standard tokenizer when you index documents.
How you can see in mapping email field have two interpretations:
text
keyword
In text standard tokenizer is work and save tokens to your index. This mean you can find for term alex and for term mail.com. If you want to find whole email then your query should look as:
{
"query": {
"term": {
"email.keyword": req.body.email
}
}
}
But elasticsearch has special uax_url_email tokenizer for urls and mails. I would like to recommend use this tokenizer for the email field.

I can't filter neither I can aggregate the documents I have been saved to ElasticSearch via LogStash

I guess the problem may be related to my logstash.conf but I don't know exactly what to do. I found excellent tutorials explaning how to do it using only ElasticSearch but in my case all data will come from NodeJs via LogStash.
I search about enabling fieldData but I couldn't figure out how to do it in my logstash.conf. Should I create a Index Template? If so how?
The context is that I want to log every time an user access our application and then bill him/her according to the access number per month.
logstash.conf
input {
tcp {
port => 5000
type => cpfTipo
}
}
filter {
json {
source => "message"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] index => "mycostumer_indice" document_type => "cpfTipo"}
}
Tentative to filter:
1)
curl -XGET http://127.0.0.1:9200/mycostumer_indice/cpfTipo/_search -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter":
{
"term": {
"username": "a"
}
}
]
}
}
}
}'
{"error":{"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [filtered]","line":3,"col":21}],"type":"parsing_exception","reason":"no [query] registered for [filtered]","line":3,"col":21},"status":400}demetrio#nodejs ~/tool
Tentatives to aggregate:
1)
curl -XGET http://127.0.0.1:9200/mycostumer_indice/cpfTipo/_search -d '{
{
"aggs" : {
"message" : {
"terms" : {
"field" : "cpfTipo",
"size" : 5
}
}
}
}'
{"error":{"root_cause":[{"type":"json_parse_exception","reason":"Unexpected character ('{' (code 123)): was expecting double-quote to start field name\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#3ce63313; line: 2, column: 2]"}],"type":"json_parse_exception","reason":"Unexpected character ('{' (code 123)): was expecting double-quote to start field name\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#3ce63313; line: 2, column: 2]"},"status":500}
2)
curl -XPOST 'http://127.0.0.1:9200/mycostumer_indice/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_username": {
"terms": {
"field": "username"
}
}
}
}'
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [username] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "mycostumer_indice",
"node" : "-em7X-ssT3SL2JBtfs0VTQ",
"reason" : {
"type" : "illegal_argument_exception",
"reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [username] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
],
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [username] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
},
"status" : 400
}
How mycostumer index appears:
curl http://127.0.0.1:9200/mycostumer_indice/cpfTipo/_search?pretty
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "mycostumer_indice",
"_type" : "cpfTipo",
"_id" : "AVrxUi5cIZDJUBCguFI8",
"_score" : 1.0,
"_source" : {
"password" : "a",
"#timestamp" : "2017-03-21T14:42:54.466Z",
"port" : 56012,
"#version" : "1",
"host" : "127.0.0.1",
"message" : "{\"username\":\"a\",\"password\":\"a\"}",
"type" : "cpfTipo",
"username" : "a"
}
}
]
}
}
In nodeJs
var express = require('express');
var bodyParser = require('body-parser');
var Client = require('node-rest-client').Client;
var expressWinston = require('express-winston');
var winston = require('winston');
require('winston-logstash');
var client = new Client();
var Logstash = require('logstash-client');
var app = express();
expressWinston.requestWhitelist.push('body');
expressWinston.responseWhitelist.push('body')
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({
extended: true
}));
var port = process.env.PORT || 3000;
var router = express.Router();
var tokenRoute = router.route('/token');
tokenRoute.post(function (req, res) {
var user = {
username: req.body.username,
password: req.body.password
};
logstash.send(user);
Your first search query uses a deprecated filtered query, simply replace it with bool and you're good:
curl -XGET http://127.0.0.1:9200/mycostumer_indice/cpfTipo/_search -d '{
"query": {
"bool": {
"filter":
{
"term": {
"username": "a"
}
}
]
}
}
}
}'
Your second query has one too many open brace at the beginning, use this one instead.
curl -XGET http://127.0.0.1:9200/mycostumer_indice/cpfTipo/_search -d '{
"aggs" : {
"message" : {
"terms" : {
"field" : "cpfTipo",
"size" : 5
}
}
}
}'
Your third query fails because you're trying to aggregate on username which is a text field. You should change the mapping of that field to use the keyword type instead.

MongoDB update $pull document from multiple arrays

I have a document with the following structure (simplified):
{
"containers": [
{
"containerId": 1,
"components": ["component1", "component2"]
},
{
"containerId": 2,
"components": ["component3", "component1"]
}]
}
How would one write a query that removes "component1" from BOTH containers? Is this possible?
So far I've tried doing {"$pullAll": { "containers.$.component": ["component1"]}}, a similar query with $pull, setting multi: true but I always end up removing the component from the first array only (I'm using .update())
EDIT: Raw data ahead!
{
"_id" : ObjectId("53a056cebe56154c99dc950b"),
"_embedded" : {
"click" : {
"items" : [],
"_links" : {
"self" : {
"href" : "http://localhost/v1/click"
}
}
},
"container" : {
"_links" : {
"self" : {
"href" : "http://localhost/v1/container"
}
},
"items" : [
{
"name" : "Container test",
"uriName" : "Container_test",
"description" : "this is a test container",
"containerId" : "CONTAINER TEST+SITE_TEST",
"component" : [
"ANOTHER COMPONENT+SITE_TEST",
"ANOTHER COMPONENT+SITE_TEST",
"SARASA+SITE_TEST"
],
"_links" : {
"self" : {
"href" : "http://localhost/v1/container/CONTAINER TEST+SITE_TEST"
}
}
},
{
"name" : "sasasa",
"uriName" : "sasasa",
"description" : "container description",
"containerId" : "SASASA+SITE_TEST",
"component" : [
"ANOTHER COMPONENT+SITE_TEST",
"COMPONENT+SITE_TEST",
"FAFAFA+SITE_TEST",
"SARASA+SITE_TEST"
],
"_links" : {
"self" : {
"href" : "/v1/container/SASASA+SITE_TEST"
}
}
}
]
}
},
"name" : "SITE_TEST",
"siteId" : "SITE_TEST",
"url" : "/v1/site"
}
Ok, so what I'm trying to do is remove the component "SARASA+SITE_TEST" from the two containers. I'm using robomongo to test the queries. I've tried db.site.update({"_embedded.container.items.component": "SARASA+SITE_TEST"},{"$pullAll": { "_embedded.container.items.component": ["SARASA+SITE_TEST"]}}, {multi: true}) and it didn't work, previously I've tried db.site.update({"_embedded.container.items.component": "SARASA+SITE_TEST"},{"$pull": { "_embedded.container.items.$.component": "SARASA+SITE_TEST"}}, {"multi": true}) and it didn't work either. I assume robomongo exposes the mongo driver directly, I didn't try to run this from the command line.
(the document is a "site", that's why my queries start with db.site)
I had a similar problem and I tried $pullAll and it worked.
https://docs.mongodb.org/manual/reference/operator/update/pullAll/
I tried the simplified version of your data and $pull works:
> db.testcoll.insert({"containers": {"containreId": 1, "components": ["component1", "component2"]}})
> db.testcoll.insert({"containers": {"containreId": 2, "components": ["component3", "component1"]}})
> db.testcoll.find()
{ "_id" : ObjectId("53a8428ca2696f063b5c51eb"), "containers" : { "containreId" : 1, "components" : [ "component1", "component2" ] } }
{ "_id" : ObjectId("53a8429ea2696f063b5c51ec"), "containers" : { "containreId" : 2, "components" : [ "component3", "component1" ] } }
> db.testcoll.update({"containers.components": "component1"}, {$pull: {"containers.components": "component1"}}, {multi: true})
> db.testcoll.find()
{ "_id" : ObjectId("53a8428ca2696f063b5c51eb"), "containers" : { "components" : [ "component2" ], "containreId" : 1 } }
{ "_id" : ObjectId("53a8429ea2696f063b5c51ec"), "containers" : { "components" : [ "component3" ], "containreId" : 2 } }

Resources