Apache Cassandra schema design - cassandra

I have following setup:
Have CF items and CF keywords.
Each item have zero, one or more keywords, stored in columns.
Each keyword have one or more items, stored in columns.
It looks like this:
items {
dl { name => DELL6400, keyword:1 => computer, keyword:2 => DELL, keyword:3 => topseller }
hp { name => HP12345, keyword:1 => computer, keyword:2 => HP }
no { name => Nokia8210, keyword:1 => phone, keyword:2 => NOKIA }
}
// here I store keys of the items only,
// in reality I have denormalized most of items columns
keywords{
computer { webpage => www.domain.com/computer , item:dl => dl , item:hp => hp }
DELL { webpage => www.domain.com/dell , item:dl => dl }
topseller { webpage => www.domain.com/top , item:dl => dl }
HP { webpage => www.domain.com/hp , item:hp => hp }
NOKIA { webpage => www.domain.com/nokia , item:no => no }
phone { webpage => www.domain.com/phone , item:no => no }
}
when I add new item, I am adding "webpage" column in keywords if neccessary.
when I am removing an item, I am removing column "item:xx" as well
question is how to avoid "empty" keywords such if I remove nokia item "no":
keywords{
...
NOKIA { webpage => www.domain.com/nokia }
phone { webpage => www.domain.com/phone }
}
I can count slice item:*, but because of eventual consistency this will be probably wrong aproach.

You can add a CounterColumn (http://wiki.apache.org/cassandra/Counters) to keywords CF. Increment it when adding an item to the keyword, and decrement on removal:
keywords{
computer { webpage => www.domain.com/computer , count => 2 , item:dl => dl , item:hp => hp }
....
}
When reading a row with count == 0, just treat it as deleted. You shouldn't actually delete the 'webpage' column if you read the row with count == 0, since there might be concurrent add operation.

this is interesting, but I though about other way - to denormalize the "webpage" thing, e.g.:
[code]
keywords{
computer { webpage:dl => www.domain.com/computer , item:dl => dl ,
webpage:dl => www.domain.com/computer , item:hp => hp }
DELL { webpage:dl => www.domain.com/dell , item:dl => dl }
topseller { webpage:dl => www.domain.com/top , item:dl => dl }
HP { webpage:hp => www.domain.com/hp , item:hp => hp }
NOKIA { webpage:no => www.domain.com/nokia , item:no => no }
phone { webpage:no => www.domain.com/phone , item:no => no }
}
[/code]
in such case when i delete item:xx, i delete webpage:xx as well, and row is auto-removed (ghost) if there is no fields there. However I am still not sure if this is such a bright idea.

Related

Logstash aggregation return empty message

I have a testing environment to test some logstash plugin before to move to production.
For now, I am using kiwi syslog generator, to generate some syslog for testing.
The field I have are as follow:
#timestamp
message
+ elastic medatadata
Starting from this basic fields, I start filtering my data.
The first thing is to add a new field based on the timestamp and message as follow:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "message_count"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
The prune is just to don't process unwanted data.
And this works just fine as I am getting a new field with those 2 values.
The next step was to run some aggregation based on specific content of the message, such as if the message contains logged in or logged out
and to do this, I used the aggregation filter
grok {
match => {
"message" => [
"(?<[#metadata][event_type]>logged out)",
"(?<[#metadata][event_type]>logged in)",
"(?<[#metadata][event_type]>workstation locked)"
]
}
}
aggregate {
task_id => "%{message}"
code => "
map['message_count'] ||= 0; map['message_count'] += 1;
"
push_map_as_event_on_timeout => true
timeout_timestamp_field => "#timestamp"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
}
}
This worked as expected but I am having a problem here. When the aggregation times out. the only field populated for the specific aggregation, is the message_count
As you can see in the above screenshot, the newfield and message(the one on the total left, sorry it didn't fit in the screenshot) are both empty.
For the demostration and testing purpose that's is absolutely fine, but it will because unmanageable if I get hundreds of syslog per second not knowing to with message that message_count refers to.
Please, I am struggling here and I don't know how to solve this issue, can please somebody help me to understand how I can fill the newfield with the content of the message that it refers to?
This is my whole logstash configuration to make it easier.
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "message_count"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
grok {
match => {
"message" => [
"(?<[#metadata][event_type]>logged out)",
"(?<[#metadata][event_type]>logged in)",
"(?<[#metadata][event_type]>workstation locked)"
]
}
}
aggregate {
task_id => "%{message}"
code => "
map['message_count'] ||= 0; map['message_count'] += 1;
"
push_map_as_event_on_timeout => true
timeout_timestamp_field => "#timestamp"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
csv {
path => "C:\Users\adminuser\Desktop\syslog\syslogs-%{+yyyy.MM.dd}.csv"
fields => ["timestamp", "message", "message_count", "newfield"]
}
}
push_map_as_event_on_timeout => true
When you use this, and a timeout occurs, it creates a new event using the contents of the map. If you want fields from the original messages to be in the new event then you have to add them to the map. For the task_id there is a shorthand notation to do this using the timeout_task_id_field option on the filter, otherwise you have explicitly add them
map['newfield'] ||= event.get('newfield');

What is the correct way to map and translate cookies in logstash?

My input is a log from IIS server with cookies included. I want my output (elasticsearch) to have a field like this:
"cookies": {
"cookie_name": "cookie_value"
}
Also for some cookies I want their values to be replaced with some other values from a dictionary.
Basically, I think the following filter config solves my problem:
kv {
source => "cookie"
target => "cookies"
trim => ";"
include_keys => [ "cookie_name1","cookie_name2" ]
}
translate {
field => "cookies.cookie_name1"
destination => "cookies.cookie_name1"
dictionary_path => "/etc/logstash/dict.yaml"
override => "true"
fallback => "%{cookies.cookie_name1}"
}
The problem is that I don't know if it’s the right way to do this, and whether it will work at all (especially the cookies.cookie_name part).
The correct way to do this is:
kv {
source => "cookie"
target => "cookies"
field_split => ";+"
include_keys => [ "cookie_name1","cookie_name2" ]
}
translate {
field => "[cookies][cookie_name1]"
destination => "[cookies][cookie_name1]"
dictionary_path => "/etc/logstash/dict.yaml"
override => "true"
fallback => "%{[cookies][cookie_name1]}"
}
https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#logstash-config-field-references
https://www.elastic.co/guide/en/logstash/7.4/plugins-filters-kv.html
https://www.elastic.co/guide/en/logstash/7.4/plugins-filters-translate.html

Array iteration with position in puppet

I'm planning to implement the possibility to add multiple ssh keys per user.
For a single key, I used:
if ($sshkey) {
ssh_authorized_key { $resourcename:
ensure => 'present',
type => 'ssh-rsa',
key => '$sshkey',
user => $title,
require => User[$title],
}
}
For multiple keys, i thought that this might work:
if ($sshkeyarray != []) {
$sshkeyarray.each |String $singlesshkey| {
ssh_authorized_key { $resourcename:
ensure => 'present',
type => 'ssh-rsa',
key => '$singlesshkey',
user => $title,
require => User[$title],
}
}
}
But the resourcename can only be used once, so I want to give names like "resourcename_1" for the first ssh key and "resourcename_n" for the n-th key.
How can I do this? Can i get the position of the singlesshkey from the array and add it to the resourdcename?
As described in the docs here you can do this:
$sshkeyarray.each |$index, String $singlesshkey| {
ssh_authorized_key { "${resourcename}_${index}":
ensure => 'present',
type => 'ssh-rsa',
key => $singlesshkey,
user => $title,
require => User[$title],
}
}
Notice that there's no need to test for an empty array either. Looping over an empty array causes nothing to happen anyway.

How to create multilingual menu link programmatically in Drupal 7

I'm trying to create the menu link programmatically. But its not working where source language is other than english. Here is my code.
$language_list = language_list();
foreach ($language_list as $language_code => $language_object) {
$menu_item = array(
'link_title' => t('Fruit'),
'menu_name' => 'menu-main-footer',
'customized' => 1,
'link_path' => $custom_path,
'language' => $language_code,
'weight' => 30,
);
menu_link_save($menu_item);
}
Any one have some idea on this?
I changed my code. And it work for me.
// Create menu translation set.
$menu_translation_set = i18n_translation_set_create('menu_link');
// Create translated menu link for all site enable language.
$language_list = language_list();
foreach ($language_list as $language_code => $language_object) {
// Add Fruit link in menu-main-footer.
// 'change-fruit' is node title.
$fruit_path = drupal_get_normal_path('change-fruit', $language_code);
if (!menu_link_get_preferred($fruit_path, 'menu-main-footer')) {
$menu_item = array(
'link_title' => t('fruit'),
'menu_name' => 'menu-main-footer',
'customized' => 1,
'link_path' => $fruit_path,
'language' => $language_code,
'weight' => 30,
'i18n_tsid' => $menu_translation_set->tsid,
);
menu_link_save($menu_item);
$menu_translation_set->add_item($menu_item, $language_code);
$menu_translation_set->save();
}
}
May be helpful to other.
I had to migrate an old menu to a new one with its localized translations so here is what I did :
$old_name = 'menu-old';
$new_name = 'menu-new';
$old_menu = menu_load($old_name);
if(isset($old_menu)){
$old_mlids = db_query("SELECT mlid from {menu_links} WHERE menu_name=:menu_name", array(':menu_name' => $old_name))->fetchAll();
if(!empty($old_mlids)){
// Clean existing items in new menu.
$new_mlids = db_query("SELECT mlid from {menu_links} WHERE menu_name=:menu_name", array(':menu_name' => $new_name))->fetchAll();
if(!empty($new_mlids)){
foreach($new_mlids as $record){
menu_link_delete($record->mlid);
}
}
// Copy old to new menu.
foreach($old_mlids as $record){
$old_menu_item = menu_link_load($record->mlid);
$new_menu_item_config = array(
'link_title' => $old_menu_item['link_title'],
'link_path' => $old_menu_item['link_path'],
'menu_name' => $new_name,
'customized' => 1,
'weight' => $old_menu_item['weight'],
'expanded' => $old_menu_item['expanded'],
'options' => $old_menu_item['options'],
);
$new_menu_item = $new_menu_item_config;
menu_link_save($new_menu_item);
// Migrate translations.
$languages = language_list('enabled')[1];
foreach($languages as $lang_code => $language_object){
if ($lang_code == language_default('language')) {
continue;
}
$translation_value = i18n_string_translate('menu:item:'.$old_menu_item['mlid'].':title', $old_menu_item['link_title'], array('langcode' => $lang_code));
if($translation_value != $old_menu_item['link_title']){
i18n_string_translation_update('menu:item:'.$new_menu_item['mlid'].':title', $translation_value, $lang_code, $old_menu_item['link_title']);
}
}
}
}
// Delete old menu.
menu_delete(array('menu_name' => $old_name));
}

Puppet - How to use $facts['name'] in notify {}

Question
How I can use $facts['fact name'] in notify?
Issue
The code below is OK.
$virt = $facts['virtual']
notify { "I'm using a value !${virt}! ": }
Notice: I'm using a value !vmware!
However, the code below shows (it looks) all the facts.
notify { "I'm using a value $facts['virtual'] ": }
Notice: I'm using a value {architecture => amd64, augeas => {version => 1.4.0}, augeasversion => 1.4.0, bios_release_date => 09/30/2014, bios_vendor => Phoenix Technologies LTD, bios_version => 6.00, blockdevice_fd0_size => 0, blockdevice_sda_model => Virtual disk, blockdevice_sda_size => 107374182400, blockdevice_sda_vendor => VMware, blockdevice_sdb_model => Virtual disk, blockdevice_sdb_size => 536870912000, blockdevice_sdb_vendor => VMware, blockdevice_sr0_model => VMware IDE CDR10, blockdevice_sr0_size => 1073741312, blockdevice_sr0_vendor => NECVMWar, blockdevices => fd0,sda,sdb,sr0, boardmanufacturer => Intel Corporation, boardproductname => 440BX Desktop Reference Platform, chassisassettag => No A ...... (a lot)
Kindly help oo get the same result as in the first, but not using ${::virtual} but using $facts['virtual'].
You do it like this:
notify { "I'm using a value ${facts['virtual']}": }

Resources