Filebeat has high read IO on SATA, is there any way to slow down the collecting rate? - io

I have a Filebeat setup to collect log data in SATA, but it does have a high read IO of around 20mb/s. below is my Filebeat config, it outputs to Kafka cluster. im just wondering is there any way to slow down the Filebeat collecting rate to reduce the read IO on disk? or any config that I can make to lower the IO?
type: log
encoding: plain
scan_frequency: 3s
- /usr/share/filebeat/logs/*/*.log
- /usr/share/filebeat/logs/*/*.error
fields_under_root: true
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
multiline.timeout: 5s
tail_files: false
symlinks: false
backoff: 1s
max_backoff: 10s
backoff_factor: 2
harvester_limit: 0
### Harvester closing options
close_inactive: 20m
close_renamed: false
close_removed: false
close_eof: false
close_timeout: 0
### State options
clean_inactive: 0
clean_removed: true
enabled: true
hosts: '${KAFKA_HOSTS:kafka:9092}'
topic: 'topic'
reachable_only: true


how to configure and run Reaper to repair cassandra in linux( centos environment)

I'm trying to install and run Reaper 1.4 on my centos VM. And followed the same installation step as in given video (, but still no success in getting reaper started.Can anyone please help me with proper/complete document. however i have read and followed
Below given is my cassandra-reaper.yaml settings:
segmentCountPerNode: 16
repairParallelism: DATACENTER_AWARE
repairIntensity: 0.9
scheduleDaysBetween: 7
repairRunThreadCount: 15
hangingRepairTimeoutMins: 30
storageType: cassandra
enableCrossOrigin: true
incrementalRepair: false
blacklistTwcsTables: false
enableDynamicSeedList: true
repairManagerSchedulingIntervalSeconds: 10
activateQueryLogger: false
jmxConnectionTimeoutInSeconds: 5
useAddressTranslator: false
# purgeRecordsAfterInDays: 30
# numberOfRunsToKeepPerUnit: 10
# 7100
#10.X.X.X: 7199
# 7200
# 7300
# 7400
# 7500
# 7600
# 7700
# 7800
username: *****
password: *****
type: default
- type: http
port: 8080
- type: http
port: 8081
appenders: []
clusterName: "dc1"
contactPoints: ["10.X.X.1","10.X.X.2","10.X.X.3","10.X.X.4","10.X.X.5"]
#contactPoints: [""]
keyspace: "reaper_db"
type: tokenAware
shuffleReplicas: true
type: dcAwareRoundRobin
usedHostsPerRemoteDC: 0
allowRemoteDCsForLocalConsistencyLevel: false
type: plainText
username: cass
password: cass
type: jdk
enabled: false
initialDelayPeriod: PT15S
periodBetweenPolls: PT10M
timeBeforeFirstSchedule: PT5M
scheduleSpreadPeriod: PT6H
- keyspace1
- keyspace2
sessionTimeout: PT10M
iniConfigs: ["classpath:shiro.ini"]
log from /var/log/cassandra-reaper/reaper.log
INFO [main] i.c.ReaperApplication - initializing runner thread pool with 15 threads
INFO [main] i.c.ReaperApplication - initializing storage of type: cassandra
INFO [main] c.d.d.core - DataStax Java driver 3.5.0 for Apache Cassandra
INFO [main] c.d.d.c.GuavaCompatibility - Detected Guava >= 19 in the classpath, using modern compatibility layer
INFO [main] c.d.d.c.ClockFactory - Using native clock to generate timestamps.
INFO [main] c.d.d.c.NettyUtil - Found Netty's native epoll transport in the classpath, using it
INFO [main] o.a.s.c.ReflectionBuilder - An instance with name 'authc' already exists. Redefining this object as a new instance of type org.apache.shiro.web.filter.authc.PassThruAuthenticationFilter
log from /var/log/cassandra-reaper.err
at org.yaml.snakeyaml.scanner.ScannerImpl.fetchMoreTokens(
at org.yaml.snakeyaml.scanner.ScannerImpl.checkToken(
at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockMappingValue.produce(
at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(
at org.yaml.snakeyaml.parser.ParserImpl.getEvent(
at com.fasterxml.jackson.dataformat.yaml.YAMLParser.nextToken(
... 11 more
ls: cannot access server/target/cassandra-reaper-*.jar: No such file or directory
io.dropwizard.configuration.ConfigurationParsingException: /etc/cassandra-reaper/cassandra-reaper.yaml has an error:
* Malformed YAML at line: 27, column: 11; while scanning for the next token; found character '\t' that cannot start any token; in 'reader', line 27, column 1:
clusterName: "dc1"
at [Source: (ByteArrayInputStream); line: 26, column: 10]
at io.dropwizard.configuration.ConfigurationParsingException$
at io.dropwizard.cli.ConfiguredCommand.parseConfiguration(
at io.cassandrareaper.ReaperApplication.main(
Caused by: com.fasterxml.jackson.dataformat.yaml.snakeyaml.error.MarkedYAMLException: while scanning for the next token; found character '\t' that cannot start any token; in 'reader', line 27, column 1:
clusterName: "dc1"
Malformed YAML at line: 27, column: 11; while scanning for the next token; found character '\t' that cannot start any token; in 'reader', line 27, column 1:
clusterName: "dc1"
You need to remove any tab whitespaces in your yaml file and replace it with 4 spaces instead.
See the answer here for why this is common when manipulating YAML files.
A YAML file cannot contain tabs as indentation

Why do I see a spike of steps per second in tensorflow training initially?

Hi tensorflow experts,
I see the following training speed profile using dataset API and prefetching of 128, 256, 512, or 1024 batches (each of 128 examples):
INFO:tensorflow:Saving checkpoints for 0 into
INFO:tensorflow:loss = 0.969178, step = 0
INFO:tensorflow:global_step/sec: 70.3812
INFO:tensorflow:loss = 0.65544295, step = 100 (1.422 sec)
INFO:tensorflow:global_step/sec: 178.33
INFO:tensorflow:loss = 0.47716027, step = 200 (0.560 sec)
INFO:tensorflow:global_step/sec: 178.626
INFO:tensorflow:loss = 0.53073615, step = 300 (0.560 sec)
INFO:tensorflow:global_step/sec: 132.039
INFO:tensorflow:loss = 0.4849593, step = 400 (0.757 sec)
INFO:tensorflow:global_step/sec: 121.437
INFO:tensorflow:loss = 0.4055175, step = 500 (0.825 sec)
INFO:tensorflow:global_step/sec: 122.379
INFO:tensorflow:loss = 0.28230205, step = 600 (0.817 sec)
INFO:tensorflow:global_step/sec: 122.163
INFO:tensorflow:loss = 0.4917924, step = 700 (0.819 sec)
INFO:tensorflow:global_step/sec: 122.509
The initial spike of 178 steps per second is reproducible across multiple runs and different prefetching amount. I am trying to understanding the underlying multi-threading mechanism on why that happens.
Additional information:
my cpu usage peaks at 1800% on a 48 core machine. My gpu usage is consistently at only 9%. So it's pretty amazing that both of these are not exhausted. So I am wondering if the mutex in queue_runner is causing the cpu processing to not realize its full potential, as described here?
[update] I also observed the same spike when I use prefetch_to_device(gpu_device, ..), with similar buffer sizes. Surprisingly, prefetch_to_device only slows things down, by about 10%.
NFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into
INFO:tensorflow:loss = 1.3881096, step = 0
INFO:tensorflow:global_step/sec: 52.3374
INFO:tensorflow:loss = 0.48779136, step = 100 (1.910 sec)
INFO:tensorflow:global_step/sec: 121.154
INFO:tensorflow:loss = 0.3451385, step = 200 (0.827 sec)
INFO:tensorflow:global_step/sec: 89.3222
INFO:tensorflow:loss = 0.37804496, step = 300 (1.119 sec)
INFO:tensorflow:global_step/sec: 80.4857
INFO:tensorflow:loss = 0.49938473, step = 400 (1.242 sec)
INFO:tensorflow:global_step/sec: 79.1798
INFO:tensorflow:loss = 0.5120025, step = 500 (1.263 sec)
INFO:tensorflow:global_step/sec: 81.2081
It's common to see spikes in steps per second at the start of each training run, as the cpu had time to fill up the buffer. Your step per seconds are very reasonable compared to the start, but the lack of cpu usage might indicate a bottleneck.
First question is, whether or not you are using the Dataset API in combination with the estimator. From your terminal output I suspect you do, if not I would start by changing your code to use the Estimator class. If you are already using the Estimator class, then make sure you are following the best performance practices as documented here.
If your are doing all of the above already, then there is a bottleneck in you pipeline. Due to the low CPU usage I would guess you are experiencing an I/O bottleneck. You might have your Dataset on a slow medium (hard-drive) or you aren't using a serialized format and are saturating the IOPS (again hard-drive or network storage). In either case, start by using a serialized data format such as TF-records and upgrade your storage to SSD or multiple hard drives in raid 1,0,10 your pick.

Graphite storage schema not working

I have configured the following storage schema in Graphite /etc/carbon/storage-schemas.conf file with the assumption that it would allow me to keep data with 60s precision during 356 days. Although when I convert data back using Whisper-Fetch, I get 60s precision for only one week of data. Any idea if I need to set this up in another file or am I missing something?
Storage schema
retentions = 60s:365d
Whisper info
whisper-info memory-buffered.wsp
maxRetention: 31536000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 855412
Archive 0
retention: 86400
secondsPerPoint: 10
points: 8640
size: 103680
offset: 52
Archive 1
retention: 604800
secondsPerPoint: 60
points: 10080
size: 120960
offset: 103732
Archive 2
retention: 31536000
secondsPerPoint: 600
points: 52560
size: 630720
offset: 224692
Your output shows that it's not using the schema you claim. The most likely answer is that the Whisper file was created before changing the schema. In this case you need to either delete the file (and let it get created again) or use to apply the new schema.

Graphite importing historical data only for 1 day

I'm trying to import historical data for 60 day per hour, but data succsessfully importing only for last 24 hours, configuration bellow:
Storage schema in Graphite /etc/carbon/storage-schemas.conf
pattern = .*
retentions = 5m:15d,15m:1y,1h:10y,1d:100y
Storage aggregation /etc/carbon/storage-aggregation.conf
pattern = .*
xFilesFactor = 0.0
aggregationMethod = sum
Restarting carbon-cache and removing old whisper data is not solving problem.
I checked .wsp files with
# whisper-info /var/lib/graphite/whisper/ran/3g/newerlang.wsp
maxRetention: 3153600000
xFilesFactor: 0.0
aggregationMethod: sum
fileSize: 1961584
Archive 0
retention: 1296000
secondsPerPoint: 300
points: 4320
size: 51840
offset: 64
Archive 1
retention: 31536000
secondsPerPoint: 900
points: 35040
size: 420480
offset: 51904
Archive 2
retention: 315360000
secondsPerPoint: 3600
points: 87600
size: 1051200
offset: 472384
Archive 3
retention: 3153600000
secondsPerPoint: 86400
points: 36500
size: 438000
offset: 1523584
Any idea if I need to set this up in another file or am I missing something?

Opscenter won't use a separate cassandra cluster

We're using cloudformation to automate the setup and tear down of several cassandra clusters we use for load testing. During this load test, we use opscenter to monitor our throughput. What I've found is that storing the opscenter data in our test's target cluster is skewing our node's data ownership information. As a result, I'd like to move opscenter and the agent data to it's own node. I have a single c3.4xl set up with a single cassandra instance and opscenter. I have the following configuration files.
opscenter server
seed_hosts =,,,,,,,,
seed_hosts =
api_port = 9160
cat /var/lib/datastax-agent/conf/address.yaml
However in the agents I see this in the logs in /var/log/datastax-agent/agent.log.
INFO [thrift-init] 2014-11-03 14:33:41,069 Connected to Cassandra cluster: usergrid
INFO [thrift-init] 2014-11-03 14:33:41,071 in execute with client org.apache.cassandra.thrift.Cassandra$Client#6deebf54
INFO [thrift-init] 2014-11-03 14:33:41,072 Using partitioner: org.apache.cassandra.dht.Murmur3Partitioner
INFO [pdp-loader] 2014-11-03 14:33:41,072 Attempting to load stored metric values.
ERROR [pdp-loader] 2014-11-03 14:33:41,092 There was an error when attempting to load stored rollups.
me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:Keyspace 'OpsCenter' does not exist)
at me.prettyprint.cassandra.connection.client.HThriftClient.getCassandra(
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(
at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(
at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(
at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(
at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(
at clj_hector.core$execute_query.doInvoke(core.clj:201)
at clojure.lang.RestFn.invoke(
at clj_hector.core$get_column_range.doInvoke(core.clj:298)
at clojure.lang.RestFn.invoke(
at opsagent.cassandra$scan_pdps$fn__1051.invoke(cassandra.clj:182)
at opsagent.cassandra$scan_pdps.invoke(cassandra.clj:181)
at opsagent.cassandra$process_pdp_row$fn__1060.invoke(cassandra.clj:199)
at opsagent.cassandra$process_pdp_row.invoke(cassandra.clj:197)
at opsagent.cassandra$process_pdp_row.invoke(cassandra.clj:195)
at opsagent.cassandra$load_pdps_with_retry$fn__1066.invoke(cassandra.clj:213)
at opsagent.cassandra$load_pdps_with_retry.invoke(cassandra.clj:210)
at opsagent.cassandra$setup_cassandra$f__388__auto____1094$fn__1095$f__388__auto____1102.invoke(cassandra.clj:357)
Caused by: InvalidRequestException(why:Keyspace 'OpsCenter' does not exist)
at org.apache.cassandra.thrift.Cassandra$
at org.apache.thrift.TServiceClient.receiveBase(
at org.apache.cassandra.thrift.Cassandra$Client.recv_set_keyspace(
at org.apache.cassandra.thrift.Cassandra$Client.set_keyspace(
at me.prettyprint.cassandra.connection.client.HThriftClient.getCassandra(
... 22 more
Generally this would indicate that the client cannot connect to the storage Cassandra node. However, from the agent node, I can execute the following command.
cassandra-cli -h
Which I can then describe the keyspace, which works.
[default#unknown] describe OpsCenter;
WARNING: CQL3 tables are intentionally omitted from 'describe' output.
See for details.
Keyspace: OpsCenter:
Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Durable Writes: true
Options: [us-east:1]
Column Families:
ColumnFamily: bestpractice_results
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.IntegerType)
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
ColumnFamily: events
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Column Metadata:
Column Name: success
Validation Class: org.apache.cassandra.db.marshal.BooleanType
Column Name: action
Validation Class: org.apache.cassandra.db.marshal.LongType
Column Name: level
Validation Class: org.apache.cassandra.db.marshal.LongType
Column Name: time
Validation Class: org.apache.cassandra.db.marshal.LongType
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
ColumnFamily: events_timeline
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.LongType
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
ColumnFamily: pdps
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
ColumnFamily: rollups300
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
ColumnFamily: rollups60
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
ColumnFamily: rollups7200
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
ColumnFamily: rollups86400
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
ColumnFamily: settings
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.BytesType
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.BytesType
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 1.0
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
This signals to me that the target Cassandra node is up and running, and has the keyspace + column families. It also indicates I don't have any sort of network firewall issues between the agent -> cassandra. I'm at a loss to explain why I'm receiving this error message. Am I still missing something in my configuration, or is this a bug?
Cassandra: 1.2.19
Opscenter: 5.0.1
DS Agent: 5.0.1
Any help would be greatly appreciated!
Here is the agent log. Note my IP's have changed since this is a new environment. It appears that it's trying to connect to, which is NOT the ec2 IP that's set in my settings of Not sure where that's coming from, but it's not what is set on the opscenter server.
Sorry for the gist, but I've exceeded the character limit.
If you look though the logs of the opscenter instance you should see this:
exceptions.Exception: Storing data in a separate cluster is only supported when managing DSE clusters.
Though in the OpsCenter docs it has this:
[storage_cassandra] seed_hosts
Used when using a different cluster for OpsCenter storage. A Cassandra
seed node is used to determine the ring topology and obtain gossip
information about the nodes in the cluster. This should be the same
comma-delimited list of seed nodes as the one configured for your
Cassandra or DataStax Enterprise cluster by the seeds property in the
cassandra.yaml configuration file.
So you would think it's possible but apparently it's not.
