I'm new to icinga and nagios. I'll appreciate any thoughts. I just like to ask,
Can I configure icinga to write or append on a RRD file? I need to get all the historical data that icinga gets everytime. Or is there anyway for me to get all the statistics data?
Yes. There are already tools that can get the performance data from icinga/nagios and save in rrd format and also plot the graphs with this data.
Examples: pnp4nagios and nagiosgraph
Related
I have explored bit on cassandra stress tool using yaml file and it is working fine. I just wanted to know is there anyway by which we can specify the location of any external csv file in yaml profile to insert data into Cassandra table using cassandra stress?
So instead of random data i wanted to see the cassandra stres test result on specific dataload on this data model?
Standard cassandra-stress doesn't have such functionality, but you can use the NoSQLBench tool that was recently open sourced by DataStax. It also uses YAML to describe workloads, but it's much more flexible, and has a number of functions for sampling data from CSV files.
P.S. there is also a separate Slack workspace for this project (to get invite, fill this form)
In a project that I am working on, we are planning migrate from Cassandra DB to other technology,
The problem is how to get all of the data out of cassandra? (We are talking about 4M-8M records)
I should try exporting data to CSV file and then import it to another db.
To export data to CSV you can start with cpoy to command ..if that does not work then a simple Java program described in this can help you for bigger set of data.
But more importantly you should understand the data model of other technology before importing data into it..you may need to change your data model..
You can also look at other tools like https://github.com/brianmhess/cassandra-loader. I have imported/exported data in terms of hundreds of million using this application.
A very simple question...
I have downloaded a very large .csv file (around 3.7 GB) and now wish to open it; but excel can't seem to manage this.
Please how do I open this file?
Clearly I am missing a trick!
Please help!
There are a number of other Stackoverflow questions addressing this problem, such as:
Excel CSV. file with more than 1,048,576 rows of data
The bottom line is that you're getting into database territory with that sort of size. The best solution I've found is Bigquery from Google's cloud platform. It's super cheap, astonishingly fast, and it automatically detects schemas on most CSVs. The downside is you'll have to learn SQL to do even the simplest things with the data.
Can you not tell excel to only "open" the file with the first 10 lines ...
This would allow you to inspect the format and then use some database functions on the contents.
Another thing that can impact whether you can open a large Excel file is the resources and capacity of the computer. That's a huge file and you have to have a lot of on-disk swap space (page file in windows terms) + memory to open a file of that size. So, one thing you can do is find another computer that has more memory and resources or increase your swap space on your computer. If you have windows just google how to increase your page file.
This is a common problem. The typical solutions are
Insert your .CSV file into a SQL database such as MySQL, PostgreSQL etc.
Processing you data using Python, or R.
Find a data hub for your data. For example, Acho Studio.
The problem with solution one is that you'll have to design a table schema and find a server to host the database. Also you need to write server side code to maintain or change the database. The problem with Python or R is that running processes on GBs of data will put a of stress to your local computer. A data hub is much easier but its costs may vary.
I have a huge 20Gb csv file to copy into cassandra, of course i need to manage the case of errors ( if the the server or the Transfer/Load application crashes ).
I need to re-start the processing(or an other node or not) and continue the transfer without starting the csv file from it begning.
what is the best and easiest way to do that ?
using the Copy CQLSH Command ? using flume or sqoop ? or using native java application, using spark... ?
thanks a lot
If it was me, I would split the file.
I would pick a preferred way to load any csv data in, ignoring the issues of huge file size and error handling. For example, I would use a python script and the native driver and test it with a few lines of csv to see that it can insert from a tiny csv file with real data.
Then I would write a script to split the file into manageable sized chunks, however you define it. I would try a few chunk sizes to get a file size that loads in about a minute. Maybe you will need hundreds of chunks for 20 GB, but probably not thousands.
Then I would split the whole file into chunks of that size and loop over the chunks, logging how it is going. On an error of any kind, fix the problem and just start loading again from the last chunk that loaded successfully as found in the log file.
Here are a two considerations that I would try first since they are simple and well contained:
cqlsh COPY has been vastly improved in 2.1.13, 2.2.5, 3.0.3 and 3.2+. If you do consider using it, make sure to be at one of those versions or newer.
Another option is to use Brian Hess' cassandra-loader which is an effective way of bulk loading to and from csv files in an efficient manner.
I think CQLSH doesn't handle the case of application crash, so why not using both of the solution exposed above, split the file into several manageable chunks and uses the copy cqlsh command to import the data ?
I would to modify my rrd file. In particular I would modify one or more datasource data at precise timestamp.
I tried to do this with rrdtool update command but without success
Can you help me?
RRD is an INSERT-ONLY database ... it has no built in capabilities for modifying data after inserting ... you can use rrdtool dump to convert the rrd file to xml format, modify the data and then use rrdtool restore to recreate the rrd file.
RRDTool databases do not support update. This is because they progressively summarise the data into the RRA, and so, as time moves on, no longer have the original raw data but only the summarised data.
In addition - and more importantly - RRD data are subject to Normalisation. This converts the original time/value pair to an adjusted value on the precise time interval boundary. In other words, if your interval is 5min, then a sample submitted for 12:01 will be adjusted proportionately into the 12:00 and 12:05 intervals. So, you cannot store an exact time.
If you are looking to store a series of events with precise times and values, with the original data available and updateable indefinitely, then RRDTool is not an appropriate choice of database. Lok at MySQL, Postgres, and other RDBMS.