graylog2 not showing any data - graylog2

I'm new to Graylog2. I'm using it for analyze the stored logs from Elasticsearch.
I have done the setup successfully using this link http://www.richardyau.com/?p=377
But, I parsed the logs to elasticsearch under the index name called "xg-*". Not sure why same has not been replicated in graylog2.
when I check the indices status in graylogs2 web interface, it shows only "graylog2_0" index. Not showing my index.
someone please help me what is the reason behind it.
Elasticsearch indices details:
[root#xg bin]# curl http://localhost:9200/_cat/indices?pretty
green open graylog2_0 4 0 0 0 576b 576b
yellow open xg-2015.12.12 5 1 56 0 335.4kb 335.4kb
[root#xg bin]#
Graylog2 Web indices details:

Graylog doesn't support other indexing schemes than its own. If you want to use Graylog to analyze your data, you also have to ingest it through Graylog.

Related

PutCassandraQL 'Cassandra Contact point' format / error in Nifi

I am using Nifi 1.1.1 and am trying to put data using a PutCassandraQL processor, but am getting
datastax.driver.core.exceptions.Transport exception:[null] Cannot
connect
while trying an option : node-0:servername.com:PORT, node-2:servername.com:PORT, node-3:servername.com:PORT
edit :
node-0.servername.com:9042,node-2.servername.com:9042,node-3.servername.com:9042
as given in the doumentation. Can someone tell me the cause of this error with an example of the properway to give the 'Cassandra Contact point' in Nifi
The Cassandra Contact Points property expects a hostname or IP followed by a colon and then the port number where the Cassandra node is listening. So if you have 3 nodes at:
node-0.servername.com:9042
node-2.servername.com:9042
node-3.servername.com:9042
Your Contact Points setting would be:
node-0.servername.com:9042,node-2.servername.com:9042,node-3.servername.com:9042

How to delete elastic search indices periodically?

I have created indices on daily basis to store the search history and i am using those indices for the suggestions in my applciation, which helps me to suggest based on history as well.
now i have to maintain only last 10 days of history. So is there any feature in Elastic search that allows me to create and delete indices periodically?
I don't know if elasticsearch has a built-in feature like that but you can achieve what you want with curator and a cron job.
An example curator command is:
Curator 3.x syntax [deprecated]:
curator --host <IP> delete indices --older-than 10 --prefix "index-prefix-" --time-unit days --timestring '%Y-%m-%d'
Curator 5.1.1 syntax:
curator_cli --host <IP> --port <PORT> delete_indices --filter_list '[{"filtertype":"age","source":"creation_date","direction":"older","unit":"days","unit_count":10},{"filtertype":"pattern","kind":"prefix","value":"index-prefix-"}]'
Run this command daily with a cron job to delete indices older than 10 days whose names start with index-prefix- and that live on the Elasticsearch instance at <IP>:<PORT>.
For more curator command-line options, see: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/singleton-cli.html
For more on cron usage, see:
http://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/
The only thing I can think of is using data math:
https://www.elastic.co/guide/en/elasticsearch/reference/current/date-math-index-names.html
In sense you can do this:
DELETE <logs-{now%2Fd-10d}>
This does not work nice in curl though due to url encoding. You can do something like this in curl:
curl -XDELETE 'localhost:9200/<logs-%7Bnow%2Fd-10d%7D>'
Both examples remove the index that is 10 days old. It does not help you in deleting indices older than 10 days, don't think that is possible. And their is no trigger or something in elasticsearch.
So I would stick to a cron job in combination with curator, but you do have this option to go with as well.

Processing large json's using logstash - Not working and not printing any error

I started using logstash (on windows) when my main cause of use will be passing logstash a large json (10 mb), filtering the json somehow and write it out to elastic search.
As for now, I don't really care about the json filtering (I will care after I'll get this to work). I wan't the file to pass through logstash and get to my elastic search.
The client who feeds logstash uses a tcp connection.
My logstash simple configuration file looks like:
input
{
tcp
{
port=>7788
codec=>"json"
}
}
output
{
elasticsearch
{
hosts=>"localhost:9200"
codec=>"json"
}
stdout
{
codec=>rubydebug
}
}
This does work for me on small json inputs like:
{"foo":"bar", "bar": "foo"}
I see the logstash working and passing the data to elastic search and
everything's ok.
Also, when using the default codec ("text") it worked, but not as expected.
My problem starts when the inputs are large jsons.
Assuming I have a 10 mb json - what do I need to do with it so logstash will be able to handle it over tcp as a json? Should the file be indented or not? What encoding should I use before I convert it into bytes? What codec\settings should my logstash have?
BTW, when I use curl and through the large json directly to elastic search - it works - So there are no problems with the json.
Is there any way I can get some better tracing or at least know why I fail?
I found out that the problem wasn't the length but the lack of a newline - So all I needed to do was to add a newline to my log files.
BTW, there is no 4K length limit - At least not when working with TCP.

Excel in SSIS: How to import a column that may have more than 255 characters when DT_NTEXT causes failures?

OK, so my latest project requires loading an Excel 2007 spreadsheet into a SQL Server table. I'm working in SSIS 2008R2. Based on some stuff I found on the internet, I opened the Excel source in Advanced editor and changed the datatype of the long column to DT_NTEXT, so that it wouldn't truncate it. Then I made the database column VARCHAR(MAX). This runs correctly in debug mode on my laptop.
Then I deployed it to the development server and attempted to load the same test file. It failed with the following error messages:
Error: Code: 0xC0208265
Source: Main Data Flow Task Get Main Data [1]
Description: Failed to retrieve long data for column "DESCR".
End Error
Error: Code: 0xC020901C
Source: Main Data Flow Task Get Main Data [1]
Description: There was an error with output column "DESCR" (72) on output "Excel Source Output" (9). The column status returned was: "DBSTATUS_UNAVAILABLE".
End Error
Error: Code: 0xC0209029
Source: Main Data Flow Task Get Main Data [1]
Description: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "output column "DESCR" (72)" failed because error code 0xC0209071 occurred, and the error row disposition on "output column "DESCR" (72)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
End Error
Searching for information about the error, I found about a million sites offering the same three suggested solutions:
Add 'IMEX=1' to the extended properties of the connection string.
It was already there.
Change the TypeGuessRows key in the registry.
This was set to zero on the server, which I understand to mean that it should look at the entire file. Nevertheless, I changed it to 8 to match my laptop. The same error occurred when I ran it again. Then I changed it to 1,763, which is more than the number of rows in the spreadsheet. It still gave the same error. So, I put it back to zero. (There's a 1,900-character value in the first row of my test file, so it shouldn't really matter how many it checks, in this case.)
Change the datatype to DT_WSTR(4000) in the source.
The column is supposed to have up to 10,000 characters, so I'm not sure this would be a good idea even if it worked. However, I tried it anyway. This time it gave me a truncation error. I changed the truncation error disposition to "ignore failure" and it loaded the data, but truncated the value to 255 characters. I have verified that the length is 4000 and doesn't get changed when I save the file, but it's still truncating at 255 characters.
I have no idea what else to look at. Any help would be appreciated.
UPDATE 1/29: The package, without any changes, works correctly when running on the pre-production server. It still fails when running on the development server. Both servers have the same version of SSIS (including minor version numbers) as well as the same versions of Windows, Access and Excel. I do not know how to explain this, nor do I know how to tell if it would work in production.
I created a new package with similar non-functional requirements (Excel 2007 file, SSIS 2008, SQL Server 2008 R2, VARCHAR(MAX) target column) and it worked just fine after deployment into the database server. My package:
Metadata at the Excel Source component's output (checked using Advanced Editor): DT_NTEXT
Derived Column component between source and destination to cast to non-unicode from unicode using (DT_TEXT,1252)
Metadata at the OLE DB Destination component's input (checked using Advanced Editor): DT_TEXT
Target Column data type: VARCHAR(MAX)
I do not explicitly use the extended property IMEX in the connection
Executed by right-clicking on the package at the database server, and loaded a file with a few thousand characters per record into the table without truncation. Hope this helps
I have faced this issue while importing an excel file with a field containing more than 255 characters. I solved the issue using Python.
Simply, import the excel in a pandas data frame and then calculate the length of each of those string values per row.
Then, sort the dataframe in descending order. This will enable SSIS to allocate maximum space for that field as it scans the first 3 rows to allocate storage:
df = pd.read_excel(f,sheet_name=0,skiprows = 1)
df = df.drop(df.columns[[0]], axis = 1)
df['length'] = df['Item Description'].str.len()
df.sort_values('length', ascending=False, inplace=True)
writer = ExcelWriter('Clean/Cleaned_'+f[5:])
df.to_excel(writer,sheet_name='Billing',index=False)
writer.save()

Using indexed types for ElasticSearch in Titan

I currently have a VM running Titan over a local Cassandra backend and would like the ability to use ElasticSearch to index strings using CONTAINS matches and regular expressions. Here's what I have so far:
After titan.sh is run, a Groovy script is used to load in the data from separate vertex and edge files. The first stage of this script loads the graph from Titan and sets up the ES properties:
config.setProperty("storage.backend","cassandra")
config.setProperty("storage.hostname","127.0.0.1")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","db/es")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")
The second part of the script sets up the indexed types:
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make();
The third part loads in the data from the CSV files, this has been tested and works fine.
My problem is, I don't seem to be able to use the ElasticSearch functions when I do a Gremlin query. For example:
g.E.has("property",CONTAINS,"test")
returns 0 results, even though I know this field contains the string "test" for that property at least once. Weirder still, when I change CONTAINS to something that isn't recognised by ElasticSearch I get a "no such property" error. I can also perform exact string matches and any numerical comparisons including greater or less than, however I expect the default indexing method is being used over ElasticSearch in these instances.
Due to the lack of errors when I try to run a more advanced ES query, I am at a loss on what is causing the problem here. Is there anything I may have missed?
Thanks,
Adam
I'm not quite sure what's going wrong in your code. From your description everything looks fine. Can you try the follwing script (just paste it into your Gremlin REPL):
config = new BaseConfiguration()
config.setProperty("storage.backend","inmemory")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","/tmp/es-so")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")
g = TitanFactory.open(config)
g.makeKey("name").dataType(String.class).make()
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make()
g.makeLabel("knows").make()
g.commit()
alice = g.addVertex(["name":"alice"])
bob = g.addVertex(["name":"bob"])
alice.addEdge("knows", bob, ["property":"foo test bar"])
g.commit()
// test queries
g.E.has("property",CONTAINS,"test")
g.query().has("property",CONTAINS,"test").edges()
The last 2 lines should return something like e[1t-4-1w][4-knows-8]. If that works and you still can't figure out what's wrong in your code, it would be good if you can share your full code (e.g. in Github or in a Gist).
Cheers,
Daniel

Resources