I am processing http logs and converting querystring parameters to fields.
kv
{
source => "uriQuerystring"
field_split => "&"
target => "uriQuerystringKeys"
}
However because callers are using mixed case parameters, I end up with numerous duplicates.
eg: uriQuerystringKeys.apiKey, uriQuerystringKeys.ApiKey, uriQuerystringKeys.APIKey
What do I need to do in my logstash configuration to convert all these field names to lowercase?
I see there's an open issue for this feature to be implemented in Logstash, but it's incomplete. There's a suggestion for some ruby code to be directly executed, but it looks like this converts all fields (not just ones of a certain prefix).
Here's a prior answer that contains the basic code you would need.
You can see a conditional inside the loop, which you could use to enforce the prefix limitations on the fields.
Related
I am working on a function that analyzes data (based on some domain-specific logic) in protobufs. When the function finds an issue, I want to include the path to the offending field, including the indexes for the repeated fields.
For example, given the protobuf below:
proto = ECS(
service=[
Service(),
Service(
capacity_provider_strategy=[
ServiceCapacityProviderStrategyItem(base=1),
ServiceCapacityProviderStrategyItem(base=2),
]
)
]
)
Let's assume that the offending field is field = proto.service[1].capacity_provider_strategy[0].
How would I, given only the field produce ecs.service[1].capacity_provider_strategy[0] in a general way?
Please, note that I am looking for a way to produce the path mentioned above solely based on the supplied field since the logic of producing the error message is de-coupled from the analyzing logic. I realize, that (in the analyzing logic) I could keep track of the indexes of the repeated fields, but this would put more overhead on the analyzing function.
I have string with accepted file's extensions. Something like "JPG,PNG,TXT". String can be whatever I want. I am using Reactive Extensions, so i filter files by using Where(). For now i am using
Where(e => e.FullPath.Contains(filtering)
But it only works, when there is only 1 extension. Any idea how to make it dynamically? Where will be call only once! I write in c#.
This looks to be a LINQ question rather than javascript(rxjs). However, as LINQ seems to be comparable with array methods in js I shall attempt to answer.
First convert the string of extensions into an array. This only needs to be done once:
filterExtensions = filtering.Split(',');
Then the condition would be:
Where(e => filterExtensions.Any(
extension => e.FullPath.Contains(extension)
))
I am fairly new to U-SQL and trying to run a U-SQL script in Azure Data Lake Analytics to process a parquet file using the Parquet extractor functionality. I am getting the below error and I don't find a way to get around it.
Error - Change the identifier to use at least one lower case letter. If that is not possible, then escape that identifier (for example: '[ACTIVITY]'), or embed it in a CSHARP() block (e.g CSHARP(ACTIVITY)).
Unfortunately all the different fields generated in the Parquet file are capitalized and I don't want to to escape these identifiers. I have tried if I could wrap the identifier with CSHARP block and it fails as well (E_CSC_USER_RESERVEDKEYWORDASIDENTIFIER: Reserved keyword CSHARP is used as an identifier.) Is there anyway I could extract the parquet file? Thanks for your help!
Code Snippet:
SET ##FeaturePreviews = "EnableParquetUdos:on";
#var1 =
EXTRACT ACTIVITY string,
AUTHOR_NAME string,
AFFLIATION string
FROM "adl://xxx.azuredatalakestore.net/Abstracts/FY2018_028"
USING Extractors.Parquet();
#var2 =
SELECT *
FROM #var1
ORDER BY ACTIVITY ASC
FETCH 5 ROWS;
OUTPUT #var2
TO "adl://xxx.azuredatalakestore.net/Results/AbstractsResults.csv"
USING Outputters.Csv();
Based on your description you try to say
EXTRACT ALLCAPSNAME int FROM "/data.parquet" USING Extractors.Parquet();
In U-SQL, we reserve all caps identifiers so we can add new keywords in the future without invalidating old scripts.
To work around, you just have to quote the name (escape it) like in any other SQL dialect:
EXTRACT [ALLCAPSNAME] int FROM "/data.parquet" USING Extractors.Parquet();
Note that this is not changing the name of the field. It is just the syntactic way to address the field.
Also note, that in most SQL communities, it is considered a best practice to always quote identifiers to avoid reserved keyword clashes.
If all fields in the Parquet file are all caps, you will have to quote them all... In a future update you will be able to say EXTRACT * FROM … for Parquet (and Orc) files, but you still will need to quote the columns when you refer to them explicitly.
I am new with logstash and grok filters. I am trying to parse a string from an Apache Access Log, with a grok filter in logstash, where the username is part of the access log in the following format:
name1.name2.name3.namex.id
I want to build a new field called USERNAME where it is name1.name2.name3.namex with the id stripped off. I have it working, but the problem is that the number of names are variable. Sometimes there are 3 names (lastname.firstname.middlename) and sometimes there are 4 names (lastname.firstname.middlename.suffix - SMITH.GEORGE.ALLEN.JR
%{WORD:lastname}.%{WORD:firstname}.%{WORD:middle}.%{WORD:id}
When there are 4 names or more it does not parse correctly. I was hoping someone can help me out with the right grok filter. I know I am missing something probably pretty simple.
You could use two patterns, adding another one that matches when there are 4 fields:
%{WORD:lastname}.%{WORD:firstname}.%{WORD:middle}.%{WORD:suffix}.%{WORD:id}
But in this case, you're creating fields that it sounds like you don't even want.
How about a pattern that splits off the ID, leaving everything in front of it, perhaps:
%{DATA:name}.%{INT}
I have a parameter called Analyst group in this format :
[Dimension].[Analyst Group].&[Nl.Workplace.Foundation]
I want to pass this parameter to another report, to filter data. Its a multi value parameter. But the other report only accepts it in this format : [KanBan].[Analyst Group].&[Nl.Workplace.Foundation]
So im trying to isolate the "Nl.Workplace.Foundation", so i can do the following thing in the Go To Report parameter expression :="[KanBan].[Analyst Group].&["& --Isolated analyst group-- &"]" to create the desired format.
So what i need is to extract the part between .&[ and ]
But i really have no idea how to isolate that part of the string.
Found a solution! If i just use the Parameter.label instead of Parameter.value it automatically does what i want!
A different solution has been found, but I will still answer the initial question. It could help.
So what i need is to extract the part between .&[ and ]
You could use a regex.
This may not be the fastest way but it can handle most of the situations.
So let's assume you have a string containing:
[Dimension].[Analyst Group].&[Nl.Workplace.Foundation]
And you want to get the following string:
Nl.Workplace.Foundation
Just use the following expression:
=System.Text.RegularExpressions.Regex.Match("[Dimension].[Analyst Group].&[Nl.Workplace.Foundation]", "\.&\[(?<NWF>[^]]+)\]").Groups("NWF").Value
In the expression, replace the input string with your dynamic values, like for example:
=System.Text.RegularExpressions.Regex.Match(Fields!Dimension.Value & "." & Fields!AnalystGroup.Value, "\.&\[(?<NWF>[^]]+)\]").Groups("NWF").Value
I'm keeping the formula as simple as possible so that you can easily adapt it, with, say, handling the case where an input string will not have a match (with the above query it will return #Error).
You could do this by adding an IIF() or better, use a custom function that you can reuse in several places and will reduce the length of your expression.