How to specify more than one grokCustomPatterns in Athena? - logstash-grok

I'm trying to use Grok expressions in Athena, mostly as a tool to debug Grok expressions in AWS Glue Classifiers.
This works:
CREATE EXTERNAL TABLE example_grok (
myColumn string
)
ROW FORMAT SERDE
'com.amazonaws.glue.serde.GrokSerDe'
WITH SERDEPROPERTIES (
'input.format'='(%{WORD:header},%{WORD:file_type},%{GREEDYDATA:head_rest})|(%{DETAILS:det},%{WORD:icp_number},%{GREEDYDATA:det_rest})',
'input.grokCustomPatterns' = 'DETAILS DET'
)
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://my-secret-bucket/path/';
I would like to specify several custom patterns, but the documentation doesn't have an example, and none of the delimiters that I have tried, either inside or outside of the string, have worked.
For example, these do NOT work
New line delimited (with no leading spaces, those are just for this post):
'input.grokCustomPatterns' =
'POSTFIX_QUEUEID [0-9A-F]{7,12}
HEADER HDR'
As a "json" array:
'input.grokCustomPatterns' = ['POSTFIX_QUEUEID [0-9A-F]{7,12}','HEADER HDR']
With multiple entries:
'input.grokCustomPatterns'='HEADER (HDR)',
'input.grokCustomPatterns'='POSTFIX_QUEUEID [0-9A-F]{7,12}',
Any assistance is appreciated,

AWS responded to the documentation improvement that I requested. A literal \n separates patterns.
To include multiple pattern entries into the input.grokCustomPatterns
expression, use the newline escape character (\n) to separate them, as
follows: 'input.grokCustomPatterns'='INSIDE_QS
([^\"])\nINSIDE_BRACKETS ([^\]])').
Grok Serde

Related

How to maintain quotes while exploding json in spark-sql

I have a column in string format like below:
["name": "XXX","active": true,"locale": "EN","Channel":["1","2"]]
I would like to explode them like below in spark sql(preserving the quotes in string values).
This is code I used:
SELECT EXPLODE(from_json(col, 'map<string, string>>'))
FROM XXX;
I am not able to preserve the quotes in "XXX" and "EN" after exploding.
This is what I want:
key
value
name
"XXX"
active
true
locale
"EN"
Channel
[1,2]
The quotes are part of the JSON representation of the data and not the data itself. If there were embedded quotes in the data it would look like:
"\"SOME DATA\""
If you need to add quotes on strings, you can always concatenate them to the specific columns. You can use the concat operator to accomplish this, https://spark.apache.org/docs/latest/api/sql/index.html#concat
Alternatively, you can use get_json_object, which allows you to extract specific parts of a JSON object. https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.functions.get_json_object.html

Need guidance with Regular Expression in Python

I need help with one of my current tasks wherein i am trying to pick only the table names from the query via Python
So basically lets say a query looks like this
Create table a.dummy_table1
as
select a.dummycolumn1,a.dummycolumn2,a.dummycolumn3 from dual
Now i am passing this query into Python using STRINGIO and then reading only the strings where it starts with "a" and has "_" in it like below
table_list = set(re.findall(r'\ba\.\w+', str(data)))
Here data is the dataframe in which i have parsed the query using StringIO
now in table_list i am getting the below output
a.dummy_table1
a.dummycolumn1
a.dummycolumn2
whereas the Expected output should have been like
a.dummy_table1
<Let me know how we can get this done , have tried the above regular expression but that is not working properly>
Any help on same would be highly appreciated
Your current regex string r"\ba.\w+" simply matches any string which:
Begins with "a" (the "\ba" part)
Followed by a period (the "." part)
Followed by 1 or more alphanumeric characters (the "\w+" part).
If I've understood your problem correctly, you are looking to extract from str(data) any string fragments which match this pattern instead:
Begins with "a"
Followed by a period
Followed by 1 or more alphanumeric characters
Followed by an underscore
Followed by 1 or more alphanumeric characters
Thus, the regular expression should have "_\w+" added to the end to match criteria 4 and 5:
table_list = set(re.findall(r"\ba\.\w+_\w+", str(data)))

Are there line continuation character for string in Terraform

I am writing some long SQL queries in a Terraform file, the query would be like:
"CREATE TABLE example_${var.name} (id string, name string..........................................)"
To make the query readable, I hope the query would be the format like the following, and cross multiple lines
CREATE TABLE example_$(var.name) (
id string,
name string,
................................
)
Is there a line continuation character for a long single line string to be written as multiple lines. Just like we could use backslash \ in Python for long string?
I have tried use heredoc, but it does not work when running the query. Thanks
It sounds like your goal is to have a long SQL query defined in Terraform, but across multiple lines so you don't need to horizontal scroll to infinity and beyond.
In my team we use heredoc to achieve this although you said it's not possible in your case.
Another idea my team use when heredoc isn't possible is to join an array of strings.
E.g.
locals {
sql = join(",", [
"id string",
"name string",
"address string",
"renter string",
"profession string"
])
}
Results in
> local.sql
id string,name string,address string,renter string,profession string
I hope I've understood your question correctly but if not please let me know.
PS: There's an open issue for multiline strings in Terraform
To make a multi-line strings in Terraform using the heredoc string syntax.
locals {
sql = <<EOT
CREATE TABLE example_$(var.name) (
id string,
name string,
................................
)
EOT
}

nodejs how to replace ; with ',' to make an sql query

I have a query that looks like this:
INSERT INTO table VALUES ('47677;2019;2019;10T-1001-10010AS;A05;International;TieLineKoman-KosovoB;L_KOM-KOSB;2018;NULL;NULL;;NULL;Tieline;NULL;10XAL-KESH-----J;0;3')
that is produced by parsing a csv file.
The query is not in a valid form, I have to replace all semicolons with the string ',' (comma inside single quotes). What I want to get is:
('47677','2019','2019','10T-1001-10010AS','A05','International','TieLineKoman-KosovoB','L_KOM-KOSB','2018','NULL','NULL','','NULL','Tieline','NULL','10XAL-KESH-----J','0','3')
I have tried to do this in many different ways, but I end up with backshlashes added in my string. This is what I get:
"INSERT INTO AllocatedEICDetail VALUES ('47677\\',\\'2019\\',\\'2019\\',\\'10T-1001-10010AS\\',\\'A05\\',\\'International\\',\\'TieLineKoman-KosovoB\\',\\'L_KOM-KOSB\\',\\'2018\\',\\'NULL\\',\\'NULL\\',\\'\\',\\'NULL\\',\\'Tieline\\',\\'NULL\\',\\'10XAL-KESH-----J\\',\\'0\\',\\'3')"
Any ideas how to do this properly without having the backslashes added?
Thank you!
//the string you have
const string = '47677;2019;2019;10T-1001-10010AS;A05;International;TieLineKoman-KosovoB;L_KOM-KOSB;2018;NULL;NULL;;NULL;Tieline;NULL;10XAL-KESH-----J;0;3';
//the string you need:
const targetString = string.replace(/\;/g,',');
You specify a small regex between the forward slashes in replace which is a simple ';', give it a 'g' flag for global which will replace all instances, and in the second argument supply what you need it replaced with.

How to make all the tab indentation to align together in Vim

I saw the following code in the DataMapper documentation, the Serial, String.. all align together, Can I do the same thing in Vim?
class Post
include DataMapper::Resource
property :id, Serial # An auto-increment integer key
property :title, String # A varchar type string, for short strings
property :body, Text # A text block, for longer string data.
property :created_at, DateTime # A DateTime, for any date you might like.
end
You can try Tabularize.vim plugin. Run this command:
:Tabularize /:\w\+,\|#/
What you need is a pattern to match the delimiters.
:\w\+, will match :id,, :title,, ...
# will match the comment sign

Resources