Fastest way to save NLog logs to Azure TableStorage - nlog

I have set up a POC that uses NLog to save log messages to Application Insights and Azure Table Storage. I have used the AzureTableStorageNLogTarget NuGet package by Harouny, but I'm not quite sure if it's the fastest way to save to Table Storage.
I need the log messages to be saved in batches for performance.
Does Harouny do that, or should I use some fork? JDetmar says he writes in batches, but does Harouny also do it? If anybody knows it, it would be nice.
I'll try to benchmark it to find out in the meantime and post my findings once I know more.

I need the log messages to be saved in batches for performance. Does Harouny do that, or should I use some fork? JDetmar says he writes in batches, but does Harouny also do it? If anybody knows it, it would be nice.
As the code is open-source, you could easily check that.
Locate the targets in both repos
Check if they override a Write method that accepts multiple logEvents (list or array), e.g. override void Write(IList<AsyncLogEventInfo> logEvents)
Results:
JDetmar: Overrides Write with single and multiple logEvents, so works in batches when using a async wrapper. see code
Harouny: Overrides Write with single logEvent, and not with multiple. So no batched writing. see code
Please note that Harouny is not maintained. (see readme)

Related

LOGSTREAMID parameter for VSAM

I'm trying to alter a vsam file for write logs for any update operation.
I perform the updates through a cics transaction.
Can anyone give me an idea, how can i save immediately all updates in logstream file?
To get update log records written by CICS for VSAM file updates you will need to configure the recovery attributes for that VSAM file. Depending upon the type of file, how the file is accessed (RLS or non-RLS) and the types of log records required will determine what options can be set and where to set them.
To keep it simple, if you set the recovery attributes in the ICF catalog definition for the VSAM data set with RECOVERY(ALL) and LOGSTREAMID(your_logstream_name) then before and after images will be written. Depending upon what the log records are needed for also consider using the LOGREPLICATE(YES) option instead or as well.
Be careful turning recovery on, records (or CIs) in the file will be locked until the transaction making the updates completes. This could lead to deadlocks and rollbacks if multiple transactions make multiple updates to the file concurrently. Also if the file is an ESDS then there are further complexities.
Make sure the general log stream or model log stream has been created so CICS has or can create somewhere to write the log records to.
I'd also recommend reading more on the recovery options available so that only the log records needed are written. You can find more info on CICS logging here

Azure Cosmos DB Python SDK : Query items from change feed using checkpoints?

Newbie to the CosmosDB...please shed some light
#Matias Quaranta - Thank you for the samples
From the official samples it seems like the Change feed can be queried either from the beginning or from a specific point in time.
options["startFromBeginning"] = True
or
options["startTime"] = time
What other options does the QueryItemsChangeFeed method support?
Does it support querying from a particular check point within a partition?
Glad the samples are useful. In theory, the concept of "checkpoints" does not exist in the Change Feed. "Checkpoints" is basically you storing the last processed batch or continuation after every execution in case your process halts.
When the process starts again, you can take your stored continuation and use it.
This is what the Change Feed Processor Library and our Azure Cosmos DB Trigger for Azure Functions do for you internally.
To pass the continuation in python, you can use options['continuation'] and you should be able to get them from the response headers on the 'x-ms-continuation'.
Refer to the sample code ReadFeedForTime, I has tried the options["startTime"]. But it doesn't work, the response is the same as the list of documents start from Beginning.

How to transfer data from Kafka to Cassandra using Nifi?

I want to collect the data from Kafka using Nifi in Cassandra. I created a flow like this for this.
My database connection configuration is like this:
This is my configurations for my ConvertJsonToSQL processor:
I encounter the following error on my ConvertJsonToSQL processor.
ConvertJSONToSQL[id=d25a7e27-0167-1000-2d9a-2c969b33482a] ConvertJSONToSQL[id=d25a7e27-0167-1000-2d9a-2c969b33482a] failed to process session due to null; Processor Administratively Yielded for 1 sec: java.lang.NullPointerException
Note: I added dbschema driver jar to Nifi library.
What do you think I should do to solve this problem?
Based on the available information it is difficult to troubleshoot the error, the most likely reason for the ConvertJSONToSQL to fail is an invalid JSON. Just one point from the documentation:
The incoming FlowFile is expected to be "flat" JSON message, meaning that it consists of a single JSON element and each field maps to a simple type.
I cannot see what you did in the AttributesToJSON processor, but I believe twitter will typically return a nested JSON, and that you might not have flattened it enough.
A simple generic way to troubleshoot this, is to start processors from the top, and inspect the queue before/after each processor untill you see something you don't expect.
With this you should be able to pinpoint the problem exactly, and if needed you can use the information discovered in this way to create a reproducible example and ask a more detailed question.

Flink fish tagging

I wonder if anyone has any experience with fish tagging* Flink batch runs.
*Just as a fish can be tagged and have its movement tracked, stamping log events with a common tag or set of data elements allows the complete flow of a transaction or a request to be tracked. We call this Fish Tagging.
source
Specifically, I would like to make sure that a batch ID is added to each line of the log which has anything to do with that particular batch execution. This will allow me to track batches in Kibana
I don't see how using log4j's MDC will would propagate through multiple Flink nodes, and using a system property lookup to inject an ID through VM params would not allow me to run batches concurrently (would it?)
Thanks in advance for any pointers

Is it possible to retrieve the list of files when a DataFrame is written, or or have spark store it somewhere?

With a call like
df.write.csv("s3a://mybucket/mytable")
I obviously know where files/objects are written, but because of S3's eventual consistency guarantees, I can't be 100% sure that getting a listing from that location will return all (or even any) of the files that were just written. If I could get the list of files/objects spark just wrote, then I could prepare a manifest file for a Redshift COPY command without worrying about eventual consistency. Is this possible-- and if so how?
The spark-redshift library can take care of this for you. If you want to do it yourself you can have a look at how they do it here: https://github.com/databricks/spark-redshift/blob/1092c7cd03bb751ba4e93b92cd7e04cffff10eb0/src/main/scala/com/databricks/spark/redshift/RedshiftWriter.scala#L299
EDIT: I avoid further worry about consistency by using df.coalesce(fileCount) to output a known number of file parts (for Redshift you want a multiple of the slices in your cluster). You can then check how many files are listed in the Spark code and also how many files are loaded in Redshift stl_load_commits.
It's good to be aware of consistency risks; you can get it in listings with delayed create visibility and deleted objects still being found.
AFAIK, You can't get a list of files created, as its somewhere where tasks can generate whatever they want into the task output dir, which is then marshalled (via listing and copy) into the final output dir,
In the absence of a consistency layer atop S3 (S3mper, s3guard, etc), you can read & spin for "a bit" to allow for the shards to catch up. I have no good idea of what is a good value of "a bit".
However, if you are calling fs.write.csv(), you may have been caught by listing inconsistencies within the committer used to propagate task output to the job dir; s that's done in S3A via list + copy, see.

Resources