ETW - Why sometimes ETL does not have valid manifest schema - etw

I have configured a collector set that also collects ETW trace. Result is ETL file.
Sometimes when I get a ETL file I do not have valid manifest and only what I see in events are guids. I can dump event and see the content, but I cannot see all columns and so on...
Sometimes it is all good. Provider names are valid, as well as data. The only difference is when it works, in event providers I have also ProviderName/ManifestData.
How can I set it up to always have manifest data?

Related

Creating custom metric descriptors continually results in HTTP 500

I think I've broken my project's custom metrics.
Earlier yesterday, I was playing around with the cloud monitoring api, and I created a metric descriptor and added some time series data to it using the latest python3 cloud monitoring library create_time_series call. Satisfied with the results, I deleted the descriptor using the library, which threw an error as I had incorrectly passed in the descriptor's name. I called it again with the correct name, and it succeeded, but now every call to create_time_series on this project fails with an HTTP 500. The error message included simply says to "Try again in a few seconds," which I have, to no avail.
I have verified that I can create time series data on other projects of mine, and it works as expected. The API Explorer available in google's API documentation for metrics also gets an HTTP 500 back on calls to this project, but works fine on others. CURLing requests yields the same results.
My suspicion is that I erroneously deleted the custom.googleapis.com endpoint in its entirety, and that is why I am unable to create new metric descriptors/time series data. Is there a way to view the state of this endpoint, or recreate it?
It is impossible to to delete the data stored in your Google Cloud project but deleting the metric descriptor renders the data inaccessible. Also, according to data retention policy, there is a deletion of this data when it expires.
To delete your custom metric descriptor, call the metricDescriptors.delete method. You can follow the steps in this guide.
You are calling CreateMetricDescriptor every time when you call CreateTimeSeries. Some or all of these calls specify no metric labels, and these calls are therefore overwriting the metric descriptor with one that has no labels. The calls to ‘CreateTimeSeries’, on the other hand, do specify metric labels, causing the metric labels to be auto-added to the descriptor.
Custom metric names typically begin with custom.googleapis.com/, which differs from the built-in metrics.
When you create a custom metric, you define a string identifier that represents the metric type. This string must be unique among the custom metrics in your Google Cloud project and it must use a prefix that marks the metric as a user-defined metric. For Monitoring, the allowable prefixes are custom.googleapis.com/ and external.googleapis.com/prometheus. The prefix is followed by a name that describes what you are collecting. For details on the recommended way to name a custom metric, see Naming conventions.

Bringing incremental data in from REST APIs into SQL azure

My needs are following:
- Need to fetch data from a 3rd party API into SQL azure.
The API's will be queried everyday for incremental data and may require pagination as by default any API response will give only Top N records.
The API also needs an auth token to work, which is the first call before we start downloading data from endpoints.
Due to last two reasons, I've opted for Function App which will be triggered daily rather than data factory which can query web APIs.
Is there a better way to do this?
Also I am thinking of pushing all JSON into Blob store and then parsing data from the JSON into SQL Azure. Any recommendations?
How long does it take to call all of the pages? If it is under ten minutes, then my recommendation would be to build an Azure Function that queries the API and inserts the json data directly into a SQL database.
Azure Function
Azure functions are very cost effective. The first million execution are free. If it takes longer than ten, then have a look at durable functions. For handling pagination, we have plenty of examples. Your exact solution will depend on the API you are calling and the language you are using. Here is an example in C# using HttpClient. Here is one for Python using Requests. For both, the pattern is similar. Get the total number of pages from the API, set a variable to that value, and loop over the pages; Getting and saving your data in each iteration. If the API won't provide the max number of pages, then loop until you get an error. Protip: Make sure specify an upper bound for those loops. Also, if your API is flakey or has intermittent failures, consider using a graceful retry pattern such as exponential backoff.
Azure SQL Json Indexed Calculated Columns
You mentioned storing your data as json files into a storage container. Are you sure you need that? If so, then you could create an external table link between the storage container and the database. That has the advantage of not having the data take up any space in the database. However, if the json will fit in the database, I would highly recommend dropping that json right into the SQL database and leveraging indexed calculated columns to make querying the json extremely quick.
Using this pairing should provide incredible performance per penny value! Let us know what you end up using.
Maybe you can create a time task by SQL server Agent.
SQL server Agent--new job--Steps--new step:
In the Command, put in your Import JSON documents from Azure Blob Storage sql statemanets for example.
Schedules--new schedule:
Set Execution time.
But I think Azure function is better for you to do this.Azure Functions is a solution for easily running small pieces of code, or "functions," in the cloud. You can write just the code you need for the problem at hand, without worrying about a whole application or the infrastructure to run it. Functions can make development even more productive, and you can use your development language of choice, such as C#, F#, Node.js, Java, or PHP.
It is more intuitive and efficient.
Hope this helps.
If you could set the default top N values in your api, then you could use web activity in azure data factory to call your rest api to get the response data.Then configure the response data as input of copy activity(#activity('ActivityName').output) and the sql database as output. Please see this thread :Use output from Web Activity call as variable.
The web activity support authentication properties for your access token.
Also I am thinking of pushing all JSON into Blob store and then
parsing data from the JSON into SQL Azure. Any recommendations?
Well,if you could dump the data into blob storage,then azure stream analytics is the perfect choice for you.
You could run the daily job to select or parse the json data with asa sql ,then dump the data into sql database.Please see this official sample.
One thing to consider for scale would be to parallelize both the query and the processing. If there is no ordering requirement, or if processing all records would take longer than the 10 minute function timeout. Or if you want to do some tweaking/transformation of the data in-flight, or if you have different destinations for different types of data. Or if you want to be insulated from a failure - e.g., your function fails halfway through processing and you don't want to re-query the API. Or you get data a different way and want to start processing at a specific step in the process (rather than running from the entry point). All sorts of reasons.
I'll caveat here to say that the best degree of parallelism vs complexity is largely up to your comfort level and requirements. The example below is somewhat of an 'extreme' example of decomposing the process into discrete steps and using a function for each one; in some cases it may not make sense to split specific steps and combine them into a single one. Durable Functions also help make orchestration of this potentially easier.
A timer-driven function that queries the API to understand the depth of pages required, or queues up additional pages to a second function that actually makes the paged API call
That function then queries the API, and writes to a scratch area (like Blob) or drops each row into a queue to be written/processed (e.g., something like a storage queue, since they're cheap and fast, or a Service Bus queue if multiple parties are interested (e.g., pub/sub)
If writing to scratch blob, a blob-triggered function reads the blob and queues up individual writes to a queue (e.g., a storage queue, since a storage queue would be cheap and fast for something like this)
Another queue-triggered function actually handles writing the individual rows to the next system in line, SQL or whatever.
You'll get some parallelization out of that, plus the ability to start from any step in the process, with a correctly-formatted message. If your processors encounter bad data, things like poison queues/dead letter queues would help with exception cases, so instead of your entire process dying, you can manually remediate the bad data.

Azure Stream Analytics - no output events

I have a problem with azure stream analytics job. Job monitor shows incoming input events (from Event Hub) but there are no output events or errors. Job is really simple, just to write every input to azure blob storage:
SELECT * FROM input
Any suggestions what could be wrong?
Update!
It was a bug in Azure Stream Analytics and it's already solved by Microsoft.
Did you try to include INTO clause?
SELECT
*
INTO
[output]
FROM
[input]
Since you have verified that events are coming into the system, it's likely that the job is encountering an error during processing or writing to output. Make sure that your input fields are in the set of supported data types and use a CAST statement if they aren't. To hone in on the root cause, you may also want to pick a field or two to project instead of using a SELECT *.
You mentioned that there aren't any errors but make sure to check the following sources of troubleshooting/diagnostic information:
Top-level status of your job (Processing, Degraded, etc.). Definition for each status is here: http://azure.microsoft.com/en-us/documentation/articles/stream-analytics-developer-guide/
Use the "Test Connection" button on your inputs and outputs to verify connectivity
Check the "Diagnosis" value for your inputs and outputs and click the name of the input/output for more detail, if applicable
Look in the Operations Logs for any Warnings or Errors

Get an entry ID for log4net ADONetAppender

I am using log4net in a web app, and log all page errors to a SQL server. I was wondering if there was any way to retrieve the entry ID generated by it. I'm going off of the documentation found here
http://logging.apache.org/log4net/release/config-examples.html
I want to use this ID as a reference number I can show to a customer so that they may contact customer support to lookup in the system and not have to go through a log file.
Apart from writing your own appender as floyddotnet suggested you could consider:
Use a GUID. You can easily generate it in your application and will serve most of your purposes. Drawback: It may be inconvenient for the customers if they try to tell your support stuff about it on the phone. If you have only email support than this is maybe not an issue.
Consider creating an incident number outside of the logging framework. A quick call to a stored procedure that returns an ID that you save in a nullable field in your log table.
A combination of the above: Use a Guid and after logging you call a stored procedure that creates an incident and returns the ID.
Writing an appender that returns the ID creates a dependency between your application and appenders that you normally do not have: Log4net was designed with a clear separation between logging and writing the log messages somewhere. The appender that you need would affect that separation.
Since the ID is generated by the database and not by log4net, I don't believe this information is available to you.
What I've done in using log4net for such conditions is to include a datetime stamp in the message that goes down to the millisecond and present that to the user as a reference number. You can do then do a simple SQL query to get to the message in the log table.
I'm not sure its posible but you can write your own Appender for log4net end store this information in the log4net-context.
Howto writing an appender for log4net:
http://www.alteridem.net/2008/01/10/writing-an-appender-for-log4net/
Context-Description:
http://logging.apache.org/log4net/release/manual/contexts.html

log4net - per user logging

Please help me with this query in using log4net.
I am using log4net in mhy we application. I am facing issues in configuring log4net to log errors at user level.
That is, If user X logs in, I like to create file name X and all error for user X should be written in X.log. Siilarly if Y user logs in the log file should be in name of Y.log and the most important point to note is, they could log in concurrently.
I tried the luck by creating log files whose name would be framed dynamically as soon as the user logs in. But issue here, if they are not using the application at the same time, the log files are creeated with correct name and writing as expected, but if both users have active sessions, log file is created only for user who FIRST logged in and error of second user has been recorded in log file that is created for FIRST user.
Please help me in this.
There has to be a better solution from this one, but you can change log4net configuration from code and even decide which config file to load - so you can do it in code, which is not as nice as editing an XML file.
so what you need to do, which is highly not recommended, is to create log4net configuration each time you call the logger static class, and do what needed based on the calling user.
again.. it doesn't feel right !
(and it will probably perform poorly).
another BETTER solution is to log everything to database (log4net supports it), with a user column, and then produce the logs from db....

Resources