Windows - see active ETW sessions so that I can close one of them - etw

I am working with Event Tracing for Windows API, and from time to time, I run my application and it does not manage to close the ETW trace controller session after opening it.
Basically I do ::StartTrace([out] handle...) and do not close that handle when I'm finished with it (closing done by using ::StopTrace() function)
I'm looking for a tool that shows me the active sessions so I can close it manually. Without it I have to restart my PC in order for the controller session to be closed at shutdown.
Also, i the same ETW area (on Win 7), I understand that I should be able to see the data layouts for public MOF descriptions using wbemtest.exe. There I am supposed to enter in
- Connect -> Namespace = \\root\wmi\EventTrace
to see MOF data. But I get "The RPC server is unavailable". Using in that screen the dafaults values: IWBemLocator(Namespaces), How to interpret passsword = null, Authentication level = packet.
In the credentials area I have user and Password (which I tried) but there is another empty field - Authority. Is there a way to see MOF data ? I runed this elevated under Win 7.

You can use the command logman query -ets to see a list of currently running Trace Event Sessions.
For example, on Windows 10, you will see something like this:
C:\>logman query -ets
Data Collector Set Type Status
-------------------------------------------------------------------------------
AppModel Trace Running
FaceRecoTel Trace Running
FaceUnlock Trace Running
LwtNetLog Trace Running
Microsoft Security Client WMI Providers Trace Running
NtfsLog Trace Running
TileStore Trace Running
WiFiSession Trace Running
SCM Trace Running
UserNotPresentTraceSession Trace Running
CldFltLog Trace Running
SHS-05042018-095434-7-5f Trace Running
WDSC-05042018-095434-7-20 Trace Running
Diagtrack-Listener Trace Running
8696EAC4-1288-4288-A4EE-49EE431B0AD9 Trace Running
Cloud Files Diagnostic Event Listener Trace Running
The command completed successfully.
If you have created you own session, for example by using Microsoft.Diagnostics.Tracing.Session.TraceEventSession,
you will have given the session a unique name, and if it is running, you should see it in the list.
To kill an existing session, do this, as an administrator:
logman stop <SessionName> -ets
There are also some PowerShell Cmdlets, that can do similar things.

The QueryAllTraces function retrieves the properties and statistics for all event tracing sessions started on the computer for which the caller has permissions to query.
May I suggest to post the second part of your question as a seperate question?

The tracelog command line utility that comes along the Windows SDK allows you to do the same thing as QueryAllTraceswith the tracelog -l command.

Related

Why my console application is buffering log and socket messages?

Background:
I have a Python (console) application which includes a socket server. This application receives messages from a 3rd party client (start and stop messages from certain Process A) to control a recording data task (like start and stop recording). You can think of it as receiving messages via sockets to start and stop recording data from the same Process A for about 5 minutes. The 3rd party client sends messages for nearly 2 hours and then stops, and at the end, the Python application will be producing a group of files per session.
This application is running 24/7 (unattended on a Windows 10 Desktop machine) and there is a logging console open as well, but I have noticed that sometimes (Haven't identified a pattern) after running for 4 or 5 days, I access the system remotely, using TeamViewer, and the console window is showing that the last message is of 1-2 days ago. But once I click on the console or press a key in that console, I receive a full batch of messages from the sessions missed during those last days, thus, start and stop messages are received "simultaneously" leading to rubbish data files.
The code:
This is the socket server part of the code. I know I'm setting a buffer of 1024, but in normal operation, this buffer should not be full to read the data
with conn:
#display client information
logger.info('Connected with ' + addr[0] + ':' + str(addr[1]))
while self.enable:
#now keep talking with the client
data = conn.recv(1024)
if data:
self.data_cb(data)
else:
logger.debug("no data, closing connection." )
break
Question:
What is leading to this buffering behaviour?
Could it be...
the 3rd party client?
my Python application?
Something in Windows network stuff?
Has anyone had experienced something like this?
Any idea is really appreciated as I have no clue why is this happening? Thanks.
Edit - Additional info:
The application is running on a real desktop machine (no virtual machine)
The application has been able to work continuously for almost a month (just stopped for valid external reasons, power outage, version update, etc)
Last time I accessed through Teamviewer and noticed that the app wasn't receiving messages for a day (the app was running for 4 days at that time), BUT I assumed it was for another reason and planned to go to the site and check (Because something similar happened before). I accessed the next day, and it was the same. But on the third day, I click on the console and tried to review the messages and instantly the whole batch of messages from the previous 2 days appeared on the log.
The app has been running for 2 weeks and did not access the PC through TeamViewer during the last 4 days, in case that accessing it could prevent the issue to occur.
TL;DR
The selection feature of Command Prompt window prevents somehow the application from printing logging messages and/or reading data from the socket (both are in the same thread).
Well, I found the cause of this buffering behaviour but I am not sure if it is a known thing or not (It was not for me, so I will post later a specific question about that selection feature).
When I checked the system today I found that the console messages were frozen at 3 days before, so I clicked on the console window, and hit a key and all the messages for 3 days were shown at once. Then, I suspected of the selection feature of the console output.
I started the application as usual and followed these steps:
I selected a part of the content in the application console.
Using another console, I connected from a dummy client using ncat (At this point the expected client connected message didn't show up)
I sent dummy messages (didn't show up either)
I finished ncat connection (CTRL-C)
Clicked on the application console and hit any key
Voila! All the logging messages (regarding connection and data appeared), and all the messages that I sent using ncat were received as one big message.
EDIT: Didn't need to create a question, it's a known "feature". There are good questions here, here and here. The last one shows how to disable this "feature".

Python 3.7 Window 10 Service Not Working: Error starting service: The service did not respond to the start or control request in a timely fashion

I have write a simple python 3.7 window service and installed successfully.Now I am facing this error
"Error starting service: The service did not respond to the start or control request in a timely fashion."
Please Help me to fix this error.
Thanks
One of the most common errors from windows when starting your service is Error 1053: The service did not respond to the start or control request in a timely fashion. This can occur for multiple reasons but there are a couple things to check when you do get them:
Make sure your service is actually stopping:Note the main method has an infinite loop. The template above with break the loop if the stop even occurs, but that will only happen if you call win32event.WaitForSingleObject somewhere within that loop; setting rc to the updated value
Make sure your service actually starts: Same as the first one, if your service starts and does not get stuck in the infinite loop, it will exit. Terminating the service
Check your system and user PATH contains the necessary routes: The DLL path is extremely important for your python service as its how the script interfaces with windows to operate as a service. Additionally if the service is unable to load python - you are also hooped. Check by typing echo %PATH% in both a regular console and a console running with Administrator priveleges to ensure all of your paths have been loaded
Give the service another restart: Changes to your PATH variable may not kick in immediately - its a windows thing

Azure Service Fabric Activation Error 7148

I have a service fabric cluster which hosts numerous applications. One of the applications has a service type where the service is created, runs for a bit, and then is deleted. Everything works great, but the cluster virtually always has its state set to error because there will be a few of these in the "Unhealthy evaluations" section.
Error event: SourceId='System.Hosting', Property='CodePackageActivation:Code:EntryPoint'.
There was an error during CodePackage activation.The service host terminated with exit code:7148
I've wrapped both the program's main and RunAsync in exception handlers, but never see anything in analytics. Is there any way to look up what exit code 7148 means? Thanks.
7148 is a general error code that indicates that something failed in SF in the process of setting up or activating your service's host process. So that's the reason that you're not seeing any errors or exceptions - your code is never getting a chance to run.
Examples of things I've seen that led to 7148:
The exe was not actually a windows exe due to corruption
The service's manifest had a reference to a cert or some other pre-req like an endpoint that was incorrectly configured (like a port that was already in use or the wrong thumbprint for a cert)
Something blew up inside Windows that cause the process creation to fail, like a failure to correctly configure host networking for a container
Most of the times when I see this I have to look at the windows error logs to see what's really happening. The SF folks are also trying to capture more common causes of failures and reporting them as better health errors rather than relying on 7148.

IBM Cognos Report Studio: "The connection closed before the request is processed."

We are consuming TM1 cubes with Report Studio through Framework Manager.
Quite often when I am trying to come up with new solutions to my challenges in Report Studio, I get an error when I run the report, and then the server goes down. Then I have to restart the dispatchers (Cognos Administration -> Status -> System -> Right Click on the server -> Test Dispatchers -> Right Click on the server -> Start Dispatchers).
The error message that I get is:
The connection closed before the request is processed. If you are
using WebSphere Application Server, to reduce the frequency of this
error, increase the Persistent Timeout parameter for the Web container
transport chains in the administrative console. Increase the time in
10-15 second intervals until the error no longer or rarely occurs.
We are not using WebSphere, but Tomcat (default with the installation).
-> Increasing connection timout interval on WebSphere thus not applicable
-> The timeout interval in the Tomcat config seems to be 60 seconds (60000 ms)
More importantly: The error message shows immediately (after 1 second) when I run the report
-> Indicates to me that this is regardless of any timeout interval setting
Additional info: The error message comes almost always when I manually and dynamically attempt to build MUNs. However, sometimes (dunno when and why) it shows the MUN that I've created and tells me that it is invalid. Which is WAY much better for debugging.
Any suggestions on why this is happening and how to fix it would be greatly appreciated!
Edit 1: http://www.linkedin.com/groups/Product-Cognos-BI-1011-Cognos-3917273.S.143157206
This post states (almost at the bottom) that
When the Cognos BI report ask for a field that does not exist, the TM1
Application disconnects the connection. And the Cognos BI Report will
timeout.
Is this true? If so; why am I sometimes told that my MUN is invalid, whereas other times the connection is closed and the server shut down? Is it because even Report Studio thinks that my MUN is valid and tries to get it from the TM1 Server?
And additionally: Is it possible to change this behavior for the TM1 server?
Edit 2: Or change the BI server behavior so that it does not shut down when the TM1 connection is disconnected, but rather show an error of some kind?
Thanks again!
Edit 3: Okay, so I did some checking with the TM1 top utility (http://pic.dhe.ibm.com/infocenter/ctm1/v9r5m0/index.jsp?topic=%2Fcom.ibm.swg.im.cognos.tm1_op.9.5.1.doc%2Ftm1_op_id6961UsingtheTM1TopUtility_N160F47.html).
When a normal report is run, a new thread is shown in the monitoring list. This thread then disappears when I stop the BI server dispatchers, or automatically after approximately 5 minutes of idle time without any reports being run (according to the TM1 Top log dump).
Likewise, when the error occurs, a new thread is shown in the list. However, it disappears after a short second (probably because the BI server dispatchers are shut down).
I have therefore concluded that it is safe to assume (?) that the request seems to reach the TM1 server, and that TM1 returns something back (or simply closes the connection as suggested in the linkedin-post that I referenced in my first edit) . And hence, that it is safe to assume that this is something that have to be fixed on the BI server side(?).
The question is therefore more likely: Is it possible to change the BI server behavior so that it does not shut down when the TM1 server returns something invalid or closes the connection, and rather show some kind of error message instead?
Thanks for any input!

How to trace IIS worker process requests

I need to be able to monitor requests from IIS w3wp processes.
How can I see IIS worker process Requests?
To trace all requests currently executing in IIS worker processes
Open a command window and type logman startsession name–p "IIS:
Request Monitor" -ets and press ENTER.
Event Tracing for Windows prints to the screen details about the
trace session you just started, including the name of the session,
the file name where the trace data will be collected (session
name.etl by default), and whether or not the command was successful
Allow the trace session to run until you have reproduced the problem
or until your sites have processed enough requests to produce a
manageable data set
From the command prompt, type logman stopsession name-ets and press
ENTER.
I'm not as experienced on Windows vs Linux so Ravindra's answer seems interesting (is this just scheduling a particular event viewer style session or actually logging out deeper?).
As you particularly ask about 'IIS worker process Requests' you have two options.
GUI
Open inetmgr, go to the root server level, go to Worker Processes and double-click the worker process of your choice. A new screen will load and you will see anything that worker is currently processing.
Command-line
Rather than just give you a single command to copy and paste this article is a great starter - http://www.iis.net/learn/get-started/getting-started-with-iis/getting-started-with-appcmdexe
The particular command you want is under the section 'INSPECTING CURRENTLY EXECUTING REQUESTS'

Resources