I have virtual machine which has passthrough infiniband nic. I am testing inifinband functionality using hello world program. I am new in this world so may need help to understand following error
I have install openmpi on ubuntu using apt-get command
spatel#ib-1:~$ mpirun -V
mpirun (Open MPI) 4.0.3
Infiniband nic
spatel#ib-1:~$ lspci -nn | grep -i mell
00:05.0 Infiniband controller [0207]: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] [15b3:101c]
My hello world program
spatel#ib-1:~$ mpirun -np 2 ./mpi_hello_world
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: ib-1
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4124
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default. The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.
Local host: ib-1
Local adapter: mlx5_0
Local port: 1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: ib-1
Local device: mlx5_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI failed an OFI Libfabric library call (fi_endpoint). This is highly
unusual; your job may behave unpredictably (and/or abort) after this.
Local host: ib-1
Location: mtl_ofi_component.c:629
Error: Unspecified error (256)
--------------------------------------------------------------------------
Hello world from processor ib-1, rank 0 out of 2 processors
Hello world from processor ib-1, rank 1 out of 2 processors
[ib-1:65704] 1 more process has sent help message help-mpi-btl-openib.txt / no device params found
[ib-1:65704] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[ib-1:65704] 1 more process has sent help message help-mpi-btl-openib.txt / ib port not selected
[ib-1:65704] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
[ib-1:65704] 1 more process has sent help message help-mtl-ofi.txt / OFI call fail
It throws bunch of warning and error so not sure what i should understand, does it use ib interface to run this job?
UPDATE
After suggested by #Gilles Gouaillardet in comment i have compiled ompi with ucx and now i am seeing following output during hello_world prog
spatel#ib-1:~$ /home/spatel/ompi/bin/mpirun -np 2 ./hello_world_ucx --mca opal_common_ucx_opal_mem_hooks 1
--------------------------------------------------------------------------
PMIx was unable to find a usable compression library
on the system. We will therefore be unable to compress
large data streams. This may result in longer-than-normal
startup times and larger memory footprints. We will
continue, but strongly recommend installing zlib or
a comparable compression library for better user experience.
You can suppress this warning by adding "pcompress_base_silence_warning=1"
to your PMIx MCA default parameter file, or by adding
"PMIX_MCA_pcompress_base_silence_warning=1" to your environment.
--------------------------------------------------------------------------
Hello world from processor ib-1, rank 0 out of 2 processors
Hello world from processor ib-1, rank 1 out of 2 processors
Now to test my infiniband network i created similar another vm ib-2 with inifinband nic to see hello_world using RDMA for communication.
/home/spatel/ompi/bin/mpirun --host ib-1,ib-2 -np 2 ./hello_world_ucx --mca opal_common_ucx_opal_mem_hooks 1
Same time i run tcpdump on ibs5 interface which is my Infiniband nic but i see no activity and notice MPI messages still using traditional nic eth0 for communication. how do i make sure it use only infiniband for MPI (i don't have any IP configure on ib nic)
I'm using tpm2-tools version 3.1.3 on Raspberry Pi CM4. It turns out it doesn't have tpm2_flushcontext which was added in v4.0.
Is there any other tools in tpm2-tools 3.1.3 that can remove contexts? My goal is to free up some memory in TPM as it returned the following error:
$ tpm2_rc_decode 0x902
error layer
hex: 0x0
identifier: TSS2_TPM_RC_LAYER
description: Error produced by the TPM
format 0 warning code
hex: 0x02
name: TPM2_RC_OBJECT_MEMORY
description: out of memory for object contexts
I want to reset SRK.
It can also be seen as a Factory Reset of the TPM.
I tried tpm2_clear but it doesn't work.
Machine: VMWare Workstation
# tpm2_getcap properties-variable
TPM2_PT_PERSISTENT:
ownerAuthSet: 0
endorsementAuthSet: 0
lockoutAuthSet: 1
reserved1: 0
disableClear: 0
inLockout: 0
tpmGeneratedEPS: 1
reserved2: 0
TPM2_PT_STARTUP_CLEAR:
phEnable: 0
shEnable: 1
ehEnable: 1
phEnableNV: 1
reserved1: 0
orderly: 0
# tpm2_clear -c p
WARNING:esys:src/tss2-esys/api/Esys_Clear.c:282:Esys_Clear_Finish() Received TPM Error
ERROR:esys:src/tss2-esys/api/Esys_Clear.c:97:Esys_Clear() Esys Finish ErrorCode (0x00000185)
ERROR: Esys_Clear(0x185) - tpm:handle(1):hierarchy is not enabled or is not correct for the use
ERROR: Unable to run tpm2_clear
In VMWare, phEnable is not set even after Cold-Start.
Machine: HP EliteBook 850 G5
~# tpm2_getcap properties-variable
TPM2_PT_PERSISTENT:
ownerAuthSet: 0
endorsementAuthSet: 0
lockoutAuthSet: 1
reserved1: 0
disableClear: 0
inLockout: 0
tpmGeneratedEPS: 0
reserved2: 0
TPM2_PT_STARTUP_CLEAR:
phEnable: 1
shEnable: 0
ehEnable: 1
phEnableNV: 1
reserved1: 0
orderly: 1
# tpm2_clear -c p
WARNING:esys:src/tss2-esys/api/Esys_Clear.c:282:Esys_Clear_Finish() Received TPM Error
ERROR:esys:src/tss2-esys/api/Esys_Clear.c:97:Esys_Clear() Esys Finish ErrorCode (0x000009a2)
ERROR: Esys_Clear(0x9A2) - tpm:session(1):authorization failure without DA implications
ERROR: Unable to run tpm2_clear
# tpm2_clear -c o
ERROR: Unexpected handle - TPM2_RH_OWNER
ERROR: Unknown or unsupported handle, got: "o"
ERROR: Cannot make sense of object context "o"
ERROR: Invalid lockout authorization
ERROR: Unable to run tpm2_clear
Is there any way SRK reset?
You're on the right track, tpm2_clear clears the owner hierarchy, that is the SRK and all its child keys.
According to the command specification (sec. 24.6) there are multiple reasons why tpm2_clear could fail.
1. The platform hierarchy is disabled
This error is quite subtle because it is not mentioned explicitly in the command description for TPM2_Clear. By default, TPM2_Clear operates on the platform hierarchy. However, the platform hierarchy can be disabled (phEnable bit clear) via the command TPM2_HierarchyControl:
tpm2_hierarchycontrol -C p phEnable clear
Any future use of the platform hierarchy should result in the return code TPM2_RC_HANDLE = 0x0000010B. However, there is no TPM command to re-enable the platform hierarchy. Architecture specification (Sec 13.3):
When phEnable is CLEAR, a _TPM_Init is required to SET it.
It seems you need to reset your TPM (toggling the hardware reset signal or power off) to re-enable the platform hierarchy.
If this does not solve your problem, see the next potential issue.
2. TPM2_Clear Command is disabled
This is probably not your problem, because it would yield another error (return code TPM_RC_DISABLED = 0x0000120).
The TPM2_Clear command can be disabled (disableClear bit set). This is done via the command TPM2_ClearControl. To enable clearing, call tpm2_clearcontrol -Cp c. Like tpm2_clear, tpm2_clearcontrol requires platform authorization.
When I run the following .d script with DTrace for Linux:
#!/usr/sbin/dtrace -s
syscall::open:entry
{
#[ustack()] = count();
}
I get many errors of the following kind:
dtrace: error on enabled probe ID 2 (ID 320864: syscall:x64:open:entry): invalid address (0xfffd) in action #2
dtrace: error on enabled probe ID 2 (ID 320864: syscall:x64:open:entry): invalid address (0xfffd) in action #2
dtrace: error on enabled probe ID 2 (ID 320864: syscall:x64:open:entry): invalid address (0xfffd) in action #2
What should I do to fix them?
You should try a later dtrace release. I believe this was fixed - the stack walk code had to keep on being rewritten due to erraticness of compilers, distros and 32 vs 64 bit kernels.
When I use the web application, the application logs me out. I think it might be an IIS recycle.
EventViewer Message:
.NET Runtime version 2.0.50727.4927 - Fatal Execution Engine Error (000007FEF582FA42) (80131506)
----------
Faulting application name: w3wp.exe, version: 7.5.7600.16385, time stamp: 0x4a5bd0eb
Faulting module name: mscorwks.dll, version: 2.0.50727.4927, time stamp: 0x4a27466f
Exception code: 0xc0000005
Fault offset: 0x00000000006be81f
Faulting process id: 0x%9
Faulting application start time: 0x%10
Faulting application path: %11
Faulting module path: %12
Report Id: %13
-------------
Fault bucket , type 0
Event Name: APPCRASH
Response: Not available
Cab Id: 0
Problem signature:
P1: w3wp.exe
P2: 7.5.7600.16385
P3: 4a5bd0eb
P4: mscorwks.dll
P5: 2.0.50727.4927
P6: 4a27466f
P7: c0000005
P8: 00000000006be81f
P9:
P10:
Attached files:
These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_w3wp.exe_6a41af6fc5f73afd65a4b62225f4f0ff51ba820_60e9d666
Analysis symbol:
Rechecking for solution: 0
Report Id: d745615a-e67c-11df-83c0-d8d385b73c58
Report Status: 4
I analyzed the crash dump with windbg but I dont know how can I solve and what is problem:
0:056> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
Unable to load image C:\Windows\assembly\NativeImages_v2.0.50727_64\mscorlib\9a017aa8d51322f18a40f414fa35872d\mscorlib.ni.dll, Win32 error 0n2
*** WARNING: Unable to verify checksum for mscorlib.ni.dll
Unable to load image C:\Windows\assembly\NativeImages_v2.0.50727_64\System.Web.RegularE#\bf11731ff6e75c72e9939a05151e7484\System.Web.RegularExpressions.ni.dll, Win32 error 0n2
*** WARNING: Unable to verify checksum for System.Web.RegularExpressions.ni.dll
Unable to load image C:\Windows\assembly\NativeImages_v2.0.50727_64\System.Web\d753bba0990df9a19883f05d5b681d3b\System.Web.ni.dll, Win32 error 0n2
*** WARNING: Unable to verify checksum for System.Web.ni.dll
Unable to load image C:\Windows\assembly\NativeImages_v2.0.50727_64\System.Data\46a0336046744a9f29986b208b8d38d4\System.Data.ni.dll, Win32 error 0n2
*** WARNING: Unable to verify checksum for System.Data.ni.dll
Unable to load image C:\Windows\winsxs\amd64_microsoft.windows.gdiplus_6595b64144ccf1df_1.1.7600.16385_none_2b4f45e87195fcc4\GdiPlus.dll, Win32 error 0n2
*** WARNING: Unable to verify timestamp for GdiPlus.dll
Unable to load image C:\Windows\assembly\NativeImages_v2.0.50727_64\System\247913fa7ae6fcf04ea33d28d24ab611\System.ni.dll, Win32 error 0n2
*** WARNING: Unable to verify checksum for System.ni.dll
GetPageUrlData failed, server returned HTTP status 500
URL requested: http://watson.microsoft.com/StageOne/w3wp_exe/7_5_7600_16385/4a5bd0eb/mscorwks_dll/2_0_50727_4927/4a27466f/c0000005/006be81f.htm?Retriage=1
FAULTING_IP:
mscorwks!COMCryptography::_GetKeyParameter+24f
000007fe`f5dde81f 418b4514 mov eax,dword ptr [r13+14h]
EXCEPTION_RECORD: ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 000007fef5dde81f (mscorwks!COMCryptography::_GetKeyParameter+0x000000000000024f)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 0000000000000014
Attempt to read from address 0000000000000014
PROCESS_NAME: w3wp.exe
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
EXCEPTION_PARAMETER1: 0000000000000000
EXCEPTION_PARAMETER2: 0000000000000014
READ_ADDRESS: 0000000000000014
FOLLOWUP_IP:
mscorwks!COMCryptography::_GetKeyParameter+24f
000007fe`f5dde81f 418b4514 mov eax,dword ptr [r13+14h]
MOD_LIST: <ANALYSIS/>
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
MANAGED_STACK: !dumpstack -EE
No export dumpstack found
MANAGED_BITNESS_MISMATCH:
Managed code needs matching platform of sos.dll for proper analysis. Use 'x64' debugger.
ADDITIONAL_DEBUG_TEXT: Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]
LAST_CONTROL_TRANSFER: from 000007fef3a0bf50 to 000007fef5dde81f
FAULTING_THREAD: ffffffffffffffff
DEFAULT_BUCKET_ID: NOSOS
PRIMARY_PROBLEM_CLASS: NOSOS
BUGCHECK_STR: APPLICATION_FAULT_NOSOS_NULL_CLASS_PTR_DEREFERENCE_INVALID_POINTER_READ_WRONG_SYMBOLS_CALL_STACKIMMUNE
STACK_TEXT:
00000000`00000000 00000000`00000000 w3wp.exe+0x0
SYMBOL_NAME: w3wp.exe
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: w3wp
IMAGE_NAME: w3wp.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 4a5bd0eb
STACK_COMMAND: ** Pseudo Context ** ; kb
FAILURE_BUCKET_ID: NOSOS_c0000005_w3wp.exe!Unknown
BUCKET_ID: X64_APPLICATION_FAULT_NOSOS_NULL_CLASS_PTR_DEREFERENCE_INVALID_POINTER_READ_WRONG_SYMBOLS_CALL_STACKIMMUNE_w3wp.exe
Followup: MachineOwner
I solved this problem.
Solution Steps:
First I open ControlPanel> ActionCenter> Problem Reports
I saw list of problems. and my IIS Crash problem.
I entered item detail and save it is dumps.
I downloaded Windbg then open this dump with it.
and enter command !analyze -v
Windbg analized and show a text like this:
GetPageUrlData failed, server returned HTTP status 404
URL requested: http://watson.microsoft.com/StageOne/w3wp_exe/7_5_7600_16385/4a5bd0eb/mscorwks_dll/2_0_50727_4927/4a27466f/c0000005/006be81f.htm?Retriage=1
FAULTING_IP:
mscorwks!COMCryptography::_GetKeyParameter+24f
000007fe`f5dde81f 418b4514 mov eax,dword ptr [r13+14h]
EXCEPTION_RECORD: ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 000007fef5dde81f (mscorwks!COMCryptography::_GetKeyParameter+0x000000000000024f)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 0000000000000014
Attempt to read from address 0000000000000014
PROCESS_NAME: w3wp.exe
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
EXCEPTION_PARAMETER1: 0000000000000000
EXCEPTION_PARAMETER2: 0000000000000014
READ_ADDRESS: 0000000000000014
FOLLOWUP_IP:
mscorwks!COMCryptography::_GetKeyParameter+24f
000007fe`f5dde81f 418b4514 mov eax,dword ptr [r13+14h]
MOD_LIST: <ANALYSIS/>
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
MANAGED_STACK: !dumpstack -EE
No export dumpstack found
MANAGED_BITNESS_MISMATCH:
Managed code needs matching platform of sos.dll for proper analysis. Use 'x64' debugger.
ADDITIONAL_DEBUG_TEXT: Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]
LAST_CONTROL_TRANSFER: from 000007fef3a0bf50 to 000007fef5dde81f
FAULTING_THREAD: ffffffffffffffff
DEFAULT_BUCKET_ID: NOSOS
PRIMARY_PROBLEM_CLASS: NOSOS
BUGCHECK_STR: APPLICATION_FAULT_NOSOS_NULL_CLASS_PTR_DEREFERENCE_INVALID_POINTER_READ_WRONG_SYMBOLS_CALL_STACKIMMUNE
STACK_TEXT:
00000000`00000000 00000000`00000000 w3wp.exe+0x0
SYMBOL_NAME: w3wp.exe
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: w3wp
IMAGE_NAME: w3wp.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 4a5bd0eb
STACK_COMMAND: ** Pseudo Context ** ; kb
FAILURE_BUCKET_ID: NOSOS_c0000005_w3wp.exe!Unknown
BUCKET_ID: X64_APPLICATION_FAULT_NOSOS_NULL_CLASS_PTR_DEREFERENCE_INVALID_POINTER_READ_WRONG_SYMBOLS_CALL_STACKIMMUNE_w3wp.exe
WATSON_STAGEONE_URL:
Followup: MachineOwner
0:056> .exr 0xffffffffffffffff
ExceptionAddress: 000007fef5dde81f (mscorwks!COMCryptography::_GetKeyParameter+0x000000000000024f)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 0000000000000014
Attempt to read from address 0000000000000014
So I added this code to Decrypt Method: if (String.IsNullOrEmpty(value)) return String.Empty;
public static string Decrypt(string value)
{
SymmetricAlgorithm algorithm = SymmetricAlgorithm.Create();
ICryptoTransform decryptor = algorithm.CreateDecryptor(EncryptionKey, EncryptionVector);
// I control value
**if (String.IsNullOrEmpty(value))
return String.Empty;**
byte[] encryptedBytes = Convert.FromBase64String(value);
MemoryStream memoryStream = new MemoryStream(encryptedBytes);
CryptoStream cryptoStream = new CryptoStream(memoryStream, decryptor, CryptoStreamMode.Read);
...
}
problem was solved.
I know I'm late, but I just debug a similar problem with WinDbg. I finally managed to find the cause of the problem.
It's a reported bug at microsoft:
http://connect.microsoft.com/VisualStudio/feedback/details/330926/cryptostream-flushfinalblock-fatal-on-64-bit-os-if-bytearray-is-null
I just add this to the discussion as a lead for others who search the web.
Tess Ferrandez has some great tutorials and information on how to use DebugDiag and WinDbg to nail down why this is happening:
If it is broken, fix it you should
There's also a lab to walk you through analysing worker process crashes:
.NET Debugging Demos Lab 5: Crash
.NET Debugging Demos Lab 2: Crash - Review
I ran into exactly the same symptoms, and the real reason was that I accidentally created an infinite recursion, which in turn caused a stack overflow. Please note that you need to restart the app pool after correcting the error.
The ASP.NET worker process is crashing with Access Violation. This is usually a result of dereferencing a NULL or an invalid pointer. Attempting to access a null reference in C# normally generates a managed exception which ASP.NET is capable of catching, I would assume that your web app is using COM interop or is invoking unmanaged (C++) code that crashes.
Unfortunately, that's about as much as we can tell you from the info above. You will need to debug your process to understand the exact cause of the crash.