Access Violation when trying to run MIT Scheme under Windows 10 - windows-10

I've been trying to install MIT Scheme under a 64-bit Windows 10 installation, however whenever I try to start the program I get the following error message:
>>The system has trapped within critical section "band load".
>>The trap is an ACCESS_VIOLATION trap.
>>Successful recovery is unlikely.
Then I'm presented with the option to try to recover, but the program then crashes with another ACCESS_VIOLATION.
I have already tried installing it in different directories and drives, setting the heap size, running with and without --edit and running it in multiple compatibility modes.
GDB reports this:
Thread 1 received signal SIGSEGV, Segmentation fault.
0x779f2a4c in win32u!NtUserMessageCall () from C:\WINDOWS\SysWOW64\win32u.dll
Is there a way to fix this problem or should I just not bother and try another implementation?
Thank you for your help!

Related

Code working on windows but launch failures on Linux

First and foremost: I am completely unable to create a MCVE, as I can only reproduce this when running a full code, any attempt to measure or replicate the error in a simpler environment makes it disappear. TDLR I suspect its not a code problem, but a configuration problem.
I have a piece of code for some mathematics on kernels in CUDA. I have a windows machine Win10 x64, GTX 1050, CUDA 9.2 and a Ubuntu 17.04, 2xGTX 1080 Ti, CUDA 9.1.
My code runs good on the windows machine. It is long (~700ms per kernel call for big samples) so I needed to increase the TDR value in windows. The code also (for now) forces it to run in 1 GPU, the first one that is selected with cudaSetDevice(0).
When I copy the same input data and code to the linux machine (I am using git, it is the same code), I get either
an illegal memory access was encountered
or
unspecified launch failure
in my error checking after the GPU call.
If I change the kernel to instead do the math, to just write a number in the output, the kernel executes properly. Other CUDA code (different functions that I have) works fine too. All this leads me to think that there is a problem outside the code, not with the code itself, nor with the general configuration of the drivers/environment variables.
I read that the xorg.conf can have an effect on the timeout of the kernels. I generated a xorg.conf (I had none) and remove the devices from there, as suggested here. I am connecting to the server remotely, and have no monitor plugged in. This changes nothing in the behavior, my kernels still error.
My question is: what else should I look? What linux specific configuration should I have a look at to pinpoint the cause of the kernel halts?
The error ended up being indeed illegal memory access.
These were caused by the fact that sizeof(unsigned long) is machine specific, and my linux machine returns 8 while my windows machine returns 4. As this code is called from MATLAB, and MATLAB (like some other high level languages such as python) defines the sizes of variables in bits (such as uint32(1)) there was a mismatch in the linux machine when doing memcpys. Turns out that this happened in a variable that is a index, so the kernels were reading garbage (due to the bad memcpy), but then triying to access another array at that location, creating an illegal memory error.
Too specific? yeah.

Matlab Parpool fl::except On linux - not enough inodes

Edit 2/5/17: After some work with our system deparment, it seems that this happens when Linux is low on inodes. My question is, therefore, why are the two related, and how could I have known it from the error message?
Problem Details:
I run Matlab R2016b on Linux (CentOS 6.3), and for the past couple of days I keep getting a non-familiar error when I try to do open parallel threads. Specifically, writing
parpool(3);
yields as always
>>starting parallel pool (parpool) using the local profile ....
But then, after a short while, I get
>> Caught unexpected fl::except::IInternalException
and it crashes. (The double 'I' in Internal is intentional).
Thanks.

CUDA device seems to be blocked

I'm running a small CUDA application: the QuickSort benchmark algorithm (see here). I have a dual system with a NVIDIA 660GTX (device 0) and 8600GTS (device 1).
Under Windows 8 and Visual Studio, the application compiles and runs flawlessly on device 0. Under Linux (Ubuntu 12.04 LTS), the app compiles with nvcc and gcc but suddenly stops in its tracks, returning a (unspecified launch failure).
I have two issues:
After this error, my GPU cannot perform some other operations, e.g., running the SDK example bandwidhtTest blocks when it performs the first data transfer, but running deviceQuery continues to perform well. How can I reset my GPU? I've already tried the cudaDeviceReset() method but it doesn't help
How can I find what is going wrong under linux? Has someone a clue or seen this before?
Thanks in advance for your help!
Using the nvidia-smi utility you can reset the GPU if it is compatible
To my knowledge and experience, (unspecified launch failure) usually referees to segmentation fault. Have you specified the right GPU to use? Try to use cuda-memcheck to see if there is any memory out of bound scenario.
From my experience XID 31 was always caused by accessing bad pointer (aka Memory access violation).
I'd first pursue this trail. Run your application with cuda memcheck. Like that cuda-memcheck you_app args to your app and see if it finds any wrong memory accesses.
Also try stepping though the code with cuda-gdb or Nsight Eclipse Edition.
I've found that using
cuda-memcheck -b ...
prevents the device from locking up.

Porting Unix ada app to Linux: Seg fault before program begins

I am an intern who was offered the task of porting a test application from Solaris to Red Hat. The application is written in Ada. It works just fine on the Unix side. I compiled it on the linux side, but now it is giving me a seg fault. I ran the debugger to see where the fault was and got this:
Warning: In non-Ada task, selecting an Ada task.
=> runtime tasking structures have not yet been initialized.
<non-Ada task> with thread id 0b7fe46c0
process received signal "Segmentation fault" [11]
task #1 stopped in _dl_allocate_tls
at 0870b71b: mov edx, [edi] ;edx := [edi]
This seg fault happens before any calls are made or anything is initialized. I have been told that 'tasks' in ada get started before the rest of the program, and the problem could be with a task that is running.
But here is the kicker. This program just generates some code for another program to use. The OTHER program, when compiled under linux gives me the same kind of seg fault with the same kind of error message. This leads me to believe there might be some little tweak I can use to fix all of this, but I just don't have enough knowledge about Unix, Linux, and Ada to figure this one out all by myself.
This is a total shot in the dark, but you can have tasks blow up like this at startup if they are trying to allocate too much local memory on the stack. Your main program can safely use the system stack, but tasks have to have their stack allocated at startup from dynamic memory, so typcially your runtime has a default stack size for tasks. If your task tries to allocate a large array, it can easily blow past that limit. I've had it happen to me before.
There are multiple ways to fix this. One way is to move all your task-local data into package global areas. Another is to dynamically allocate it all.
If you can figure out how much memory would be enough, you have a couple more options. You can make the task a task type, and then use a
for My_Task_Type_Name'Storage_Size use Some_Huge_Number;
statement. You can also use a "pragma Storage_Size(My_Task_Type_Name)", but I think the "for" statement is preferred.
Lastly, with Gnat you can also change the default task stack size with the -d flag to gnatbind.
Off the top of my head, if the code was used on Sparc machines, and you're now runing on an x86 machine, you may be running into endian problems.
It's not much help, but it is a common gotcha when going multiplat.
Hunch: the linking step didn't go right. Perhaps the wrong run-time startup library got linked in?
(How likely to find out what the real trouble was, months after the question was asked?)

Apache w/mod_rails segmentation fault

I am running Redmine on Apache 2 with mod_rails (passenger) 2.0.3 and Enterprise Ruby 1.8.6. Every so often I get a segfault from Apache when I try to login. Anyone know how I can debug this issue? I see something like this in Apache's error.log:
[Mon Jan 19 17:09:48 2009] [notice] child pid 8714 exit signal Segmentation fault (11)
The only way I can get the application to work after that is restart the whole system (restarting Apache only doesn't help).
First steps are:
Find out where the core file is being left on your system (enable core dumps if necessary).
Run file(1) on the resulting core file. This will probably say "... generated by httpd", but it's as well to check.
Fire up gdb against the executable name from (2) and the core file from (1), and start digging. The command where (or bt) is a good place to start: this will give you a stack trace at the time the process dumped core.
It sounds like you don't have a mass of C coding experience, so good luck! Tracking down this kind of error can be a real dog. You can try posting the stack trace from (3) here, but don't hold your breath whilst waiting for an answer. At best, the failing function name might be a good string to feed to Google.
I ran into a similar issue with a segfault (11). Found the following question on ServerFault which offered an upgrade as a solution.
Was running an older version of Ubuntu and had the segfault problem. A do-release-upgrade brought my system to Ubuntu 11.10 and the problem magically went away.

Resources