"No Kernel!" error Azure ML compute JupyterLab - azure

When using the JupyterLab found within the azure ML compute instance, every now and then, I run into an issue where it will say that network connection is lost.
I have confirmed that the computer is still running.
the notebook itself can be edited and saved, so the computer/VM is definitely running
Of course, the internet is fully functional
On the top right corner next to the now blank circle it will say "No Kernel!"

We can't repro the issue, can you help gives us more details? One possibility is that the kernel has bugs and hangs (could be due to extensions, widgets installed) or the resources on the machine are exhausted and kernel dies. What VM type are you using? If it's a small VM you may ran out of resources.

Having troubleshooted the internet I found that you can force a reconnect (if you wait long enough, like a few minutes, it will do on its own) by using Kernel > Restart Kernel.
Based on my own experience, it seems like this is a fairly common issue but I did spend a few minutes figuring it out. Hope this helps others who are using this.

Check your browser console for any language-pack loading errors.
Part of our team had this issue this week, the root cause for us was some language-packs for pt-br not loading correctly, once the affected team members changed the page/browser language to en-us the problem was solved.

I have been dealing with the same issue, after some research around this problem I learnt my firewall was blocking JupyterLab, Jupyter and terminal, allowing the access to it solved the issue.

Related

Determining Website Crash Time on Linux Server

2.5 months ago, I was running a website on a Linux server to do a user study on 3 variations of a tool. All 3 variations ran on the same website. While I was conducting my user study, the website (i.e., process hosting the website) crashed. In my sleep-deprived state, I unfortunately did not record when the crash happened. However, I now need to know a) when the crash happened, and b) for how long the website was down until I brought it back up. I only have a rough timeframe for when the crash happened and for long it was down, but I need to pinpoint this information as precisely as possible to do some time-on-task analyses with my user study data.
The server runs Linux 16.04.4 LTS (GNU/Linux 4.4.0-165-generic x86_64) and has been minimally set up to run our website. As such, it is unlikely that any utilities aside from those that came with the OS have been installed. Similarly, no additional setup has likely been done. For example, I tried looking at a history of commands used in hopes that HISTTIMEFORMAT was previously set so that I could see timestamps. This ended up not being the case; while I can now see timestamps for commands, setting HISTTIMEFORMAT is not retroactive, meaning I can't get accurate timestamps for the commands I ran 2.5 months ago. That all being said, if you have an idea that you think might work, I'm willing to try (as long as it doesn't break our server)!
It is also worth mentioning that I currently do not know if it's possible to see a remote desktop or something of the like; I've been just ssh'ing in and use the terminal to interact with the server.
I've been bouncing ideas off with friends and colleagues, and we all feel that there must be SOMETHING we could use to pinpoint when the server went down (e.g., network activity logs showing spikes around the time that the user study began as well as when the website was revived, a log of previous/no longer running processes, etc.). Unfortunately, none of us know about Linux logs or commands to really dig deep into this very specific issue.
In summary:
I need a timestamp for either when the website crashed or when it was revived. It would be nice to have both (or otherwise determine for how long the website was down for), but this is not completely necessary
I'm guessing only a "native" Linux command will be useful since nothing new/special has been installed on our server. Otherwise, any additional command/tool/utility will have to be retroactive.
It may or may not be possible to get a remote desktop working with the server (e.g., to use some tool that has a GUI you interact with to help get some information)
Myself and my colleagues have that sense of "there must be SOMETHING we could use" between various logs or system information, such at network activity, process start times, etc., but none of us know enough about Linux to do deep digging without some help
Any ideas for what I can try to help figure out at least when the website crashed (if not also for how long it was down)?
A friend of mine pointed me to the journalctl command, which apparently maintains timestamps of past commands separately from HISTTIMEFORMAT and keeps logs that for me went as far back as October 7. It contained enough information for me to determine both when I revived my Node js server as well as when my Node js server initially went down

gitlab runner errors occasionally

I have gitlab setup with runners on dedicated VM machine (24GB 12 vCPUs and very low runner concurrency=6).
Everything worked fine until I've added more Browser tests - 11 at the moment.
These tests are in stage browser-test and start properly.
My problem is that, it sometimes succeeds and sometimes not, with totally random errors.
Sometimes it cannot resolve host, other times unable to find element on page..
If I rerun these failed tests, all goes green always.
Anyone has an idea on what is going wrong here?
BTW... I've checked, this dedicated VM is not overloaded...
I have resolved all my initial issues (not tested with full machine load so far), however, I've decided to post some of my experiences.
First of all, I was experimenting with gitlab-runner concurrency (to speed things up) and it turned out, that it really quickly filled my storage space. So for anybody experiencing storage shortcomings, I suggest installing this package
Secondly, I was using runner cache and artifacts, which in the end were cluttering my tests a bit, and I believe, that was the root cause of my problems.
My observations:
If you want to take advantage of cache in gitlab-runner, remember that by default it is accessible on host where runner starts only, and remember that cache is retrieved on top of your installation, meaning it overrides files from your project.
Artifacts are a little bit more flexible, cause they are stored/fetched from your gitlab installation. You should develop your own naming convention (using vars) for them to control, what is fetched/cached between stages and to make sure all is working, as you would expect.
Cache/Artifacts in your tests should be used with caution and understanding, cause they can introduce tons of problems, if not used properly...
Side note:
Although my VM machine was not overloaded, certain lags in storage were causing timeouts in the network and finally in Dusk, when running multiple gitlab-runners concurrently...
Update as of 2019-02:
Finally, I have tested this on a full load, and I can confirm my earlier side note, about machine overload is more than true.
After tweaking Linux parameters to handle big load (max open files, connections, sockets, timeouts, etc.) on hosts running gitlab-runners, all concurrent tests are passing green, without any strange, occasional errors.
Hope it helps anybody with configuring gitlab-runners...

MS Access 2016 program pops Stack errors and Security Warnings on non-developer PCs

I read all the rules on asking good questions here, I hope this will suffice.
I am having problems with an Access 2016 .ACCDE database.
The program runs fine on my machine. When I try to run it on my friends' machines (either the .ACCDE or .ACCDB version) it won't load and pops Out Of Stack Space errors and the Security Notice instead.
So, here's the set up:
The program was written in Access 2016. It is a Front End/Back End design. It's not a very big program 16 tables, 41 forms and 51 code modules.
I use the FMS Access Analyzer to help make sure my code is clean so the quality of the program is good to very good.
PRIOR versions of the program ran fine on all machines. I made several changes, improvements and moved it to the \Documents folder. Now we are having problems.
Machine 'A' (Development PC): New Win 10, 8GB RAM, Full MS Access (not runtime).
Machine 'B': Newish laptop 2GB RAM, lots of disk, Access 2016 Runtime. It ran prior versions of the program fine but now is blowing errors.
Machine 'C': Newish desktop 8GB RAM lots of free disk, full Access (not runtime). It also ran prior versions of the program fine but now is blowing errors.
Initally, the opening form would pop an error that the On Load event caused an Out Of Stack Space event. User says,
"Still happens after a fresh reboot. It does NOT happen with other .accde files." Both A and B machines are showing the same errors.
I made many changes but could not cure the Out Of Stack Space error. Finally, I went to an Autoexec Macro instead of a startup form. The autoexec macro that caused Error 3709 and aborted the macro. Machine B had CPU 49%, Mem 60%. The micro sd drive had 5.79GB used and 113GB free.
I deleted the macro. Went back to startup Form, still no luck.
I asked if he got a MS Security error, he said, "Yes, Microsoft Access Security Notice. Figuring just a general warning since it let's me go ahead and open the file. The directory where we have the program (C:Documents\Condor) was already a Trusted Location on my work machine."
So, does this sound like a Security error?
Is it a problem to have the program in the \Documents folder?
okay well there's a lot going on in this post - so to sanity check I would suggest getting back to basics: working just with .accdb and full license - - does it throw any errors at all?
an aside: because with runtime an error = crash....usually it just rolls over and closes without any message.
an aside: you don't need .accde for run time as it can't affect design, only if there are full license people you want to keep from going into design view would you need accde.
you have to be sure that the runtime / accde machines have the exact same path to the back end as your full license machine's path - as the path is stored in the front end
but sanity checking the accdb on the full license machine is the first step in debugging this... if this is not all okay then must be dealt with first.
I'm sorry, I thought I had posted that the problem was resolved.The table links broke because, as you pointed out, one person's This PC\Documents\whatever folder is different from anyone else's. (C:\Users\KentH\Documents\whatever vs. C:\Users\JohnT\Documents\whatever)
Thank you for your time and suggestions. Broken table links can cause the stack error, fer sure, and that can be caused by trying to put programs someplace other than the C:\Programs folder.
D'oh!

Cygwin intermittently loses it's mapped drives in /cygdrive

So, I have a collection of Windows Server 2016 virtual machines that are used to run some tests in pairs. To perform these tests, I copy a selection of scripts and files from the network on to the machine, before performing the tests.
I'm basically using a selection of scripts that have existed around here since before my time and whilst i would like to use other methods, so much of our infrastructure relies on these scripts that overhauling the system would be a colossal task.
First up, i sort out the mapped drives with
net use X: \\network\location1 /user:domain\user password
net use Y: \\network\location2 /user:domain\user password
and so on
Soon after, i use rsync to copy files from a location in /cygdrive/y/somewhere to /cygdrive/c/somewhere_else
During the rsync, i will get errors that "files have vanished" (I'm currently unable to post the exact error, I will edit this later to include this). When i check what's currently in the /cygdrive directory, all i see is /cygdrive/c and everything else has disappeared.
I've tried making a symbolic link to /cygdrive/y in a different location, I've tried including persistent:yes on the net use command, I've changed the power settings on the network card to not sleep. None of these work.
I'm currently looking into the settings for the virtual machines themselves at this point, but I have some doubts as we have other virtual windows machines that do not seem to have this issue.
Has anyone has heard of anything similar and/or knows of a decent method to troubleshoot this?
Right, so I've been working on this all day and finally noticed a positive change, but since my systems are in VMware's vCloud, this may not work for some people. It's was simply a matter of having the VM turned off and upgrading the Virtual Hardware Version to the latest version. I have noticed with this though, that upon a restart, one of the first messages that comes up mentions that the computer is "disabling group policies".
I did a bit of research into this and found out that Windows 8 and 10 (no mention of any Windows Server machines) both automatically update Group Policies in the background, disconnecting and reconnecting mapped drives to recreate them.
It's possible that changing the Group Policy drive from "recreate" to "update" should fix this issue, and that the Virtual Hardware update happened to resolve this in a similar manner.

Azure Server Incaccessible

One of my 10 Azure VMs running windows has suddenly became inaccessible! Azure Management Console shows the state of this VM as "running" the Dashboard shows no server activity since my last RDP logout 16 hours ago. I tried restarting the instance with no success, still inaccessible ( No RDP access, hosted application down, unsuccessful ping...).
I have changed the instance size from A5 to A6 using the management portal and everything went back to normal. Windows event viewer showed no errors except the unexpected shutdown today after my Instance size change. Nothing was logged between my RDP logout yesterday and the system startup today after changing the size.
I can't afford having the server down for 16 hours! Luckily this server was the development server.
How can I know what went wrong? Anyone faced a similar issue with Azure?
Thanks
there is no easy way to troubleshoot this without capturing it in a stuck state.
Sounds like you followed the recommended steps, i.e.:
- Check VM is running (portal/PowerShell/CLI).
- Check endpoints are valid.
- Restart VM
- Force a redeployment by changing the instance size.
To understand why it happened it would be necessary to leave it in a stuck state and open a support case to investigate.
There is work underway to make both self-service diagnosis and redeployment easier for situations like this.
Apparently nothing is wrong! After the reboot the machine was installing updates to complete the reboot. When I panicked, I have rebooted it again, stopped it, started it again and I have even changed its configuration thinking that it is dead. While in fact it was only installing updates.
Too bad that we cannot disable the automatic reboot or estimate the time it takes to complete.

Resources