gitlab runner errors occasionally

gitlab runner errors occasionally - gitlab

I have gitlab setup with runners on dedicated VM machine (24GB 12 vCPUs and very low runner concurrency=6).
Everything worked fine until I've added more Browser tests - 11 at the moment.
These tests are in stage browser-test and start properly.
My problem is that, it sometimes succeeds and sometimes not, with totally random errors.
Sometimes it cannot resolve host, other times unable to find element on page..
If I rerun these failed tests, all goes green always.
Anyone has an idea on what is going wrong here?
BTW... I've checked, this dedicated VM is not overloaded...

I have resolved all my initial issues (not tested with full machine load so far), however, I've decided to post some of my experiences.
First of all, I was experimenting with gitlab-runner concurrency (to speed things up) and it turned out, that it really quickly filled my storage space. So for anybody experiencing storage shortcomings, I suggest installing this package
Secondly, I was using runner cache and artifacts, which in the end were cluttering my tests a bit, and I believe, that was the root cause of my problems.
My observations:
If you want to take advantage of cache in gitlab-runner, remember that by default it is accessible on host where runner starts only, and remember that cache is retrieved on top of your installation, meaning it overrides files from your project.
Artifacts are a little bit more flexible, cause they are stored/fetched from your gitlab installation. You should develop your own naming convention (using vars) for them to control, what is fetched/cached between stages and to make sure all is working, as you would expect.
Cache/Artifacts in your tests should be used with caution and understanding, cause they can introduce tons of problems, if not used properly...
Side note:
Although my VM machine was not overloaded, certain lags in storage were causing timeouts in the network and finally in Dusk, when running multiple gitlab-runners concurrently...
Update as of 2019-02:
Finally, I have tested this on a full load, and I can confirm my earlier side note, about machine overload is more than true.
After tweaking Linux parameters to handle big load (max open files, connections, sockets, timeouts, etc.) on hosts running gitlab-runners, all concurrent tests are passing green, without any strange, occasional errors.
Hope it helps anybody with configuring gitlab-runners...

Related

Intermediate step(s) between manual prod and CI/CD for Node/Next on EC2

For about 18 months now I've been working in Node; and for the last 6 months I've been slowly migrating my existing WordPress websites to NextJS.
To date, I've been deploying to production manually. I log into my production server, checkout the latest release from GitHub, build, and do a pm2 restart.
Even though the above workflow seems to be the most commonly documented around the internet, it's always felt a little wrong to me.
Recently, I found myself in a situation where I needed to customise some 3rd party code. So, my main code now has a line in package.json that says
{
...
"dependencies": {
...
"react-share": "file:../react-share/react-share-4.4.1.tgz",
...
},
...
}
which implies that I'm going to checkout my custom react-share, build it somewhere on the production server, change this line to point to wherever I put it, and then rebuild.
Also, I'm using Prisma, which means that every time I deploy, before I do a build, I need to do an npx prisma generate to create the client.
This now all seems really, really wrong.
I don't know how a "simple" CI/CD environment might look, but whatever it looks like, it feels like overkill. It's just me doing development, and my production environment is a single EC2 server sitting behind AWS CloudFront.
It seems to me that I should be doing something more/different than what I'm currently doing, in service to someday moving to a CI/CD model, if/when I have a whole team working on this, or sufficient users that I have multiple load-balanced servers and need production to be continually up.
In particular, it feels like I shouldn't be building on the production server.
Are there any intermediary step(s) I can/should be taking for faster/less-error-prone/less-down-time deployment to a single EC2 instance for Next/Node apps, between manually deploying as I am currently, and some sort of CI/CD setup? Or are my only choices to do what I'm doing now, or go research how to do CI/CD?

You're approaching towards your initial stages of what technically is called DevOps, if not already as it appears from your context. What you're asking is a broad topic, which is an understatement, and explaining each and everything here will almost be like writing an article about it, at the very least.
However, I'll brief you overall on how to approach with this.
I don't know how a "simple" CI/CD environment might look, but whatever it looks like, it feels like overkill.
Simplicity & complexity are relative terms. A system which is complicated for one might be simple for another. CI/CD doesn't define any laws that you need to follow in order to create a perfect deployment procedure, as everyone's deployment requirement is unique (at some point).
If I mention it in bullet points, what you need to figure out before you start with setting up CI&CD, is -
The sequence of steps your deployment procedure needs in order to deploy your latest version. As you have stated already that you've been doing deployment manually, that means you already know your steps. All you need to do is to fine-tune each step so that it shouldn't require manual intervention while being executed automatically by the CI program.
Choose a CI program, like Travis CI, Circle CI, or if you're using GitHub, it has it's own GitHub Actions for the purpose, you can read their documentation for more details. Your CI program will be responsible for executing your deployment steps which you'll mention to it in whichever format it understands (mostly .yml).
The CI program will execute your steps on behalf of you based on the condition which you'll provide, (like when code is pushed on prod branch). It will execute the commands on a machine (like your EC2), specifically, GitHub actions runner will be responsible for running your commands on your machine, the runner should be setup beforehand in the instance you intend to deploy your code on. More details on runners can be found in relevant documentations.
Since the runner will actually execute the commands on your machine, make sure that all required commands and parameters, including the concerned files & directories are accessible to the runner program, from permissions point of view at least. For example, running your npx prisma generate command should require that npx command is available and executable in the system, and the concerned folders in which the command will CRUD files is accessible by the runner program. Similarly for all other commands.
Get your hands on bash scripting as well.
If your steps contain dynamic info, like the one you mentioned that in your package.json an npm script needs to be updated, then a custom bash script created to update the same automatically will help, for instance. There will be however, several other ways depending on the specific nature of the dynamic changes.
The above points are huge (by huge, I mean astronomically huge) oversimplification of the ways through which CI&CD pipelines are setup. But I hope you get the idea of it at least.
In particular, it feels like I shouldn't be building on the production server.
Your feeling is legitimate. You should replicate your production environment (including deployment procedures) into a separate development environment as close as possible, in order to have all your experiments, development and testing done separately from production environment, and after successful evaluation on the development environment, deploy on production one. Steps like building will most likely be done on both environments, as it is something your program needs to run, irrespective of the environment it is running in. Your future team will appreciate this separation of environments.
if/when I have a whole team working on this, or sufficient users that I have multiple load-balanced servers and need production to be continually up.
Again, this small statement in itself is a proper domain of IT department, known as System Design, in which, to put it simply, you or your team will create an architecture for your whole system which will support your business requirements and scaling as your audience increases, which is something a simple Stackoverflow QnA won't suffice to explain.
Therefore,
or go research how to do CI/CD?
is what I'd recommend and you should also feel is the right way ahead, after reading everything above.
Useful references to begin with (not endorsing any resources, you can search for relevant/better resources too)
GitHub Actions self-hosted runners
System Design - Getting started
Bash scripting
Development, Staging, Production

Separate environments for learning or trying out vs production (sandboxes?)

Can you suggest me a way of separating learning/trying out vs production in the same computer? I am in such a place that I know a lot of JS and production ready skills whilst sometimes require probing or trying out simpler stuff or basics. I presume that a lot of engineers are also in a similar place.
This is the situation I am facing with right now.
I wanted to install redis and configure it while trying out something interested.
In a separate project I needed another clean redis configuration and installation.
In front-end side I tried and installed a few npm packages globally.
At some point I installed python 3.4 now require 3.6
At some point I installed nginx and configured it, now need another configuration and wipe the previous one out,
If I start a big project right now I feel like my computer will eventually let me down due to several attempts I previously done
et cetera, these all create friction on both my learning and exploration
Now, it crosses mind to use separate virtual box installations for trying out things, but this answer is trivial, please suggest something else.
P.S.: I am using Linux Mint.

You can install and use Docker, which is also trivial,
however, if your environment is Linux you can use LXC

There isn't really a single good answer to this sort of question of course; but some things that are generally a good idea are:
use git repos to keep the source "backed up" (obviously your local pc should not be the git server); commit your changes all the time, if you can't hold your breath for as long as the timespan between 2 commits, then you're doing it wrong (or you may have asthma, see a doctor).
Always build your project with there being not just multiple, but a variable amount of "deployments" in mind. That means not hardcoding absolute paths and database names/ports/hostnames and things like that. If your project needs database/api credentials then that should be in a configfile of sorts (or in the env); that configfile should be stored outside the codebase and shouldn't be checked into your git repos (though there can ofcourse be a config template in there).
Always have at least 2 deployments of any project actually deployed. Next to the (obvious) "live"/"production" deployment, which your clients/users use, you want a "dev"-version for yourself where you can freely shit the bed, and for bigger projects you may well want multiple. Each deployment would have its own database, and it's own copy of the code/assets.
It can be useful to deploy everything inside podman or docker containers, that makes it easier to have a near-identical system in both development and production (incase those are different servers), but that may be too much overhead for you.
Have a method (maybe a script) that makes it very easy to deploy updates from your gitrepo or dev-deployment, to the production deployment. Based on your description, i'm guessing if a client tells you she wants some minor cosmetic changes done, you do them straight on the live version; very convenient and fast, but a horrible thing in practice. once you switch from that workflow to having a seperate dev-deploy, you'll feel slowed down by that (which you are), but if you optimize that workflow over time you'll get to the point where you could still deploy cosmetic changes in a minute orso, while having fully separated deployments, it is worth the time investment.
Have a personal devtools git repo or something similar. You're likely using an IDE such as VS code ? Back up your vs code user config in that repo, update it reasonably frequently. Use a texteditor, photoshop/editor, etc etc, same deal. You hear that ticking sound ? that's the bomb that's been placed on your motherboard. It might go off tonight, it might not go off for years, but you never know, always expect it could be today or tomorrow, so have stuff backed up externally and/or on offline media.
There's a lot more but those are some of the basics that spring to mind.

I though Docker was only for containerizing your app with all the installation files and configurations before pushing to the production
Docker is useful whenever you need to configure the runtime environment in an isolated manner. Production, local development, other environments - all need the same runtime. All benefit from the runtime definition and isolation that docker provides. Arguably docker is even more useful in workstation-centric development, than it is in production.
I wanted to install redis and configure it while trying out something interested.
Instead of installing redis on your os directly, run the preexisting docker image for redis.
In a separate project I needed another clean redis configuration and installation.
Instantiate the docker image again and now you have 2 isolated redis servers running locally.
In front-end side I tried and installed a few npm packages globally.
Run your npm code within a nodejs docker container
At some point I installed python 3.4 now require 3.6
Different versions of python is a great use case for docker containers, which will tagged with specific python versions.
At some point I installed nginx and configured it, now need another configuration and wipe the previous one out,
Nginx also has a very useful official container.
If I start a big project right now I feel like my computer will eventually let me down due to several attempts I previously done
Yeah, it gets messy quick. That's why docker is such a great solution. Give every project dedicated services and use docker-compose to simplify the networking and building components. Fight the temptation to use a docker container for more than one service - instead stitch them together with docker networks.
Read https://docs.docker.com/get-started/overview/ to get started with docker.

"No Kernel!" error Azure ML compute JupyterLab

When using the JupyterLab found within the azure ML compute instance, every now and then, I run into an issue where it will say that network connection is lost.
I have confirmed that the computer is still running.
the notebook itself can be edited and saved, so the computer/VM is definitely running
Of course, the internet is fully functional
On the top right corner next to the now blank circle it will say "No Kernel!"

We can't repro the issue, can you help gives us more details? One possibility is that the kernel has bugs and hangs (could be due to extensions, widgets installed) or the resources on the machine are exhausted and kernel dies. What VM type are you using? If it's a small VM you may ran out of resources.

Having troubleshooted the internet I found that you can force a reconnect (if you wait long enough, like a few minutes, it will do on its own) by using Kernel > Restart Kernel.
Based on my own experience, it seems like this is a fairly common issue but I did spend a few minutes figuring it out. Hope this helps others who are using this.

Check your browser console for any language-pack loading errors.
Part of our team had this issue this week, the root cause for us was some language-packs for pt-br not loading correctly, once the affected team members changed the page/browser language to en-us the problem was solved.

I have been dealing with the same issue, after some research around this problem I learnt my firewall was blocking JupyterLab, Jupyter and terminal, allowing the access to it solved the issue.

ArangoDB - Help diagnosing database corruption after system restart

I've been working with Arango for a few months now within a local, single-node development environment that regularly gets restarted for maintenance reasons. About 5 or 6 times now my development database has become corrupted after a controlled restart of my system. When it occurs, the corruption is subtle in that the Arango daemon seems to start ok and the database structurally appears as expected through the web interface (collections, documents are there). The problems have included the Foxx microservice system failing to upload my validated service code (generic 500 service error) as well as queries using filters not returning expected results (damaged indexes?). When this happens, the only way I've been able to recover is by deleting the database and rebuilding it.
I'm looking for advice on how to debug this issue - such as what to look for in log files, server configuration options that may apply, etc. I've read most of the development documentation, but only skimmed over the deployment docs, so perhaps there's an obvious setting I'm missing somewhere to adjust reliability/resilience? (this is a single-node local instance).
Thanks for any help/advice!

please note that issues like this should rather be discussed on github.

Commits are extremely slow/hang after a few uploads

I've recently started to notice really annoying problems with VisualSVN(+server) and/or TortoiseSVN. The problem is occurring on multiple (2) machines. Both running Windows 7 x64
The VisualSVN-server is running Windows XP SP3.
What happens is that after say, 1 2 or 3 (or a bit more, but almost always at the same file) the commit just hangs on transferring data. With a speed of 0bytes/sec.
I can't find any error logs on the Server. I also just asked for a 45day trial of Enterprise Server for its logging capabilities but no errors there as well.
Accessing the repository disk itself is fast, I can search/copy/paste to that disk/SVN repo disk just fine.
The Visual SVN Server also does not use excessive amounts of memory nor CPU usage, which stays around 0-3%.
Both the Server as well as TortoiseSVN's memory footprint moves/changes which would indicate at least "something" is happening.
Committing with Eclipse (different project (PHP), different repository on the server) is going great. No slow downs, almost instant commits, with 1 file or 50files. The Eclipse plugin that I use is Subclipse.
I am currently quite stuck on this problem and it is prohibiting us from working with SVN right now.
[edit 2011-09-08 1557]
I've noticed that it goes extremely slow at 'large' files, for instance a 1700MB .resx (binary) or 77KB .h source (text) file. 'small' files > 10KB go almost instantly.
[edit 2011-09-08 1608]
I've just added the code to code.google.com to see if the problem is on my end or the server end. Adding to google code goes just fine, no hangs at all. 2,17MB transferred in 2mins and 37secs.

I've found and fixed the problem. It appeared to have been a faulty NIC, speedtest.net resulted in ~1mbit, shoving in a different NIC pushed this to the max of 60mbit and solving my commit problems.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string