Kafka, Linux/Docker and IntelliJ on Windows10 - linux

I'm going down the 100DaysKafka path. It appears that the Confluent platform only runs on Linux via Docker. I do my Java development using IntelliJ and Windows 10. Is this a dead-end waste of time or can IntelliJ hook into the running Linux Kafka instance? Thanks!

via Docker
This is false. Confluent Platform doesn't support Windows, meaning some tools like the Schema Registry and KSQL and Control Center don't offer startup scripts for Windows.
Doesn't mean it's not possible to run Kafka or Zookeeper, which sounds like all you want. - How to install Kafka on Windows?
You don't need Intellij to "hook into" anything. That's also not really proper terminology unless you are planning on actually contributing to the Kafka source code. If you're just writing a client, it makes a TCP connection, which works fine over localhost.

Related

How does RunKit make their virtual servers?

There are many websites providing cloud coding sush as Cloud9, repl.it. They must use server virtualisation technologies. For example, Clould9's workspaces are powered by Docker Ubuntu containers. Every workspace is a fully self-contained VM (see details).
I would like to know if there are other technologies to make sandboxed environment. For example, RunKit seems to have a light solution:
It runs a completely standard copy of Node.js on a virtual server
created just for you. Every one of npm's 300,000+ packages are
pre-installed, so try it out
Does anyone know how RunKit acheives this?
You can see more in "Tonic is now RunKit - A Part of Stripe! " (see discussion)
we attacked the problem of time traveling debugging not at the application level, but directly on the OS by using the bleeding edge virtualization tools of CRIU on top of Docker.
The details are in "Time Traveling in Node.js Notebooks"
we were able to take a different approach thanks to an ambitious open source project called CRIU (which stands for checkpoint and restore in user space).
The name says it all. CRIU aims to give you the same checkpointing capability for a process tree that virtual machines give you for an entire computer.
This is no small task: CRIU incorporates a lot of lessons learned from earlier attempts at similar functionality, and years of discussion and work with the Linux kernel team. The most common use case of CRIU is to allow migrating containers from one computer to another
The next step was to get CRIU working well with Docker
Part of that setup is being opened-source, as mentioned in this HackerNews feed.
It uses linux containers, currently powered by Docker.

Datastax Enterprise - 5.0 Best Practices - Installation

I am currently evaluating the Datastax Enterprise 5 installation for my production system. There are many methods available for installation. When we choose runinstallter unattended method by DSE using option file it provide two modes
1. Service Based - Need root permission and binaries are installed in /usr/share/dse and /etc/dse.
2. No Service Based - Not need root and binaries can be installed on custom location equivalent to tar based installation without service based.
I have following questions -
Is there any best practice available which method is best suited for production installation ( in short any problem in running no service based runinstallter installation)
Is there a way we can modify runinstaller in service based installation to point to another dse home then /usr/share/dse and /etc/dse , something like /Cassandra which is owned by casandra user.
Any other best practice on the method of installation with is currently live in production without any issues.
Regards
Any of the methods specified here are fine for production installations
Not that I know of, you might want to look at using the Tarball installation if you need this level of configuration
There are a whole lot of things you need to think about when planning a cluster for DSE 5. I would start by looking at this list here.
I'm an OpsCenter developer who works on the Lifecycle Manager feature, so I'm more than a bit biased... but I think that OpsCenter LifeCycle Manager is an excellent way to install and manage DSE if you don't already have something like Chef or Ansible that you use enterprise-wide. It's a graphical webapp with a RESTful API in case you need to do any scripting of it. It deploys DSE using deb/rpm packages over SSH and can configure pretty much every DSE option there is.
As to your other questions:
Services vs no-services installations: You probably want a services-based installation. It behaves more like a "normal" linux service that can be managed with the 'service' command. A no-services install is primarily useful if you don't have root access because of very tight security policies in your org, and if you choose to go that route you'll need to decide how you want to manage DSE startup and shutdown (for clean reboots, for example).
The DSE installer can probably handle non-standard paths, but I'm not familiar enough with the details. LCM can handle some non-standard paths but not all of them (DSE will always be installed to the standard locations, for example). If you want to very tightly control every aspect of the install, tarball is your best choice. That's a lot of complexity, though, do you REALLY need to control every path?
The OpsCenter Best Practice service is probably the best list of recommended things to do in Prod, and is very easy to turn for LCM-managed clusters. But even if you don't use LCM, I recommend you set up OpsCenter so you can use the Best Practice Service.
You can find the OpsCenter install stesp at: https://docs.datastax.com/en/latest-opsc/opsc/LCM/opscLCMOverview.html.

Windows event logs to Flume

I've installed a Cloudera Flume node (0.9.4) on my windows 2003 server and it appears to be running. However, I'm stuck as to the next steps to take to send windows server event log data to the master node. My master node is located on a Linux machine. What next steps are needed to connect my Windows flume node to the master node?
thanks,
Ralph.
I'm baffled as to why this seems to be the only decent "open-source" (if not community-developed) solution, but after a few research efforts over the last several years, I've repeatedly come up with NXLog as the best option for handling Windows event logs in a primarily *nix-based environment.
NXLog has a special input module for this purpose called im_msvistalog. I've been using this with NXLog Community Edition and it works well so far. (FYI, I'm shipping Windows logs directly to Solr.)
I presume that there just aren't that many people using tools of this flavor (i.e., Apache Flume, Solr, Java, typically Linux-based tools) for handling Windows event logs. :-) I'd like to know why if anyone cares to chime in. I guess people with Windows infrastructure they care about will just have something like a centralized Windows Event Viewer that operates as a syslog daemon would in a *nix environment?
If this solution doesn't work for you, you can also try querying the Windows event logs using the Windows Events Command Line Utility. I haven't yet had to resort to that since everything I've needed has been available using that NXLog input module I mentioned above.
You need to connect the Windows Event Log to Flume. I haven't tried this but I suggest you try a tool such as KiwiSyslog to turn Windows Events into Syslog. You then configure Flume with a Syslog source and tell KiwiSyslog to sent the events there.
BTW, Flume 0.9.4 is very old. I suggest you change to a recent Apache Flume as that is where the active support (largely by Cloudera staff) is.

Collectd server not writing down received client data

I have pretty strange problem with Collectd. I'm not new to Collectd, was using it for a long time on CentOS based boxes, but now we have Ubuntu TLS 12.04 boxes, and I have really strange issue.
So, using version 5.2 on Ubuntu 12.04 TLS. Two boxes residing on Rackspace (maybe important, but I'm not sure). Network plugin configured using two local IPs, without any firewall in between and without any security (just to try to set simple client server scenario).
On both servers collectd writes in configured folders as it should write, but on server machine it doesn't write data received from client.
Troubleshooted with tcpdump, and I can clearly see UDP traffic and collectd data, including hostname and plugin names from my client machine, received on server, but they are not flushed to appropriate folder (configured by collectd) ever. Also running everything as root user, to avoid troubleshooting permissions.
Anyone has any idea or similar experience with this? Or maybe some idea what could I do for troubleshooting this beside trying to crawl internet (I think I clicked on every sensible link Google gave me in last two days) and checking network layer (which looks fine)?
And just small note: exactly the same happened with official 4.10.2 version from Ubuntu's repo. After trying to troubleshoot it for hours moved to upgrade to version five.
I'd suggest trying out the quite generic troubleshooting procedure based on the csv and logfile plugins, as described in this answer. As everything seems to be fine locally, follow this procedure on the server, activating only the network plugin (in addition to logfile, csv and possibly rrdtool).
So after no way of fixing this, I upgraded my Ubuntu to 12.04.2 LTS (3.2.0-24-virtual) and this just started working fine, without any intervention.

Free Linux Cluster Build for Small Scale Reseach

I need to build a small cluster for my research. It's pretty humble and I'd like to build a cluster just with my other 3 laptops at home.
I'm writing in C++. My codes in MPI framework are ready. I can simulate them using visual studio 2010 and they're fine. Now I want to see the real thing.
I want to do it free (I'm a student). I have ubuntu installed and I wonder:
if I could build a cluster using ubuntu. I couldn't find a clear answer to that on the net.
if not, is there a free linux distro that I can use at building cluster?
I also wonder if I have to install ubuntu, or the linux distro on the host machine to all other laptops. Will any other linux distribution (like openSUSE) work with the one at the host machine? Or do all of them have to be same linux distro?
Thank you all.
In principle, any linux distro will work with the cluster, and also in principle, they can all be different distros. In practice, it'll be a enormously easier with everything the same, and if you get a distribution which already has a lot of your tools set up for you, it'll go much more quickly.
To get started, something like the Bootable Cluster CD should be fairly easy -- I've not used it myself yet, but know some who have. It'll let you boot up a small cluster without overwriting anything on the host computer, which lets you get started very easily. Other distributions of software for clusters include Rocks and Oscar. A technical discussion on building a cluster can be found here.
I also liked PelicanHPC when I used it a few years back. I was more successful getting it to work that with Rocks, but it is much less popular.
http://pareto.uab.es/mcreel/PelicanHPC/
Just to get a cluster up and running is actually not very difficult for the situation you're describing. Getting everything installed and configured just how you want it though can be quite challenging. Good luck!

Resources