Pyro4 for non-network inter-process communication

Pyro4 for non-network inter-process communication - python-3.x

I'm currently using Pyro4 to create daemons that host services which are simply objects that can be called from other daemon-hosted objects or scripts. The objects take quite a long time to initialise and so I need to keep these objects alive rather then simply re-running a script that creates them each time I need to call them.
The implementation is beautifully simple, the client code executes quickly enough for my requirements and it is easy to extend functionality. However, Pyro4 is explicitly made for python programs running over a network and I am just running these daemons internally within a server. There does not seem to be python packages that handle both daemonisation and communication between daemons in the clean way Pyro4 does.
My question: is Pyro4 a right fit for my needs or is there an alternative more standard way of dealing with this use case?

Many inter-process communication protocols are using "the network" even when running on a single machine. "Network" connections on the local loopback adapter (IPV4 addresses 127.0.0.0/8 and IPV6 ::1) should be particularly fast as this usually doesn't go over a physical network interface at all.
Also, are you aware that Pyro4 also supports communicating over Unix domain sockets? Those are purely a local system resource.
All in all the phrase "Pyro4 is explicitly made for python programs running over a network" is untrue. I definitely intended Pyro4 to be used between processes even on a single computer. If there's something in particular about Pyro4 that you think is not suitable for this purpose please point it out so it can be improved!

Related

what is a container? and gVisor?

I am trying to understand what are containers and what is their purpose?
I am a little bit confused. When I started to read about them I saw that they rely on the Linux namespaces (is it true?) - a way to isolate the process within the container from the other processes on the machine, and got the impression that their main purpose is security.
For instance, let's say that I own a server that runs multiple services. I also don't want that a single hacked service will be able to hack the whole system. So I put each service inside a container that will make the service unable to interfere the other processes inside the machine, like to kill them or to play with their memory and in that way eliminate the risk.
But later I saw other purposes like being able to ship the app easily? or something like that. so what is their main purpose? I also read that if their main purpose is security - they have a problem. because they run directly on the host kernel (again, is it true?)- and an exploit like the "dirty cow" will or was able to get out of the container and be able to corrupt the machine. So I ended reading about the gVisor - which from what I understood tries to secure the containers, and in some cases succeed. So - what does gVisor do differently? that it's able to secure the containers? is gVisor a container itself? or just a Runtime environment for containers?
eventually, I always see comparisons between containers and VM and I ask why? And when should I use them?
I don't know if anything that I wrote is correct, and I will be glad if you will point out my mistakes, and answer my questions. Yes, I know that there are a lot of them and I am sorry, but Thanks!

The answer below is not guaranteed to be concise. Anyone is welcomed to point out my mistakes.
It might be a little bit vague because many people mixed such concepts nowadays.
1. LXC
When I first got to know such concepts, container still meant LXC, a long-existed technique in Linux. IMHO, container is a complete process that does not simulate a kernel. The difference between a container and a normal process is that container provides a isolated view via cgroups, as if it was in a new operating system. But in fact, the containers still share the host kernel (you are right), so people do worry about the security, especially when you want to deploy it in a public cloud (I don't see people using LXC directly on public cloud yet).
Despite the potential insecurity, the convenience and lightweightness(fast boot, small memory fingerprint) of containers seem to outweigh its drawbacks in most of security-insensitive situations. Tools like docker and kubernetes make large-scale deployment and management more efficient.
2. Virtual Machine & Hardware-assisted virtualization
In contrast to container, the concept Virtual Machine represents another category of isolated execution environment. Considering that most of VMs leverages some hardware-accelerating techniques like VT-x, I will assume you are talking about hardware-assisted virtualization. Virtual Machine usually contains a full kernel inside it.
See this picture from Doug Chamberlain
The Intel VT-x technique provides 2 modes, root mode(privileged) and non-root mode(not privileged). Each mode has its own ring0-ring3 (e.g, non-root ring3, non-root ring0, root ring3, root ring 0). The whole virtual machine runs in non-root mode, and the hypervisor(VMM, e.g., kvm) runs in root-mode.
In the classic qemu+kvm setup, qemu runs in root ring3, and kvm runs in root ring0.
The strong isolation and the existance of guest kernel makes virtual machine more secure and compatible. But, of course, the price is performance and efficiency (slower boot etc.)
Container-based Virtualization
People want the isolation of hardware-assisted virtualization, but don't want to give up the convenience of containers. Therefore, the hybrid solution seems really intuitive to come.
There are 2 typical solutions at present, Kata Container and [gVisor][6]
Kata Container tries to slim the whole stack of virtual machine to make it more lightweight. However, there is still linux inside it and it is still a virtual machine, but more lightweight.
gVisor claims to be an secure container, but it still leverages hardware virtualization techniques (or ptrace if you don't want virtualization). There is a component called sentry, which runs both in non-root ring0 and root ring3. The sentry will do part of the guest kernel's job, but is much smaller than linux. If sentry could not finish a request itself, it proxy the request down to the host kenrel.
The reason why most people believe gvisor is somewhat more secure is that it achieves "defense in depth" -- more layers of indirection lead people to believe it is more secure. This is usually true, but again, is not a guarantee.

Get notifications for socket events in kernel

I need to find for local tcp-connections/Unix-domain-connections the client process that starts the connection in the most effective way possible.
I currently have a solution for this implemented in a driver, that intercept both accept and connect syscalls, match them, and finds out the client process.
Now I try to achieve the same without my own driver.
Different approaches I tried:
Linux auditing system plugin that intercept the appropriate syscalls and resolve the client in a similar manner. This method has 3 problems:
Each intercepted event flows from the kernel to the auditd process to some other intermediate process and only than to my plugin.
The audit rules are applies for all the plugins and audit-log - I can't define my own set of filtering rules. - which may cause on machines that uses the audit system for extreme inefficiency.
It requires that audit to be installed on the machine (which is not default for many Linux distributions).
Imitate netstat/lsof - In order to find the process that holds a connection, they iterate the entire procfs mount - this makes sense, because after connection establishment, each connection could belongs to more than one process.
This approach is extremely not effective if I want to do so for each connection establishment.
I thought to try one of the LSM (Linux security modules) solutions - but I am currently unfamiliar with it - Is their a LSM that can pass the kernel events for a running application for a "decision"?
I will be happy to hear suggestions and other comments that could lead me to a reasonable solution for my original goal.

Found an option that suits my needs for Unix-domain-sockets only - if runnning code in the server side is a possability (as in my case) one can use getsockopt in the following manner:
getsockopt(fd, SOL_SOCKET, SO_PEERCRED, &cr, &len)
More details in http://welz.org.za/notes/on-peer-cred.html

Whats the easiest way to send messages to my linux daemon app?

Whats the easiest way to send messages to my linux daemon app? Is it possible that myapp foo bar can invoke a callback instead of starting a new process? Whats the standard way to communicate with a daemon? Is my most reasonable choice to stick the PID in /var/run and create a named pipe or a socket?
Whats is the standard way apps which run on the command line communicate to its daemon process? I'm assuming its impossible to ask linux to call a callback when I type in myapp foo bar?

Whats is the standard way apps which run on the command line communicate to its daemon process?
There are a number of ways to do this:
Dropboxes and Signals
A variety of locations are being used to store "pid files" containing a daemon's process ID number: /var/run/<app>/<app>.pid, /var/run/<app>.pid (thanks #Adam Katz for edit), /run/<app>/<app>.pid, /run/<app>.pid, (see Askubuntu: Why has /var/run been migrated to /run?).
When the pid of the daemon process is known, a command line program (which is running as a separate process) could communicate with the daemon in these ways:
Writing something into a prearranged place. This could be an ordinary file, a database table, or any convenient spot readable by the server.
Sending a signal to the daemon. The common way to do this locally is with the kill system call, int kill(pid_t pid, int sig);.
Old School Example: The server multiplexing daemon xinetd would reread its configuration file after receiving SIGHUP.
The send-a-signal methodology has been made redundant by the inotify API whereby a process can subscribe to file system events. Using signals is useful when you don't want the daemon to act on every file change, since not every change may leave the file in a valid state, as when modifying a system config file.
FIFO or Pipe
A fifo or pipe is simply a special file that blocks processes reading it until some other process has written to it. You can make a named pipe/fifo in the file system with mkfifo. The only tricky thing about this is that pipes should generally be opened unbuffered, e.g. with open() as opposed to fopen(). Scripting languages sometimes make a facility for pipe-friendly reading/writing: Perl had a line-buffered mode set with $|=1 that was useful for pipes.
More commonly, you see anonymous pipes all the time on the command line with the | symbol separating commands which are executed as separate processes.
Sockets
What about something newer, like mySQL? The mysql database system consists of a command line client mysql and a server mysqld and may also serve other clients either on the local machine or over the internet.
mySQL communicates using a socket. The server listens to a socket for connections, and forks new processes giving a socket descriptor to the child for processing. When the child is done processing, it can exit.
There are UNIX sockets and Internet sockets, with different namespaces. One guide to programming sockets in C on Linux would be the sockets chapter of the The GNU C Library manual.
No-wait I/O is an alternative to forking off processes. This is done in C with the select() system call, which allows having a process wait for an event on one or more files, including sockets, or a timeout. The GNU C Library docs includes a no-wait I/O socket server example
NodeJS is a server for the Javascript language, written as a single-threaded server using no-wait I/O, and shows that these techniques are still relevant in modern systems.
"Callbacks"
I'm assuming its impossible to ask linux to call a callback when I type in myapp foo bar?
Maybe. But it might be too much work to be worth the trouble.
When you type myapp foo bar into "Linux", that's not Linux, you are typing that into your command shell, which is a program running in its own process separate from everything else.
So, unless the functionality you want is built into the command shell, there is normally no way for that command shell to send messages to your other program without starting a new process.
The default command shell for many (but not all) Linux systems is /bin/bash. To communicate with a daemon that listens to sockets from bash we would need a socket opening procedure in bash. And, sure enough, it exists!
One can listen to a socket in bash. This is the basis for a daemon:
From: Simple Socket Server in Bash? answer by dtmilano:
Q: I have a simple debugging solution which writes things to 127.0.0.1:4444 and I'd like to be able to simply bind up a port from bash and print everything that comes across. Is there an easy way to do this?
A:
$ exec 3<>/dev/tcp/127.0.0.1/4444
$ cat <&3
One can also open a socket from bash to a remote process, i.e. communicate with a a daemon:
From: TCP connection, bash only
we learn exec 5<>"/dev/tcp/${SERVER}/${PORT}" redirects a TCP link to file descriptor 5 for both input and output.
Not all bash programs are compiled with TCP support. It is apparently Linux-distribution dependent. At least according to a comment on this answer by William Pursell
There are other shells besides bash. Many shells were developed back in the *nix days. ksh Korn shell. csh C-shell. Bourne shell sh. Ash shell. Wikipedia keeps a list of shells. And these shells each have their own advantages and disadvantages and are not entirely compatible with each other's formatting!
Fast forward about 30 years, and there aren't so many in common use now.
But an important feature exists here: each user can choose his own login shell. See the chsh command.
So where I am going here is that if bash doesn't support the communications you need to do, you could set up a command shell where special messages can be sent without opening a new process. This might save you a few milliseconds, and usually isn't worth it. But nothing is stopping you. You might even set up an ipython command shell, as suggested in https://stackoverflow.com/a/209670/103081, and python can import most anything you need to do socket communications with your specialized daemon.

There are too many inferior questions in your post, so they can't all be answered in a single compact response. Also, some questions are like taste compare request - I mean "what's the best way?" - and this kind of questions is usually disliked at SO. Despite this I'll try to provide at least basic references you can continue with.
Whats the easiest way to send messages to my linux daemon app?
There is no such way unless you are already using some framework which provide such way. Daemons are too different. Some require only a simple set of fixed messages, some require complex reconfiguration and on-the-fly modification of internal data. The main set of approaches is:
Unix signals are really the simplest way to provide protected (with minor simplification, only the same user or the superuser can send them), idempotent (repeat shall not change signal meaning), non-parameterized message; such message set is very limited (~50 freely used ones in Linux). They are delivered asynchronously but can be converted to internal message queue members in a target application. Their main usage is for simple messages as "stop gracefully", "reread your configs" or "turn on debug messages".
Unix domain sockets are the basic IPC transport for any interactions (except the most fast ones where shared memory is better). They allow datagrams (like UDP), byte streams (like TCP) and message streams (like SCTP or SPX). A listening socket shall be represented in file system with a path known to clients; Unix file access rights will regulate access to it or you can check credentials explicitly. Virtually any protocol can be built over such sockets, but only clients from the same running OS can connect it, such sockets aren't automatically visible even from host OS in virtualization case.
Non-pipe-style IPC: shared memory and semaphores can be used for high load interactions but they require strong usage discipline. There are two versions of such interfaces (SysV and Posix).
Internet domain sockets provide the same transports as Unix domain sockets, allow access from any remote client (not nessessarily on the same OS) but don't provide access checks automatically, so client authentication is needed. Also transport protection is often needed (SSL/TLS and so on).
Various frameworks are provided over socket interfaces which implement some parts of needed functionality such as registration, endpoint publishing, server search, and so on. For local interactions, this now starts with Dbus which is message-oriented and allows named servers and typed messages either in common OS space or user-local space.
Different netwide message transports are developed in last years (AMQP, ZeroMQ, Apache Kafka, etc.) They are designed for using in distributed network, but often without any authentication.
The main issue with self-made daemon is that you shall attach any of these interfaces explicitly and provide a framework to listen for, decode and process incoming messages. If your application has already to be message-driven, you can put it into an environment which is message-driven by design (Erlang, J2EE...), unless you are scared for rewriting to another language :)
Is it possible that myapp foo bar can invoke a callback instead of starting a new process?
It's more style related question than principal one. You can name a control tool in the same way as a long running one. There are such examples in practice: SysV init (telinit is only convenience name), firefox (without -no-remote it tries to connect to already running instance), but generally, separate names are preferred:
apachectl starts/stops/controls daemon but the daemon name is httpd;
rndc for named, gdc for gated, etc.
If a daemon binary is a very fat guy, cost to start it each time for a small control message is too high and shall be avoided. But if you really want to do this way, determine two things:
Do you want it to start automatically unless a running instance is found?
How will you distingiush daemon starting request from control request?
If you have strong answers to both questions, you needn't worry.
Whats the standard way to communicate with a daemon?
No, there is no a single standard way. There are a few standard ways, see above. One shall choose a way on determined pros and cons.
Is my most reasonable choice to stick the PID in /var/run and create a named pipe or a socket?
Again, see above, but you'd know that, unless the daemon has root rights, /run (/var/run) isn't writable. A non-root daemon shall use totally another directory (e.g. ~/run/ for its dedicated user home dir) or be provided with a writable subdirectory.
I'm assuming its impossible to ask linux to call a callback when I type in myapp foo bar?
You again asked for "callback" I don't understand exactly what it stands for. If your daemon is written in an imperative language out of event-driven environment, any message receiving from other processes shall be implemented explicitly. For Unix signals, you write signal receiving functions and configure process to start them on incoming signal. For sockets, you shall create socket, give it a known address, put it into listening mode and provide code to accept incoming connections, gather and decode messages, etc. Only for already event-driven environment, you can use standard methods.
For Erlang, such remote control is very often implemented as connect to daemon node and RPC request using gen_server:call() or analog. The called Erlang process processes this in generic code without any need to implement cases a requester was another process at the same node or from a separate one.
There are some frameworks which have already implemented some of this need in procedural environments; e.g. Dbus client libraries can be written in a way they
create a separate thread for Dbus socket listening;
accept incoming messages and call the callbacks you specify,
but this has its own disadvantages:
a callback is run in its own thread; you shall provide proper synchronization to avoid data spoiling;
with huge message rate, flow control is at least too rough, or even impossible, and a daemon could be overflowed early before it realises bus connect shall be stopped.
To summarize, your initial question was like "How I could reach my friend's home in an efficient way?" with minimum of details. You can simply step 20 meters if your friend resides next door, or you can ride a bike, or you need to call taxi, or you need to rent a space ship to reach back side of the Moon:) You should repeatedly narrow your requirements until a single variant remains.

Best way to verify that untrusted node code doesn't send information over the network

I am hoping to use two untrusted node files. I want to be sure that they don't send any information over the network maliciously. I have access to the source code and have read it some what carefully but it is quite lengthy and complex, and some of it is minified, and someone writing very sneaky code could potentially do something tricky that I might have missed.
Is there an easy way to be sure (ie if it doesn't use the request or socket modules) that it cannot possibly be sending any data out over the network.
The code only requires the sys, fs, and tail modules and I will be running it without the request or socket module installed.
I am very new to node, are the other easy precautions that I can take?

This process is good for anything suspect, not just node.js
General Precautions:
run inside a virtual machine
ensure the host is adequately firewalled
run Wireshark locally to observe network traffic requests
Sometimes you need to go a step further. Some programs (on Windows in particular) can actually tell if you are attempting monitoring... (although I'd be a little surprised if a Javascript program could do that). But doing this can be educational anyway.
set up a second machine that acts as a gateway to the network
ensure it is adequately firewalled
connext the test machine physically only to the gateway
run Wireshark on the gateway and observe traffic from the test machine
That wont catch everything, you should still monitor with wireshark what happens in a real situation in case any traffic is context dependent.

Running external code in a restricted environment (linux)

For reasons beyond the scope of this post, I want to run external (user submitted) code similar to the computer language benchmark game. Obviously this needs to be done in a restricted environment. Here are my restriction requirements:
Can only read/write to current working directory (will be large tempdir)
No external access (internet, etc)
Anything else I probably don't care about (e.g., processor/memory usage, etc).
I myself have several restrictions. A solution which uses standard *nix functionality (specifically RHEL 5.x) would be preferred, as then I could use our cluster for the backend. It is also difficult to get software installed there, so something in the base distribution would be optimal.
Now, the questions:
Can this even be done with externally compiled binaries? It seems like it could be possible, but also like it could just be hopeless.
What about if we force the code itself to be submitted, and compile it ourselves. Does that make the problem easier or harder?
Should I just give up on home directory protection, and use a VM/rollback? What about blocking external communication (isn't the VM usually talked to over a bridged LAN connection?)
Something I missed?
Possibly useful ideas:
rssh. Doesn't help with compiled code though
Using a VM with rollback after code finishes (can network be configured so there is a local bridge but no WAN bridge?). Doesn't work on cluster.

I would examine and evaluate both a VM and a special SELinux context.

I don't think you'll be able to do what you need with simple file system protection because you won't be able to prevent access to syscalls which will allow access to the network etc. You can probably use AppArmor to do what you need though. That uses the kernel and virtualizes the foreign binary.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string