Adding an iptables rule in %post rpm install - firewall

I've been working on packaging some software that listens on TCP ports. I have an RPM spec that creates a package for CentOS 6.5. The only issue I am having is I must manually add iptables rules or disable iptables completely after the rpm install.
Is there any idempotent way to add iptables rules in an RPM %post section. Does it even make sense? Most solutions I have found involve some combination of grep and other utilities. That seems extremely error prone to me.

In CentOS the rules are saved in /etc/sysconfig/iptables. This file is parsed when the /etc/init.d/iptables script is used to start the firewall.
The problem is that this can be regenerated at any point by running service iptables save which means any work done by your RPM post script may / will get clobbered... same thing goes for system-config-firewall.
The simplest option would be to include a script with your RPM that gets run from post, that applies your rule and updates /etc/sysconfig/iptables, and prints a warning that the script would need to be re-run if the user used system-config-firewall at a later date.
Example SPEC fragment:
%post blah
/path/to/my/script/my-update-iptables && exit 0
Example script fragment:
#!/bin/bash
#
# do stuff here
#
iptables ... # do stuff to append / insert rules as required
#
service iptables save
#
echo "WARNING - you may need to rerun this `$0` if you change the firewall" >&2

Related

Looking for a way to correcly generate an iptables-restore file

I'm building a firewall rule generator and i need to apply all the iptables rule atomically. The only guaranteed way to do that is to use an iptables restore file, which has it's own syntax. The only guaranteed way to generate such a file is to run the iptables commands, dump them with iptables save and restore them, which seems completely unacceptable for a live system. Is there an easier way, such as a software which will parse raw iptables rules and generate an iptables restore? I've found fwmacro, but it's not maintained, and has it's own syntax, such as:
-A 10stateful -mstate --state INVALID -j DROP
instead of
iptables -A stateful -mstate --state INVALID -j DROP
I find the easiest way to generate an iptables-restore file is to run iptables-save > rules.ipt and then edit the file with any required statements.

Run a system command when an IPTables rule is matched

:)
I'm wanting to be able to run a system command when an IPTable rule is hit, passing the IP address of the remote device to it.
I've had a look around but found nothing. I thought of grepping logs, but I'm expecting a lot of traffic..
Any help would be fantastic!
Thanks
(If it helps, Ubuntu Linux is my platform of choice)
Here is how you do it:
iptables -I FORWARD -p tcp --dport 80 -d a.b.c.d -j LOG --log-prefix="TRIGGER ME NOW !!!"
tail -f some-logfile | awk '/some-pattern/ {system("run-some-command")}'
Should be straight forward enough and should be able to deal with lots of traffic, the tail command should be quick enough... Just make sure the file doesn't grow too much.
Do it with knockd instead. You configure a port knocking sequence of just one port, then tell knockd the command you want to run. Normally it's used to add/remove iptables rules -- to open a service (e.g. ssh access) after a certain knock sequence, but I don't see why you couldn't just use it to run a command after a very simple, one packet on one expected port rule.
'apt-get install knockd' on your Ubuntu system and the man page has examples you can easily adapt to this.
it is actually easy.
we have 2 way to do this. If you use tail log then iptables will not depend on log result.
you can use NFQUEUE. Please read my article if you have time.
https://medium.com/#farizmuradov/useful-notes-about-nfqueue-80a2c271db1a
Same article I have added my linkedin page.
you can write simple router in application level and send data from iptables to listen port. In programming level you can execute scripts and send data again some port. Then you can continue by iptables.

Run script with rc.local: script works, but not at boot

I have a node.js script which need to start at boot and run under the www-data user. During development I always started the script with:
su www-data -c 'node /var/www/php-jobs/manager.js
I saw exactly what happened, the manager.js works now great. Searching SO I found I had to place this in my /etc/rc.local. Also, I learned to point the output to a log file and to append the 2>&1 to "redirect stderr to stdout" and it should be a daemon so the last character is a &.
Finally, my /etc/rc.local looks like this:
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.
su www-data -c 'node /var/www/php-jobs/manager.js >> /var/log/php-jobs.log 2>&1 &'
exit 0
If I run this myself (sudo /etc/rc.local): yes, it works! However, if I perform a reboot no node process is running, the /var/log/php-jobs.log does not exist and thus, the manager.js does not work. What is happening?
In this example of a rc.local script I use io redirection at the very first line of execution to my own log file:
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.
exec 1>/tmp/rc.local.log 2>&1 # send stdout and stderr from rc.local to a log file
set -x # tell sh to display commands before execution
/opt/stuff/somefancy.error.script.sh
exit 0
On some linux's (Centos & RH, e.g.), /etc/rc.local is initially just a symbolic link to /etc/rc.d/rc.local. On those systems, if the symbolic link is broken, and /etc/rc.local is a separate file, then changes to /etc/rc.local won't get seen at bootup -- the boot process will run the version in /etc/rc.d. (They'll work if one runs /etc/rc.local manually, but won't be run at bootup.)
Sounds like on dimadima's system, they are separate files, but /etc/rc.d/rc.local calls /etc/rc.local
The symbolic link from /etc/rc.local to the 'real' one in /etc/rc.d can get lost if one moves rc.local to a backup directory and copies it back or creates it from scratch, not realizing the original one in /etc was just a symbolic link.
I ended up with upstart, which works fine.
In Ubuntu I noticed there are 2 files. The real one is /etc/init.d/rc.local; it seems the other /etc/rc.local is bogus?
Once I modified the correct one (/etc/init.d/rc.local) it did execute just as expected.
You might also have made it work by specifying the full path to node. Furthermore, when you want to run a shell command as a daemon you should close stdin by adding 1<&- before the &.
I had the same problem (on CentOS 7) and I fixed it by giving execute permissions to /etc/local:
chmod +x /etc/rc.local
if you are using linux on cloud, then usually you don't have chance to touch the real hardware using your hands. so you don't see the configuration interface when booting for the first time, and of course cannot configure it. As a result, the firstboot service will always be in the way to rc.local. The solution is to disable firstboot by doing:
sudo chkconfig firstboot off
if you are not sure why your rc.local does not run, you can always check from /etc/rc.d/rc file because this file will always run and call other subsystems (e.g. rc.local).
I got my script to work by editing /etc/rc.local then issuing the following 3 commands.
sudo mv /filename /etc/init.d/
sudo chmod +x /etc/init.d/filename
sudo update-rc.d filename defaults
Now the script works at boot.
I am using CentOS 7.
$ cd /etc/profile.d
$ vim yourstuffs.sh
Type the following into the yourstuffs.sh script.
type whatever you want here to execute
export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH
Save and reboot the OS.
I have used rc.local in the past. But I have learned from my experience that the most reliable way to run your script at the system boot time is is to use #reboot command in crontab. For example:
#reboot path_to_the_start_up_script.sh
This is most probably caused by a missing or incomplete PATH environment variable.
If you provide full absolute paths to your executables (su and node) it will work.
It is my understanding that if you place your script in a certain RUN Level, you should use ln -s to link the script to the level you want it to work in.
first make the script executable using
sudo chmod 755 /path/of/the/file.sh
now add the script in the rc.local
sh /path/of/the/file.sh
before exit 0
in the rc.local,
next make the rc.local to executable with
sudo chmod 755 /etc/rc.local
next to initialize the rc.local use
sudo /etc/init.d/rc.local start
this will initiate the rc.local
now reboot the system.
Done..
I found that because I was using a network-oriented command in my rc.local, sometimes it would fail. I fixed this by putting sleep 3 at the top of my script. I don't know why but it seems when the script is run the network interfaces aren't properly configured or something, and this just allows some time for the DHCP server or something. I don't fully understand but I suppose you could give it a try.
I had exactly same issue, the script was running fine locally but when I reboot/power-on it was not.
I resolved the issue by changing the file path. Basically need to give the complete path in the script. While running locally, file can be accessed but when running on reboot, local path will not be understood.
1 Do not recommend using root to run the apps such as node app.
Well you can do it but may catch more exceptions.
2 The rc.local normally runs as root user.
So if the your script should runs as another user such as www U should make sure the PATH and other environment is ok.
3 I find a easy way to run a service as a user:
sudo -u www -i /the/path/of/your/script
Please prefer the sudo manual~
-i [command]
The -i (simulate initial login) option runs the shell specified by the password database entry of the target user as a loginshell...
rc.local only runs on startup. If you reboot and want the script to execute, it needs to go into the rc.0 file starting with the K99 prefix.

What command(s) control the behavior of /etc/rc*.d on Redhat/CentOS?

/etc/init.d/*
/etc/rc{1-5}.d/*
/sbin/chkconfig — The /sbin/chkconfig utility is a simple command line tool for maintaining the /etc/rc.d/init.d/ directory hierarchy.
As mentioned by px, the proper way to manage the links to scripts from /etc/init.d to /etc/rc?.d is the /sbin/chkconfig command.
Scripts should have comments near the top that specify how chkconfig is to handle them. For example, /etc/init.d/httpd:
# chkconfig: - 85 15
# description: Apache is a World Wide Web server. It is used to serve \
# HTML files and CGI.
# processname: httpd
# config: /etc/httpd/conf/httpd.conf
# config: /etc/sysconfig/httpd
# pidfile: /var/run/httpd.pid
Also, use the /sbin/service command to start and stop services when run from the shell.
in one word: init.
This process always has pid of 1 and controls (spawns) all other processes in your unix according to the rules in /etc/init.d.
init is usually called with a number as an argument, e.g. init 3 This will make it run the contents of the rc3.d folder.
For more information: Wikipedia article for init.
Edit: Forgot to mention, what actually controls what rc level you start off in is your bootloader.

Is there a way for non-root processes to bind to "privileged" ports on Linux?

It's very annoying to have this limitation on my development box, when there won't ever be any users other than me.
I'm aware of the standard workarounds, but none of them do exactly what I want:
authbind (The version in Debian testing, 1.0, only supports IPv4)
Using the iptables REDIRECT target to redirect a low port to a high port (the "nat" table is not yet implemented for ip6tables, the IPv6 version of iptables)
sudo (Running as root is what I'm trying to avoid)
SELinux (or similar). (This is just my dev box, I don't want to introduce a lot of extra complexity.)
Is there some simple sysctl variable to allow non-root processes to bind to "privileged" ports (ports less than 1024) on Linux, or am I just out of luck?
EDIT: In some cases, you can use capabilities to do this.
Okay, thanks to the people who pointed out the capabilities system and CAP_NET_BIND_SERVICE capability. If you have a recent kernel, it is indeed possible to use this to start a service as non-root but bind low ports. The short answer is that you do:
setcap 'cap_net_bind_service=+ep' /path/to/program
And then anytime program is executed thereafter it will have the CAP_NET_BIND_SERVICE capability. setcap is in the debian package libcap2-bin.
Now for the caveats:
You will need at least a 2.6.24 kernel
This won't work if your file is a script. (i.e. uses a #! line to launch an interpreter). In this case, as far I as understand, you'd have to apply the capability to the interpreter executable itself, which of course is a security nightmare, since any program using that interpreter will have the capability. I wasn't able to find any clean, easy way to work around this problem.
Linux will disable LD_LIBRARY_PATH on any program that has elevated privileges like setcap or suid. So if your program uses its own .../lib/, you might have to look into another option like port forwarding.
Resources:
capabilities(7) man page. Read this long and hard if you're going to use capabilities in a production environment. There are some really tricky details of how capabilities are inherited across exec() calls that are detailed here.
setcap man page
"Bind ports below 1024 without root on GNU/Linux": The document that first pointed me towards setcap.
Note: RHEL first added this in v6.
Update 2017:
Use authbind
Disclaimer (update per 2021): Note that authbind works via LD_PRELOAD, which is only used if your program uses libc, which is (or might) not be the case if your program is compiled with GO, or any other compiler that avoids C. If you use go, set the kernel parameter for the protected port range, see bottom of post. </EndUpdate>
Authbind is much better than CAP_NET_BIND_SERVICE or a custom kernel.
CAP_NET_BIND_SERVICE grants trust to the binary but provides no
control over per-port access.
Authbind grants trust to the
user/group and provides control over per-port access, and
supports both IPv4 and IPv6 (IPv6 support has been added as of late).
Install: apt-get install authbind
Configure access to relevant ports, e.g. 80 and 443 for all users and groups:
sudo touch /etc/authbind/byport/80
sudo touch /etc/authbind/byport/443
sudo chmod 777 /etc/authbind/byport/80
sudo chmod 777 /etc/authbind/byport/443
Execute your command via authbind
(optionally specifying --deep or other arguments, see man authbind):
authbind --deep /path/to/binary command line args
e.g.
authbind --deep java -jar SomeServer.jar
As a follow-up to Joshua's fabulous (=not recommended unless you know what you do) recommendation to hack the kernel:
I've first posted it here.
Simple. With a normal or old kernel, you don't.
As pointed out by others, iptables can forward a port.
As also pointed out by others, CAP_NET_BIND_SERVICE can also do the job.
Of course CAP_NET_BIND_SERVICE will fail if you launch your program from a script, unless you set the cap on the shell interpreter, which is pointless, you could just as well run your service as root...
e.g. for Java, you have to apply it to the JAVA JVM
sudo /sbin/setcap 'cap_net_bind_service=ep' /usr/lib/jvm/java-8-openjdk/jre/bin/java
Obviously, that then means any Java program can bind system ports.
Ditto for mono/.NET.
I'm also pretty sure xinetd isn't the best of ideas.
But since both methods are hacks, why not just lift the limit by lifting the restriction ?
Nobody said you have to run a normal kernel, so you can just run your own.
You just download the source for the latest kernel (or the same you currently have).
Afterwards, you go to:
/usr/src/linux-<version_number>/include/net/sock.h:
There you look for this line
/* Sockets 0-1023 can't be bound to unless you are superuser */
#define PROT_SOCK 1024
and change it to
#define PROT_SOCK 0
if you don't want to have an insecure ssh situation, you alter it to this:
#define PROT_SOCK 24
Generally, I'd use the lowest setting that you need, e.g. 79 for http, or 24 when using SMTP on port 25.
That's already all.
Compile the kernel, and install it.
Reboot.
Finished - that stupid limit is GONE, and that also works for scripts.
Here's how you compile a kernel:
https://help.ubuntu.com/community/Kernel/Compile
# You can get the kernel-source via package `linux-source`, no manual download required
apt-get install linux-source fakeroot
mkdir ~/src
cd ~/src
tar xjvf /usr/src/linux-source-<version>.tar.bz2
cd linux-source-<version>
# Apply the changes to PROT_SOCK define in /include/net/sock.h
# Copy the kernel config file you are currently using
cp -vi /boot/config-`uname -r` .config
# Install ncurses libary, if you want to run menuconfig
apt-get install libncurses5 libncurses5-dev
# Run menuconfig (optional)
make menuconfig
# Define the number of threads you wanna use when compiling (should be <number CPU cores> - 1), e.g. for quad-core
export CONCURRENCY_LEVEL=3
# Now compile the custom kernel
fakeroot make-kpkg --initrd --append-to-version=custom kernel-image kernel-headers
# And wait a long long time
cd ..
In a nutshell,
use iptables if you want to stay secure,
compile the kernel if you want to be sure this restriction never bothers you again.
sysctl method
Note:
As of late, updating the kernel is no longer required.
You can now set
sysctl net.ipv4.ip_unprivileged_port_start=80
Or to persist
sysctl -w net.ipv4.ip_unprivileged_port_start=80.
And if that yields an error, simply edit /etc/sysctl.conf with nano and set the parameter there for persistence across reboots.
or via procfs
echo 80 | sudo tee /proc/sys/net/ipv4/ip_unprivileged_port_start
You can do a port redirect. This is what I do for a Silverlight policy server running on a Linux box
iptables -A PREROUTING -t nat -i eth0 -p tcp --dport 943 -j REDIRECT --to-port 1300
For some reason no one mention about lowering sysctl net.ipv4.ip_unprivileged_port_start to the value you need.
Example: We need to bind our app to 443 port.
sysctl net.ipv4.ip_unprivileged_port_start=443
Some may say, there is a potential security problem: unprivileged users now may bind to the other privileged ports (444-1024).
But you can solve this problem easily with iptables, by blocking other ports:
iptables -I INPUT -p tcp --dport 444:1024 -j DROP
iptables -I INPUT -p udp --dport 444:1024 -j DROP
Comparison with other methods. This method:
from some point is (IMO) even more secure than setting CAP_NET_BIND_SERVICE/setuid, since an application doesn't setuid at all, even partly (capabilities actually are).
For example, to catch a coredump of capability-enabled application you will need to change sysctl fs.suid_dumpable (which leads to another potential security problems)
Also, when CAP/suid is set, /proc/PID directory is owned by root, so your non-root user will not have full information/control of running process, for example, user will not be able (in common case) to determine which connections belong to application via /proc/PID/fd/ (netstat -aptn | grep PID).
has security disadvantage: while your app (or any app that uses ports 443-1024) is down for some reason, another app could take the port. But this problem could also be applied to CAP/suid (in case you set it on interpreter, e.g. java/nodejs) and iptables-redirect. Use systemd-socket method to exclude this problem. Use authbind method to only allow special user binding.
doesn't require setting CAP/suid every time you deploy new version of application.
doesn't require application support/modification, like systemd-socket method.
doesn't require kernel rebuild (if running version supports this sysctl setting)
doesn't do LD_PRELOAD like authbind/privbind method, this could potentially affect performance, security, behavior (does it? haven't tested). In the rest authbind is really flexible and secure method.
over-performs iptables REDIRECT/DNAT method, since it doesn't require address translation, connection state tracking, etc. This only noticeable on high-load systems.
Depending on the situation, I would choose between sysctl, CAP, authbind and iptables-redirect. And this is great that we have so many options.
Or patch your kernel and remove the check.
(Option of last resort, not recommended).
In net/ipv4/af_inet.c, remove the two lines that read
if (snum && snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
goto out;
and the kernel won't check privileged ports anymore.
The standard way is to make them "setuid" so that they start up as root, and then they throw away that root privilege as soon as they've bound to the port but before they start accepting connections to it. You can see good examples of that in the source code for Apache and INN. I'm told that Lighttpd is another good example.
Another example is Postfix, which uses multiple daemons that communicate through pipes, and only one or two of them (which do very little except accept or emit bytes) run as root and the rest run at a lower privilege.
Modern Linux supports /sbin/sysctl -w net.ipv4.ip_unprivileged_port_start=0.
You can setup a local SSH tunnel, eg if you want port 80 to hit your app bound to 3000:
sudo ssh $USERNAME#localhost -L 80:localhost:3000 -N
This has the advantage of working with script servers, and being very simple.
I know this is an old question, but now with recent (>= 4.3) kernels there is finally a good answer to this - ambient capabilities.
The quick answer is to grab a copy of the latest (as-yet-unreleased) version of libcap from git and compile it. Copy the resulting progs/capsh binary somewhere (/usr/local/bin is a good choice). Then, as root, start your program with
/usr/local/bin/capsh --keep=1 --user='your-service-user-name' \
--inh='cap_net_bind_service' --addamb='cap_net_bind_service' \
-- -c 'your-program'
In order, we are
Declaring that when we switch users, we want to keep our current capability sets
Switching user & group to 'your-service-user-name'
Adding the cap_net_bind_service capability to the inherited & ambient sets
Forking bash -c 'your-command' (since capsh automatically starts bash with the arguments after --)
There's a lot going on under the hood here.
Firstly, we are running as root, so by default, we get a full set of capabilities. Included in this is the ability to switch uid & gid with the setuid and setgid syscalls. However, ordinarily when a program does this, it loses its set of capabilities - this is so that the old way of dropping root with setuid still works. The --keep=1 flag tells capsh to issue the prctl(PR_SET_KEEPCAPS) syscall, which disables the dropping of capabilities when changing user. The actual changing of users by capsh happens with the --user flag, which runs setuid and setgid.
The next problem we need to solve is how to set capabilities in a way that carries on after we exec our children. The capabilities system has always had an 'inherited' set of capabilities, which is " a set of capabilities preserved across an execve(2)" [capabilities(7)]. Whilst this sounds like it solves our problem (just set the cap_net_bind_service capability to inherited, right?), this actually only applies for privileged processes - and our process is not privileged anymore, because we already changed user (with the --user flag).
The new ambient capability set works around this problem - it is "a set of capabilities that are preserved across an execve(2) of a program that is not privileged." By putting cap_net_bind_service in the ambient set, when capsh exec's our server program, our program will inherit this capability and be able to bind listeners to low ports.
If you're interested to learn more, the capabilities manual page explains this in great detail. Running capsh through strace is also very informative!
File capabilities are not ideal, because they can break after a package update.
The ideal solution, IMHO, should be an ability to create a shell with inheritable CAP_NET_BIND_SERVICE set.
Here's a somewhat convoluted way to do this:
sg $DAEMONUSER "capsh --keep=1 --uid=`id -u $DAEMONUSER` \
--caps='cap_net_bind_service+pei' -- \
YOUR_COMMAND_GOES_HERE"
capsh utility can be found in libcap2-bin package in Debian/Ubuntu distributions. Here's what goes on:
sg changes effective group ID to that of the daemon user. This is necessary because capsh leaves GID unchanged and we definitely do not want it.
Sets bit 'keep capabilities on UID change'.
Changes UID to $DAEMONUSER
Drops all caps (at this moment all caps are still present because of --keep=1), except inheritable cap_net_bind_service
Executes your command ('--' is a separator)
The result is a process with specified user and group, and cap_net_bind_service privileges.
As an example, a line from ejabberd startup script:
sg $EJABBERDUSER "capsh --keep=1 --uid=`id -u $EJABBERDUSER` --caps='cap_net_bind_service+pei' -- $EJABBERD --noshell -detached"
Two other simple possibilities: Daemon and Proxy
Daemon
There is an old (unfashionable) solution to the "a daemon that binds on a low port and hands control to your daemon". It's called inetd (or xinetd).
The cons are:
your daemon needs to talk on stdin/stdout (if you don't control the daemon -- if you don't have the source -- then this is perhaps a showstopper, although some services may have an inetd-compatibility flag)
a new daemon process is forked for every connection
it's one extra link in the chain
Pros:
available on any old UNIX
once your sysadmin has set up the config, you're good to go about your development (when you re-build your daemon, might you lose setcap capabilities? And then you'll have to go back to your admin "please sir...")
daemon doesn't have to worry about that networking stuff, just has to talk on stdin/stdout
can configure to execute your daemon as a non-root user, as requested
Proxy
Another alternative: a hacked-up proxy (netcat or even something more robust) from the privileged port to some arbitrary high-numbered port where you can run your target daemon. (Netcat is obviously not a production solution, but "just my dev box", right?). This way you could continue to use a network-capable version of your server, would only need root/sudo to start proxy (at boot), wouldn't be relying on complex/potentially fragile capabilities.
My "standard workaround" uses socat as the user-space redirector:
socat tcp6-listen:80,fork tcp6:8080
Beware that this won't scale, forking is expensive but it's the way socat works.
Linux supports capabilities to support more fine-grained permissions than just "this application is run as root". One of those capabilities is CAP_NET_BIND_SERVICE which is about binding to a privileged port (<1024).
Unfortunately I don't know how to exploit that to run an application as non-root while still giving it CAP_NET_BIND_SERVICE (probably using setcap, but there's bound to be an existing solution for this).
systemd is a sysvinit replacement which has an option to launch a daemon with specific capabilities. Options Capabilities=, CapabilityBoundingSet= in systemd.exec(5) manpage.
TLDR: For "the answer" (as I see it), jump down to the >>TLDR<< part in this answer.
OK, I've figured it out (for real this time), the answer to this question, and this answer of mine is also a way of apologizing for promoting another answer (both here and on twitter) that I thought was "the best", but after trying it, discovered that I was mistaken about that. Learn from my mistake kids: don't promote something until you've actually tried it yourself!
Again, I reviewed all the answers here. I've tried some of them (and chose not to try others because I simply didn't like the solutions). I thought that the solution was to use systemd with its Capabilities= and CapabilitiesBindingSet= settings. After wrestling with this for some time, I discovered that this is not the solution because:
Capabilities are intended to restrict root processes!
As the OP wisely stated, it is always best to avoid that (for all your daemons if possible!).
You cannot use the Capabilities related options with User= and Group= in systemd unit files, because capabilities are ALWAYS reset when execev (or whatever the function is) is called. In other words, when systemd forks and drops its perms, the capabilities are reset. There is no way around this, and all that binding logic in the kernel is basic around uid=0, not capabilities. This means that it is unlikely that Capabilities will ever be the right answer to this question (at least any time soon). Incidentally, setcap, as others have mentioned, is not a solution. It didn't work for me, it doesn't work nicely with scripts, and those are reset anyways whenever the file changes.
In my meager defense, I did state (in the comment I've now deleted), that James' iptables suggestion (which the OP also mentions), was the "2nd best solution". :-P
>>TLDR<<
The solution is to combine systemd with on-the-fly iptables commands, like this (taken from DNSChain):
[Unit]
Description=dnschain
After=network.target
Wants=namecoin.service
[Service]
ExecStart=/usr/local/bin/dnschain
Environment=DNSCHAIN_SYSD_VER=0.0.1
PermissionsStartOnly=true
ExecStartPre=/sbin/sysctl -w net.ipv4.ip_forward=1
ExecStartPre=-/sbin/iptables -D INPUT -p udp --dport 5333 -j ACCEPT
ExecStartPre=-/sbin/iptables -t nat -D PREROUTING -p udp --dport 53 -j REDIRECT --to-ports 5333
ExecStartPre=/sbin/iptables -A INPUT -p udp --dport 5333 -j ACCEPT
ExecStartPre=/sbin/iptables -t nat -A PREROUTING -p udp --dport 53 -j REDIRECT --to-ports 5333
ExecStopPost=/sbin/iptables -D INPUT -p udp --dport 5333 -j ACCEPT
ExecStopPost=/sbin/iptables -t nat -D PREROUTING -p udp --dport 53 -j REDIRECT --to-ports 5333
User=dns
Group=dns
Restart=always
RestartSec=5
WorkingDirectory=/home/dns
PrivateTmp=true
NoNewPrivileges=true
ReadOnlyDirectories=/etc
# Unfortunately, capabilities are basically worthless because they're designed to restrict root daemons. Instead, we use iptables to listen on privileged ports.
# Capabilities=cap_net_bind_service+pei
# SecureBits=keep-caps
[Install]
WantedBy=multi-user.target
Here we accomplish the following:
The daemon listens on 5333, but connections are successfully accepted on 53 thanks to iptables
We can include the commands in the unit file itself, and thus we save people headaches. systemd cleans up the firewall rules for us, making sure to remove them when the daemon isn't running.
We never run as root, and we make privilege escalation impossible (at least systemd claims to), supposedly even if the daemon is compromised and sets uid=0.
iptables is still, unfortunately, quite an ugly and difficult-to-use utility. If the daemon is listening on eth0:0 instead of eth0, for example, the commands are slightly different.
With systemd, you just need to slightly modify your service to accept preactivated sockets.
You can later use systemd socket activate.
No capabilities, iptables or other tricks are needed.
This is content of relevant systemd files from this example of simple python http server
File httpd-true.service
[Unit]
Description=Httpd true
[Service]
ExecStart=/usr/local/bin/httpd-true
User=subsonic
PrivateTmp=yes
File httpd-true.socket
[Unit]
Description=HTTPD true
[Socket]
ListenStream=80
[Install]
WantedBy=default.target
Port redirect made the most sense for us, but we ran into an issue where our application would resolve a url locally that also needed to be re-routed; (that means you shindig).
This will also allow you to be redirected when accessing the url on the local machine.
iptables -A PREROUTING -t nat -p tcp --dport 80 -j REDIRECT --to-port 8080
iptables -A OUTPUT -t nat -p tcp --dport 80 -j REDIRECT --to-port 8080
At startup:
iptables -A PREROUTING -t nat -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 8080
Then you can bind to the port you forward to.
There is also the 'djb way'. You can use this method to start your process as root running on any port under tcpserver, then it will hand control of the process to the user you specify immediately after the process starts.
#!/bin/sh
UID=$(id -u username)
GID=$(id -g username)
exec tcpserver -u "${UID}" -g "${GID}" -RHl0 0 port /path/to/binary &
For more info, see: http://thedjbway.b0llix.net/daemontools/uidgid.html
Use the privbind utility: it allows an unprivileged application to bind to reserved ports.
Bind port 8080 to 80 and open port 80:
sudo iptables -t nat -A OUTPUT -o lo -p tcp --dport 80 -j REDIRECT --to-port 8080
sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT
and then run program on port 8080 as a normal user.
you will then be able to access http://127.0.0.1 on port 80
I tried the iptables PREROUTING REDIRECT method. In older kernels it seems this type of rule wasn't supported for IPv6. But apparently it is now supported in ip6tables v1.4.18 and Linux kernel v3.8.
I also found that PREROUTING REDIRECT doesn't work for connections initiated within the machine. To work for conections from the local machine, add an OUTPUT rule also — see iptables port redirect not working for localhost. E.g. something like:
iptables -t nat -I OUTPUT -o lo -p tcp --dport 80 -j REDIRECT --to-port 8080
I also found that PREROUTING REDIRECT also affects forwarded packets. That is, if the machine is also forwarding packets between interfaces (e.g. if it's acting as a Wi-Fi access point connected to an Ethernet network), then the iptables rule will also catch connected clients' connections to Internet destinations, and redirect them to the machine. That's not what I wanted—I only wanted to redirect connections that were directed to the machine itself. I found I can make it only affect packets addressed to the box, by adding -m addrtype --dst-type LOCAL. E.g. something like:
iptables -A PREROUTING -t nat -p tcp --dport 80 -m addrtype --dst-type LOCAL -j REDIRECT --to-port 8080
One other possibility is to use TCP port forwarding. E.g. using socat:
socat TCP4-LISTEN:www,reuseaddr,fork TCP4:localhost:8080
However one disadvantage with that method is, the application that is listening on port 8080 then doesn't know the source address of incoming connections (e.g. for logging or other identification purposes).
Since the OP is just development/testing, less than sleek solutions may be helpful:
setcap can be used on a script's interpreter to grant capabilities to scripts. If setcaps on the global interpreter binary is not acceptable, make a local copy of the binary (any user can) and get root to setcap on this copy. Python2 (at least) works properly with a local copy of the interpreter in your script development tree. No suid is needed so the root user can control to what capabilities users have access.
If you need to track system-wide updates to the interpreter, use a shell script like the following to run your script:
#!/bin/sh
#
# Watch for updates to the Python2 interpreter
PRG=python_net_raw
PRG_ORIG=/usr/bin/python2.7
cmp $PRG_ORIG $PRG || {
echo ""
echo "***** $PRG_ORIG has been updated *****"
echo "Run the following commands to refresh $PRG:"
echo ""
echo " $ cp $PRG_ORIG $PRG"
echo " # setcap cap_net_raw+ep $PRG"
echo ""
exit
}
./$PRG $*
Answer at 2015/Sep:
ip6tables now supports IPV6 NAT: http://www.netfilter.org/projects/iptables/files/changes-iptables-1.4.17.txt
You will need kernel 3.7+
Proof:
[09:09:23] root#X:~ ip6tables -t nat -vnL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 REDIRECT tcp eth0 * ::/0 ::/0 tcp dpt:80 redir ports 8080
0 0 REDIRECT tcp eth0 * ::/0 ::/0 tcp dpt:443 redir ports 1443
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 6148 packets, 534K bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 6148 packets, 534K bytes)
pkts bytes target prot opt in out source destination
There is a worked example of doing this with a file capable shared library linked to an unprivileged application on the libcap website. It was recently mentioned in an answer to a question about adding capabilities to shared libraries.

Resources