Using bash to parse a log for unique MAC addresses

Using bash to parse a log for unique MAC addresses - linux

I've got Debian box set up as a syslog server for a couple cisco ASAs. They are running DHCP and I'm attemping to track the unique instances of a MAC addresses being assigned a lease. I've set the ASAs to only log the message that the cisco DHCPd uses, and it sends that to the Debian server as %HOSTIPADDRESS%.log, which then rotates out daily. So I've got a directory that is filled with this:
-rw-r----- 1 syslog adm 536351 Aug 23 06:24 10.10.10.4.log.10
-rw-r----- 1 syslog adm 459634 Aug 22 06:24 10.10.10.4.log.11
-rw-r----- 1 syslog adm 176957 Aug 21 06:24 10.10.10.4.log.12
-rw-r----- 1 syslog adm 246654 Aug 20 06:24 10.10.10.4.log.13
-rw-r----- 1 syslog adm 459978 Aug 19 06:24 10.10.10.4.log.14
-rw-r----- 1 syslog adm 606987 Aug 18 06:21 10.10.10.4.log.15
-rw-r----- 1 syslog adm 599140 Aug 17 06:24 10.10.10.4.log.16
-rw-r----- 1 syslog adm 605837 Aug 16 06:24 10.10.10.4.log.17
-rw-r----- 1 syslog adm 607630 Aug 15 06:24 10.10.10.4.log.18
-rw-r----- 1 syslog adm 189493 Aug 14 06:24 10.10.10.4.log.19
In each of those logs I've got something that looks like this:
Aug 23 06:20:19 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 011c.9148.dbb4.15 (172.16.1.196)
Aug 23 06:20:41 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 0138.0f4a.986a.16 (172.16.1.126)
Aug 23 06:20:51 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 0190.b686.63c6.a9 (172.16.1.193)
Aug 23 06:20:55 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 0154.4e90.8a7a.00 (172.16.1.211)
Aug 23 06:21:11 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 012c.0e3d.fcf6.34 (172.16.1.189)
Aug 23 06:21:35 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 0154.4e90.8a7a.00 (172.16.1.211)
Aug 23 06:21:51 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 0154.4e90.8a7a.00 (172.16.1.211)
Aug 23 06:22:29 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 5caf.0664.cd18 (172.16.1.212)
Aug 23 06:24:00 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 01fc.dbb3.49af.eb (172.16.1.207)
Aug 23 06:24:21 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 01a0.3be3.03b4.74 (172.16.1.195)
Aug 23 06:24:39 10.10.10.4 %ASA-6-604103: DHCP daemon interface inside: address granted 01b4.79a7.1895.33 (172.16.1.157)
The problem, is that dhcp leases renew,as you can see by the multiple instances of the same device at 172.16.1.211, for instance. I thought I could get around this by setting longer leases, as my understanding of how DHCP works is that leases would not start the renewal process until they reached their half-life but that is not working.
I'm also running into issues of address pool depletion because my leases are so long, and the ASA model I'm using has a hard limit to the size of it's scope.
Long story short, I need to parse those logs and retrieve the number of unique MAC addresses that occur in one of those logs. Any ideas on how this can be accomplished with bash? If I knew how to pull that info from one of the files, I could get through setting up process to do it for all of them using cron or something. I am not a programmer, however, I'm a network engineer. Any help would be appreciated.
Thanks,

Long story short, I need to parse those logs and retrieve the number of unique MAC addresses that occur in one of those logs.
Yes, given the regular nature of the data in your log files, this is very easy to do with several different tools.
The most basic would be to use cut
cut -d" " -f13 | sort | uniq -c
A more advanced tool is awk, and it provides many logic enhancments that allow you to add as many conditional statements as you want to filter the data as needed. For your case, though it is still very simple,
awk '{print $12}' | sort | uniq -c
In both cases, cut and awk, I only had to count over the number of fields in your data to the value of interest, and then specify that as the column (field number in awk-speak).
(when testing these answers, I found that using cut required using -d" " and -f13 (for some reason). I thought cut defaulted to -d" " but I had to specify it explicitly for the code to work).
Of course in both examples, I'm using the sort and uniq utilities, (man uniq for the how-to). uniq, has several options, and the -c option indicates count, so the data needs to be sorted for the counts to accumulate correctly (I missed that in my original comment).
Just for example, you could extend your counter to filter by the date value at the front of each record with
awk '/^Aug 23/{print $12}' | sort | uniq -c
But there are many more filtering and logic tools that you can use with awk.
If you're going to be working with logfile data regularly (or other non-XML-like data), I'd recommend working thru the Grymoire's Awk Tutorial .
IHTH

Related

Can you bind the default network interface of the host into the container to read network stats?

I have a project where I read system information from the host inside a container. Right now I got CPU, RAM and Storage to work, but Network turns out to be a little harder. I am using the Node.js library https://systeminformation.io/network.html, which reads the network stats from /sys/class/net/.
The only solution that I found right now, is to use --network host, but that does not seem like the best way, because it breaks a lot of other networking related stuff and I cannot make the assumption that everybody who uses my project is fine with that.
I have tried --add-host=host.docker.internal:host-gateway as well, but while it does show up in /etc/hosts, it does not add a network interface to /sys/class/net/.
My knowledge on Docker and Linux is very limited, so does someone know if there is any other way?
My workaround for now is, to use readlink -f /sys/class/net/$(ip addr show | awk '/inet.*brd/{print $NF; exit}') to get the final path to the network statistics of the default interface and mount it to a imaginary path in the container. Therefore I don't use the mentioned systeminformation library for that right now. I would still like to have something that is a bit more reliable and in the best case officially supported by docker. I am fine with something that is not compatible with systeminformation, though.

There is a way to enter the host network namespace after starting the container. This can be used to run one process in the container in the container network namespace and another process in the host network namespace. Communication between the processes can be done using a unix domain socket.
Alternatively you can just mount a new instance of the sysfs which points to the host network namespace. If I understood correctly this is what you really need.
For this to work you need access to the host net namespace (I mount /proc/1/ns/net to the container for this purpose). Additionally the capabilities CAP_SYS_PTRACE and CAP_SYS_ADMIN are needed.
# /proc/1 is the 'init' process of the host which is always running in host network namespace
$ docker run -it --rm --cap-add CAP_SYS_PTRACE --cap-add CAP_SYS_ADMIN -v /proc/1/ns/net:/host_ns_net:ro debian:bullseye-slim bash
root#8b40f2f48808:/ ls -l /sys/class/net
lrwxrwxrwx 1 root root 0 Jun 2 21:09 eth0 -> ../../devices/virtual/net/eth0
lrwxrwxrwx 1 root root 0 Jun 2 21:09 lo -> ../../devices/virtual/net/lo
# enter the host network namespace
root#8b40f2f48808:/ nsenter --net=/host_ns_net bash
# now we are in the host network namespace and can see the host network interfaces
root#8b40f2f48808:/ mkdir /sys2
root#8b40f2f48808:/ mount -t sysfs nodevice /sys2
root#8b40f2f48808:/ ls -l /sys2/class/net/
lrwxrwxrwx 1 root root 0 Oct 25 2021 enp2s0 -> ../../devices/pci0000:00/0000:00:1c.1/0000:02:00.0/net/enp2s0
lrwxrwxrwx 1 root root 0 Oct 25 2021 enp3s0 -> ../../devices/pci0000:00/0000:00:1c.2/0000:03:00.0/net/enp3s0
[...]
root#8b40f2f48808:/ ls -l /sys2/class/net/enp2s0/
-r--r--r-- 1 root root 4096 Oct 25 2021 addr_assign_type
-r--r--r-- 1 root root 4096 Oct 25 2021 addr_len
-r--r--r-- 1 root root 4096 Oct 25 2021 address
-r--r--r-- 1 root root 4096 Oct 25 2021 broadcast
[...]
# Now you can switch back to the original network namespace
# of the container; the dir "/sys2" is still accessible
root#8b40f2f48808:/ exit
Putting this together for non-interactive usage:
Use the docker run with the following parameters:
docker run -it --rm --cap-add CAP_SYS_PTRACE --cap-add CAP_SYS_ADMIN -v /proc/1/ns/net:/host_ns_net:ro debian:bullseye-slim bash
Execute these commands in the container before starting your node app:
mkdir /sys2
nsenter --net=/host_ns_net mount -t sysfs nodevice /sys2
After nsenter (and mount) exits, you are back in the network namespace of the container. In theory you could drop the extended capabilities now.
Now you can access the network devices under /sys2/class/net.

You could mount the host's /sys/class/net/ directory as a volume in your container and patch the systeminformation package to read the contents of your custom path instead of the default path. The changes would need to be made in lib/network.js. You can see in that file how the directory is hardcoded throughout, just do a find/replace in your local copy to change all instances of the default path.

An easy way is to mount the whole "/sys" filesystem of the host into the container. Either mount them to a new location (e.g. /sys_host) or over-mount the original "/sys" in the container:
# docker run -it --rm -v /sys:/sys:ro debian:bullseye-slim bash
root#b84df3184dce:/# ls -l /sys/class/net/
lrwxrwxrwx 1 root root 0 Oct 25 2021 enp2s0 -> ../../devices/pci0000:00/0000:00:1c.1/0000:02:00.0/net/enp2s0
lrwxrwxrwx 1 root root 0 Oct 25 2021 enp3s0 -> ../../devices/pci0000:00/0000:00:1c.2/0000:03:00.0/net/enp3s0
[...]
root#b84df3184dce:/# ls -l /sys/class/net/enp2s0/
-r--r--r-- 1 root root 4096 Oct 25 2021 addr_assign_type
-r--r--r-- 1 root root 4096 Oct 25 2021 addr_len
-r--r--r-- 1 root root 4096 Oct 25 2021 address
-r--r--r-- 1 root root 4096 Oct 25 2021 broadcast
[...]
Please be aware that this way the container has access to the whole "/sys" filesystem of the host. The relative links from the network interface to the pci device still work.
If you don't need to write you should mount it read-only by appending ":ro" to the mounted path.

last command in linux shows `:0` in 3rd column, what does it mean?

I want to write a shell script sort out the data that last command shows.
I got this in my server.
root pts/0 10.168.136.175 Wed Sep 14 14:24 - 14:54 (00:29)
root :0 Mon Sep 12 10:34 - 11:00 (00:25)
reboot system boot 2.6.18-308.el5PA Sun Sep 11 11:31 (86+03:05)
I did some search, there are some saying :0.0 in the 3rd column means login locally, second column means what kind of terminal been use, like pts and tty.
But what does the :0 in line 2 second column in my log?
I am using redhat 6.5.

It means local computer. Generally each session represented by pairs ip_address:display_number. When you logged in locally the IP address is omitted. That's why there is nothing before :. Display number is actually the session number from the specified IP address. So, 0 means the first session

How to read syslog messages as a normal user?

I'm using Ubuntu 12.04. By default /var/log/syslog is readable only by adm group members.
$ls -lh /var/log/syslog
-rw-r----- 1 syslog adm 23M Oct 29 10:20 /var/log/syslog
I tried using dmesg -f syslog, but it is also not working.
Thanks in advance for your help.

You can change the output directory in the syslog.conf, but I don't think you can change the group:owner. If you install syslog-ng, you can set the global options to output files with whatever rights you require.
create_dirs(yes);
dir_group("root");
dir_owner("adm");
dir_perm(0755);

Automatically launching Firefox from terminal using at command

I am a beginner at linux and really enthusiastic to learn the OS. I am trying to launch Firefox(or any other software like Evince) from the command line as follows:
[root#localhost ~]# at 1637
[root#localhost ~]# at> firefox
[root#localhost ~]# at> ^d
The job gets scheduled without any error. But at the specified time it does not run.
I also tried giving the following path:
[root#localhost ~]# at 1637
[root#localhost ~]# at> /usr/bin/firefox
[root#localhost ~]# at> ^d
Still no result. But When I try to use echo to display a text on the screen it appears at the specified time as desired. What might be the issue?

I think you have not set DISPLAY. at will run in separate shell where display is not set.
try the following code.
dinesh:~$ at 2120
warning: commands will be executed using /bin/sh
at> export DISPLAY=:0
at> /usr/bin/firefox > firefox.log 2>&1
at> <EOT>
job 7 at Tue Mar 11 21:20:00 2014
If it is still failing check firefox.log for more information.

1) Its not always recommended to run things as root
2) You can also try ./firefox if you are in the current directory of firefox. In linux you need to pay attention to your path variable. Unless . (the current directory) is in your path you will have to type ./program if the program is in the same directory as you.
Also you need to pay attention to file permissions: In linux you have read-write-eXecute access.
ls -l will do a list of directories and show the file permissions:
drwxr-xr-x 10 user staff 340 Oct 6 2012 GlassFish_Server/
drwx------# 15 jeffstein staff 510 Oct 6 15:01 Google Drive/
drwxr-xr-x 20 jeffstein staff 680 May 14 2013 Kindle/
drwx------+ 67 jeffstein staff 2278 Jan 26 14:22 Library/
drwx------+ 19 jeffstein staff 646 Oct 23 18:28 Movies/
drwx------+ 15 jeffstein staff 510 Jan 3 20:29 Music/
drwx------+ 90 jeffstein staff 3060 Mar 9 20:23 Pictures/
drwxr-xr-x+ 6 jeffstein staff 204 Nov 3 21:16 Public/
drwxr-xr-x 22 jeffstein staff 748 Jan 14 2012 androidTools/
-rwxrwxrwx 1 jeffstein staff 1419 Aug 28 2013 color.sh*
This is an example of ls -l here you can see color.sh has -rwxrwxrwx that means that anybody can read or write or run the file.
Without actually knowing where you installed firefox however I can't be of more help but these are some small pointers which might help.

try finding where firefox is actually installed using "whereis firefox" command.
Then try using that path in at command.

In order to get directions on how to use a command type:
man at
this will display the "manual"
DESCRIPTION
The at and batch utilities read commands from standard input or a speci-
fied file. The commands are executed at a later time, using sh(1).
at executes commands at a specified time;
atq lists the user's pending jobs, unless the user is the superuser;
in that case, everybody's jobs are listed;
atrm deletes jobs;
batch executes commands when system load levels permit; in other words,
when the load average drops below _LOADAVG_MX (1.5), or the value
specified in the invocation of at run.
So obviously you need to schedule a job with at and you can see if it worked with atq
Read the manual and it should help - if i have more time I'll write you a quick example.

Some high level questions about how PAM is designed

I'm creating a PAM module for a project. The PAM module will be using a library that will be re-used by some command line utilities (rather than re-writing everything each time). In this library, I want to have it interpret policy that discriminate against and/or logs according to subnet memberships of the remote host. Near as I can tell this value is probably coming from the authenticating application, but I don't know. Since the shared object won't have access to the pamh structure from libpam I can't just do a pam_get_item (like I would be able to from the PAM module itself) so I've had to resort to other means.
The best solution I've come up with is to have the shared object look for a connected TTY, if it's there go to utmp and find the login process associated with that TTY, extract the IP address from there. If there isn't a TTY, assume it's an initial login of a network user. The library then iterates over the sockets (which I've defined as basically any symlink with the word "socket" in the target's filename when you do a ls -l /proc/<pid>/fd) and uses the socket inode number to cross reference with /proc/net/tcp and extracts the remote IP address associated with that inode number. If it doesn't find an inode there then it assumes it's Unix domain or tcp6 (IPv6 support in this is forthcoming and not terribly important for the near future). If it still isn't able to find it, assume that some daemon has called an application linking against it and interpret it as such (might do something eventually, if it's worthwhile, but for now it's just a big NOOP if the first two don't return anything.
It seems to work but I have some high level questions about how PAM is supposed to work:
Is there some official standard that governs PAM operation? For example, is it covered by a POSIX standard somewhere? I know there are multiple PAM implementations (four or five that I've found thusfar) but I don't know if existing commonalities are de jure or de facto or just how I happen to have my system configured.
After I did a ls -l /proc/<pid>/fd > /lsOutput from the module itself (via system()):
[root#hypervisor pam]# cat /lsOutput total 0
lrwx------. 1 root root 64 Jun 15 15:09 0 -> /dev/null
lrwx------. 1 root root 64 Jun 15 15:09 1 -> /dev/null
lrwx------. 1 root root 64 Jun 15 15:09 2 -> /dev/null
lr-x------. 1 root root 64 Jun 15 15:09 3 -> socket:[426180]
[root#hypervisor pam]#
And issuing a manual ls after the user logins in:
[root#hypervisor pam]# ls -l /proc/18261/fd
total 0
lrwx------. 1 root root 64 Jun 15 15:15 0 -> /dev/null
lrwx------. 1 root root 64 Jun 15 15:15 1 -> /dev/null
lrwx------. 1 root root 64 Jun 15 15:15 11 -> /dev/ptmx
lrwx------. 1 root root 64 Jun 15 15:15 12 -> /dev/ptmx
lrwx------. 1 root root 64 Jun 15 15:15 13 -> socket:[426780]
lrwx------. 1 root root 64 Jun 15 15:15 14 -> socket:[426829]
lrwx------. 1 root root 64 Jun 15 15:15 2 -> /dev/null
lrwx------. 1 root root 64 Jun 15 15:15 3 -> socket:[426180]
lrwx------. 1 root root 64 Jun 15 15:15 4 -> socket:[426322]
lr-x------. 1 root root 64 Jun 15 15:15 5 -> pipe:[426336]
l-wx------. 1 root root 64 Jun 15 15:15 6 -> pipe:[426336]
lrwx------. 1 root root 64 Jun 15 15:15 7 -> socket:[426348]
lrwx------. 1 root root 64 Jun 15 15:15 8 -> socket:[426349]
lrwx------. 1 root root 64 Jun 15 15:15 9 -> /dev/ptmx
[root#hypervisor pam]#
So basically, it seems like both the TTY and any additional sockets get opened only AFTER the session modules finish (my temporary test module's session handling is the last in the stack for the sshd service). I've been unable to get it to be otherwise (or even think of a time when the connecting client won't be a TCP socket at descriptor 3).
Is this just due to my lack of imagination or is it necessarily so? I'm leaning towards the latter as it would seem that communicating with the client would be a pre-requisite to doing pretty much anything else that's useful. I don't know that for sure, so I feel I should ask somebody. Will descriptor 3 always be the authenticating client (my .so only makes the assumption that it's the lowest numbered TCP socket, and only if there's no TTY, but it seems like 3 should always be the descriptor for the connecting client). Would pulling the first TCP descriptor be a "deterministic" way of establishing the remote client's identity? Or is there no prescribed way this is supposed to play out and that's just how either my system is configured or how SSH has chosen to interface with PAM?
Is it sshd that's setting the rhost value or is that coming from some place else? I've tried grep-ing over the source code for both SSH and libpam, but no dice. I can see where libpam handles the setting of the host value when something call pam_set_item, but not were pam_set_item actually gets called to set it to be this or that particular host.
Any amount of help would be appreciated, I've googled but I'm starting to get splinters on my fingertips from scraping the bottom of the barrel.
Main reason I'm interested in knowing this is so that I'll end up not only with the "right" answer but mostly so that I won't have any surprises later on down the road. We have some Solaris platforms we may do this on, but my main motivation is to have assumptions that are grounded in things that are actually going to be constant.
I also realize that I could have the client programs/modules feed the host information to the library, but that would likely involve code re-write two or three times (as the CLI tools prepare session information from utmp and the PAM module from pam_get_item) and potentially make the project look more complex than it really needs to be.

Answering some of your questions:
"Is there some official standard that governs PAM operation?"
Apparently, yes. Wikipedia's entry on Pluggable_Authentication_Modulesays "PAM was standardized as part of the X/Open UNIX standardization process, resulting in the X/Open Single Sign-on (XSSO) standard." I have never found this particularly relevant to my dealings with it.
"Is this just due to my lack of imagination or is it necessarily so?"
<magicEightBall>"Concentrate and ask again"</magicEightBall> (It's ambiguous which "this" is being referred to - perhaps you can clarify?
"Will descriptor 3 always be the authenticating client?"
This is a behaviour of the application, rather than PAM.
Would pulling the first TCP descriptor be a "deterministic" way of establishing the remote client's identity?
Also a behaviour of the application.
"Is it sshd that's setting the rhost value or is that coming from some place else?"
It is sshd that sets the rhost value. In openssh's file auth-pam.c, function sshpam_init(), you'll find:
sshpam_err = pam_set_item(sshpam_handle, PAM_RHOST, pam_rhost);
Some general notes:
rather than keying off the TTY - which, as you've discovered get set later - you can walk the process lineage via getppid() and /proc.
don't trust system() in PAM modules - it uses a user's environment, and may not be the one you expect. Use fork()/execlp() instead.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string