Fluent Bit not saving any data on filesystem - linux

I am new to fluent bit and currently doing a POC. I tried multiple things but couldn't make Fluent Bit save any data to filesystem.
[SERVICE]
flush 1
daemon Off
log_level trace
parsers_file parsers.conf
plugins_file plugins.conf
http_server on
http_listen 0.0.0.0
http_port 2020
storage.metrics on
storage.path /var/log/fluent-bit/buffer
storage.max_chunks_up 4
storage.sync full
storage.backlog.mem_limit 1M
[INPUT]
name cpu
tag cpu.local
# Read interval (sec) Default: 1
interval_sec 1
[INPUT]
name exec
tag d-disk
command df -h --type=ext4 | grep -v Filesystem
interval_sec 1
interval_nsec 0
[INPUT]
name mem
tag memory
interval_sec 1
[OUTPUT]
name stdout
match memory
When I go to /var/log/fluent-bit/buffer and run ls -a I see nothing.
My aim to make Fluent Bit save data on disk.

Here we have to specifically mention the buffering mechanism to use. more details.
Try adding storage.type filesystem in your INPUT section

Related

How to create swap partition/file on a Yocto distribution

I'm trying to create a swap partition/file on my board where a core-image-minimal has been installed.
The fdisk -l command doesn't show any partition thus I'm not able to figure out which block device I need to use to create a new partition.
Secondly, launchig a swapon command on a swapfile correctly initialized using mkswap will raise an invalid argument error saying that the file contains holes even though I created it using dd.
At this point I'm not sure if I can do something like this since the free output looks like:
total used free shared buff/cache available
Mem: 503304 32108 101108 216 370088 465180
Swap: 0 0 0
To add any partition to your image, you need to modify the wks file that is used for your build.
To get the current wks file run :
bitbake -e | grep ^WKS_FILE=
Then, look for that file in your layers sources.
In that file you can add (example 1GB swap):
part swap --ondisk mmcblk0 --size 44 --label swap --fstype=swap --size=1024M --overhead-factor 1
For a real example, you can see the raspberry-pi machine swap support commit here.
You can use a custom wks file and set it to your custom machine conf file:
WKS_FILE ?= "custom-image.wks"
For detailed info, check the Yocto reference about wks.

monitoring linux server sockets or files

I have the famous socketexception too many open files bug.
Iam running an apache http server, tomcat server and a mysql database on my server.
I checked the limit of open files with ulimit -n that gave me 1024.
If i want to check how many files are opened by lsof -u tomcat, it gives me 5
same for mysql. I not sure what the problem is.. but i have also a readlink permission denied.
i want to monitor my socket connections and opened files on my server. I thought about using the decribed linux commands in a shell script and send them per mail to me.
The other option i think is using netstat and count maybe the connections.. but its loading very slowly and is giving me getnameinfo fail.
what would be the better command to monitor the bug i have`?
EDIT:
SHOW GLOBAL VARIABLES LIKE '%open%';
Variable_name Value
Com_ha_open 0
Com_show_open_tables 0
Open_files 8
Open_streams 0
Open_table_definitions 87
Open_tables 64
Opened_files 673
Opened_table_definitions 87
Opened_tables 628
Slave_open_temp_tables 0
SHOW GLOBAL VARIABLES LIKE '%open%';
Variable_name Value
have_openssl DISABLED
innodb_open_files 300
open_files_limit 2000
table_open_cache 64
SHOW GLOBAL VARIABLES LIKE '%connect%'
character_set_connection latin1
collation_connection latin1_swedish_ci
connect_timeout 10
init_connect
max_connect_errors 10
max_connections 400
max_user_connections 0
SHOW GLOBAL STATUS LIKE '%connect%';
Variable_name Value
Aborted_connects 1
Connections 35954
Max_used_connections 102
Ssl_client_connects 0
Ssl_connect_renegotiates 0
Ssl_finished_connects 0
Threads_connected 11
You may check ulimit values with 'ulimit -a' to determine capacity of Open Files.
From OS Command Prompt, ulimit -n 8192 and press enter to enable more Open Files dyamically.
To make this change persist across OS restart, the next URL can be your guide.
https://glassonionblog.wordpress.com/2013/01/27/increase-ulimit-and-file-descriptors-limit/
Where their example is for 500000 capacity, use 8192 for your system, please.
Suggestions to consider for your my.cnf [mysqld] section,
thread_cache_size=100 # to support your max_used_connections of 102
max_user_connections=400 # from 0 to match max_connections requested
table_open_cache=800 # from 64 to reduce Opened_tables count
innodb_open_files=800 # from 300 to match table_open_cache requested
Implementing these details should avoid 'too many open files' message. For additional assistance, view profile, Network profile for contact information and free downloadable Utility Scripts to assist with performance tuning.

How to release hugepages from the crashed application

I have an application that uses hugepage and the application suddenly crashed due to some bug.
After crashing, since the application does not release the hugepage properly, the free hugepage number is not increased in sys filesystem.
$ sudo cat /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages
0
$ sudo cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
1024
Is there a way to release the hugepages by force?
Sometimes need to check all directory that hugetlbfs has been mounted.
So,
find mounted directory by command mount | grep huge.
check every directory except especially /dev/hugepages.
delete all 2M-sized files. (2M is the size of hugepage)
Use ipcs -m to list the shared memory segments.
Use ipcrm to remove the left over shared memory segments.
Edit on 06/24/2019:
Ok, so, the above answer, while correct as far as it goes, was a bit brief. In particular, if you have a host with multiple DB instances, and only one is crashed how can you determine which (if any) memory segments should be cleaned up?
Well, this too, can be done. For each running instance, connect w/ / as sysdba, then do oradebug setmypid (any pid will do, as all Oracle PIDs connect to the SGA). Then do oradebug ipc. That will (hopefully) return IPC information written to the trace file. So, go to the udump (or diag_dest) directory, and look for your trace file. It will contain all the IPC information for the instance. This will include ShmId. Look through the file for the ShmId(s) that this instance is using. Now look at the output of ipcs -m.
When you have done that for all the running instances, any memory segment output by ipcs -m that shows non-zero memory allocation, and that you cannot account for in the oradebug ipc information from any running instance, must be the left over memory segments from the crashed instance. Use ipcrm to remove it/them.
When doing this on a host with multiple running instances, this can be a bit fraught. Please proceed with caution. You don't want to remove the SGA of a running instance!
Hope that helps....
HugeTLB can either be used for shared memory (and Mark J. Bobak's answer would deal with that) or the app mmaps files created in a hugetlb filesystem. If the app crashes without removing those files they survive and keep corresponding memory 'allocated'.
Check hugeTLB filesystem and see if there are any leftover files from the app. Removing them would release the memory.
If you follow the instruction below, you can get rid of the allocated hugepages:
1) Let's check the hugepages which were free at restart
dpdk#dpdkvm:~$ ls /mnt/huge/
empty
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total: 256
HugePages_Free: 256
...
2) Starting a dpdk application with wrong parameters, producing an error
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ sudo ./build/kni -c 0x03 -n 2 -- -P -p 0x03 --config="(0,0,1),(1,0,1)"
...
EAL: Error - exiting with code: 1
Cause: No supported Ethernet device found
3) When I check hugepages, there is not any free
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total: 256
HugePages_Free: 0
...
4) Now, when I check the mounted hugepage directory, I can see the files which are not given back to OS by dpdk application.
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ ls /mnt/huge/
...
rtemap_0 rtemap_137 rtemap_176 rtemap_214 rtemap_253 rtemap_62
...
5) Finally, if you remove the files starting with rtemap, you can give the hugepages back
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ sudo rm /mnt/huge/*
[sudo] password for dpdk:
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total: 256
HugePages_Free: 256
...
your hugetlb may be used by shared memory or mmap files.
try to remove the shared memories or umount the hugetlb fs

Identify other end of a unix domain socket connection

I'm trying to figure out what process is holding the other end of a unix domain socket. In some strace output I've identified a given file descriptor which is involved in the problem I'm currently debugging, and I'd like to know which process is on the other end of that. As there are multiple connections to that socket, simply going by path name won't work.
lsof provides me with the following information:
dbus-daem 4175 mvg 10u unix 0xffff8803e256d9c0 0t0 12828 #/tmp/dbus-LyGToFzlcG
So I know some address (“kernel address”?), I know some socket number, and I know the path. I can find that same information in other places:
$ netstat -n | grep 12828
unix 3 [ ] STREAM CONNECTED 12828 #/tmp/dbus-LyGToFzlcG
$ grep -E '12828|ffff8803e256d9c0' /proc/net/unix
ffff8803e256d9c0: 00000003 00000000 00000000 0001 03 12828 #/tmp/dbus-LyGToFzlcG
$ ls -l /proc/*/fd/* 2>/dev/null | grep 12828
lrwx------ 1 mvg users 64 10. Aug 09:08 /proc/4175/fd/10 -> socket:[12828]
However, none of this tells me what the other end of my socket connection is. How can I tell which process is holding the other end?
Similar questions have been asked on Server Fault and Unix & Linux. The accepted answer is that this information is not reliably available to the user space on Linux.
A common suggestion is to look at adjacent socket numbers, but ls -l /proc/*/fd/* 2>/dev/null | grep 1282[79] gave no results here. Perhaps adjacent lines in the output from netstat can be used. It seems like there was a pattern of connections with and without an associated socket name. But I'd like some kind of certainty, not just guesswork.
One answer suggests a tool which appears to be able to address this by digging through kernel structures. Using that option requires debug information for the kernel, as generated by the CONFIG_DEBUG_INFO option and provided as a separate package by some distributions. Based on that answer, using the address provided by lsof, the following solution worked for me:
# gdb /usr/src/linux/vmlinux /proc/kcore
(gdb) p ((struct unix_sock*)0xffff8803e256d9c0)->peer
This will print the address of the other end of the connection. Grepping lsof -U for that number will provide details like the process id and the file descriptor number.
If debug information is not available, it might be possible to access the required information by knowing the offset of the peer member into the unix_sock structure. In my case, on Linux 3.5.0 for x86_64, the following code can be used to compute the same address without relying on debugging symbols:
(gdb) p ((void**)0xffff8803e256d9c0)[0x52]
I won't make any guarantees about how portable that solution is.
Update: It's been possible to to do this using actual interfaces for a while now. Starting with Linux 3.3, the UNIX_DIAG feature provides a netlink-based API for this information, and lsof 4.89 and later support it. See https://unix.stackexchange.com/a/190606/1820 for more information.

How to tie a network connection to a PID without using lsof or netstat?

Is there a way to tie a network connection to a PID (process ID) without forking to lsof or netstat?
Currently lsof is being used to poll what connections belong which process ID. However lsof or netstat can be quite expensive on a busy host and would like to avoid having to fork to these tools.
Is there someplace similar to /proc/$pid where one can look to find this information? I know what the network connections are by examining /proc/net but can't figure out how to tie this back to a pid. Over in /proc/$pid, there doesn't seem to be any network information.
The target hosts are Linux 2.4 and Solaris 8 to 10. If possible, a solution in Perl, but am willing to do C/C++.
additional notes:
I would like to emphasize the goal here is to tie a network connection to a PID. Getting one or the other is trivial, but putting the two together in a low cost manner appears to be difficult. Thanks for the answers to so far!
I don't know how often you need to poll, or what you mean with "expensive", but with the right options both netstat and lsof run a lot faster than in the default configuration.
Examples:
netstat -ltn
shows only listening tcp sockets, and omits the (slow) name resolution that is on by default.
lsof -b -n -i4tcp:80
omits all blocking operations, name resolution, and limits the selection to IPv4 tcp sockets on port 80.
On Solaris you can use pfiles(1) to do this:
# ps -fp 308
UID PID PPID C STIME TTY TIME CMD
root 308 255 0 22:44:07 ? 0:00 /usr/lib/ssh/sshd
# pfiles 308 | egrep 'S_IFSOCK|sockname: '
6: S_IFSOCK mode:0666 dev:326,0 ino:3255 uid:0 gid:0 size:0
sockname: AF_INET 192.168.1.30 port: 22
For Linux, this is more complex (gruesome):
# pgrep sshd
3155
# ls -l /proc/3155/fd | fgrep socket
lrwx------ 1 root root 64 May 22 23:04 3 -> socket:[7529]
# fgrep 7529 /proc/3155/net/tcp
6: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7529 1 f5baa8a0 300 0 0 2 -1
00000000:0016 is 0.0.0.0:22. Here's the equivalent output from netstat -a:
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
Why don't you look at the source code of netstat and see how it get's the information? It's open source.
For Linux, have a look at the /proc/net directory
(for example, cat /proc/net/tcp lists your tcp connections). Not sure about Solaris.
Some more information here.
I guess netstat basically uses this exact same information so i don't know if you will be able to speed it up a whole lot. Be sure to try the netstat '-an' flags to NOT resolve ip-adresses to hostnames realtime (as this can take a lot of time due to dns queries).
The easiest thing to do is
strace -f netstat -na
On Linux (I don't know about Solaris). This will give you a log of all of the system calls made. It's a lot of output, some of which will be relevant. Take a look at the files in the /proc file system that it's opening. This should lead you to how netstat does it. Indecently, ltrace will allow you to do the same thing through the c library. Not useful for you in this instance, but it can be useful in other circumstances.
If it's not clear from that, then take a look at the source.
Take a look at these answers which thoroughly explore the options available:
How I can get ports associated to the application that opened them?
How to do like "netstat -p", but faster?

Resources