tcpdump capture limit size with latest capture - linux

tcpdump -W 5 -C 10 -w capfile
I know what this command does, which is rotating buffer of 5 files (-W 5) and tcpdump switches to another file once the current file reaches 10,000,000 bytes, about 10MB (-C works in units of 1,000,000 bytes, so -C 10 = 10,000,000 bytes). The prefix of the files will be capfile (-w capfile), and a one-digit integer will be appended to each: how to save a new file when tcpdum file size reaches 10Mb
My question is what happens if I set -W to 1:
tcpdump -W 1 -C 10 -w capfile
Is this gonna only have 1 file with max size 10 MB contains the latest capture?

Related

Use several thread when rendering pdf to image using mupdf

Is it possible to run mutool.exe draw using several threads to increase PDF to Image conversion speed?
The command help list says something about -B and -T parameters, but I do not understand what maximum band_height does? What values should I set for -B?
-B - maximum band_height (pXm, pcl, pclm, ocr.pdf, ps, psd and png output only)
-T - number of threads to use for rendering (banded mode only)
Executing mutool with -B 100 -T 6 slightly increased conversion speed by 10% but not so much, the CPU usage spiked from 6% to 11%, but why not 60%?
mutool.exe draw -r 300 -B 100 -T 6 -o "C:\test%d.png" "C:\test-large.pdf"
Every system and PDF is different but lets use a single page without text for timings in my system.
I know this file is complex but not too unusual since without text, other objects behave as text would, without the complexity of font look-up etc. so rendering time is generally fairly similar for a given run.
Lets start with a low resolution since I know the file well enough to have found it fail due to Malloc in this machine around 300dpi.
mutool draw -Dst -r 50 -o complex.png complex.pdf
page complex.pdf 1 1691ms
total 1691ms (0ms layout) / 1 pages for an average of 1691ms
mutool draw -Dst -r 100 -o complex.png complex.pdf
page complex.pdf 1 3299ms
total 3299ms (0ms layout) / 1 pages for an average of 3299ms
mutool draw -Dst -r 200 -o complex.png complex.pdf
page complex.pdf 1 7959ms
total 7959ms (0ms layout) / 1 pages for an average of 7959ms
mutool draw -Dst -r 400 -o complex.png complex.pdf
page complex.pdf 1error: malloc of 2220451350 bytes failed
error: cannot draw 'complex.pdf'
So this is when "Banding" is required to avoid memory issues since my target is 400 dpi output.
You may see I used -D above so I need to remove that for threads, cannot use multiple threads without using display list. Lets start small since too large bands or too many threads can also malloc error.
mutool draw -st -B 32 -T 2 -r 400 -o complex.png complex.pdf
page complex.pdf 1 14111ms
total 14111ms (0ms layout) / 1 pages for an average of 14111ms
14 seconds for this file is not a bad result based on the progressive timings above, but perhaps on this 8 thread device I could do better? Lets try bigger bands and more threads.
mutool draw -st -B 32 -T 3 -r 400 -o complex.png complex.pdf
page complex.pdf 1 12726ms
total 12726ms (0ms layout) / 1 pages for an average of 12726ms
mutool draw -st -B 256 -T 3 -r 400 -o complex.png complex.pdf
page complex.pdf 1 12234ms
total 12234ms (0ms layout) / 1 pages for an average of 12234ms
mutool draw -st -B 256 -T 6 -r 400 -o complex.png complex.pdf
page complex.pdf 1 12258ms
total 12258ms (0ms layout) / 1 pages for an average of 12258ms
So increasing threads up to 3 helps and upping the Band size helps, but 6 threads is no better. So is there another tweak we can consider, and playing around with many runs the best I got on this kit/configuration was 12 seconds.
mutool draw -Pst -B 128 -T 4 -r 400 -o complex.png complex.pdf
page complex.pdf 1 1111ms (interpretation) 10968ms (rendering) 12079ms (total)

Is possibile to rotate a tcpdump log?

I have the following command:
sudo tcpdump -ni enp0s3 -W 1 -C 1 -w file.cap
with this command I say: "listen on the network interface enp0s3 and capture all packets in a file whose maximum size must be 1 mb". It works, however the problem is that when the file reaches the size of 1mb, it is reset and the capture starts all over again from 0 kb, deleting all the packets.
I want that when the file is 1MB, only the older packages are deleted and the new ones are added replacing them. I don't want all packets to be deleted and acquisition restarts at 0kb. In other words, I want the file to always be around 1mb, adding the new incoming packets in place of the oldest ones.
You can use -U -W 2 with the -C size limit. It will then alternate between two files and you can concatenate them (or work on the older one).
Alternatives would be to write to a stream or pipe and not to files, at all.

tcpdump: invalid file size

I am trying to run a tcpdump command with filesize 4096 but, it return with an error :-
tcpdump: invalid filesize
Command :- tcpdump -i any -nn -tttt -s0 -w %d-%m-%Y_%H:%M:%S:%s_hostname_ipv6.pcap -G 60 -C 4096 port 53
After some hit & trial I found that it's failing for filesize (4096 i.e 2^12) (8192 i.e. 2^13) and so on.
So, for filesize after 2^11 it's giving me invalid filesize error.
Can anybody tell me on which condition tcpdump return invalid filesize.
Also when I was running with Filesize :- 100000
tcpdump -i any -nn -tttt -s0 -w %d-%m-%Y_%H:%M:%S:%s_hostname_ipv6.pcap -G 60 -C 100000 port 53
.pcap file of max size 1.3GB was getting created.
I also tried looking in the source code of tcpdump but, couldn't find much.
I am trying to run a tcpdump command with filesize 4096
To quote a recent version of the tcpdump man page:
-C file_size
Before writing a raw packet to a savefile, check whether the
file is currently larger than file_size and, if so, close the
current savefile and open a new one. Savefiles after the first
savefile will have the name specified with the -w flag, with a
number after it, starting at 1 and continuing upward. The units
of file_size are millions of bytes (1,000,000 bytes, not
1,048,576 bytes).
So -C 4096 means a file size of 4096000000 bytes. That's a large file size and, in older versions of tcpdump, a file size that large (one bigger than 2147483647) isn't supported for the -C flag.
If you mean you want it to write out files that are 4K bytes in size, unfortunately tcpdump doesn't support that. This means it's past due to fix tcpdump issue 884 by merging tcpdump pull request 916 - I'll do that now, but that won't help you now.
Also when I was running with Filesize :- 100000
That's a file size of 100000000000, which is 100 gigabytes. Unfortunately, if you want a file size of 100000 bytes (100 kilobytes), again, the current minimum file size is 1 megabyte, so that's not supported.

Why using pipe for sort (linux command) is slow?

I have a large text file of ~8GB which I need to do some simple filtering and then sort all the rows. I am on a 28-core machine with SSD and 128GB RAM. I have tried
Method 1
awk '...' myBigFile | sort --parallel = 56 > myBigFile.sorted
Method 2
awk '...' myBigFile > myBigFile.tmp
sort --parallel 56 myBigFile.tmp > myBigFile.sorted
Surprisingly, method1 takes 11.5 min while method2 only takes (0.75 + 1 < 2) min. Why is sorting so slow when piped? Is it not paralleled?
EDIT
awk and myBigFile is not important, this experiment is repeatable by simply using seq 1 10000000 | sort --parallel 56 (thanks to #Sergei Kurenkov), and I also observed a six-fold speed improvement using un-piped version on my machine.
When reading from a pipe, sort assumes that the file is small, and for small files parallelism isn't helpful. To get sort to utilize parallelism you need to tell it to allocate a large main memory buffer using -S. In this case the data file is about 8GB, so you can use -S8G. However, at least on your system with 128GB of main memory, method 2 may still be faster.
This is because sort in method 2 can know from the size of the file that it is huge, and it can seek in the file (neither of which is possible for a pipe). Further, since you have so much memory compared to these file sizes, the data for myBigFile.tmp need not be written to disc before awk exits, and sort will be able to read the file from cache rather than disc. So the principle difference between method 1 and method 2 (on a machine like yours with lots of memory) is that sort in method 2 knows the file is huge and can easily divide up the work (possibly using seek, but I haven't looked at the implementation), whereas in method 1 sort has to discover the data is huge, and it can not use any parallelism in reading the input since it can't seek the pipe.
I think sort does not use threads when read from pipe.
I have used this command for your first case. And it shows that sort uses only 1 CPU even though it is told to use 4. atop actually also shows that there is only one thread in sort:
/usr/bin/time -v bash -c "seq 1 1000000 | sort --parallel 4 > bf.txt"
I have used this command for your second case. And it shows that sort uses 2 CPU. atop actually also shows that there are four thread in sort:
/usr/bin/time -v bash -c "seq 1 1000000 > tmp.bf.txt && sort --parallel 4 tmp.bf.txt > bf.txt"
In you first scenario sort is an I/O bound task, it does lots of read syscalls from stdin. In your second scenario sort uses mmap syscalls to read file and it avoids being an I/O bound task.
Below are results for the first and second scenarios:
$ /usr/bin/time -v bash -c "seq 1 10000000 | sort --parallel 4 > bf.txt"
Command being timed: "bash -c seq 1 10000000 | sort --parallel 4 > bf.txt"
User time (seconds): 35.85
System time (seconds): 0.84
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:37.43
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 9320
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2899
Voluntary context switches: 1920
Involuntary context switches: 1323
Swaps: 0
File system inputs: 0
File system outputs: 459136
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
$ /usr/bin/time -v bash -c "seq 1 10000000 > tmp.bf.txt && sort --parallel 4 tmp.bf.txt > bf.txt"
Command being timed: "bash -c seq 1 10000000 > tmp.bf.txt && sort --parallel 4 tmp.bf.txt > bf.txt"
User time (seconds): 43.03
System time (seconds): 0.85
Percent of CPU this job got: 175%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:24.97
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1018004
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2445
Voluntary context switches: 299
Involuntary context switches: 4387
Swaps: 0
File system inputs: 0
File system outputs: 308160
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
You have more system calls, if you use the pipe.
seq 1000000 | strace sort --parallel=56 2>&1 >/dev/null | grep read | wc -l
2059
Without the pipe the file is mapped into memory.
seq 1000000 > input
strace sort --parallel=56 input 2>&1 >/dev/null | grep read | wc -l
33
Kernel calls are in most cases the bottle neck. That is the reason why sendfile has been invented.

how to determine ulimits - linux

how to determine ulimits (linux)?
Im using ubuntu 16.04,
kernel version 4.4.0-21-generic
I set the nofile to maximum for root (in /etc/security/limits.conf)
the line is: * hard nofile NUMBER
according to file /proc/sys/fs/file-max
the value is : 32854728
when Im running the command ulimit -a
i found that the limitation is 1024.
i tested it , and i found that the highest value of max open file is 1048575.
If I set it to higher value the limit is 1024.
how to determine ulimit of openfiles? why I can't set it to higher limit than 1048575?
To determine the maximum number of file handles for the entire system, run:
cat /proc/sys/fs/file-max
To determine the current usage of file handles, run:
$ cat /proc/sys/fs/file-nr
1154 133 8192
|   |  |
|   |   |
|        | maximum open file descriptors
   | total free allocated file descriptors
total allocated file descriptors
(the number of file descriptors allocated since boot)

Resources