How fast is your GREP in Cygwin?

How fast is your GREP in Cygwin? - linux

I'm using Cygwin in a very fast PC but I find it is ridiculously slow when I want to use grep. It also slow when I want to process a large file (say 25Mb). Here I'm using an example to prove my case.
> time for i in $(seq 1000); do grep "$i" .; done
real 75.865 user 5.442 sys 14.542 pcpu 26.34
I want to know
Show me your score. Have you had similar problem with slowness of cygwin or GNU grep
How can you improve the performance
What your tips of using Cygwin
uname -rvs
CYGWIN_NT-6.1-WOW64 1.7.9(0.237/5/3) 2011-03-29 10:10
which grep
grep is /usr/bin/grep
grep is /bin/grep
grep is /usr/bin/grep`

$ time for i in $(seq 1000); do grep "$i" .; done
real 0m13.741s
user 0m3.520s
sys 0m8.577s
$ uname -rvs
CYGWIN_NT-6.1-WOW64 1.7.15(0.260/5/3) 2012-05-09 10:25
$ which grep
/usr/bin/grep

Related

How to find all shared libraries actually used during execution in Linux?

I have an executable and I would like to find out which shared libraries were actually used during a specific run. I know ldd would list all the shared library dependencies of that executable but I would like to find out the subset of those that were actually used during a specific run*. Is this possible?
*what I mean with specific run is running the executable with certain input parameters that would cause only a small part of the code to be run.

You can use ltrace(1) for this:
$ PROG='ls -l'
# Collect call info
$ ltrace -o calls.txt -l '*' $PROG &> /dev/null
# Analyze collected data
$ cat calls.txt | sed -ne '/->/{ s/^\(.*\)->.*/\1/; p }' | sort -u
libacl.so.1
libcap.so.2
libc.so.6
libselinux.so.1
ls
# Compare with ldd
$ ldd /bin/ls | wc -l
10

You could use strace and grep for open .so files.
strace $MYPROG | grep -E '^open*\.so
lsof also should work to grep for open libraries.
lsof -p $PID | awk '{print $9}' | grep '\.so'
This assumes the shared libraries have .so extension

inotify + stdout piping - output being lost in a pipe

I have some one-liner generating events for inotify.
while true; do for i in $(seq 1 100); do touch /tmp/ino/foo$i; sleep 1s; done; rm /tmp/ino/foo*; done
I then set up a small bash pipeline to watch that folder, ignoring events about ISDIR (maybe I could do that with inotifywait, but that's not relevant):
inotifywait -m -e close /tmp/ino 2>/dev/null | grep -v ISDIR
And that works fine, I see lines like /tmp/ino/ CLOSE_WRITE,CLOSE foo57.
But if I add an extra pipe at the end, I don't get any output. To keep it simple, let's use the fact that grep pattern is idempotent.
inotifywait -m -e close /tmp/ino 2>/dev/null | grep -v ISDIR | grep -v ISDIR
This produces no output. I know the my generator is still running, and a pipeless inotifywait -m -e close /tmp/ino in another terminal is still producing output.
After a bit of thinking, I assumed it was probably a buffering problem (issues like this often seem to be). I changed my pipeline to
inotifywait -m -e close /tmp/ino 2>/dev/null | grep -v ISDIR --line-buffered | grep -v ISDIR
And now I'm getting output again, so that fixes the problem.
However, I don't really understand why it failed to work without forcing line buffering. I've never experienced issues like this with grep, even with 'slow producing' outputs.
However, some other programs in the pipeline, forcing sed to be sed -u , and forcing me to add fflush() in at the end of each awk statement.
So, what's forcing strange buffering here, and how can I fix it (without having to scrabble around in man pages looking for esoteric force line buffering commands)?

inotifywait is probably buffering. I would have suggested using stdbuf:
stdbuf -oL inotifywait -m -e close /tmp/ino 2>/dev/null | grep ...

How to use grep with large (millions) number of files to search for string and get result in few minutes

This question is related to
How to use grep efficiently?
I am trying to search for a "string" in a folder which has 8-10 million small (~2-3kb) plain text files. I need to know all the files which has "string".
At first I used this
grep "string"
That was super slow.
Then I tried
grep * "string" {} \; -print
Based on linked question, I used this
find . | xargs -0 -n1 -P8 grep -H "string"
I get this error:
xargs: argument line too long
Does anyone know a way to accomplish this task relatively quicker?
I run this search on a server machine which has more than 50GB of available RAM, and 14 cores of CPU. I wish somehow I could use all that processing power to run this search faster.

You should remove -0 argument to xargs and up -n parameter instead:
... | xargs -n16 ...

It's not that big stack of files (kudos to 10⁷ files - a messys dream) but I created 100k files (400 MB overall) with
for i in {1..100000}; do head -c 10 /dev/urandom > dummy_$i; done
and made some tests for pure curiosity (the keyword 10 I was searching is chosen randomly):
> time find . | xargs -n1 -P8 grep -H "10"
real 0m22.626s
user 0m0.572s
sys 0m5.800s
> time find . | xargs -n8 -P8 grep -H "10"
real 0m3.195s
user 0m0.180s
sys 0m0.748s
> time grep "10" *
real 0m0.879s
user 0m0.512s
sys 0m0.328s
> time awk '/10/' *
real 0m1.123s
user 0m0.760s
sys 0m0.348s
> time sed -n '/10/p' *
real 0m1.531s
user 0m0.896s
sys 0m0.616s
> time perl -ne 'print if /10/' *
real 0m1.428s
user 0m1.004s
sys 0m0.408s
Btw. there isn't a big difference in running time if I suppress the output with piping STDOUT to /dev/null. I am using Ubuntu 12.04 on a not so powerful laptop ;)
My CPU is Intel(R) Core(TM) i3-3110M CPU # 2.40GHz.
More curiosity:
> time find . | xargs -n1 -P8 grep -H "10" 1>/dev/null
real 0m22.590s
user 0m0.616s
sys 0m5.876s
> time find . | xargs -n4 -P8 grep -H "10" 1>/dev/null
real m5.604s
user 0m0.196s
sys 0m1.488s
> time find . | xargs -n8 -P8 grep -H "10" 1>/dev/null
real 0m2.939s
user 0m0.140s
sys 0m0.784s
> time find . | xargs -n16 -P8 grep -H "10" 1>/dev/null
real 0m1.574s
user 0m0.108s
sys 0m0.428s
> time find . | xargs -n32 -P8 grep -H "10" 1>/dev/null
real 0m0.907s
user 0m0.084s
sys 0m0.264s
> time find . | xargs -n1024 -P8 grep -H "10" 1>/dev/null
real 0m0.245s
user 0m0.136s
sys 0m0.404s
> time find . | xargs -n100000 -P8 grep -H "10" 1>/dev/null
real 0m0.224s
user 0m0.100s
sys 0m0.520s

8 million files is a lot in one directory! However, 8 million times 2kb is 16GB and you have 50GB of RAM. I am thinking of a RAMdisk...

If you've got that much RAM, why not read it all into memory and use a regular expression library to search? It's a simple C program:
#include <fcntl.h>
#include <regex.h>
...

get thread count on HP-UX

how can I get thread count on HP-UX
I am using
ps -eLf| grep java | wc -l and
ps -L -p $PID |wc -l
on liunx and solaris, but it seems can't use on HP-UX
I have tried ps uH p $PID on HP-UX, but it seems can't too.
Does any one have solution for this?
please help ^_^

'ps -ef |grep -i java |wc -l ' >> is the best workaround (this will count your grep command as well, so if it results in value 'x' then total tthread in execution is 'x-1' actually. )

Grep time command output

Using time ls, I have the following output:
$ time ls -l
total 2
-rwx------+ 1 FRIENDS None 97 Jun 23 08:59 location.txt
-rw-r--r--+ 1 FRIENDS None 10 Jun 23 09:06 welcome
real 0m0.040s
user 0m0.000s
sys 0m0.031s
Now, when I try to grep only the real value line, the actual result is:
$ time ls -l | grep real
real 0m0.040s
user 0m0.000s
sys 0m0.031s
My question is, how to get only the real value as output? In this case, 0m0.040s.

time writes its output to stderr, so you need to pipe stderr instead of stdout. But it's also important to remember that time is part of the syntax of bash, and it times an entire pipeline. Consequently, you need to wrap the pipeline in braces, or run it in a subshell:
$ { time ls -l >/dev/null; } 2>&1 | grep real
real 0m0.005s
With Bash v4.0 (probably universal on Linux distributions but still not standard on Mac OS X), you can use |& to pipe both stdout and stderr:
{ time ls -l >/dev/null; } |& grep real
Alternatively, you can use the time utility, which allows control of the output format. On my system, that utility is found in /usr/bin/time:
/usr/bin/time -f%e ls -l >/dev/null
man time for more details on the time utility.

(time ls -l) 2>&1 > /dev/null |grep real
This redirects stderr (which is where time sends its output) to the same stream as stdout, then redirects stdout to dev/null so the output of ls is not captured, then pipes what is now the output of time into the stdin of grep.

If you just want to specify the output format of time builtin, you can modify the value of TIMEFORMAT environment variable instead of filtering it with grep.
In you case,
TIMEFORMAT=%R
time ls -l
would give you the "real" time only.
Here's the link to relevant information in Bash manual (under "TIMEFORMAT").
This is a similar question on SO about parsing the output of time.

Look out.. bash has a built-in "time" command. Here are some of the differences..
# GNU time command (can also use $TIMEFORMAT variable instead of -f)
bash> /usr/bin/time -f%e ls >/dev/null
0.00
# BASH built-in time command (can also use $TIME variable instead of -f)
bash> time -f%e ls >/dev/null
-f%e: command not found
real 0m0.005s
user 0m0.004s
sys 0m0.000s

I think, it can be made a little easier:
time ls &> /dev/null | grep real

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How fast is your GREP in Cygwin? - linux

$ time for i in $(seq 1000); do grep "$i" .; done real 0m13.741s user 0m3.520s sys 0m8.577s $ uname -rvs CYGWIN_NT-6.1-WOW64 1.7.15(0.260/5/3) 2012-05-09 10:25 $ which grep /usr/bin/grep

Related

How to find all shared libraries actually used during execution in Linux?

inotify + stdout piping - output being lost in a pipe

How to use grep with large (millions) number of files to search for string and get result in few minutes

get thread count on HP-UX

Grep time command output

Categories

Resources