Diffrence between $(git ls-files -s | wc -l ) and $(git ls-files -s >out && wc -l <out) - linux

Are the two commands $(git ls-files -s | wc -l) and $(git ls-files -s >out && wc -l <out) different or same,
as when first is written in the second form, i end up getting errors.

When you pipe the output of one program into the input of another, as in:
$(git ls-files -s | wc -l)
...the programs run concurrently. wc will start counting lines as soon as it receives them. The pipe also directs the output of git to the input of wc without any intermediate file.
Note that in this case, wc will run even if the git command fails for some reason, so you'll get the wc output (in most cases, 0).
In your second example:
$(git ls-files -s >out && wc -l <out)
...the git command runs first, and stores its results in a file called out. Then, if that was successful, wc runs and counts the lines. Because of &&, if the git command fails, wc won't run at all. In either case, you'll have a file named out laying around with the results of the git command in it.
Piping is generally better; it'll run faster and if you don't need to keep the intermediate results, it won't have any side effects.

Related

Git: speed up this command for searching Git blame for todos

I'm using this command:
git ls-tree -r --name-only HEAD -- . | grep -E '\.(ts|tsx|js|jsx|css|scss|html)$' | xargs -n1 git blame -c -e | sed -E 's/^.+\.com>\s+//' | LC_ALL=C grep -F 'todo: ' | sort
This gets all the todos in my codebase, sorted by date. This is mostly from Use git to list TODOs in code sorted by date introduced, I'm not very good with the command line.
However, the grep 'todo: ' part takes a long time. It takes 1min for ~400 files, without any particularly large files. Is it possible to speed this up somehow?
Edit: I realized it's the git blame that's slow, not grep, so I did a search before running git blame:
git ls-tree -r --name-only HEAD -- . | grep -E '\\.(ts|tsx|js|jsx|css|scss|html)$' | LC_ALL=C xargs grep -F -l 'todo: ' | xargs -n1 git blame -c -e | sed -E 's/^.+\\.com>\\s+//' | LC_ALL=C grep -F 'todo: ' | sort
Now it's 6s.

ssh tail with nested ls and head cannot access

am trying to execute the following command:
$ ssh root#10.10.10.50 "tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )"
ls: cannot access /var/log/alert_ARCDB.log: No such file or directory
tail: cannot follow `-' by name
notice the error returned, when i login to ssh separately and then execute
tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )"
see the below:
# ls -t /var/log/alert_ARCDB.log | head -n1
/var/log/alert_ARCDB.log
why is that happening and how to fix it. am trying to do this in one line as i don't want to create a script file.
Thanks a lot
Shell parameter expansion happens before command execution.
Here's a simple example. If I type...
ls "$HOME"
...the shell replaces $HOME with the path to my home directory first, then runs something like ls /home/larsks. The ls command has no idea that the command line originally had $HOME.
If we look at your command...
$ ssh root#10.10.10.50 "tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )"
...we see that you're in exactly the same situation. The $(ls -t ...) expression is expanded before ssh is executed. In other words, that command is running your local system.
You can inhibit the shell expansion on your local system by using single quotes. For example, running:
echo '$HOME'
Will produce:
$HOME
So you can run:
ssh root#10.10.10.50 'tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )'
But there's another problem here. If /var/log/alert_ARCDB.log is a file, your command makes no sense: calling ls -t on a single file gets you nothing.
If alert-ARCDB.log is a directory, you have a different problem. The result of ls /some/directory is a list of filenames without any directory prefix. If I run something like:
ls -t /tmp
I will get output like
file1
file2
If I do this:
tail $(ls -t /tmp | head -1)
I end up with a command that looks like:
tail file1
And that will fail, because there is no file1 in my current directory.
One approach would be to pipe the commands you want to perform to ssh. One simple way to achieve that is to first create a function that will echo the commands you want executed :
remote_commands()
{
echo 'cd /var/log/alert_ARCDB.log'
echo 'tail -F -n 1 "$(ls -t | head -n1 )"'
}
The cd will allow you to use the relative path listed by ls. The single quotes make sure that everything will be sent as-is to the remote shell, with no local expansion occurring.
Then you can do
ssh root#10.10.10.50 bash < <(remote_commands)
This assumes alert_ARCDB.log is a directory (or else I am not sure why you would want to add head -n1 after that).

Bash grep command finding the same file 5 times

I'm building a little bash script to run another bash script that's found in multiple directories. Here's the code:
cd /home/mainuser/CaseStudies/
grep -R -o --include="Auto.sh" [\w] | wc -l
When I execute just that part, it finds the same file 5 times in each folder. So instead of getting 49 results, I get 245. I've written a recursive bash script before and I used it as a template for this problem:
grep -R -o --include=*.class [\w] | wc -l
This code has always worked perfectly, without any duplication. I've tried running the first code with and without the " ", I've tried -r as well. I've read through the bash documentation and I can't seem to find a way to prevent, or even why I'm getting, this duplication. Any thoughts on how to get around this?
As a separate, but related question, if I could launch Auto.sh inside of each directory so that the output of Auto.sh was dumped into that directory; without having to place Auto.sh in each folder. That would probably be much more efficient that what I'm currently doing and it would also probably fix my current duplication problem.
This is the code for Auto.sh:
#!/bin/bash
index=1
cd /home/mainuser/CaseStudies/
grep -R -o --include=*.class [\w] | wc -l
grep -R -o --include=*.class [\w] |awk '{print $3}' > out.txt
while read LINE; do
echo 'Path '$LINE > 'Outputs/ClassOut'$index'.txt'
javap -c $LINE >> 'Outputs/ClassOut'$index'.txt'
index=$((index+1))
done <out.txt
Preferably I would like to make it dump only the javap outputs for the application its currently looking at. Since those .class files could be in any number of sub-directories, I'm not sure how to make them all dump in the top folder, without executing a modified Auto.sh in the top directory of each application.
Ok, so to fix the multiple find:
grep -R -o --include="Auto.sh" [\w] | wc -l
Should be:
grep -R -l --include=Auto.sh '\w' | wc -l
The reason this was happening, was that it was looking for instances of the letter w in Auto.sh. Which occurred 5 times in the file.
However, the overall fix that doesn't require having to place Auto.sh in every directory, is something like this:
MAIN_DIR=/home/mainuser/CaseStudies/
cd $MAIN_DIR
ls -d */ > DirectoryList.txt
while read LINE; do
cd $LINE
mkdir ProjectOutputs
bash /home/mainuser/Auto.sh
cd $MAIN_DIR
done <DirectoryList.txt
That calls this Auto.sh code:
index=1
grep -R -o --include=*.class '\w' | wc -l
grep -R -o --include=*.class '\w' | awk '{print $3}' > ProjectOutputs.txt
while read LINE; do
echo 'Path '$LINE > 'ProjectOutputs/ClassOut'$index'.txt'
javap -c $LINE >> 'ProjectOutputs/ClassOut'$index'.txt'
index=$((index+1))
done <ProjectOutputs.txt
Thanks again for everyone's help!

Processing file with xargs for concurrency

There is an input like:
folder1
folder2
folder3
...
foldern
I would like to iterate over taking multiple lines at once and processes each line, remove the first / (and more but for now this is enough) and echo the. Iterating over in bash with a single thread can be slow sometimes. The alternative way of doing this would be splitting up the input file to N pieces and run the same script with different input and output N times, at the end you can merge the results.
I was wondering if this is possible with xargs.
Update 1:
Input:
/a/b/c
/d/f/e
/h/i/j
Output:
mkdir a/b/c
mkdir d/f/e
mkdir h/i/j
Script:
for i in $(<test); do
echo mkdir $(echo $i | sed 's/\///') ;
done
Doing it with xargs does not work as I would expect:
xargs -a test -I line --max-procs=2 echo mkdir $(echo $line | sed 's/\///')
Obviously I need a way to execute the sed on the input for each line, but using $() does not work.
You probably want:
--max-procs=max-procs, -P max-procs
Run up to max-procs processes at a time; the default is 1. If
max-procs is 0, xargs will run as many processes as possible at
a time. Use the -n option with -P; otherwise chances are that
only one exec will be done.
http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
With GNU Parallel you can do:
cat file | perl -pe s:/:: | parallel mkdir -p
or:
cat file | parallel mkdir -p {= s:/:: =}

How to reference the output of the previous command twice in Linux command?

For instance, if I'd like to reference the output of the previous command once, I can use the command below:
ls *.txt | xargs -I % ls -l %
But how to reference the output twice? Like how can I implement something like:
ls *.txt | xargs -I % 'some command' % > %
PS: I know how to do it in shell script, but I just want a simpler way to do it.
You can pass this argument to bash -c:
ls *.txt | xargs -I % bash -c 'ls -l "$1" > "out.$1"' - %
You can lookup up 'tpipe' on SO; it will also lead you to 'pee' (which is not a good search term elsewhere on the internet). Basically, they're variants of the tee command which write to multiple processes instead of writing to files like the tee command does.
However, with Bash, you can use Process Substitution:
ls *.txt | tee >(cmd1) >(cmd2)
This will write the input to tee to each of the commands cmd1 and cmd2.
You can arrange to lose standard output in at least two different ways:
ls *.txt | tee >(cmd1) >(cmd2) >/dev/null
ls *.txt | tee >(cmd1) | cmd2

Resources