different shell behaviour: bash omits newline, zsh keeps it - linux

I have a script which searches source files that contain a "TODO"
note inside the comments. Furthermore I use a concatenation of grep, git blame, uniq and sort to get
the list ordered by the person who wrote the TODO comment.
The following works fine in bash and zsh:
#!/bin/bash
for FILE in $(grep -r -i "todo" apps/business | awk '{print $1}' | sed 's/://' | sed 's/\#//')
do
git blame $FILE | grep -i "todo"
done | sort -k2 | uniq
Now I want to count all the entries. Instead of calling the (time expensive)
grep/git blame again, I want to save everything into $MATCHES to count it
without evaluating it again.
MATCHES=$(for FILE in $(grep -r -i "todo" apps/business | awk '{print $1}' | sed 's/://' | sed 's/\#//')
do
git blame $FILE | grep -i "todo"
done | sort -k2 | uniq)
echo $MATCHES
That's where I experience different behaviour in bash/zsh:
zsh: Returns the same as the first script (as expected)
bash: Ignores the newlines of git blame, puts everything on one line. wc -l counts 1 line.
What am I missing here? Why is bash behaving differently here?
And how do I get bash to not-ignore the newline?

zsh doesn't perform word-splitting on the unquoted parameter expansion $MATCH by default. Use echo "$MATCHES" | wc -l, and bash should work as well.
Note this is the wrong way to iterate over the output of a command; use a while loop and the read command instead.
grep -ri "todo" apps/business | awk '{print $1}' | sed -e 's/://' -e 's/\#//' |
while IFS= read -r FILE; do
git blame "$FILE" | grep -i todo
done | sort -k2 | uniq

Related

sed works differently for a tail -f vs tail?

I have a log file that is line separated by \n with multi-line logs (ones containing SQL) being separated by \r\n. In order to pull the SQL statements out of the file I need to convert the \r\n to \s. Not knowing sed I googled and found a good solution that works very well but it fails when I switch to tail -f.
eg. these work:
tail -1000 /mylogfile.log | sed -e ':a;N;$!ba;s/\r\n/ /g' | grep "Executing command"
cat /mylogfile.log | sed -e ':a;N;$!ba;s/\r\n/ /g' | grep "Executing command"
but this returns no data at all
tail -f /mylogfile.log | sed -e ':a;N;$!ba;s/\r\n/ /g' | grep "Executing command"
EDIT: For the person who added a "This question already has an answer here:", no that does not answer the question at all. First, that other question wasn't even resolved for the person who asked it. Second, it talks only about grep, the problem is with sed. I can have 10 greps and it still works fine if I switch from sed to perl. eg
tail -f /mylog.log | perl -ne 's/\r\n/ /g; print;' | grep "Executing command" | grep -vi ANALYZE | grep -vi DESCRIBE | grep -vi "SHOW PARTITION"

Using STDIN from pipe in sed command to replace value in a file

I've got a command to perform a series of commands that produce a variable output string such as 123456. I want to pipe that to a sed command replacing a known string in a csv file that looks like this:
Fred,Wilma,Betty,Barney
However, the command below does not work and I haven't found any other references to using pipe values as the variable for a replace.
How does this code change if the values in the csv are in a random order and I always want to change the second value?
Example code:
find / -iname awk 2>/dev/null | sha256sum | cut -c1-10 > test.txt |
sed -i -e '/Wilma/ r test.txt' -e 's/Wilma//' input.csv
Contents of input.csv should become: Fred,0d522cd316,Betty,Barney
Okay, in
find / -iname awk 2>/dev/null | sha256sum | cut -c1-10 > test.txt | sed -i -e '/Wilma/ r test.txt' -e 's/Wilma//' input.csv
you have a bug. That "> test.txt" after cut is going to eat your stdin on sed, so things go weird with that pipe afterwards taking stdin. You don't want a pipe there, or you don't want to redirect to a file.
The way to take piped stdin and use it as a parameter in a command is through xargs.
find / -iname awk 2>/dev/null | sha256sum | cut -c1-10 | xargs --replace=INSERTED -- sed -i -e 's/Wilma/INSERTED/' input.csv
(...though that find|shasum is suspect too, in that the order of files is random(ish) and it matters for a reliable sum. You prpobably mean to "|sort" after find.)
(Some would sed -i -e "s/Wilma/$(find|sort|shasum|cut)" f, but I ain't among them. Animals.)
For replacing a fixed string like "Wilma", try:
sed -i 's/Wilma/'"$(find / -iname awk 2>/dev/null |
sha256sum | cut -c1-10)"'/' input.csv
To replace the 2nd field no matter what's in it, try:
sed -i 's/[^,]*/'"$(find / -iname awk 2>/dev/null |
sha256sum | cut -c1-10)"'/2' input.csv

Output of wc -l without file-extension

I've got the following line:
wc -l ./*.txt | sort -rn
i want to cut the file extension. So with this code i've got the output:
number filename.txt
for all my .txt-files in the .-directory. But I want the output without the file-extension, like this:
number filename
I tried a pipe with cut for different kinds of parameter, but all i got was to cut the whole filename with this command.
wc -l ./*.txt | sort -rn | cut -f 1 -d '.'
Assuming you don't have newlines in your filename you can use sed to strip out ending .txt:
wc -l ./*.txt | sort -rn | sed 's/\.txt$//'
unfortunately, cut doesn't have a syntax for extracting columns according to an index from the end. One (somewhat clunky) trick is to use rev to reverse the line, apply cut to it and then rev it back:
wc -l ./*.txt | sort -rn | rev | cut -d'.' -f2- | rev
Using sed in more generic way to cut off whatever extension the files have:
$ wc -l *.txt | sort -rn | sed 's/\.[^\.]*$//'
14 total
8 woc
3 456_base
3 123_base
0 empty_base
A better approach using proper mime type (what is the extension of tar.gz or such multi extensions ? )
#!/bin/bash
for file; do
case $(file -b $file) in
*ASCII*) echo "this is ascii" ;;
*PDF*) echo "this is pdf" ;;
*) echo "other cases" ;;
esac
done
This is a POC, not tested, feel free to adapt/improve/modify

put output in the next pipe

I want to move the output of the command:
ls -1 /${TMP_DIR}/*0000000221*.dbf | xargs | sed 's/ /,/g'
In the end of a command that come after it, like that:
ls -1 /${TMP_DIR}/*0000000221*.dbf | xargs | sed 's/ /,/g' | impdp sim/sim files=$1
For example:
execute ls -1 /${TMP_DIR}/*0000000221*.dbf | xargs | sed 's/ /,/g' will give me:
/tmp/a_0000000221.dbf,/tmp/a_00000002212.dbf,/tmp/b_0000000221.dbf
So I want the final command will look like:
impdp sim/sim files=/tmp/a_0000000221.dbf,/tmp/a_00000002212.dbf,/tmp/b_0000000221.dbf
EDIT:
Sorry I didnt write this from the beginning - I've variable in the command ${TMP_DIR}
You probably don't need that many pipes. You can use it like this:
printf "impdp sim/sim files=" && printf "%s," /tmp/*0000000221*.dbf
impdp sim/sim files=/tmp/a_0000000221.dbf,/tmp/a_00000002212.dbf,/tmp/b_0000000221.dbf,
ls is a bit redundant if you just want to get the file names.
You can get the shell to glob those and then use printf to put them one per line.
To separate those items with ',' rather than '\n', you can use paste
Finally, putting all that within $() will execute that in a subshell,
and output the result for the command in the current shell.
impdp sim/sim files=$(printf '%s\n' /${TMP_DIR}/*0000000221*.dbf | paste -d, -s)
You can try other order of commands:
impdp sim/sim files=$(ls -1 /tmp/*0000000221*.dbf | xargs | sed 's/ /,/g')
You can use globbing, an array and IFS to construct the parameter string:
$ ls -1
1.txt
2.txt
3.txt
$ echo impdp sim/sim files="$(a=(*.txt);IFS=',';echo "${a[*]}")"
impdp sim/sim files=1.txt,2.txt,3.txt
Obviously this will break on filenames with spaces or newlines.
To run, just remove the echo.
(all solutions including mine assumes your filenames do not contain spaces)
sed is a little overkill, you can use tr and avoid xargs too:
impdp sim/sim files=$(ls /tmp/*0000000221*.dbf | tr "\n" ",")

What is this Bash (and/or other shell?) construct called?

What is the construct in bash called where you can take wrap a command that outputs to stdout, such that the output itself is treated like a stream? In case I'm not describing that so well, maybe an example will do best, and this is what I typically use it for: applying diff to output that does not come from a file, but from other commands, where
cmd
is wrapped as
<(cmd)
By wrapping a command in such a manner, in the example below I determine that there a difference of one between the two commands that I am running, and then I am able to determine that one precise difference. What is the construct/technique of wrapping a command as <(cmd) called? Thanks
[builder#george v6.5 html]$ git status | egrep modified | awk '{print $3}' | wc -l
51
[builder#george v6.5 html]$ git status | egrep modified | awk '{print $3}' | xargs grep -l 'Ext\.define' | wc -l
50
[builder#george v6.5 html]$ diff <(git status | egrep modified | awk '{print $3}') <(git status | egrep modified | awk '{print $3}' | xargs grep -l 'Ext\.define')
39d38
< javascript/reports/report_initiator.js
ADDENDUM
The revised command using the advice for using git's ls-file should be as follows (untested):
diff <(git ls-files -m) <(git ls-files -m | xargs grep -l 'Ext\.define')
It is called process substitution.
This is called Process Substitution
This is process substitution, as you have been told. I'd just like to point out that this also works in the other direction. Process substitution with >(cmd) allows you to take a command that writes to a file and instead have that output redirected to another command's stdin. It's very useful for inserting something into a pipeline that takes an output filename as an argument. You don't see it as much because pretty much every standard command will write to stdout already, but I have used it often with custom stuff. Here is a contrived example:
$ echo "hello world" | tee >(wc)
hello world
1 2 12

Resources