while loop in pipeline having side effects on Solaris /bin/sh but not Linux

while loop in pipeline having side effects on Solaris /bin/sh but not Linux - linux

I am running the same script under Linux and under solaris.
Here is the script:
#!/bin/sh
index=0
ls /tmp | grep e |
while read fileWithE
do
echo $fileWithE
index=`expr $index + 1`
done
echo "index is $index"
Since the while loop runs in a subshell, I was expecting 'index is 0' as an output in solaris and in linux.
But in solaris the $index is the actual number of files containing 'e' under /tmp.
So while loops don't run in a subshell under solaris? I was expecting the same results in both OS..?

POSIX doesn't require that no component of a pipeline be run by the outer shell; this is an implementation decision left to the individual shell's author, and thus a shell may have any component or no component of a pipeline invoked by the parent shell (and thus able to have side effects that persist beyond the life of the pipeline) and still be compliant with POSIX sh.
Shells which are known to use the parent shell to execute the last component of a pipeline include:
ksh88
ksh93
zsh
bash 4.2 with the lastpipe option enabled, when job control is disabled.
If you want to be certain that shell commands run in a pipeline can have no side effects across all POSIX-compliant shells, it's wise to put the entire pipeline in a explicit subshell.
One way you can experimentally validate that this difference in behavior is related to position within the pipeline would be to modify your test only slightly by adding an additional pipeline element.
#!/bin/sh
index=0
ls /tmp \
| grep e \
| while read fileWithE; do echo "$fileWithE"; index=`expr $index + 1`; done \
| cat >/dev/null
echo $index
...you'll see that the | cat changes the behavior, such as the changes to index made by the while loop are no longer visible in the calling shell even on commonly available shells where this would otherwise be the case.

Related

When piping in BASH, is it possible to get the PID of the left command from within the right command?

The Problem
Given a BASH pipeline:
./a.sh | ./b.sh
The PID of ./a.sh being 10.
Is there way to find the PID of ./a.sh from within ./b.sh?
I.e. if there is, and if ./b.sh looks something like the below:
#!/bin/bash
...
echo $LEFT_PID
cat
Then the output of ./a.sh | ./b.sh would be:
10
... Followed by whatever else ./a.sh printed to stdout.
Background
I'm working on this bash script, named cachepoint, that I can place in a pipeline to speed things up.
E.g. cat big_data | sed 's/a/b/g' | uniq -c | cachepoint | sort -n
This is a purposefully simple example.
The pipeline may run slowly at first, but on subsequent runs, it will be quicker, as cachepoint starts doing the work.
The way I picture cachepoint working is that it would use the first few hundred lines of input, along with a list of commands before it, in order to form a hash ID for the previously cached data, thus breaking the stdin pipeline early on subsequent runs, resorting instead to printing the cached data. Cached data would get deleted every hour or so.
I.e. everything left of | cachepoint would continue running, perhaps to 1,000,000 lines, in normal circumstances, but on subsequent executions of cachepoint pipelines, everything left of | cachepoint would exit after maybe 100 lines, and cachepoint would simply print the millions of lines it has cached. For the hash of the pipe sources and pipe content, I need a way for cachepoint to read the PIDs of what came before it in the pipeline.
I use pipelines a lot for exploring data sets, and I often find myself piping to temporary files in order to bypass repeating the same costly pipeline more than once. This is messy, so I want cachepoint.

This Shellcheck-clean code should work for your b.sh program on any Linux system:
#! /bin/bash
shopt -s extglob
shopt -s nullglob
left_pid=
# Get the identifier for the pipe connected to the standard input of this
# process (e.g. 'pipe:[10294010]')
input_pipe_id=$(readlink "/proc/self/fd/0")
if [[ $input_pipe_id != pipe:* ]]; then
echo 'ERROR: standard input is not a pipe' >&2
exit 1
fi
# Find the process that has standard output connected to the same pipe
for stdout_path in /proc/+([[:digit:]])/fd/1; do
output_pipe_id=$(readlink -- "$stdout_path")
if [[ $output_pipe_id == "$input_pipe_id" ]]; then
procpid=${stdout_path%/fd/*}
left_pid=${procpid#/proc/}
break
fi
done
if [[ -z $left_pid ]]; then
echo "ERROR: Failed to set 'left_pid'" >&2
exit 1
fi
echo "$left_pid"
cat
It depends on the fact that, on Linux, for a process with id PID the path /proc/PID/fd/0 looks like a symlink to the device connected to the standard input of the process and /proc/PID/fd/1 looks like a symlink to the device connected to the standard output of the process.

Why does "pgrep -f bash" emit two numbers instead of one?

When I run this script in shell:
printf "Current bash PID is `pgrep -f bash`\n"
using this command:
$ bash script.sh
I get back this output:
Current bash PID is 5430
24390
Every time I run it, I get a different number:
Current bash PID is 5430
24415
Where is the second line coming from?

When you use backticks (or the more modern $(...) syntax for command substitution), you create a subshell. That's a fork()ed-off, independent copy of the shell process which has its own PID, so pgrep finds two separate copies of the shell. (Moreover, pgrep can be finding copies of bash running on the system completely unrelated to the script at hand).
If you want to find the PID of the current copy of bash, you can just look it up directly (printf is better practice than echo when contents can contain backslashes or if the behavior of echo -n or the nonstandard bash extension echo -e is needed, but neither of those things is the case here, so echo is fine):
echo "Current bash PID is $$"
Note that even when executed in a subshell, $$ expands to the PID of the parent shell. With bash 4.0 or newer, you can use $BASHPID to look up the current PID even in a subshell.
See the related question Bash - Two processes for one script

How to make echo compatible with read in bash?

I tried this:
qs#BF:~$ echo aaa | read c
qs#BF:~$ echo $c
It gives nothing, which means $c is an empty macro.
But why the following one works:
qs#BF:~$ cat trim.hs | read cc
qs#BF:~$ echo $cc
import qualified Data.Text as T
It correctly gives the first line of trim.hs
There seams to be an exception when echo piped with read.
Am I right? Could you help me to make echo compatible with read? Please.

Neither of these "work"
echo aaa | read c
cat trim.hs | read cc
In bash, commands in a pipeline are all executed in subshells. So, the read command sets the variable c in a subshell, but then that subshell exits and its environment disappears
To demonstrate, let's query the value of $c in the subshell using a grouping construct:
unset c
echo 123 | { read c; echo in subshell: ">$c<"; }
echo in parent: ">$c<"
outputs
in subshell: >123<
in parent: ><
bash does have a setting to allow the last command in a pipeline to run in the current shell:
set +m # job control must be disabled
shopt -s lastpipe # enable the option
unset d
echo 456 | read d
echo ">$d<"
>456<

I think the underlying problem here is subshells that read is run in. These won't (always) propagate values to your invocation.
From the POSIX read standard it outlines how using read within subshells will not be visible to the caller:
If it is called in a subshell or separate utility execution environment, such as one of the following:
(read foo)
nohup read ...
find . -exec read ... \;
it shall not affect the shell variables in the caller's environment.
And noting in these shell tips:
POSIX allows any or all commands in a pipeline to be run in subshells, and which command (if any) runs in the main shell varies greatly between implementations — in particular Bash and ksh differ here. The standard idiom for overcoming this problem is to use a here document:
IFS= read var << EOF
$(foo)
EOF

Why does bash behave differently, when it is called as sh?

I have an ubuntu machine with default shell set to bash and both ways to the binary in $PATH:
$ which bash
/bin/bash
$ which sh
/bin/sh
$ ll /bin/sh
lrwxrwxrwx 1 root root 4 Mar 6 2013 /bin/sh -> bash*
But when I try to call a script that uses the inline file descriptor (that only bash can handle, but not sh) both calls behave differently:
$ . ./inline-pipe
reached
$ bash ./inline-pipe
reached
$ sh ./inline-pipe
./inline-pipe: line 6: syntax error near unexpected token `<'
./inline-pipe: line 6: `done < <(echo "reached")'
The example-script I am referring to looks like that
#!/bin/sh
while read line; do
if [[ "$line" == "reached" ]]; then echo "reached"; fi
done < <(echo "reached")
the real one is a little bit longer:
#!/bin/sh
declare -A elements
while read line
do
for ele in $(echo $line | grep -o "[a-z]*:[^ ]*")
do
id=$(echo $ele | cut -d ":" -f 1)
elements["$id"]=$(echo $ele | cut -d ":" -f 2)
done
done < <(adb devices -l)
echo ${elements[*]}

When bash is invoked as sh, it (mostly) restricts itself to features found in the POSIX standard. Process substitution is not one of those features, hence the error.

Theoretically, it is a feature of bash: if you call as "sh", it by default switches off all of its features. And the root shell is by default "/bin/sh".
Its primary goal is the security. Secondary is the produce some level of compatibility between some shells of the system, because it enables the system scripts to run in alternate (faster? more secure?) environment.
This is the theory.
Practically goes this so, that there are always people in a development team, who want to reduce and eliminate everything with various arguments (security, simplicity, safety, stability - but these arguments are going somehow always to the direction of the removal, deletion, destroying).
This is because the bash in debian doesn't have network sockets, this is because debian wasn't able in 20 years to normally integrate the best compressors (bz2, xz) - and this is because the root shell is by default so primitive, as of the PDP11 of the eighties.

I believe sh on ubuntu is actually dash which is smaller than bash with fewer features.

How do I know if I'm running a nested shell?

When using a *nix shell (usually bash), I often spawn a sub-shell with which I can take care of a small task (usually in another directory), then exit out of to resume the session of the parent shell.
Once in a while, I'll lose track of whether I'm running a nested shell, or in my top-level shell, and I'll accidentally spawn an additional sub-shell or exit out of the top-level shell by mistake.
Is there a simple way to determine whether I'm running in a nested shell? Or am I going about my problem (by spawning sub-shells) in a completely wrong way?

The $SHLVL variable tracks your shell nesting level:
$ echo $SHLVL
1
$ bash
$ echo $SHLVL
2
$ exit
$ echo $SHLVL
1
As an alternative to spawning sub-shells you could push and pop directories from the stack and stay in the same shell:
[root#localhost /old/dir]# pushd /new/dir
/new/dir /old/dir
[root#localhost /new/dir]# popd
/old/dir
[root#localhost /old/dir]#

Here is a simplified version of part of my prompt:
PS1='$(((SHLVL>1))&&echo $SHLVL)\$ '
If I'm not in a nested shell, it doesn't add anything extra, but it shows the depth if I'm in any level of nesting.

Look at $0: if it starts with a minus -, you're in the login shell.

pstree -s $$ is quite useful to see your depth.

The environment variable $SHLVL contains the shell "depth".
echo $SHLVL
The shell depth can also be determined using pstree (version 23 and above):
pstree -s $$ | grep sh- -o | wc -l
I've found the second way to be more robust than the first whose value was reset when using sudo or became unreliable with env -i.
None of them can correctly deal with su.
The information can be made available in your prompt:
PS1='\u#\h/${SHLVL} \w \$ '
PS1='\u#\h/$(pstree -s $$ | grep sh- -o | tail +2 | wc -l) \w \$ '
The | tail +2 is there to remove one line from the grep output. Since we are using a pipeline inside a "$(...)" command substitution, the shell needs to invoke a sub-shell, so pstree report it and grep detects one more sh- level.
In debian-based distributions, pstree is part of the package psmisc. It might not be installed by default on non-desktop distributions.

As #John Kugelman says, echo $SHLVL will tell you the bash shell depth.
And as #Dennis Williamson shows, you can edit your prompt via the PS1 variable to get it to print this value.
I prefer that it always prints the shell depth value, so here's what I've done: edit your "~/.bashrc" file:
gedit ~/.bashrc
and add the following line to the end:
export PS1='\$SHLVL'":$SHLVL\n$PS1"
Now you will always see a printout of your current bash level just above your prompt. Ex: here you can see I am at a bash level (depth) of 2, as indicated by the $SHLVL:2:
$SHLVL:2
7510-gabriels ~ $
Now, watch the prompt as I go down into some bash levels via the bash command, then come back up via exit. Here you see my commands and prompt (response), starting at level 2 and going down to 5, then coming back up to level 2:
$SHLVL:2
7510-gabriels ~ $ bash
$SHLVL:3
7510-gabriels ~ $ bash
$SHLVL:4
7510-gabriels ~ $ bash
$SHLVL:5
7510-gabriels ~ $ exit
exit
$SHLVL:4
7510-gabriels ~ $ exit
exit
$SHLVL:3
7510-gabriels ~ $ exit
exit
$SHLVL:2
7510-gabriels ~ $
Bonus: always show in your terminal your current git branch you are on too!
Make your prompt also show you your git branch you are working on by using the following in your "~/.bashrc" file instead:
git_show_branch() {
__gsb_BRANCH=$(git symbolic-ref -q --short HEAD 2>/dev/null)
if [ -n "$__gsb_BRANCH" ]; then
echo "$__gsb_BRANCH"
fi
}
export PS1="\e[7m\$(git_show_branch)\e[m\n\h \w $ "
export PS1='\$SHLVL'":$SHLVL $PS1"
Source: I have no idea where git_show_branch() originally comes from, but I got it from Jason McMullan on 5 Apr. 2018. I then added the $SHLVL part shown above just last week.
Sample output:
$SHLVL:2 master
7510-gabriels ~/GS/dev/temp $
And here's a screenshot showing it in all its glory. Notice the git branch name, master, highlighted in white!
Update to the Bonus section
I've improved it again and put my ~/.bashrc file on github here. Here's a sample output of the new terminal prompt. Notice how it shows the shell level as 1, and it shows the branch name of the currently-checked-out branch (master in this case) whenever I'm inside a local git repo!:
Cross-referenced:
Output of git branch in tree like fashion

ptree $$ will also show you how many levels deep you are

If you running inside sub-shell following code will yield 2:
ps | fgrep bash | wc -l
Otherwise, it will yield 1.
EDIT Ok, it's not so robust approach as was pointed out in comments :)
Another thing to try is
ps -ef | awk '{print $2, " ", $8;}' | fgrep $PPID
will yield 'bash' if you in sub-shell.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string