Related
I have the following script created by some self-claimed bash expert:
SCRIPT_LOCATION="$(readlink -f $0)"
SCRIPT_DIRECTORY="$(dirname ${SCRIPT_LOCATION})"
export PYTHONPATH="${PYTHONPATH}:${SCRIPT_DIRECTORY}/util"
That runs nicely on my local Ubuntu 16.04. Now I wanted to use it on our RH 7.2 servers; and there I got an error message from readlink; about being called with bad parameters.
Then I figured: on Ubuntu, $0 gives "bash"; whereas on RH, it gives "-bash".
EDIT: script is invoked as . ourscript.sh
Questions:
Any idea why that is?
When I change my script to use a hardcoded readlink -f bash the whole things works. Are there "better" ways for fixing this?
Feel free to also explain what readlink -f bash is actually doing ;-)
As the script is sourced the readlink -f $0 is pointless as it will just show you the command used to run the shell you are currently using.
To explain the difference in command lets look at the bash man page:
A login shell is one whose first character of argument zero is a -, or one started with the --login option.
When bash is invoked as an interactive login shell, or as a non-interactive shell with the --login option, it first reads and executes commands from the file /etc/profile, if that file exists. After reading that file, it looks for ~/.bash_profile, ~/.bash_login, and ~/.profile, in that order, and reads and executes commands from the first one that exists and is readable. The --noprofile option may be used when the shell is started to inhibit this behavior.
So guessing ubuntu starts with the noprofile option.
As for readlink, we can again look at the man page
-f, --canonicalize
canonicalize by following every symlink in every component of the given name recursively; all but the last component must exist
Therefore it follows symlinks to the base.
Using readlink -f with any non qualified path will result in it just appending the last arg to your current working directory which will not actually show where the script is run.
Try putting any random string instead of bash after it and will see the script is unaffected.
e.g
readlink -f dafsfdsf
Returns
/home/me/testscript/dafsfdsf
I created an alias in order not to write ls every time I move into a new directory:
alias cl='cd_(){ cd "$#" && ls; }; cd_'
Let us say I have a folder named "Downloads" (which of course I happen to have) so I just type the following in the terminal:
cl Downloads
Now I will find myself in the "Downloads" folder and receive a list of the stuff I have in the folder, like say: example.txt, hack.hs, picture.jpg,...
If I want to move to a directory and look if there is, say, hack.hs I could try something like this:
cl Downloads | grep hack
What I get is just the output:
hack.hs
But I will remain in the folder I was (which means I am not in Downloads).
I understand this happens because every command is executed in a subshell, and thus cd Downloads && ls is executed in a subshell of its own and then the output (namely the list of stuff I have) gets redirected via the pipe to grep. This is why I then am not in the new folder.
My question is the following:
How do I do it in order to be able to write something like "cl Downloads | grep hack" and get the "hack"-greped list of stuff AND be in the Downloads folder?
Thank you very much,
Pol
For anyone ever googling this:
A quick fix was proposed by #gniourf_gniourf :
cl Downloads > >(grep hack)
Some marked this question as a possible duplicate of Make bash alias that takes duplicates, but the fact that my bash alias already takes arguments shows that this is not the case. The problem at hand was about how to execute a command in the current shell while at the same time redirecting the output to another command.
As you're aware (and as is covered in BashFAQ #24), the reason
{ cd "$#" && ls; } | grep ...
...prevents the results of cd being visible in the outer shell is that no component of a pipeline is guaranteed by POSIX to be run in the outer shell. (Some shells, including ksh [out-of-the-box] and very modern bash with non-default options enabled, will occasionally or optionally run the last piece of a pipeline in the parent shell, but this can't portably be relied on).
A way to avoid this, that's applicable to all POSIX shells, is to direct output to a named pipe, thus avoiding setting up a pipeline:
mkfifo mypipe
grep ... <mypipe &
{ cd "$#" && ls; } >mypipe
In modern ksh and bash, there's a shorter syntax that will do this for you -- using /dev/fd entries instead of setting up a named pipe if the operating system provides that facility:
{ cd "$#" && ls; } > >(grep ...)
In this case, >(grep ...) is replaced with a filename that points to either a FIFO or a /dev/fd entry that, when written to by the process in question, redirects output to grep -- but without a pipeline.
By the way -- I really do hope your use of ls in this manner is as an example. The output of ls is not well-specified for the range of all possible filenames, so grepping it is innately unreliable. Consider using printf '%s\0' * to emit a NUL-delimited list of non-hidden names in a directory, if you really do want to build a streamed result; or using glob expressions to check for files matching a specific pattern (BashFAQ #4 covers a similar scenario); extglobs are available if you need something closer to full regex matching support than POSIX patterns support.
This question already has answers here:
Can I export a variable to the environment from a Bash script without sourcing it?
(13 answers)
Closed 3 years ago.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
I'm trying to write a shell script that, when run, will set some environment variables that will stay set in the caller's shell.
setenv FOO foo
in csh/tcsh, or
export FOO=foo
in sh/bash only set it during the script's execution.
I already know that
source myscript
will run the commands of the script rather than launching a new shell, and that can result in setting the "caller's" environment.
But here's the rub:
I want this script to be callable from either bash or csh. In other words, I want users of either shell to be able to run my script and have their shell's environment changed. So 'source' won't work for me, since a user running csh can't source a bash script, and a user running bash can't source a csh script.
Is there any reasonable solution that doesn't involve having to write and maintain TWO versions on the script?
Use the "dot space script" calling syntax. For example, here's how to do it using the full path to a script:
. /path/to/set_env_vars.sh
And here's how to do it if you're in the same directory as the script:
. set_env_vars.sh
These execute the script under the current shell instead of loading another one (which is what would happen if you did ./set_env_vars.sh). Because it runs in the same shell, the environmental variables you set will be available when it exits.
This is the same thing as calling source set_env_vars.sh, but it's shorter to type and might work in some places where source doesn't.
Your shell process has a copy of the parent's environment and no access to the parent process's environment whatsoever. When your shell process terminates any changes you've made to its environment are lost. Sourcing a script file is the most commonly used method for configuring a shell environment, you may just want to bite the bullet and maintain one for each of the two flavors of shell.
You're not going to be able to modify the caller's shell because it's in a different process context. When child processes inherit your shell's variables, they're
inheriting copies themselves.
One thing you can do is to write a script that emits the correct commands for tcsh
or sh based how it's invoked. If you're script is "setit" then do:
ln -s setit setit-sh
and
ln -s setit setit-csh
Now either directly or in an alias, you do this from sh
eval `setit-sh`
or this from csh
eval `setit-csh`
setit uses $0 to determine its output style.
This is reminescent of how people use to get the TERM environment variable set.
The advantage here is that setit is just written in whichever shell you like as in:
#!/bin/bash
arg0=$0
arg0=${arg0##*/}
for nv in \
NAME1=VALUE1 \
NAME2=VALUE2
do
if [ x$arg0 = xsetit-sh ]; then
echo 'export '$nv' ;'
elif [ x$arg0 = xsetit-csh ]; then
echo 'setenv '${nv%%=*}' '${nv##*=}' ;'
fi
done
with the symbolic links given above, and the eval of the backquoted expression, this has the desired result.
To simplify invocation for csh, tcsh, or similar shells:
alias dosetit 'eval `setit-csh`'
or for sh, bash, and the like:
alias dosetit='eval `setit-sh`'
One nice thing about this is that you only have to maintain the list in one place.
In theory you could even stick the list in a file and put cat nvpairfilename between "in" and "do".
This is pretty much how login shell terminal settings used to be done: a script would output statments to be executed in the login shell. An alias would generally be used to make invocation simple, as in "tset vt100". As mentioned in another answer, there is also similar functionality in the INN UseNet news server.
In my .bash_profile I have :
# No Proxy
function noproxy
{
/usr/local/sbin/noproxy #turn off proxy server
unset http_proxy HTTP_PROXY https_proxy HTTPs_PROXY
}
# Proxy
function setproxy
{
sh /usr/local/sbin/proxyon #turn on proxy server
http_proxy=http://127.0.0.1:8118/
HTTP_PROXY=$http_proxy
https_proxy=$http_proxy
HTTPS_PROXY=$https_proxy
export http_proxy https_proxy HTTP_PROXY HTTPS_PROXY
}
So when I want to disable the proxy,
the function(s) run in the login shell and sets the variables
as expected and wanted.
It's "kind of" possible through using gdb and setenv(3), although I have a hard time recommending actually doing this. (Additionally, i.e. the most recent ubuntu won't actually let you do this without telling the kernel to be more permissive about ptrace, and the same may go for other distros as well).
$ cat setfoo
#! /bin/bash
gdb /proc/${PPID}/exe ${PPID} <<END >/dev/null
call setenv("foo", "bar", 0)
END
$ echo $foo
$ ./setfoo
$ echo $foo
bar
This works — it isn't what I'd use, but it 'works'. Let's create a script teredo to set the environment variable TEREDO_WORMS:
#!/bin/ksh
export TEREDO_WORMS=ukelele
exec $SHELL -i
It will be interpreted by the Korn shell, exports the environment variable, and then replaces itself with a new interactive shell.
Before running this script, we have SHELL set in the environment to the C shell, and the environment variable TEREDO_WORMS is not set:
% env | grep SHELL
SHELL=/bin/csh
% env | grep TEREDO
%
When the script is run, you are in a new shell, another interactive C shell, but the environment variable is set:
% teredo
% env | grep TEREDO
TEREDO_WORMS=ukelele
%
When you exit from this shell, the original shell takes over:
% exit
% env | grep TEREDO
%
The environment variable is not set in the original shell's environment. If you use exec teredo to run the command, then the original interactive shell is replaced by the Korn shell that sets the environment, and then that in turn is replaced by a new interactive C shell:
% exec teredo
% env | grep TEREDO
TEREDO_WORMS=ukelele
%
If you type exit (or Control-D), then your shell exits, probably logging you out of that window, or taking you back to the previous level of shell from where the experiments started.
The same mechanism works for Bash or Korn shell. You may find that the prompt after the exit commands appears in funny places.
Note the discussion in the comments. This is not a solution I would recommend, but it does achieve the stated purpose of a single script to set the environment that works with all shells (that accept the -i option to make an interactive shell). You could also add "$#" after the option to relay any other arguments, which might then make the shell usable as a general 'set environment and execute command' tool. You might want to omit the -i if there are other arguments, leading to:
#!/bin/ksh
export TEREDO_WORMS=ukelele
exec $SHELL "${#-'-i'}"
The "${#-'-i'}" bit means 'if the argument list contains at least one argument, use the original argument list; otherwise, substitute -i for the non-existent arguments'.
You should use modules, see http://modules.sourceforge.net/
EDIT: The modules package has not been updated since 2012 but still works ok for the basics. All the new features, bells and whistles happen in lmod this day (which I like it more): https://www.tacc.utexas.edu/research-development/tacc-projects/lmod
Another workaround that I don't see mentioned is to write the variable value to a file.
I ran into a very similar issue where I wanted to be able to run the last set test (instead of all my tests). My first plan was to write one command for setting the env variable TESTCASE, and then have another command that would use this to run the test. Needless to say that I had the same exact issue as you did.
But then I came up with this simple hack:
First command ( testset ):
#!/bin/bash
if [ $# -eq 1 ]
then
echo $1 > ~/.TESTCASE
echo "TESTCASE has been set to: $1"
else
echo "Come again?"
fi
Second command (testrun ):
#!/bin/bash
TESTCASE=$(cat ~/.TESTCASE)
drush test-run $TESTCASE
You can instruct the child process to print its environment variables (by calling "env"), then loop over the printed environment variables in the parent process and call "export" on those variables.
The following code is based on Capturing output of find . -print0 into a bash array
If the parent shell is the bash, you can use
while IFS= read -r -d $'\0' line; do
export "$line"
done < <(bash -s <<< 'export VARNAME=something; env -0')
echo $VARNAME
If the parent shell is the dash, then read does not provide the -d flag and the code gets more complicated
TMPDIR=$(mktemp -d)
mkfifo $TMPDIR/fifo
(bash -s << "EOF"
export VARNAME=something
while IFS= read -r -d $'\0' line; do
echo $(printf '%q' "$line")
done < <(env -0)
EOF
) > $TMPDIR/fifo &
while read -r line; do export "$(eval echo $line)"; done < $TMPDIR/fifo
rm -r $TMPDIR
echo $VARNAME
Under OS X bash you can do the following:
Create the bash script file to unset the variable
#!/bin/bash
unset http_proxy
Make the file executable
sudo chmod 744 unsetvar
Create alias
alias unsetvar='source /your/path/to/the/script/unsetvar'
It should be ready to use so long you have the folder containing your script file appended to the path.
It's not what I would call outstanding, but this also works if you need to call the script from the shell anyway. It's not a good solution, but for a single static environment variable, it works well enough.
1.) Create a script with a condition that exits either 0 (Successful) or 1 (Not successful)
if [[ $foo == "True" ]]; then
exit 0
else
exit 1
2.) Create an alias that is dependent on the exit code.
alias='myscript.sh && export MyVariable'
You call the alias, which calls the script, which evaluates the condition, which is required to exit zero via the '&&' in order to set the environment variable in the parent shell.
This is flotsam, but it can be useful in a pinch.
You can invoke another one Bash with the different bash_profile.
Also, you can create special bash_profile for using in multi-bashprofile environment.
Remember that you can use functions inside of bashprofile, and that functions will be avialable globally.
for example, "function user { export USER_NAME $1 }" can set variable in runtime, for example: user olegchir && env | grep olegchir
Another option is to use "Environment Modules" (http://modules.sourceforge.net/). This unfortunately introduces a third language into the mix. You define the environment with the language of Tcl, but there are a few handy commands for typical modifications (prepend vs. append vs set). You will also need to have environment modules installed. You can then use module load *XXX* to name the environment you want. The module command is basically a fancy alias for the eval mechanism described above by Thomas Kammeyer. The main advantage here is that you can maintain the environment in one language and rely on "Environment Modules" to translate it to sh, ksh, bash, csh, tcsh, zsh, python (?!?!!), etc.
I created a solution using pipes, eval and signal.
parent() {
if [ -z "$G_EVAL_FD" ]; then
die 1 "Rode primeiro parent_setup no processo pai"
fi
if [ $(ppid) = "$$" ]; then
"$#"
else
kill -SIGUSR1 $$
echo "$#">&$G_EVAL_FD
fi
}
parent_setup() {
G_EVAL_FD=99
tempfile=$(mktemp -u)
mkfifo "$tempfile"
eval "exec $G_EVAL_FD<>'$tempfile'"
rm -f "$tempfile"
trap "read CMD <&$G_EVAL_FD; eval \"\$CMD\"" USR1
}
parent_setup #on parent shell context
( A=1 ); echo $A # prints nothing
( parent A=1 ); echo $A # prints 1
It might work with any command.
I don't see any answer documenting how to work around this problem with cooperating processes. A common pattern with things like ssh-agent is to have the child process print an expression which the parent can eval.
bash$ eval $(shh-agent)
For example, ssh-agent has options to select Csh or Bourne-compatible output syntax.
bash$ ssh-agent
SSH2_AUTH_SOCK=/tmp/ssh-era/ssh2-10690-agent; export SSH2_AUTH_SOCK;
SSH2_AGENT_PID=10691; export SSH2_AGENT_PID;
echo Agent pid 10691;
(This causes the agent to start running, but doesn't allow you to actually use it, unless you now copy-paste this output to your shell prompt.) Compare:
bash$ ssh-agent -c
setenv SSH2_AUTH_SOCK /tmp/ssh-era/ssh2-10751-agent;
setenv SSH2_AGENT_PID 10752;
echo Agent pid 10752;
(As you can see, csh and tcsh uses setenv to set varibles.)
Your own program can do this, too.
bash$ foo=$(makefoo)
Your makefoo script would simply calculate and print the value, and let the caller do whatever they want with it -- assigning it to a variable is a common use case, but probably not something you want to hard-code into the tool which produces the value.
Technically, that is correct -- only 'eval' doesn't fork another shell. However, from the point of view of the application you're trying to run in the modified environment, the difference is nil: the child inherits the environment of its parent, so the (modified) environment is conveyed to all descending processes.
Ipso facto, the changed environment variable 'sticks' -- as long as you are running under the parent program/shell.
If it is absolutely necessary for the environment variable to remain after the parent (Perl or shell) has exited, it is necessary for the parent shell to do the heavy lifting. One method I've seen in the documentation is for the current script to spawn an executable file with the necessary 'export' language, and then trick the parent shell into executing it -- always being cognizant of the fact that you need to preface the command with 'source' if you're trying to leave a non-volatile version of the modified environment behind. A Kluge at best.
The second method is to modify the script that initiates the shell environment (.bashrc or whatever) to contain the modified parameter. This can be dangerous -- if you hose up the initialization script it may make your shell unavailable the next time it tries to launch. There are plenty of tools for modifying the current shell; by affixing the necessary tweaks to the 'launcher' you effectively push those changes forward as well.
Generally not a good idea; if you only need the environment changes for a particular application suite, you'll have to go back and return the shell launch script to its pristine state (using vi or whatever) afterwards.
In short, there are no good (and easy) methods. Presumably this was made difficult to ensure the security of the system was not irrevocably compromised.
The short answer is no, you cannot alter the environment of the parent process, but it seems like what you want is an environment with custom environment variables and the shell that the user has chosen.
So why not simply something like
#!/usr/bin/env bash
FOO=foo $SHELL
Then when you are done with the environment, just exit.
You could always use aliases
alias your_env='source ~/scripts/your_env.sh'
I did this many years ago. If I rememeber correctly, I included an alias in each of .bashrc and .cshrc, with parameters, aliasing the respective forms of setting the environment to a common form.
Then the script that you will source in any of the two shells has a command with that last form, that is suitable aliased in each shell.
If I find the concrete aliases, I will post them.
Other than writings conditionals depending on what $SHELL/$TERM is set to, no. What's wrong with using Perl? It's pretty ubiquitous (I can't think of a single UNIX variant that doesn't have it), and it'll spare you the trouble.
From Bash Reference Manual I get the following about exec bash builtin command:
If command is supplied, it replaces the shell without creating a new process.
Now I have the following bash script:
#!/bin/bash
exec ls;
echo 123;
exit 0
This executed, I got this:
cleanup.sh ex1.bash file.bash file.bash~ output.log
(files from the current directory)
Now, if I have this script:
#!/bin/bash
exec ls | cat
echo 123
exit 0
I get the following output:
cleanup.sh
ex1.bash
file.bash
file.bash~
output.log
123
My question is:
If when exec is invoked it replaces the shell without creating a new process, why when put | cat, the echo 123 is printed, but without it, it isn't. So, I would be happy if someone can explain what's the logic of this behavior.
Thanks.
EDIT:
After #torek response, I get an even harder to explain behavior:
1.exec ls>out command creates the out file and put in it the ls's command result;
2.exec ls>out1 ls>out2 creates only the files, but do not put inside any result. If the command works as suggested, I think the command number 2 should have the same result as command number 1 (even more, I think it should not have had created the out2 file).
In this particular case, you have the exec in a pipeline. In order to execute a series of pipeline commands, the shell must initially fork, making a sub-shell. (Specifically it has to create the pipe, then fork, so that everything run "on the left" of the pipe can have its output sent to whatever is "on the right" of the pipe.)
To see that this is in fact what is happening, compare:
{ ls; echo this too; } | cat
with:
{ exec ls; echo this too; } | cat
The former runs ls without leaving the sub-shell, so that this sub-shell is therefore still around to run the echo. The latter runs ls by leaving the sub-shell, which is therefore no longer there to do the echo, and this too is not printed.
(The use of curly-braces { cmd1; cmd2; } normally suppresses the sub-shell fork action that you get with parentheses (cmd1; cmd2), but in the case of a pipe, the fork is "forced", as it were.)
Redirection of the current shell happens only if there is "nothing to run", as it were, after the word exec. Thus, e.g., exec >stdout 4<input 5>>append modifies the current shell, but exec foo >stdout 4<input 5>>append tries to exec command foo. [Note: this is not strictly accurate; see addendum.]
Interestingly, in an interactive shell, after exec foo >output fails because there is no command foo, the shell sticks around, but stdout remains redirected to file output. (You can recover with exec >/dev/tty. In a script, the failure to exec foo terminates the script.)
With a tip of the hat to #Pumbaa80, here's something even more illustrative:
#! /bin/bash
shopt -s execfail
exec ls | cat -E
echo this goes to stdout
echo this goes to stderr 1>&2
(note: cat -E is simplified down from my usual cat -vET, which is my handy go-to for "let me see non-printing characters in a recognizable way"). When this script is run, the output from ls has cat -E applied (on Linux this makes end-of-line visible as a $ sign), but the output sent to stdout and stderr (on the remaining two lines) is not redirected. Change the | cat -E to > out and, after the script runs, observe the contents of file out: the final two echos are not in there.
Now change the ls to foo (or some other command that will not be found) and run the script again. This time the output is:
$ ./demo.sh
./demo.sh: line 3: exec: foo: not found
this goes to stderr
and the file out now has the contents produced by the first echo line.
This makes what exec "really does" as obvious as possible (but no more obvious, as Albert Einstein did not put it :-) ).
Normally, when the shell goes to execute a "simple command" (see the manual page for the precise definition, but this specifically excludes the commands in a "pipeline"), it prepares any I/O redirection operations specified with <, >, and so on by opening the files needed. Then the shell invokes fork (or some equivalent but more-efficient variant like vfork or clone depending on underlying OS, configuration, etc), and, in the child process, rearranges the open file descriptors (using dup2 calls or equivalent) to achieve the desired final arrangements: > out moves the open descriptor to fd 1—stdout—while 6> out moves the open descriptor to fd 6.
If you specify the exec keyword, though, the shell suppresses the fork step. It does all the file opening and file-descriptor-rearranging as usual, but this time, it affects any and all subsequent commands. Finally, having done all the redirections, the shell attempts to execve() (in the system-call sense) the command, if there is one. If there is no command, or if the execve() call fails and the shell is supposed to continue running (is interactive or you have set execfail), the shell soldiers on. If the execve() succeeds, the shell no longer exists, having been replaced by the new command. If execfail is unset and the shell is not interactive, the shell exits.
(There's also the added complication of the command_not_found_handle shell function: bash's exec seems to suppress running it, based on test results. The exec keyword in general makes the shell not look at its own functions, i.e., if you have a shell function f, running f as a simple command runs the shell function, as does (f) which runs it in a sub-shell, but running (exec f) skips over it.)
As for why ls>out1 ls>out2 creates two files (with or without an exec), this is simple enough: the shell opens each redirection, and then uses dup2 to move the file descriptors. If you have two ordinary > redirects, the shell opens both, moves the first one to fd 1 (stdout), then moves the second one to fd 1 (stdout again), closing the first in the process. Finally, it runs ls ls, because that's what's left after removing the >out1 >out2. As long as there is no file named ls, the ls command complains to stderr, and writes nothing to stdout.
I have just started using Linux and I am curious how shell built-in commands such as cd are defined.
Also, I'd appreciate if someone could explain how they are implemented and executed.
If you want to see how bash builtins are defined then you just need to look at Section 4 of The Bash Man Page.
If, however, you want to know how bash bultins are implemented, you'll need to look at the Bash source code because these commands are compiled into the bash executable.
One fast and easy way to see whether or not a command is a bash builtin is to use the help command. Example, help cd will show you how the bash builtin of 'cd' is defined. Similarly for help echo.
The actual set of built-ins varies from shell to shell. There are:
Special built-in utilities, which must be built-in, because they have some special properties
Regular built-in utilities, which are almost always built-in, because of the performance or other considerations
Any standard utility can be also built-in if a shell implementer wishes.
You can find out whether the utility is built in using the type command, which is supported by most shells (although its output is not standardized). An example from dash:
$ type ls
ls is /bin/ls
$ type cd
cd is a shell builtin
$ type exit
exit is a special shell builtin
Re cd utility, theoretically there's nothing preventing a shell implementer to implement it as external command. cd cannot change the shell's current directory directly, but, for instance, cd could communicate new directory to the shell process via a socket. But nobody does so because there's no point. Except very old shells (where there was not a notion of built-ins), where cd used some dirty system hack to do its job.
How is cd implemented inside the shell? The basic algorithm is described here. It can also do some work to support shell's extra features.
Manjari,
Check the source code of bash shell from ftp://ftp.gnu.org/gnu/bash/bash-2.05b.tar.gz
You will find that the definition of shell built-in commands in not in a separate binary executable but its within the shell binary itself (the name shell built-in clearly suggests this).
Every Unix shell has at least some builtin commands. These builtin commands are part of the shell, and are implemented as part of the shell's source code. The shell recognizes that the command that it was asked to execute was one of its builtins, and it performs that action on its own, without calling out to a separate executable. Different shells have different builtins, though there will be a whole lot of overlap in the basic set.
Sometimes, builtins are builtin for performance reasons. In this case, there's often also a version of that command in $PATH (possibly with a different feature set, different set of recognized command line arguments, etc), but the shell decided to implement the command as a builtin as well so that it could save the work of spawning off a short-lived process to do some work that it could do itself. That's the case for bash and printf, for example:
$ type printf
printf is a shell builtin
$ which printf
/usr/bin/printf
$ printf
printf: usage: printf [-v var] format [arguments]
$ /usr/bin/printf
/usr/bin/printf: missing operand
Try `/usr/bin/printf --help' for more information.
Note that in the above example, printf is both a shell builtin (implemented as part of bash itself), as well as an external command (located at /usr/bin/printf). Note that they behave differently as well - when called with no arguments, the builtin version and the command version print different error messages. Note also the -v var option (store the results of this printf into a shell variable named var) can only be done as part of the shell - subprocesses like /usr/bin/printf have no access to the variables of the shell that executed them.
And that brings us to the 2nd part of the story: some commands are builtin because they need to be. Some commands, like chmod, are thin wrappers around system calls. When you run /bin/chmod 777 foo, the shell forks, execs /bin/chmod (passing "777" and "foo") as arguments, and the new chmod process runs the C code chmod("foo", 777); and then returns control to the shell. This wouldn't work for the cd command, though. Even though cd looks like the same case as chmod, it has to behave differently: if the shell spawned another process to execute the chdir system call, it would change the directory only for that newly spawned process, not the shell. Then, when the process returned, the shell would be left sitting in the same directory as it had been in all along - therefore cd needs to be implemented as a shell builtin.
A Shell builtin -- http://linux.about.com/library/cmd/blcmdl1_builtin.htm
for eg. -
which cd
/usr/bin/which: no cd in (/usr/bin:/usr/local/bin......
Not a shell builtin but a binary.
which ls
/bin/ls
http://ss64.com/bash/ this will help you.
and here is shell scripting guide
http://www.freeos.com/guides/lsst/