I'm trying to solve some problem that would behave as follow
Let's quote situation
In the directory, I have few scripts with some content (it doesn't matter what it's doing)
example1.sh
example2.sh
example3.sh
...etc
Altogether there are 50 scripts
Some of these scripts contain the same function, for example
function foo1
{
echo "Hello"
}
and in some scripts function can be named the same but has other content or modified, for example
function foo1
{
echo "$PWD"
}
or
function foo1
{
echo "Hello"
ls -la
}
I have to find the same function with the same name and the same content in these scripts
For example,
foo1 the same or modified content in example1.sh and example2.sh -> what I want
foo1 other content in example1.sh and example3.sh -> not interested
My question is what is the best idea to solve this problem? What do you think?
My idea was to sort content from all scripts and grep names of repeated functions. I managed to do that but still, it's not what I want because I have to check every file with this function and check its content... and it's a pain in the neck because for some functions there are 10 scripts...
I was wondering about extracting content from repeated functions but I don't know how to do it, what do you think? Or maybe you have some other suggestions?
Thank you in advance for your answer!
what is the best idea to solve this problem?
Write a shell language tokenizer and implement syntax parsing enough to extract function definitions from a file. Sources of shell implementations will be an inspiration. Then build a database of file->function+body and list all files with same function+body.
For simple enough functions, an awk or perl or python script would be enough to cover most cases. But the best would be full shell language tokenizer.
Do not use function name {. Instead use name() {. See bash obsolete and deprecated syntax.
With the following files:
# file1.sh
function foo1
{
echo "Hello"
}
# file2.sh
function foo1
{
echo "Hello"
}
# file3.sh
function foo1
{
echo "$PWD"
}
# file4.sh
function foo1
{
echo "$PWD"
}
The following script:
printf "%s\n" *.sh |
while IFS= read -r file; do
sed -zE '
s/(function[[:space:]]+([[:print:]]+)[[:space:]]*\{|(function[[:space:]]+)?([[:print:]]+)[[:space:]]*\([[:space:]]*\)[[:space:]]*\{)([^}]*)}/\x01\2\4\n\5\x02/g;
/\x01/!d;
s/[^\x01\x02]*\x01([^\x01\x02]*)\x02[^\x01\x02]*/\1\n\x00/g
' "$file" |
sed -z 's~^~'"$file"'\x01~';
done |
awk -v RS='\0' -v FS='\1' '
{cnt[$2]++; a[$2]=a[$2]" "$1}
END{ for (i in cnt) if (cnt[i] > 1) print a[i], i }
'
outputs:
file1.sh file2.sh foo1
echo "Hello"
file3.sh file4.sh foo1
echo "$PWD"
Indicating there is the same function foo1 in file1.sh and file2.sh and the same function foo1 in file3.sh and file4.sh.
Also note that a script can and do:
if condition; then
func() { echo something; }
else
func() { echo something else; }
fi
A real tokenizer will have to also take that into account.
Create a message digest of the content of each function and use it as a key in an associative array. Add files that contain the same function digest to find duplicates.
You may want to normalize space in the function content and tweak the regex address range.
#!/usr/bin/env bash
# the 1st argument is the function name
func_name="$1"
func_pattern="^function $func_name[[:blank:]]*$"
shift
declare -A dupe_groups
while read -r func_dgst file; do # collect results in an associative array
dupe_groups[$func_dgst]+="$file "
done < <( # the remaining arguments are scripts
for f in "${#}"; do
if grep --quiet "$func_pattern" "$f"; then
dgst=$( # use an address range in sed to print function contents
sed -n "/$func_pattern/,/^}/p" "$f" | \
# pipe to openssl to create a message digest
openssl dgst -sha1 )
echo "$dgst $f"
fi
done )
# print the results
for key in "${!dupe_groups[#]}"; do
echo "$key ${dupe_groups[$key]}"
done
I tested with your example{1..3}.sh files added the following example4.sh for a duplicate function.
example4.sh
function foo1
{
echo "Hello"
ls -la
}
function another
{
echo "there"
}
To run
./group-func.sh foo1 example1.sh example2.sh example3.sh example4.sh
Results
155853f813e944a7fcc5ae73ee2d959e300d217a example1.sh
7848af9b8b9d48c5cb643f34b3e5ca26cb5bfbdd example2.sh
4771de27523a765bb0dbf070691ea1cbae841375 example3.sh example4.sh
Related
I am curious to know that whether it is possible in bash that we can run for loop on a bunch of variables and call those values within for loop. Example:
a="hello"
b="world"
c="this is bash"
for f in a b c; do {
echo $( $f )
OR
echo $ ( "$f" )
} done
I know this is not working but can we call the values saved in a, b and c variables in for loop with printing f. I tried multiple way but unable to resolve.
You need the ! like this:
for f in a b c; do
echo "${!f}"
done
You can also use a nameref:
#!/usr/bin/env bash
a="hello"
b="world"
c="this is bash"
declare -n f
for f in a b c; do
printf "%s\n" "$f"
done
From the documentation:
If the control variable in a for loop has the nameref attribute, the list of words can be a list of shell variables, and a name reference will be established for each word in the list, in turn, when the loop is executed.
Notes on the OP's code, (scroll to bottom for corrected version):
for f in a b c; do {
echo $( $f )
} done
Problems:
The purpose of { & } is usually to put the separate outputs of
separate unpiped commands into one stream. Example of separate
commands:
echo foo; echo bar | tac
Output:
foo
bar
The tac command puts lines of input in reverse order, but in the
code above it only gets one line, so there's nothing to reverse.
But with curly braces:
{ echo foo; echo bar; } | tac
Output:
bar
foo
A do ... done already acts just like curly braces.
So "do {" instead of a "do" is unnecessary and redundant; but it
won't harm anything, or have any effect.
If f=hello and we write:
echo $f
The output will be:
hello
But the code $( $f ) runs a subshell on $f which only works if $f is
a command. So:
echo $( $f )
...tries to run the command hello, but there probably is no such
command, so the subshell will output to standard error:
hello: command not found
...but no data is sent to standard output, so echo will
print nothing.
To fix:
a="hello"
b="world"
c="this is bash"
for f in "$a" "$b" "$c"; do
echo "$f"
done
This is my code:
#!/bin/sh
echo "ARGUMENTS COUNT : " $#
echo "ARGUMENTS LIST : " $*
dictionary=`awk '{ print $1 }'`
function()
{
for i in dictionary
do
for j in $*
do
if [ $j = $i ]
then
;
else
append
fi
done
done
}
append()
{
ls $j > dictionary1.txt
}
function
I need using unix shell functions make "dictionary". For example: I write in arguments default word, example hello. Then my function checks the file dictionary1 if that word is existing in the file. If not - append that word in file, if it's already exist - do nothing.
For some reason, my script does not work. When I start my script, it waits for something and that's it.
What I am doing wrong? How can I fix it?
An implementation that tries to care about both performance and correctness might look like:
#!/usr/bin/env bash
# ^^^^- NOT sh; sh does not support [[ ]] or <(...)
addWords() {
local tempFile dictFile
tempFile=$(mktemp dictFile.XXXXXX) || return
dictFile=$1; shift
[[ -e "$dictFile" ]] || touch "$dictFile" || return
sort -um "$dictFile" <(printf '%s\n' "$#" | sort -u) >"$tempFile"
mv -- "$tempFile" "$dictFile"
}
addWords myDict beta charlie delta alpha
addWords myDict charlie zulu
cat myDict
...has a final dictionary state of:
alpha
beta
charlie
delta
zulu
...and it rereads the input file only once for each addWords call (no matter how many words are being added!), not once per word to add.
Don't name a function "function".
Don't read in and walk through the whole file - all you need is to know it the word is there or not. grep does that.
ls lists files. You want to send a word to the file, not a filename. use echo or printf.
sh isn't bash. Use bash unless there's a clear reason not to, and the only reason is because it isn't available.
Try this:
#! /bin/env bash
checkWord() {
grep -qm 1 "$1" dictionary1.txt ||
echo "$1" >> dictionary1.txt
}
for wd
do checkWord "$wd"
done
If that works, you can add more structure and error checking.
You can remove your dictionary=awk... line (as mentioned it's blocking waiting for input) and simply grep your dictionary file for each argument, something like the below :
for i in "$#"
do
if ! grep -qow "$i" dictionary1.txt
then
echo "$i" >> dictionary1.txt
fi
done
With any awk in any shell on any UNIX box:
awk -v words="$*" '
BEGIN {
while ( (getline word < "dictionary1.txt") > 0 ) {
dict[word]++
}
close("dictionary1.txt")
split(words,tmp)
for (i in tmp) {
word = tmp[i]
if ( !dict[word]++ ) {
newWords = newWords word ORS
}
}
printf "%s", newWords >> "dictionary1.txt"
exit
}'
In bash one can escape arguments that contain whitespace.
foo "a string"
This also works for arguments to a command or function:
bar() {
foo "$#"
}
bar "a string"
So far so good, but what if I want to manipulate the arguments before calling foo?
This does not work:
bar() {
for arg in "$#"
do
args="$args \"prefix $arg\""
done
# Everything looks good ...
echo $args
# ... but it isn't.
foo $args
# foo "$args" would just be silly
}
bar a b c
So how do you build argument lists when the arguments contain whitespace?
There are (at least) two ways to do this:
Use an array and expand it using "${array[#]}":
bar() {
local i=0 args=()
for arg in "$#"
do
args[$i]="prefix $arg"
((++i))
done
foo "${args[#]}"
}
So, what have we learned? "${array[#]}" is to ${array[*]} what "$#" is to $*.
Or if you do not want to use arrays you need to use eval:
bar() {
local args=()
for arg in "$#"
do
args="$args \"prefix $arg\""
done
eval foo $args
}
Here is a shorter version which does not require the use of a numeric index:
(example: building arguments to a find command)
dir=$1
shift
for f in "$#" ; do
args+=(-iname "*$f*")
done
find "$dir" "${args[#]}"
Use arrays (one of the hidden features in Bash).
You can use the arrays just as you suggest, with a small detail changed. The line calling foo should read
foo "${args[#]}"
I had a problem with this too as well. I was writing a bash script to backup the important files on my windows computer (cygwin). I tried the array approach too, and still had some issues. Not sure exactly how I fixed it, but here's the parts of my code that are important in case it will help you.
WORK="d:\Work Documents\*"
# prompt and 7zip each file
for x in $SVN $WEB1 $WEB2 "$WORK" $GRAPHICS $W_SQL
do
echo "Add $x to archive? (y/n)"
read DO
if [ "$DO" == "y" ]; then
echo "compressing $x"
7zip a $W_OUTPUT "$x"
fi
echo ""
done
I have already searched about this particular problem, but couldn't find anything helpful.
Let's assume I have following functions defined in my ~/.bashrc (Note: this is pseudo-code!):
ANDROID_PLATFORM_ROOT="/home/simao/xos/src/"
function getPlatformPath() {
echo "$ANDROID_PLATFORM_ROOT"
}
function addCaf() {
# Me doing stuff
echo "blah/$(getPlatformPath)"
}
function addAosp() {
# Me doing stuff
echo "aosp/$(getPlatformPath)"
}
function addXos() {
# Me doing stuff
echo "xos/$(getPlatformPath)"
}
function addAllAll() {
cd $(gettop)
# repo forall -c "addCaf; addAosp; addXos" # Does not work!
repo forall -c # Here is where I need all those commands
}
My problem:
I need to get the functions addCaf, addAosp and addXos in one single line.
Like you can run following in bash (pseudo code):
dothis; dothat; doanotherthing; trythis && succeedsdothis || nosuccessdothis; blah
I would like to run all commands inside the three functions addCaf, addAosp and addXos in just one line.
Any help is appreciated.
What I already tried:
repo forall -c "bash -c \"source ~/.bashrc; addAllAll\""
But that didn't work as well.
Edit:
To clarify what I mean.
I want something like that as a result:
repo forall -c 'function getPlatformPath() { echo "$ANDROID_PLATFORM_ROOT"; }; ANDROID_PLATFORM_ROOT="/home/simao/xos/src/"; echo "blah/$(getPlatformPath)"; echo "aosp/$(getPlatformPath)"; echo "xos/$(getPlatformPath)"'
But I don't want to write that manually. Instead, I want to get those lines from the functions that already exist.
You can use type and then parse its output to do whatever you want to do with the code lines.
$ foo() {
> echo foo
> }
$ type foo
foo is a function
foo ()
{
echo foo
}
Perhaps this example makes things more clear:
#!/bin/bash
foo() {
echo "foo"
}
bar() {
echo "bar"
}
export IFS=$'\n'
for f in foo bar; do
for i in $(type $f | head -n-1 | tail -n+4); do
eval $i
done
done
exit 0
This is how it looks:
$ ./funcs.sh
foo
bar
What the script is doing is first loop over all the functions you have (in this case only foo and bar). For each function, it loops over the code of that function (skipping the useless lines from type's output) and it executes them. So at the end it's the same as having this code...
echo "foo"
echo "bar"
...which are exactly the code lines inside the functions, and you are executing them one after the other.
Note that you could also build a string variable containing all the code lines separated by ; if instead of running eval on every line you do something like this:
code_lines=
for f in foo bar; do
for i in $(type $f | head -n-1 | tail -n+4); do
if [ -z $code_lines ]; then
code_lines="$i"
else
code_lines="${code_lines}; $i"
fi
done
done
eval $code_lines
Assuming that repo forall -c interprets the next positional argument just as bash -c, try:
foo () {
echo "foo!"
}
boo () {
if true; then
echo "boo!"
fi
}
echo works | bash -c "source "<(typeset -f foo boo)"; foo; boo; cat"
Note:
The difference from the original version is that this no longer interferes with stdin.
The <(...) substitution is unescaped because it must be performed by the original shell, the one where foo and boo are first defined. Its output will be a string of the form /dev/fd/63, which is a file descriptor that is passed open to the second shell, and which contains the forwarded definitions.
Make a dummy function foo(), which just prints "bar":
foo() { echo bar ; }
Now a bash function to print what's in one (or more) functions. Since the contents of a function are indented with 4 spaces, sed removes any lines without 4 leading spaces, then removes the leading spaces as well, and adds a ';' at the end of each function:
# Usage: in_func <function_name1> [ ...<function_name2> ... ]
in_func()
{ while [ "$1" ] ; do \
type $1 | sed -n '/^ /{s/^ //p}' | sed '$s/.*/&;/' ; shift ; \
done ; }
Print what's in foo():
in_func foo
Output:
echo bar;
Assign what's in foo() to the string $baz, then print $baz:
baz="`in_func foo`" ; echo $baz
Output:
echo bar;
Run what's in foo():
eval "$baz"
Output:
bar
Assign what's in foo() to $baz three times, and run it:
baz="`in_func foo foo foo`" ; eval "$baz"
Output:
bar
bar
bar
Shell functions aren't visible to child processes unless they're exported. Perhaps that is the missing ingredient.
export -f addCaf addAosp addXos
repo forall -c "addCaf; addAosp; addXos"
This should work:
repo forall -c "$(addCaf) $(addAosp) $(addXos)"
Is it possible to define a macro-function in bash so when I write:
F(sth);
bash runs this:
echo "sth" > a.txt;
Arbitrary syntax can't be made to do anything. Parentheses are metacharacters which have special meaning to the parser, so there's no way you can use them as valid names. The best way to extend the shell is to define functions.
This would be a basic echo wrapper that always writes to the same file:
f() {
echo "$#"
} >a.txt
This does about the same but additionally handles stdin - sacrificing echo's -e and -n options:
f() {
[[ ${1+_} || ! -t 0 ]] && printf '%s\n' "${*-$(</dev/fd/0)}"
} >a.txt
Which can be called as
f arg1 arg2...
or
f <file
Functions are passed arguments in the same way as any other commands.
The second echo-like wrapper first tests for either a set first argument, or stdin coming from a non-tty, and conditionally calls printf using either the positional parameters if set, or stdin. The test expression avoids the case of both zero arguments and no redirection from a file, in which case Bash would try expanding the output of the terminal, hanging the shell.
F () {
echo "$1" > a.txt
}
You don't use parentheses when you call it. This is how you call it:
F "text to save"
Yes, only you should call it with F sth:
F()
{
echo "$1" > a.txt
}
Read more here.
This was answered long ago, but to provide an answer that satisfies the original request (even though that is likely not what is actually desired):
This is based on Magic Aliases: A Layering Loophole in the Bourne Shell by Simon Tatham.
F() { str="$(history 1)"; str=${str# *F(}; echo "${str%)*}"; } >a.txt
alias F='\F #'
$ F(sth)
$ cat a.txt
sth
See also ormaaj's better magic alias.