Behavior of Arrays in bash scripting and zsh shell (Start Index 0 or 1?) - linux

I need explenation about the following behavior of arrays in shell scripting:
Imagine the following is given:
arber#host ~> ls
fileA fileB script.sh
Now i can do the following commands:
arber#host ~> ARR=($(ls -d file*))
arber#host ~> echo ${ARR[0]} # start index 0
arber#host ~> echo ${ARR[1]} # start index 1
fileA
arber#host ~> echo ${ARR[2]} # start index 2
fileB
But when I do this via script.sh it behaves different (Start Index = 0):
arber#host ~> cat script.sh
#!/bin/bash
ARR=($(ls -d file*))
# get length of an array
aLen=${#ARR[#]}
# use for loop read all items (START INDEX 0)
for (( i=0; i<${aLen}; i++ ));
do
echo ${ARR[$i]}
done
Here the result:
arber#host ~> ./script.sh
fileA
fileB
I use Ubuntu 18.04 LTS and zsh. Can someone explain this?

TL;DR:
bash array indexing starts at 0 (always)
zsh array indexing starts at 1 (unless option KSH_ARRAYS is set)
To always get consistent behaviour, use:
${array[#]:offset:length}
Explanation
For code which works in both bash and zsh, you need to use the offset:length syntax rather than the [subscript] syntax.
Even for zsh-only code, you'll still need to do this (or use emulate -LR zsh) since zsh's array subscripting basis is determined by the KSH_ARRAYS option.
Eg, to reference the first element in an array:
${array[#]:0:1}
Here, array[#] is all the elements, 0 is the offset (which always is 0-based), and 1 is the number of elements desired.

Arrays in Bash are indexed from zero, and in zsh they're indexed from one.
But you don't need the indices for a simple use case such as this. Looping over ${array[#]} works in both:
files=(file*)
for f in "${files[#]}"; do
echo "$f"
done
In zsh you could also use $files instead of "${files[#]}", but that doesn't work in Bash. (And there's the slight difference that it drops empty array elements, but you won't get any from file names.)
Also, don't use $(ls file*), it will break if you have filenames with spaces (see WordSpliting on BashGuide), and is completely useless to begin with.
The shell is perfectly capable of generating filenames by itself. That's actually what will happen there, the shell finds all files with names matching file*, passes them to ls, and ls just prints them out again for the shell to read and process.

Related

Bash script that counts and prints out the files that start with a specific letter

How do i print out all the files of the current directory that start with the letter "k" ?Also needs to count this files.
I tried some methods but i only got errors or wrong outputs. Really stuck on this as a newbie in bash.
Try this Shellcheck-clean pure POSIX shell code:
count=0
for file in k*; do
if [ -f "$file" ]; then
printf '%s\n' "$file"
count=$((count+1))
fi
done
printf 'count=%d\n' "$count"
It works correctly (just prints count=0) when run in a directory that contains nothing starting with 'k'.
It doesn't count directories or other non-files (e.g. fifos).
It counts symlinks to files, but not broken symlinks or symlinks to non-files.
It works with 'bash' and 'dash', and should work with any POSIX-compliant shell.
Here is a pure Bash solution.
files=(k*)
printf "%s\n" "${files[#]}"
echo "${#files[#]} files total"
The shell expands the wildcard k* into the array, thus populating it with a list of matching files. We then print out the array's elements, and their count.
The use of an array avoids the various problems with metacharacters in file names (see e.g. https://mywiki.wooledge.org/BashFAQ/020), though the syntax is slightly hard on the eyes.
As remarked by pjh, this will include any matching directories in the count, and fail in odd ways if there are no matches (unless you set nullglob to true). If avoiding directories is important, you basically have to get the directories into a separate array and exclude those.
To repeat what Dominique also said, avoid parsing ls output.
Demo of this and various other candidate solutions:
https://ideone.com/XxwTxB
To start with: never parse the output of the ls command, but use find instead.
As find basically goes through all subdirectories, you might need to limit that, using the -maxdepth switch, use value 1.
In order to count a number of results, you just count the number of lines in your output (in case your output is shown as one piece of output per line, which is the case of the find command). Counting a number of lines is done using the wc -l command.
So, this comes down to the following command:
find ./ -maxdepth 1 -type f -name "k*" | wc -l
Have fun!
This should work as well:
VAR="k"
COUNT=$(ls -p ${VAR}* | grep -v ":" | wc -w)
echo -e "Total number of files: ${COUNT}\n" 1>&2
echo -e "Files,that begin with ${VAR} are:\n$(ls -p ${VAR}* | grep -v ":" )" 1>&2

Finding a line that shows in a file only once

Assuming that I have files with 100 lines. There are a lot of lines that repeat themselves in the file, and only one line that does not.
I want to find the line that shows only once. Is there a command for that or do I have to build some complicated loop as below?
My code so far:
#!/bin/bash
filename="repeat_lines.txt"
var="$(wc -l <$filename )"
echo "length:" $var
#cp ex4.txt ex4_copy.txt
for((index=0; index < var; index++));
do
one="$(head -n $index $filename | tail -1)"
counter=0
for((index2=0; index2 < var; index2++));
do
two="$(head -n $index2 $filename | tail -1)"
if [ "$one" == "$two" ]; then
counter=$((counter+1))
fi
done
echo $one"is "$counter" times in the text: "
done
If I understood your question correctly, then
sort repeat_lines.txt | uniq -u should do the trick.
e.g. for file containing:
a
b
a
c
b
it will output c.
For further reference, see sort manpage, uniq manpage.
You've got a reasonable answer that uses standard shell tools sort and uniq. That's probably the solution you want to use, if you want something that is portable and doesn't require bash.
But an alternative would be to use functionality built into your bash shell. One method might be to use an associative array, which is a feature of bash 4 and above.
$ cat file.txt
a
b
c
a
b
$ declare -A lines
$ while read -r x; do ((lines[$x]++)); done < file.txt
$ for x in "${!lines[#]}"; do [[ ${lines["$x"]} -gt 1 ]] && unset lines["$x"]; done
$ declare -p lines
declare -A lines='([c]="1" )'
What we're doing here is:
declare -A creates the associative array. This is the bash 4 feature I mentioned.
The while loop reads each line of the file, and increments a counter that uses the content of a line of the file as the key in the associative array.
The for loop steps through the array, deleting any element whose counter is greater than 1.
declare -p prints the details of an array in a predictable, re-usable format. You could alternately use another for loop to step through the remaining array elements (of which there might be only one) in order to do something with them.
Note that this solution, while fine for small files (say, up to a few thousand lines), may not scale well for very large files of, say, millions of lines. Bash isn't the fastest at reading input this way, and one must be cognizant of memory limits when using arrays.
The sort alternative has the benefit of memory optimization using files on disk for extremely large files, at the expense of speed.
If you're dealing with files of only a few hundred lines, then it's hard to predict which solution will be faster. In the end, the form of output may dictate your choice of solution. The sort | uniq pipe generates a list to standard output. The bash solution above generates the same list as keys in an array. Otherwise, they are functionally equivalent.

File redirection fails in Bash script, but not Bash terminal

I am having a problem where cmd1 works, but not cmd2 in my Bash script ending in .sh. I have made the Bash script executable.
Additionally, I can execute cmd2 just fine from my Bash terminal. I have tried to make a minimally reproducible example, but my larger goal is to run a complicated executable with command line arguments and pass output to a file that may or may not exist (rather than displaying the output in the terminal).
Replacing > with >> also gives the same error in the script, but not the terminal.
My Bash script:
#!/bin/bash
cmd1="cat test.txt"
cmd2="cat test.txt > a"
echo $cmd1
$cmd1
echo $cmd2
$cmd2
test.txt has the words "dog" and "cat" on two separate lines without quotes.
Short answer: see BashFAQ #50: I'm trying to put a command in a variable, but the complex cases always fail!.
Long answer: the shell expands variable references (like $cmd1) toward the end of the process of parsing a command line, after it's done parsing redirects (like > a is supposed to be) and quotes and escapes and... In fact, the only thing it does with the expanded value is word splitting (e.g. treating cat test.txt > a as "cat" followed by "test.txt", ">", and finally "a", rather than a single string) and wildcard expansion (e.g. if $cmd expanded to cat *.txt, it'd replace the *.txt part with a list of matching files). (And it skips word splitting and wildcard expansion if the variable is in double-quotes.)
Partly as a result of this, the best way to store commands in variables is: don't. That's not what they're for; variables are for data, not commands. What you should do instead, though, depends on why you were storing the command in a variable.
If there's no real reason to store the command in a variable, then just use the command directly. For conditional redirects, just use a standard if statement:
if [ -f a ]; then
cat test.txt > a
else
cat test.txt
fi
If you need to define the command at one point, and use it later; or want to use the same command over and over without having to write it out in full each time, use a function:
cmd2() {
cat test.txt > a
}
cmd2
It sounds like you may need to be able to define the command differently depending on some condition, you can actually do that with a function as well:
if [ -f a ]; then
cmd() {
cat test.txt > a
}
else
cmd() {
cat test.txt
}
fi
cmd
Alternately, you can wrap the command (without redirect) in a function, then use a conditional to control whether it redirects:
cmd() {
cat test.txt
}
if [ -f a ]; then
cmd > a
else
cmd
fi
It's also possible to wrap a conditional redirect into a function itself, then pipe output to it:
maybe_redirect_to() {
if [ -f "$1" ]; then
cat > "$1"
else
cat
fi
}
cat test.txt | maybe_redirect_to a
(This creates an extra cat process that isn't really doing anything useful, but if it makes the script cleaner, I'd consider that worth it. In this particular case, you could minimize the stray cats by using maybe_redirect_to a < test.txt.)
As a last resort, you can store the command string in a variable, and use eval to parse it. eval basically re-runs the shell parsing process from the beginning, meaning that it'll recognize things like redirects in the string. But eval has a well-deserved reputation as a bug magnet, because it's easy for it to treat parts of the string you thought were just data as command syntax, which can cause some really weird (& dangerous) bugs.
If you must use eval, at least double-quote the variable reference, so it runs through the parsing process just once, rather than sort-of-once-and-a-half as it would unquoted. Here's an example of what I mean:
cmd3="echo '5 * 3 = 15'"
eval "$cmd3"
# prints: 5 * 3 = 15
eval $cmd3
# prints: 5 [list of files in the current directory] 3 = 15
# ...unless there are any files with shell metacharacters in their names, in
# which case something more complicated might happen.
BashFAQ #50 discusses some other possible reasons and solutions. Note that the array approach will not work here, since arrays also get expanded after redirects are parsed.
If you pop an 'eval' in front of $cmd2 it should work as expected:
#!/bin/bash
cmd2="cat test.txt > a"
eval $cmd2
If you're not sure about the operation of a script you could always use the debug mode to see if you can determine the error.
bash -x scriptname
This will run the command and display the output of variable evaluations. Hopefully this will reveal any issues with syntax.

Looping through the elements of a path variable in Bash

I want to loop through a path list that I have gotten from an echo $VARIABLE command.
For example:
echo $MANPATH will return
/usr/lib:/usr/sfw/lib:/usr/info
So that is three different paths, each separated by a colon. I want to loop though each of those paths. Is there a way to do that? Thanks.
Thanks for all the replies so far, it looks like I actually don't need a loop after all. I just need a way to take out the colon so I can run one ls command on those three paths.
You can set the Internal Field Separator:
( IFS=:
for p in $MANPATH; do
echo "$p"
done
)
I used a subshell so the change in IFS is not reflected in my current shell.
The canonical way to do this, in Bash, is to use the read builtin appropriately:
IFS=: read -r -d '' -a path_array < <(printf '%s:\0' "$MANPATH")
This is the only robust solution: will do exactly what you want: split the string on the delimiter : and be safe with respect to spaces, newlines, and glob characters like *, [ ], etc. (unlike the other answers: they are all broken).
After this command, you'll have an array path_array, and you can loop on it:
for p in "${path_array[#]}"; do
printf '%s\n' "$p"
done
You can use Bash's pattern substitution parameter expansion to populate your loop variable. For example:
MANPATH=/usr/lib:/usr/sfw/lib:/usr/info
# Replace colons with spaces to create list.
for path in ${MANPATH//:/ }; do
echo "$path"
done
Note: Don't enclose the substitution expansion in quotes. You want the expanded values from MANPATH to be interpreted by the for-loop as separate words, rather than as a single string.
In this way you can safely go through the $PATH with a single loop, while $IFS will remain the same inside or outside the loop.
while IFS=: read -d: -r path; do # `$IFS` is only set for the `read` command
echo $path
done <<< "${PATH:+"${PATH}:"}" # append an extra ':' if `$PATH` is set
You can check the value of $IFS,
IFS='xxxxxxxx'
while IFS=: read -d: -r path; do
echo "${IFS}${path}"
done <<< "${PATH:+"${PATH}:"}"
and the output will be something like this.
xxxxxxxx/usr/local/bin
xxxxxxxx/usr/bin
xxxxxxxx/bin
Reference to another question on StackExchange.
for p in $(echo $MANPATH | tr ":" " ") ;do
echo $p
done
IFS=:
arr=(${MANPATH})
for path in "${arr[#]}" ; do # <- quotes required
echo $path
done
... it does take care of spaces :o) but also adds empty elements if you have something like:
:/usr/bin::/usr/lib:
... then index 0,2 will be empty (''), cannot say why index 4 isnt set at all
This can also be solved with Python, on the command line:
python -c "import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]" echo {}
Or as an alias:
alias foreachpath="python -c \"import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]\""
With example usage:
foreachpath echo {}
The advantage to this approach is that {} will be replaced by each path in succession. This can be used to construct all sorts of commands, for instance to list the size of all files and directories in the directories in $PATH. including directories with spaces in the name:
foreachpath 'for e in "{}"/*; do du -h "$e"; done'
Here is an example that shortens the length of the $PATH variable by creating symlinks to every file and directory in the $PATH in $HOME/.allbin. This is not useful for everyday usage, but may be useful if you get the too many arguments error message in a docker container, because bitbake uses the full $PATH as part of the command line...
mkdir -p "$HOME/.allbin"
python -c "import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]" 'for e in "{}"/*; do ln -sf "$e" "$HOME/.allbin/$(basename $e)"; done'
export PATH="$HOME/.allbin"
This should also, in theory, speed up regular shell usage and shell scripts, since there are fewer paths to search for every command that is executed. It is pretty hacky, though, so I don't recommend that anyone shorten their $PATH this way.
The foreachpath alias might come in handy, though.
Combining ideas from:
https://stackoverflow.com/a/29949759 - gniourf_gniourf
https://stackoverflow.com/a/31017384 - Yi H.
code:
PATHVAR='foo:bar baz:spam:eggs:' # demo path with space and empty
printf '%s:\0' "$PATHVAR" | while IFS=: read -d: -r p; do
echo $p
done | cat -n
output:
1 foo
2 bar baz
3 spam
4 eggs
5
You can use Bash's for X in ${} notation to accomplish this:
for p in ${PATH//:/$'\n'} ; do
echo $p;
done
OP's update wants to ls the resulting folders, and has pointed out that ls only requires a space-separated list.
ls $(echo $PATH | tr ':' ' ') is nice and simple and should fit the bill nicely.

How to loop an executable command in the terminal in Linux?

Let me first describe my situation, I am working on a Linux platform and have a collection of .bmp files that add one to the picture number from filename0022.bmp up to filename0680.bmp. So a total of 658 pictures. I want to be able to run each of these pictures through a .exe file that operates on the picture then kicks out the file to a file specified by the user, it also has some threshold arguments: lower, upper. So the typical call for the executable is:
./filter inputfile outputfile lower upper
Is there a way that I can loop this call over all the files just from the terminal or by creating some kind of bash script? My problem is similar to this: Execute a command over multiple files with a batch file but this time I am working in a Linux command line terminal.
You may be interested in looking into bash scripting.
You can execute commands in a for loop directly from the shell.
A simple loop to generate the numbers you specifically mentioned. For example, from the shell:
user#machine $ for i in {22..680} ; do
> echo "filename${i}.bmp"
> done
This will give you a list from filename22.bmp to filename680.bmp. That simply handles the iteration of the range you had mentioned. This doesn't cover zero padding numbers. To do this you can use printf. The printf syntax is printf format argument. We can use the $i variable from our previous loop as the argument and apply the %Wd format where W is the width. Prefixing the W placeholder will specify the character to use. Example:
user#machine $ for i in {22..680} ; do
> echo "filename$(printf '%04d' $i).bmp"
> done
In the above $() acts as a variable, executing commands to obtain the value opposed to a predefined value.
This should now give you the filenames you had specified. We can take that and apply it to the actual application:
user#machine $ for i in {22..680} ; do
> ./filter "filename$(printf '%04d' $i).bmp" lower upper
> done
This can be rewritten to form one line:
user#machine $ for i in {22..680} ; do ./filter "filename$(printf '%04d' $i).bmp" lower upper ; done
One thing to note from the question, .exe files are generally compiled in COFF format where linux expects an ELF format executable.
here is a simple example:
for i in {1..100}; do echo "Hello Linux Terminal"; done
to append to a file:(>> is used to append, you can also use > to overwrite)
for i in {1..100}; do echo "Hello Linux Terminal" >> file.txt; done
You can try something like this...
#! /bin/bash
for ((a=022; a <= 658 ; a++))
do
printf "./filter filename%04d.bmp outputfile lower upper" $a | "sh"
done
You can leverage xargs for iterating:
ls | xargs -i ./filter {} {}_out lower upper
Note:
{} corresponds to one line output from the pipe, here it's the inputfile name.
Output files wouldbe named with postfix '_out'.
You can test that AS-IS in your shell :
for i in *; do
echo "$i" | tr '[:lower:]' '[:upper:]'
done
If you have a special path, change * by your path + a glob : Ex :
for i in /home/me/*.exe; do ...
See http://mywiki.wooledge.org/glob
This while prepend the name of the output images like filtered_filename0055.bmp
for i in *; do
./filter $i filtered_$i lower upper
done

Resources