Finding index for new folder - linux

I am given a name and I am supposed to make a dir with this name. If this dir already exists, name of the folder should have _$number as its suffix.
Number is calculated as highest value + 1. Examples:
Name:awesome
Files: dummy awesome awesome_2 awesome_4 dummy_3
New folder: awesome_5
Name:awesome
Files: dummy dummy_1
New folder: awesome
My solution for finding highest value works only for names without special characters. Should the name be for example: "$#&*!(#)(%+#$ asdasd \ ^ sad", it fails.
function max_item() {
local prefix="$1"
local max="0"
shopt -s nullglob
for in_file in * ; do
if [[ "$in_file" =~ ^"$prefix"_(-{0,1}[0-9][0-9]*)$ ]]; then
num="${BASH_REMATCH[1]}";
[[ "$max" -lt "$num" ]] && max="$num";
fi
done
echo "$max"
shopt -u nullglob
return 0
}
I guess it has something to do with special characters in regex but I have exhausted all my ideas.

Since you are looking for a number at the end of the name, prefixed by an _, you could do this instead:
max=0
number='^[[:digit:]]+$'
for in_file in "${prefix}_"* ; do
num="${in_file##*_}"
[[ "$num" =~ $number ]] && [[ "$max" -lt "$num" ]] && max="$num"
done
num=$((max + 1))
I have incorporated #Jens' excellent suggestion to loop through the just the matching files.

Looping in shell code is notoriously slow.
For small numbers, codeforester's solution is fine, but starting at around 30 items (the exact number depends on many factors), the external-utility-based solution below will be faster and scale much better.
(For fewer items, an external-utility solution is slower, but that will rarely matter).
The solution below has the added advantage of being more concise:
max_index() {
printf '%d\n' "$(shopt -s nullglob;
printf '%s\n' "$1_"* |
awk -F_ '{print $NF}' |
sort -rn | head -n 1)"
}
Note: The reasonable assumption is made that your filenames have no embedded newlines.
shopt -s nullglob ensures that if a globbing pattern ("$1_"* in this case) matches nothing, it expands to the null (empty) string.
printf '%s\n' "$1_"* prints all matching filesystem items line by line.
awk -F_ '{print $NF}' outputs the last _-based token on each line, i.e., the trailing number.
Note: cut -d_ -f2 would work too, but makes the assumption that only one _ is present in the filename.
sort -rn sorts the trailing numbers numerically (-n), in reverse (-r).
head -n 1 then extracts only the 1st output line, which is by definition the highest number (if any).
Note that printf '%d\n' '' outputs 0, which is effectively what happens if no existing _<number> suffixes are found.

Related

Bash - replace in file strings from first array with strings from second array

I have two arrays. The first one is filled with values greped from the file that I want to replace with the new ones downloaded.
Please note that I don't know exactly how the first array will look like, meaning some values will have _, other - and some won't have any of that and immediately after the name will be placed : (colon).
Example arrays:
array1:
[account:123 shoppingcart-1:123 notification-core_1:123 notification-dispatcher_core_1:123 notification-dispatcher-smschannel_core_1:123]
array2:
[account_custom_2:124 shoppingcart_custom_2:124 notification_custom_2:124 notification-dispatcher_custom_2:124 notification-dispatcher-smschannel_custom_2:124]
Those arrays are the only example, there are more than 50 values to replace.
I am doing a comparison of every item in the second array with every item in the first array as shown below:
file_name="<path_to_file>/file.txt"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" $file_name
fi
done
done
Here I substring only the first part for every item in the second array so I can compare it with the item in the first array.
e.g. account_custom_2:124 -> account or notification-dispatcher_custom_2:124 -> notification-dispatcher.
This works nice but I encounter problem when notification is in notification-core_1:123 and notification-dispatcher_core_1:123 and notification-dispatcher-smschannel_core_1:123.
Can you please give advice on how to fix this or if you can suggest another approach to this?
The point is the base of array2 element may include other element
as a substring and will cause an improper replacement depending on the order
of matching.
To avoid this, you can sort the array in descending order so the
longer pattern comes first.
Assuming the strings in the arrays do not contain tab characters, would
you please try:
file_name="<path_to_file>/file.txt"
array1=(account:123 shoppingcart-1:123 notification-core_1:123 notification-dispatcher_core_1:123 notification-dispatcher-smschannel_core_1:123)
array2=(account_custom_2:124 shoppingcart_custom_2:124 notification_custom_2:124 notification-dispatcher_custom_2:124 notification-dispatcher-smschannel_custom_2:124)
# insert the following block to sort array2 in descending order
array2=( $(for j in "${array2[#]}"; do
array_2_base=${j%%_*}
printf "%s\t%s\n" "$array_2_base" "$j"
done | sort -r | cut -f2-) )
# the following code will work "as is"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" "$file_name"
delete="${array1[$i]}"
array1=( "${array1[#]/$delete}" )
fi
done
done
The script above will be inefficent in execution time due to the repetitive
invocation of sed -i command.
The script below will run faster
by pre-generating the sed script and executing it just once.
file_name="<path_to_file>/file.txt"
array1=(
account:123
shoppingcart-1:123
notification-core_1:123
notification-dispatcher_core_1:123
notification-dispatcher-smschannel_core_1:123
)
array2=(
account_custom_2:124
shoppingcart_custom_2:124
notification_custom_2:124
notification-dispatcher_custom_2:124
notification-dispatcher-smschannel_custom_2:124
)
while IFS=$'\t' read -r base a2; do # read the sorted list line by line
for a1 in "${array1[#]}"; do
if [[ $a1 == *$base* ]]; then
scr+="s=$a1=$a2=g;" # generate sed script by appending the "s" command
continue 2
fi
done
done < <(for j in "${array2[#]}"; do
array_2_base=${j%%_*} # substring before the 1st "_"
printf "%s\t%s\n" "$array_2_base" "$j"
# print base and original element side by side
done | sort -r)
sed -i "$scr" "$file_name" # execute the replacement at once
If number of items in your arrays are equal then you can process them in one loop
for i in "${!array1[#]}"; {
value=${array1[$i]}
new_value=${array2[$i]}
sed -i "s/$value/$new_value/" file
}
I found a way to fix this.
I am deleting string from the first array once replaced.
file_name="<path_to_file>/file.txt"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" $file_name
delete="${array1[$i]}"
array1=( "${array1[#]/$delete}" )
fi
done
done

MD5 comparison between two text files

I just started learning Linux shell scripting. I have to compare this two files in Linux shell scripting for version control example :
file1.txt
275caa62391ff4f3096b1e8a4975de40 apple
awd6s54g64h6se4h6se45wahae654j6 ball
e4rby1s6y4653a46h153a41bqwa54tvi cat
r53aghe4354hr35a4hr65a46eeh5j45ro castor
file2.txt
275caa62391ff4f3096b1e8a4975de40 apple
js65fg4a64zgr65f4w65ea465fa65gh7 ball
wroghah4a65ejdtse5z4g6sa7H658aw7 candle
wagjh54hr5ae454zrwrh354aha4564re castor
How to sort this text files in newly added(one which is added in file 2 but not in file 1) ,deleted(one which is deleted in file 2 but not in file 1) and changed files (have same name but different checksum) ?
I tried using diff , bcompare , vimdiff but I am not getting a proper output as a text file.
Thanks in advance
I don't know if such a command exist, but I've taken the liberty to write you a sorting mechanism in Bash. Although it's optimised, I suggest you recreate it in a language of your own choice.
#! /bin/bash
# Sets the array delimiter to a newline
IFS=$'\n'
# If $1 is empty, default to 'file1.txt'. Same for $2.
FILE1=${1:-file1.txt}
FILE2=${2:-file2.txt}
DELETED=()
ADDED=()
CHANGED=()
# Loop over array $1 and print content
function array_print {
# -n creates a "pointer" to an array. This
# way you can pass large arrays to functions.
local -n array=$1
echo "$1: "
for i in "${array}"; do
echo $i
done
}
# This function loops over the entries in file_in and checks
# if they exist in file_tst. Unless doubles are found, a
# callback is executed.
function array_sort {
local file_in="$1"
local file_tst="$2"
local callback=${3:-true}
local -n arr0=$4
local -n arr1=$5
while read -r line; do
tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
hit=$(grep $tst_name $file_tst)
# If found, skip. Nothing is changed.
[[ $hit != $line ]] || continue
# Run callback
$callback "$hit" "$line" arr0 arr1
done < "$file_in"
}
# If tst is empty, line will be added to not_found. For file 1 this
# means that file doesn't exist in file2, thus is deleted. Otherwise
# the file is changed.
function callback_file1 {
local tst=$1
local line=$2
local -n not_found=$3
local -n found=$4
if [[ -z $tst ]]; then
not_found+=($line)
else
found+=($line)
fi
}
# If tst is empty, line will be added to not_found. For file 2 this
# means that file doesn't exist in file1, thus is added. Since the
# callback for file 1 already filled all the changed files, we do
# nothing with the fourth parameter.
function callback_file2 {
local tst=$1
local line=$2
local -n not_found=$3
if [[ -z $tst ]]; then
not_found+=($line)
fi
}
array_sort "$FILE1" "$FILE2" callback_file1 DELETED CHANGED
array_sort "$FILE2" "$FILE1" callback_file2 ADDED CHANGED
array_print ADDED
array_print DELETED
array_print CHANGED
exit 0
Since it might be hard to understand the code above, I've written it out. I hope it helps :-)
while read -r line; do
tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
hit=$(grep $tst_name $FILE2)
# If found, skip. Nothing is changed.
[[ $hit != $line ]] || continue
# If name does not occur, it's deleted (exists in
# file1, but not in file2)
if [[ -z $hit ]]; then
DELETED+=($line)
else
# If name occurs, it's changed. Otherwise it would
# not come here due to previous if-statement.
CHANGED+=($line)
fi
done < "$FILE1"
while read -r line; do
tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
hit=$(grep $tst_name $FILE1)
# If found, skip. Nothing is changed.
[[ $hit != $line ]] || continue
# If name does not occur, it's added. (exists in
# file2, but not in file1)
if [[ -z $hit ]]; then
ADDED+=($line)
fi
done < "$FILE2"
Files which are only in file1.txt:
awk 'NR==FNR{a[$2];next} !($2 in a)' file2.txt file1.txt > only_in_file1.txt
Files which are only in file2.txt:
awk 'NR==FNR{a[$2];next} !($2 in a)' file1.txt file2.txt > only_in_file2.txt
Then something like this answer:
awk compare columns from two files, impute values of another column
e.g:
awk 'FNR==NR{a[$1]=$1;next}{print $0,a[$1]?a[$2]:"NA"}' file2.txt file1.txt | grep NA | awk '{print $1,$2}' > md5sdiffer.txt
You'll need to come up with how you want to present these though.
There might be a more elegant way to loop though the final example (as opposed to finding those with NA and then re-filtering), however it's still enough to go off

How to clean up multiple file names using bash?

I have. directory with ~250 .txt files in it. Each of these files has a title like this:
Abraham Lincoln [December 01, 1862].txt
George Washington [October 25, 1790].txt
etc...
However, these are terrible file names for reading into python and I want to iterate over all of them to change them to a more suitable format.
I've tried similar things for changing single variables that are shared across many files. But I can't wrap my head around how I should iterate over these files and change the formatting of their names while still keeping the same information.
The ideal output would be something like
1861_12_01_abraham_lincoln.txt
1790_10_25_george_washington.txt
etc...
Please try the straightforward (tedious) bash script:
#!/bin/bash
declare -A map=(["January"]="01" ["February"]="02" ["March"]="03" ["April"]="04" ["May"]="05" ["June"]="06" ["July"]="07" ["August"]="08" ["September"]="09" ["October"]="10" ["November"]="11" ["December"]="12")
pat='^([^[]+) \[([A-Za-z]+) ([0-9]+), ([0-9]+)]\.txt$'
for i in *.txt; do
if [[ $i =~ $pat ]]; then
newname="$(printf "%s_%s_%s_%s.txt" "${BASH_REMATCH[4]}" "${map["${BASH_REMATCH[2]}"]}" "${BASH_REMATCH[3]}" "$(tr 'A-Z ' 'a-z_' <<< "${BASH_REMATCH[1]}")")"
mv -- "$i" "$newname"
fi
done
for file in *.txt; do
# extract parts of the filename to be differently formatted with a regex match
[[ $file =~ (.*)\[(.*)\] ]] || { echo "invalid file $file"; exit; }
# format extracted strings and generate the new filename
formatted_date=$(date -d "${BASH_REMATCH[2]}" +"%Y_%m_%d")
name="${BASH_REMATCH[1]// /_}" # replace spaces in the name with underscores
f="${formatted_date}_${name,,}" # convert name to lower-case and append it to date string
new_filename="${f::-1}.txt" # remove trailing underscore and add `.txt` extension
# do what you need here
echo $new_filename
# mv $file $new_filename
done
I like to pull the filename apart, then put it back together.
Also GNU date can parse-out the time, which is simpler than using sed or a big case statement to convert "October" to "10".
#! /usr/bin/bash
if [ "$1" == "" ] || [ "$1" == "--help" ]; then
echo "Give a filename like \"Abraham Lincoln [December 01, 1862].txt\" as an argument"
exit 2
fi
filename="$1"
# remove the brackets
filename=`echo "$filename" | sed -e 's/[\[]//g;s/\]//g'`
# cut out the name
namepart=`echo "$filename" | awk '{ print $1" "$2 }'`
# cut out the date
datepart=`echo "$filename" | awk '{ print $3" "$4" "$5 }' | sed -e 's/\.txt//'`
# format up the date (relies on GNU date)
datepart=`date --date="$datepart" +"%Y_%m_%d"`
# put it back together with underscores, in lower case
final=`echo "$namepart $datepart.txt" | tr '[A-Z]' '[a-z]' | sed -e 's/ /_/g'`
echo mv \"$1\" \"$final\"
EDIT: converted to BASH, from Bourne shell.

Find file with largest number of lines in single directory

I'm trying to create a function that only outputs the file with the largest number of lines in a directory (and not any sub-directories). I'm being asked to make use of the wc function but don't really understand how to read each file individually and then sort them just to find the largest. Here is what I have so far:
#!/bin/bash
function sort {
[ $# -ne 1 ] && echo "Invalid number of arguments">&2 && exit 1;
[ ! -d "$1" ] && echo "Invalid input: not a directory">&2 && exit 1;
# Insert function here ;
}
# prompt if wanting current directory
# if yes
# sort $PWD
# if no
#sort $directory
This solution is almost pure Bash (wc is the only external command used):
shopt -s dotglob # Include filenames with initial '.' in globs
shopt -s nullglob # Make globs produce nothing when nothing matches
dir=$1
maxlines=-1
maxfile=
for file in "$dir"/* ; do
[[ -f $file ]] || continue # Skip non-files
[[ -L $file ]] && continue # Skip symlinks
numlines=$(wc -l < "$file")
if (( numlines > maxlines )) ; then
maxfile=$file
maxlines=$numlines
fi
done
[[ -n "$maxfile" ]] && printf '%s\n' "$maxfile"
Remove the shopt -s dotglob if you don't want to process files whose names begin with a dot. Remove the [[ -L $file ]] && continue if you want to process symlinks to files.
This solution should handle all filenames (ones containing spaces, ones containing glob characters, ones beginning with '-', ones containing newlines, ...), but it runs wc for each file so it may be unacceptably slow compared to solutions that feed many files to wc at once if you need to handle directories that have large numbers of files.
How about this:
wc -l * | sort -nr | head -2 | tail -1
wc -l counts lines (you get an error for directories, though), then sort in reverse order treating the first column as a number, then take the first two lines, then the second, as we need to skip over the total line.
wc -l * 2>/dev/null | sort -nr | head -2 | tail -1
The 2>/dev/null throws away all the errors, if you want a neater output.
Use a function like this:
my_custom_sort() {
for i in "${1+$1/}"*; do
[[ -f "$i" ]] && wc -l "$i"
done | sort -n | tail -n1 | cut -d" " -f2
}
And use it with or without directory (in latter case, it uses the current directory):
my_custom_sort /tmp
helloworld.txt

Bash- scramble characters contained in a string

So I have this function with the following output:
AGsg4SKKs74s62#
I need to find a way to scramble the characters without deleting anything..aka all characters must be present after I scramble them.
I can only bash utilities including awk and sed.
echo 'AGsg4SKKs74s62#' | sed 's/./&\n/g' | shuf | tr -d "\n"
Output (e.g.):
S7s64#2gKAGsKs4
Here's a pure Bash function that does the job:
scramble() {
# $1: string to scramble
# return in variable scramble_ret
local a=$1 i
scramble_ret=
while((${#a})); do
((i=RANDOM%${#a}))
scramble_ret+=${a:i:1}
a=${a::i}${a:i+1}
done
}
See if it works:
$ scramble 'AGsg4SKKs74s62#'
$ echo "$scramble_ret"
G4s6s#2As74SgKK
Looks all right.
I know that you haven't mentioned Perl but it could be done like this:
perl -MList::Util=shuffle -F'' -lane 'print shuffle #F' <<<"AGsg4SKKs74s62#"
-a enables auto-split mode and -F'' sets the field separator to an empty string, so each character goes into a separate array element. The array is shuffled using the function provided by the core module List::Util.
Here is my solution, usage: shuffleString "any-string". Performance is not in my consideration when using bash.
function shuffleString() {
local line="$1"
for i in $(seq 1 ${#line}); do
local p=$(expr $RANDOM % ${#line})
if [[ $p -lt $i ]]; then
local line="${line:0:$p}${line:$i:1}${line:$p+1:$i-$p-1}${line:$p:1}${line:$i+1}"
elif [[ $p -gt $i ]]; then
local line="${line:0:$i}${line:$p:1}${line:$i+1:$p-$i-1}${line:$i:1}${line:$p+1}"
fi
done
echo "$line"
}

Resources