Bash- scramble characters contained in a string - string

So I have this function with the following output:
AGsg4SKKs74s62#
I need to find a way to scramble the characters without deleting anything..aka all characters must be present after I scramble them.
I can only bash utilities including awk and sed.

echo 'AGsg4SKKs74s62#' | sed 's/./&\n/g' | shuf | tr -d "\n"
Output (e.g.):
S7s64#2gKAGsKs4

Here's a pure Bash function that does the job:
scramble() {
# $1: string to scramble
# return in variable scramble_ret
local a=$1 i
scramble_ret=
while((${#a})); do
((i=RANDOM%${#a}))
scramble_ret+=${a:i:1}
a=${a::i}${a:i+1}
done
}
See if it works:
$ scramble 'AGsg4SKKs74s62#'
$ echo "$scramble_ret"
G4s6s#2As74SgKK
Looks all right.

I know that you haven't mentioned Perl but it could be done like this:
perl -MList::Util=shuffle -F'' -lane 'print shuffle #F' <<<"AGsg4SKKs74s62#"
-a enables auto-split mode and -F'' sets the field separator to an empty string, so each character goes into a separate array element. The array is shuffled using the function provided by the core module List::Util.

Here is my solution, usage: shuffleString "any-string". Performance is not in my consideration when using bash.
function shuffleString() {
local line="$1"
for i in $(seq 1 ${#line}); do
local p=$(expr $RANDOM % ${#line})
if [[ $p -lt $i ]]; then
local line="${line:0:$p}${line:$i:1}${line:$p+1:$i-$p-1}${line:$p:1}${line:$i+1}"
elif [[ $p -gt $i ]]; then
local line="${line:0:$i}${line:$p:1}${line:$i+1:$p-$i-1}${line:$i:1}${line:$p+1}"
fi
done
echo "$line"
}

Related

How to clean up multiple file names using bash?

I have. directory with ~250 .txt files in it. Each of these files has a title like this:
Abraham Lincoln [December 01, 1862].txt
George Washington [October 25, 1790].txt
etc...
However, these are terrible file names for reading into python and I want to iterate over all of them to change them to a more suitable format.
I've tried similar things for changing single variables that are shared across many files. But I can't wrap my head around how I should iterate over these files and change the formatting of their names while still keeping the same information.
The ideal output would be something like
1861_12_01_abraham_lincoln.txt
1790_10_25_george_washington.txt
etc...
Please try the straightforward (tedious) bash script:
#!/bin/bash
declare -A map=(["January"]="01" ["February"]="02" ["March"]="03" ["April"]="04" ["May"]="05" ["June"]="06" ["July"]="07" ["August"]="08" ["September"]="09" ["October"]="10" ["November"]="11" ["December"]="12")
pat='^([^[]+) \[([A-Za-z]+) ([0-9]+), ([0-9]+)]\.txt$'
for i in *.txt; do
if [[ $i =~ $pat ]]; then
newname="$(printf "%s_%s_%s_%s.txt" "${BASH_REMATCH[4]}" "${map["${BASH_REMATCH[2]}"]}" "${BASH_REMATCH[3]}" "$(tr 'A-Z ' 'a-z_' <<< "${BASH_REMATCH[1]}")")"
mv -- "$i" "$newname"
fi
done
for file in *.txt; do
# extract parts of the filename to be differently formatted with a regex match
[[ $file =~ (.*)\[(.*)\] ]] || { echo "invalid file $file"; exit; }
# format extracted strings and generate the new filename
formatted_date=$(date -d "${BASH_REMATCH[2]}" +"%Y_%m_%d")
name="${BASH_REMATCH[1]// /_}" # replace spaces in the name with underscores
f="${formatted_date}_${name,,}" # convert name to lower-case and append it to date string
new_filename="${f::-1}.txt" # remove trailing underscore and add `.txt` extension
# do what you need here
echo $new_filename
# mv $file $new_filename
done
I like to pull the filename apart, then put it back together.
Also GNU date can parse-out the time, which is simpler than using sed or a big case statement to convert "October" to "10".
#! /usr/bin/bash
if [ "$1" == "" ] || [ "$1" == "--help" ]; then
echo "Give a filename like \"Abraham Lincoln [December 01, 1862].txt\" as an argument"
exit 2
fi
filename="$1"
# remove the brackets
filename=`echo "$filename" | sed -e 's/[\[]//g;s/\]//g'`
# cut out the name
namepart=`echo "$filename" | awk '{ print $1" "$2 }'`
# cut out the date
datepart=`echo "$filename" | awk '{ print $3" "$4" "$5 }' | sed -e 's/\.txt//'`
# format up the date (relies on GNU date)
datepart=`date --date="$datepart" +"%Y_%m_%d"`
# put it back together with underscores, in lower case
final=`echo "$namepart $datepart.txt" | tr '[A-Z]' '[a-z]' | sed -e 's/ /_/g'`
echo mv \"$1\" \"$final\"
EDIT: converted to BASH, from Bourne shell.

Retrieve string between characters and assign on new variable using awk in bash

I'm new to bash scripting, I'm learning how commands work, I stumble in this problem,
I have a file /home/fedora/file.txt
Inside of the file is like this:
[apple] This is a fruit.
[ball] This is a sport's equipment.
[cat] This is an animal.
What I wanted is to retrieve words between "[" and "]".
What I tried so far is :
while IFS='' read -r line || [[ -n "$line" ]];
do
echo $line | awk -F"[" '{print$2}' | awk -F"]" '{print$1}'
done < /home/fedora/file.txt
I can print the words between "[" and "]".
Then I wanted to put the echoed word into a variable but i don't know how to.
Any help I will appreciate.
Try this:
variable="$(echo $line | awk -F"[" '{print$2}' | awk -F"]" '{print$1}')"
or
variable="$(awk -F'[\[\]]' '{print $2}' <<< "$line")"
or complete
while IFS='[]' read -r foo fruit rest; do echo $fruit; done < file
or with an array:
while IFS='[]' read -ra var; do echo "${var[1]}"; done < file
In addition to using awk, you can use the native parameter expansion/substring extraction provided by bash. Below # indicates a trim from the left, while % is used to trim from the right. (note: a single # or % indicates removal up to the first occurrence, while ## or %% indicates removal of all occurrences):
#!/bin/bash
[ -r "$1" ] || { ## validate input is readable
printf "error: insufficient input. usage: %s filename\n" "${0##*/}"
exit 1
}
## read each line and separate label and value
while read -r line || [ -n "$line" ]; do
label=${line#[} # trim initial [ from left
label=${label%%]*} # trim through ] from right
value=${line##*] } # trim from left through '[ '
printf " %-8s -> '%s'\n" "$label" "$value"
done <"$1"
exit 0
Input
$ cat dat/labels.txt
[apple] This is a fruit.
[ball] This is a sport's equipment.
[cat] This is an animal.
Output
$ bash readlabel.sh dat/labels.txt
apple -> 'This is a fruit.'
ball -> 'This is a sport's equipment.'
cat -> 'This is an animal.'

Command to count the characters present in the variable

I am trying to count the number of characters present in the variable. I used the below shell command. But I am getting error - command not found in line 4
#!/bin/bash
for i in one; do
n = $i | wc -c
echo $n
done
Can someone help me in this?
In bash you can just write ${#string}, which will return the length of the variable string, i.e. the number of characters in it.
Something like this:
#!/bin/bash
for i in one; do
n=$(echo $i | wc -c)
echo $n
done
Assignments in bash cannot have a space before the equals sign. In addition, you want to capture the output of the command you run and assign that to $n, rather than that statement which would probably just assign $i to $n.
Use the following instead:
#!/bin/bash
for i in one; do
n=`$i | wc -c`
echo $n
done
It can be as simple as that:
str="abcdef"; wc -c <<< "$str"
7
But mind you that end of line counts as a character:
str="abcdef"; cat -A <<< "$str"
abcdef$
If you need to remove it:
str="abcdef"; tr -d '\n' <<< "$str" | wc -c
6

Script on bash, can you?

There is a string $STRING, in which syllables are written with the spaces. If the variable $WORD have at least one syllable in this string, report of this in any way.
Your solution checks to see if $WORD exists in $STRING when it should be the other way around. Try this:
string="run walk stand"
word=walking
if echo "$string" | sed -e 's/ /\n/g' | grep -Fqif - <(echo "$word")
then
echo "Match!"
fi
As you can see, you can test the result of the grep without having to save the output in a variable.
By the way -n is the same as ! -z.

How do I find common characters between two strings in bash?

For example:
s1="my_foo"
s2="not_my_bar"
the desired result would be my_o. How do I do this in bash?
My solution below uses fold to break the string into one character per line, sort to sort the lists, comm to compare the two strings and finally tr to delete the new line characters
comm -12 <(fold -w1 <<< $s1 | sort -u) <(fold -w1 <<< $s2 | sort -u) | tr -d '\n'
Alternatively, here is a pure Bash solution (which also maintains the order of the characters). It iterates over the first string and checks if each character is present in the second string.
s="temp_foo_bar"
t="temp_bar"
i=0
while [ $i -ne ${#s} ]
do
c=${s:$i:1}
if [[ $result != *$c* && $t == *$c* ]]
then
result=$result$c
fi
((i++))
done
echo $result
prints: temp_bar
Assuming the strings do not contain embedded newlines:
s1='my_foo' s2='my_bar'
intersect=$(
comm -12 <(
fold -w1 <<< "$s1" |
sort -u
) <(
fold -w1 <<< "$s2" |
sort -u
) |
tr -d \\n
)
printf '%s\n' "$intersect"
And another one:
tr -dc "$s2" <<< "$s1"
a late entry, I've just found this page:
echo "$str2" |
awk 'BEGIN{FS=""}
{ n=0; while(n<=NF) {
if ($n == substr(test,n,1)) { if(!found[$n]) printf("%c",$n); found[$n]=1;} n++;
} print ""}' test="$str1"
and another one, this one builds a regexp for matching (note: doesn't work with special characters, but that's not that hard to fix with anonther sed)
echo "$str1" |
grep -E -o ^`echo -n "$str2" | sed 's/\(.\)/(|\1/g'; echo "$str2" | sed 's/./)/g'`
Should be a portable solution:
s1="my_foo"
s2="my_bar"
while [ -n "$s1" -a -n "$s2" ]
do
if [ "${s1:0:1}" = "${s2:0:1}" ]
then
printf %s "${s1:0:1}"
else
break
fi
s1="${s1:1:${#s1}}"
s2="${s2:1:${#s2}}"
done
A solution using a single sed execution:
echo -e "$s1\n$s2" | sed -e 'N;s/^/\n/;:begin;s/\n\(.\)\(.*\)\n\(.*\)\1\(.*\)/\1\n\2\n\3\4/;t begin;s/\n.\(.*\)\n\(.*\)/\n\1\n\2/;t begin;s/\n\n.*//'
As all cryptic sed script, it needs explanation in the form of a sed script file that can be run by echo -e "$s1\n$s2" | sed -f script:
# Read the next line so s1 and s2 are in the pattern space only separated by a \n.
N
# Put a \n at the beginning of the pattern space.
s/^/\n/
# During the script execution, the pattern space will contain <result so far>\n<what left of s1>\n<what left of s2>.
:begin
# If the 1st char of s1 is found in s2, remove it from s1 and s2, append it to the result and do this again until it fails.
s/\n\(.\)\(.*\)\n\(.*\)\1\(.*\)/\1\n\2\n\3\4/
t begin
# When previous substitution fails, remove 1st char of s1 and try again to find 1st char of S1 in s2.
s/\n.\(.*\)\n\(.*\)/\n\1\n\2/
t begin
# When previous substitution fails, s1 is empty so remove the \n and what is left of s2.
s/\n\n.*//
If you want to remove duplicate, add the following at the end of the script:
:end;s/\(.\)\(.*\)\1/\1\2/;t end
Edit: I realize that dogbane's pure shell solution has the same algorithm, and is probably more efficient.
comm=""
for ((i=0;i<${#s1};i++))
do
if test ${s1:$i:1} = ${s2:$i:1}
then
comm=${comm}${s1:$i:1}
fi
done
Since everyone loves perl one-liners full of punctuation:
perl -e '$a{$_}++ for split "",shift; $b{$_}++ for split "",shift; for (sort keys %a){print if defined $b{$_}}' my_foo not_my_bar
Creates hashes %a and %b from the input strings.
Prints any characters common to both strings.
outputs:
_moy
"flower","flow","flight" --> output fl
s="flower"
t="flow"
i=0
while [ $i -ne ${#s} ]
do
c=${s:$i:1}
if [[ $result != *$c* && $t == *$c* ]]
then
result=$result$c
fi
((i++))
done
echo $result
p=$result
q="flight"
j=0
while [ $j -ne ${#p} ]
do
c1=${p:$j:1}
if [[ $result1 != *$c1* && $q == *$c1* ]]
then
result1=$result1$c1
fi
((j++))
done
echo $result1

Resources