How do I find common characters between two strings in bash? - string

For example:
s1="my_foo"
s2="not_my_bar"
the desired result would be my_o. How do I do this in bash?

My solution below uses fold to break the string into one character per line, sort to sort the lists, comm to compare the two strings and finally tr to delete the new line characters
comm -12 <(fold -w1 <<< $s1 | sort -u) <(fold -w1 <<< $s2 | sort -u) | tr -d '\n'
Alternatively, here is a pure Bash solution (which also maintains the order of the characters). It iterates over the first string and checks if each character is present in the second string.
s="temp_foo_bar"
t="temp_bar"
i=0
while [ $i -ne ${#s} ]
do
c=${s:$i:1}
if [[ $result != *$c* && $t == *$c* ]]
then
result=$result$c
fi
((i++))
done
echo $result
prints: temp_bar

Assuming the strings do not contain embedded newlines:
s1='my_foo' s2='my_bar'
intersect=$(
comm -12 <(
fold -w1 <<< "$s1" |
sort -u
) <(
fold -w1 <<< "$s2" |
sort -u
) |
tr -d \\n
)
printf '%s\n' "$intersect"
And another one:
tr -dc "$s2" <<< "$s1"

a late entry, I've just found this page:
echo "$str2" |
awk 'BEGIN{FS=""}
{ n=0; while(n<=NF) {
if ($n == substr(test,n,1)) { if(!found[$n]) printf("%c",$n); found[$n]=1;} n++;
} print ""}' test="$str1"
and another one, this one builds a regexp for matching (note: doesn't work with special characters, but that's not that hard to fix with anonther sed)
echo "$str1" |
grep -E -o ^`echo -n "$str2" | sed 's/\(.\)/(|\1/g'; echo "$str2" | sed 's/./)/g'`

Should be a portable solution:
s1="my_foo"
s2="my_bar"
while [ -n "$s1" -a -n "$s2" ]
do
if [ "${s1:0:1}" = "${s2:0:1}" ]
then
printf %s "${s1:0:1}"
else
break
fi
s1="${s1:1:${#s1}}"
s2="${s2:1:${#s2}}"
done

A solution using a single sed execution:
echo -e "$s1\n$s2" | sed -e 'N;s/^/\n/;:begin;s/\n\(.\)\(.*\)\n\(.*\)\1\(.*\)/\1\n\2\n\3\4/;t begin;s/\n.\(.*\)\n\(.*\)/\n\1\n\2/;t begin;s/\n\n.*//'
As all cryptic sed script, it needs explanation in the form of a sed script file that can be run by echo -e "$s1\n$s2" | sed -f script:
# Read the next line so s1 and s2 are in the pattern space only separated by a \n.
N
# Put a \n at the beginning of the pattern space.
s/^/\n/
# During the script execution, the pattern space will contain <result so far>\n<what left of s1>\n<what left of s2>.
:begin
# If the 1st char of s1 is found in s2, remove it from s1 and s2, append it to the result and do this again until it fails.
s/\n\(.\)\(.*\)\n\(.*\)\1\(.*\)/\1\n\2\n\3\4/
t begin
# When previous substitution fails, remove 1st char of s1 and try again to find 1st char of S1 in s2.
s/\n.\(.*\)\n\(.*\)/\n\1\n\2/
t begin
# When previous substitution fails, s1 is empty so remove the \n and what is left of s2.
s/\n\n.*//
If you want to remove duplicate, add the following at the end of the script:
:end;s/\(.\)\(.*\)\1/\1\2/;t end
Edit: I realize that dogbane's pure shell solution has the same algorithm, and is probably more efficient.

comm=""
for ((i=0;i<${#s1};i++))
do
if test ${s1:$i:1} = ${s2:$i:1}
then
comm=${comm}${s1:$i:1}
fi
done

Since everyone loves perl one-liners full of punctuation:
perl -e '$a{$_}++ for split "",shift; $b{$_}++ for split "",shift; for (sort keys %a){print if defined $b{$_}}' my_foo not_my_bar
Creates hashes %a and %b from the input strings.
Prints any characters common to both strings.
outputs:
_moy

"flower","flow","flight" --> output fl
s="flower"
t="flow"
i=0
while [ $i -ne ${#s} ]
do
c=${s:$i:1}
if [[ $result != *$c* && $t == *$c* ]]
then
result=$result$c
fi
((i++))
done
echo $result
p=$result
q="flight"
j=0
while [ $j -ne ${#p} ]
do
c1=${p:$j:1}
if [[ $result1 != *$c1* && $q == *$c1* ]]
then
result1=$result1$c1
fi
((j++))
done
echo $result1

Related

How to clean up multiple file names using bash?

I have. directory with ~250 .txt files in it. Each of these files has a title like this:
Abraham Lincoln [December 01, 1862].txt
George Washington [October 25, 1790].txt
etc...
However, these are terrible file names for reading into python and I want to iterate over all of them to change them to a more suitable format.
I've tried similar things for changing single variables that are shared across many files. But I can't wrap my head around how I should iterate over these files and change the formatting of their names while still keeping the same information.
The ideal output would be something like
1861_12_01_abraham_lincoln.txt
1790_10_25_george_washington.txt
etc...
Please try the straightforward (tedious) bash script:
#!/bin/bash
declare -A map=(["January"]="01" ["February"]="02" ["March"]="03" ["April"]="04" ["May"]="05" ["June"]="06" ["July"]="07" ["August"]="08" ["September"]="09" ["October"]="10" ["November"]="11" ["December"]="12")
pat='^([^[]+) \[([A-Za-z]+) ([0-9]+), ([0-9]+)]\.txt$'
for i in *.txt; do
if [[ $i =~ $pat ]]; then
newname="$(printf "%s_%s_%s_%s.txt" "${BASH_REMATCH[4]}" "${map["${BASH_REMATCH[2]}"]}" "${BASH_REMATCH[3]}" "$(tr 'A-Z ' 'a-z_' <<< "${BASH_REMATCH[1]}")")"
mv -- "$i" "$newname"
fi
done
for file in *.txt; do
# extract parts of the filename to be differently formatted with a regex match
[[ $file =~ (.*)\[(.*)\] ]] || { echo "invalid file $file"; exit; }
# format extracted strings and generate the new filename
formatted_date=$(date -d "${BASH_REMATCH[2]}" +"%Y_%m_%d")
name="${BASH_REMATCH[1]// /_}" # replace spaces in the name with underscores
f="${formatted_date}_${name,,}" # convert name to lower-case and append it to date string
new_filename="${f::-1}.txt" # remove trailing underscore and add `.txt` extension
# do what you need here
echo $new_filename
# mv $file $new_filename
done
I like to pull the filename apart, then put it back together.
Also GNU date can parse-out the time, which is simpler than using sed or a big case statement to convert "October" to "10".
#! /usr/bin/bash
if [ "$1" == "" ] || [ "$1" == "--help" ]; then
echo "Give a filename like \"Abraham Lincoln [December 01, 1862].txt\" as an argument"
exit 2
fi
filename="$1"
# remove the brackets
filename=`echo "$filename" | sed -e 's/[\[]//g;s/\]//g'`
# cut out the name
namepart=`echo "$filename" | awk '{ print $1" "$2 }'`
# cut out the date
datepart=`echo "$filename" | awk '{ print $3" "$4" "$5 }' | sed -e 's/\.txt//'`
# format up the date (relies on GNU date)
datepart=`date --date="$datepart" +"%Y_%m_%d"`
# put it back together with underscores, in lower case
final=`echo "$namepart $datepart.txt" | tr '[A-Z]' '[a-z]' | sed -e 's/ /_/g'`
echo mv \"$1\" \"$final\"
EDIT: converted to BASH, from Bourne shell.

What is the proper way to convert first character of a string from upper to lower case in shell scripting on Mac OS? [duplicate]

I want to uppercase just the first character in my string with bash.
foo="bar";
//uppercase first character
echo $foo;
should print "Bar";
One way with bash (version 4+):
foo=bar
echo "${foo^}"
prints:
Bar
foo="$(tr '[:lower:]' '[:upper:]' <<< ${foo:0:1})${foo:1}"
One way with sed:
echo "$(echo "$foo" | sed 's/.*/\u&/')"
Prints:
Bar
$ foo="bar";
$ foo=`echo ${foo:0:1} | tr '[a-z]' '[A-Z]'`${foo:1}
$ echo $foo
Bar
To capitalize first word only:
foo='one two three'
foo="${foo^}"
echo $foo
One two three
To capitalize every word in the variable:
foo="one two three"
foo=( $foo ) # without quotes
foo="${foo[#]^}"
echo $foo
One Two Three
(works in bash 4+)
Using awk only
foo="uNcapItalizedstrIng"
echo $foo | awk '{print toupper(substr($0,0,1))tolower(substr($0,2))}'
Here is the "native" text tools way:
#!/bin/bash
string="abcd"
first=`echo $string|cut -c1|tr [a-z] [A-Z]`
second=`echo $string|cut -c2-`
echo $first$second
just for fun here you are :
foo="bar";
echo $foo | awk '{$1=toupper(substr($1,0,1))substr($1,2)}1'
# or
echo ${foo^}
# or
echo $foo | head -c 1 | tr [a-z] [A-Z]; echo $foo | tail -c +2
# or
echo ${foo:1} | sed -e 's/^./\B&/'
It can be done in pure bash with bash-3.2 as well:
# First, get the first character.
fl=${foo:0:1}
# Safety check: it must be a letter :).
if [[ ${fl} == [a-z] ]]; then
# Now, obtain its octal value using printf (builtin).
ord=$(printf '%o' "'${fl}")
# Fun fact: [a-z] maps onto 0141..0172. [A-Z] is 0101..0132.
# We can use decimal '- 40' to get the expected result!
ord=$(( ord - 40 ))
# Finally, map the new value back to a character.
fl=$(printf '%b' '\'${ord})
fi
echo "${fl}${foo:1}"
This works too...
FooBar=baz
echo ${FooBar^^${FooBar:0:1}}
=> Baz
FooBar=baz
echo ${FooBar^^${FooBar:1:1}}
=> bAz
FooBar=baz
echo ${FooBar^^${FooBar:2:2}}
=> baZ
And so on.
Sources:
Bash Manual: Shell Parameter Expansion
Full Bash Guide: Parameters
Bash Hacker's Wiki Parameter Expansion
Inroductions/Tutorials:
Cyberciti.biz: 8. Convert to upper to lower case or vice versa
Opensource.com: An introduction to parameter expansion in Bash
This one worked for me:
Searching for all *php file in the current directory , and replace the first character of each filename to capital letter:
e.g: test.php => Test.php
for f in *php ; do mv "$f" "$(\sed 's/.*/\u&/' <<< "$f")" ; done
Alternative and clean solution for both Linux and OSX, it can also be used with bash variables
python -c "print(\"abc\".capitalize())"
returns Abc
This is POSIX sh-compatible as far as I know.
upper_first.sh:
#!/bin/sh
printf "$1" | cut -c1 -z | tr -d '\0' | tr [:lower:] [:upper:]
printf "$1" | cut -c2-
cut -c1 -z ends the first string with \0 instead of \n. It gets removed with tr -d '\0'. It also works to omit the -z and use tr -d '\n' instead, but this breaks if the first character of the string is a newline.
Usage:
$ upper_first.sh foo
Foo
$
In a function:
#!/bin/sh
function upper_first ()
{
printf "$1" | cut -c1 -z | tr -d '\0' | tr [:lower:] [:upper:]
printf "$1" | cut -c2-
}
old="foo"
new="$(upper_first "$old")"
echo "$new"
Posix compliant and with less sub-processes:
v="foo[Bar]"
printf "%s" "${v%"${v#?}"}" | tr '[:lower:]' '[:upper:]' && printf "%s" "${v#?}"
==> Foo[Bar]
first-letter-to-lower () {
str=""
space=" "
for i in $#
do
if [ -z $(echo $i | grep "the\|of\|with" ) ]
then
str=$str"$(echo ${i:0:1} | tr '[A-Z]' '[a-z]')${i:1}$space"
else
str=$str${i}$space
fi
done
echo $str
}
first-letter-to-upper-xc () {
v-first-letter-to-upper | xclip -selection clipboard
}
first-letter-to-upper () {
str=""
space=" "
for i in $#
do
if [ -z $(echo $i | grep "the\|of\|with" ) ]
then
str=$str"$(echo ${i:0:1} | tr '[a-z]' '[A-Z]')${i:1}$space"
else
str=$str${i}$space
fi
done
echo $str
}
first-letter-to-lower-xc(){
v-first-letter-to-lower | xclip -selection clipboard
}
Not exactly what asked but quite helpful
declare -u foo #When the variable is assigned a value, all lower-case characters are converted to upper-case.
foo=bar
echo $foo
BAR
And the opposite
declare -l foo #When the variable is assigned a value, all upper-case characters are converted to lower-case.
foo=BAR
echo $foo
bar
What if the first character is not a letter (but a tab, a space, and a escaped double quote)? We'd better test it until we find a letter! So:
S=' \"รณ foo bar\"'
N=0
until [[ ${S:$N:1} =~ [[:alpha:]] ]]; do N=$[$N+1]; done
#F=`echo ${S:$N:1} | tr [:lower:] [:upper:]`
#F=`echo ${S:$N:1} | sed -E -e 's/./\u&/'` #other option
F=`echo ${S:$N:1}
F=`echo ${F} #pure Bash solution to "upper"
echo "$F"${S:(($N+1))} #without garbage
echo '='${S:0:(($N))}"$F"${S:(($N+1))}'=' #garbage preserved
Foo bar
= \"Foo bar=

Retrieve string between characters and assign on new variable using awk in bash

I'm new to bash scripting, I'm learning how commands work, I stumble in this problem,
I have a file /home/fedora/file.txt
Inside of the file is like this:
[apple] This is a fruit.
[ball] This is a sport's equipment.
[cat] This is an animal.
What I wanted is to retrieve words between "[" and "]".
What I tried so far is :
while IFS='' read -r line || [[ -n "$line" ]];
do
echo $line | awk -F"[" '{print$2}' | awk -F"]" '{print$1}'
done < /home/fedora/file.txt
I can print the words between "[" and "]".
Then I wanted to put the echoed word into a variable but i don't know how to.
Any help I will appreciate.
Try this:
variable="$(echo $line | awk -F"[" '{print$2}' | awk -F"]" '{print$1}')"
or
variable="$(awk -F'[\[\]]' '{print $2}' <<< "$line")"
or complete
while IFS='[]' read -r foo fruit rest; do echo $fruit; done < file
or with an array:
while IFS='[]' read -ra var; do echo "${var[1]}"; done < file
In addition to using awk, you can use the native parameter expansion/substring extraction provided by bash. Below # indicates a trim from the left, while % is used to trim from the right. (note: a single # or % indicates removal up to the first occurrence, while ## or %% indicates removal of all occurrences):
#!/bin/bash
[ -r "$1" ] || { ## validate input is readable
printf "error: insufficient input. usage: %s filename\n" "${0##*/}"
exit 1
}
## read each line and separate label and value
while read -r line || [ -n "$line" ]; do
label=${line#[} # trim initial [ from left
label=${label%%]*} # trim through ] from right
value=${line##*] } # trim from left through '[ '
printf " %-8s -> '%s'\n" "$label" "$value"
done <"$1"
exit 0
Input
$ cat dat/labels.txt
[apple] This is a fruit.
[ball] This is a sport's equipment.
[cat] This is an animal.
Output
$ bash readlabel.sh dat/labels.txt
apple -> 'This is a fruit.'
ball -> 'This is a sport's equipment.'
cat -> 'This is an animal.'

Sorting variables alphabetically in a sed

In short i have a bash script that finds all 5 letter words in the dictionary with a single duplicated letter. Im using sed to print out the letter that repeats and the letters that dont. I have to sort the letters that don't repeat in alphabetical order, and im not quite sure how
Here's my sed;
sed 's/\(.*\)\(.\)\(.*\)\2\(.*\)/\1\2\3\2\4 \2 \1\3\4 /'
so i need to sort \1\3\4 by piping them into a read loop
UPDATE
grep '^[a-z][a-z][a-z][a-z][a-z]$' /usr/share/dict/words |
grep '.*\(.\).*\1.*' |
grep -v '.*\(.\).*\1.*\1.*' |
grep -v '.*\(.\).*\(.\).*\1.*\2.*\1.*\2.*' |
grep -v '.*\(.\).*\(.\).*\1.*\2.*\2.*\1.*' |
grep -v '.*\(.\).*\(.\).*\1.*\1.*\2.*\2.*' |
sed 's/\(.*\)\(.\)\(.*\)\2\(.*\)/\1\2\3\2\4 \2 \1\3\4/' |
while read word dup nondup
do sort -$nondup
front=$[nondup:1]
middle=$[nondup:2]
back=$[nondup:3]
echo $word $dup $front$middle$back
done
For a sample dictionary of 5-letter words:
$ cat file
timey
terra
debby
ovolt
spell
Now, using your sed command, let's sort the output by the non-repeating letters:
$ sed 's/\(.*\)\(.\)\(.*\)\2\(.*\)/\1\2\3\2\4 \2 \1\3\4 /' file | sort -k3
timey
debby b dey
spell l spe
terra r tea
ovolt o vlt
sort -k3 sorts over the third column.
Above but also sort the non-recurring letters alphabetically
This solution adds a shell while loop in order to sort the non-recurring letters:
sed 's/\(.*\)\(.\)\(.*\)\2\(.*\)/\1\2\3\2\4 \2 \1\3\4 /' file | while read word rep non
do
non=$(echo "$non" | grep -o . | sort |tr -d "\n")
echo "$word $rep $non"
done | sort -k3
On the same input, this produces the output:
timey
terra r aet
debby b dey
spell l eps
ovolt o ltv
Primitive Method Suitable Only For Instructors
If I understand correctly, your instructor wants something like this:
sed 's/\(.*\)\(.\)\(.*\)\2\(.*\)/\1\2\3\2\4 \2 \1\3\4 /' file |
while read word rep non
do
[ "$non" ] || continue # skip any word that lacks a repeating letter
front=${non:0:1}
middle=${non:1:1}
back=${non:2:1}
if [[ "$front" < "$middle" ]] && [[ "$front" < "$back" ]]
then
[[ "$middle" < "$back" ]] && non=$front$middle$back || non=$front$back$middle
elif [[ "$middle" < "$front" ]] && [[ "$middle" < "$back" ]]
then
[[ "$front" < "$back" ]] && non=$middle$front$back || non=$middle$back$front
elif [[ "$back" < "$front" ]] && [[ "$back" < "$middle" ]]
then
[[ "$front" < "$middle" ]] && non=$back$front$middle || non=$back$middle$front
else
echo "ERROR"
fi
echo "$word $rep $non"
done | sort -k3
This method requires bash.
You can simply modify your sed command and pipe to sort to sort the 3 characters most efficiently. In addition to John's answer, if your question wants only the remnants sorted:
sed -e 's/\(.*\)\(.\)\(.*\)\2\(.*\)/\1\3\4/' stack/dat/dicta.dat | sort
input:
$ cat stack/dat/dicta.dat
aback
abaft
abase
abash
abask
abate
output:
$ sed -e 's/\(.*\)\(.\)\(.*\)\2\(.*\)/\1\3\4/' stack/dat/dicta.dat | sort
bck
bft
bse
bsh
bsk
bte
If you want the full output sorted, then calling sort following your original sed with option -k3 is the correct way.

Bash- scramble characters contained in a string

So I have this function with the following output:
AGsg4SKKs74s62#
I need to find a way to scramble the characters without deleting anything..aka all characters must be present after I scramble them.
I can only bash utilities including awk and sed.
echo 'AGsg4SKKs74s62#' | sed 's/./&\n/g' | shuf | tr -d "\n"
Output (e.g.):
S7s64#2gKAGsKs4
Here's a pure Bash function that does the job:
scramble() {
# $1: string to scramble
# return in variable scramble_ret
local a=$1 i
scramble_ret=
while((${#a})); do
((i=RANDOM%${#a}))
scramble_ret+=${a:i:1}
a=${a::i}${a:i+1}
done
}
See if it works:
$ scramble 'AGsg4SKKs74s62#'
$ echo "$scramble_ret"
G4s6s#2As74SgKK
Looks all right.
I know that you haven't mentioned Perl but it could be done like this:
perl -MList::Util=shuffle -F'' -lane 'print shuffle #F' <<<"AGsg4SKKs74s62#"
-a enables auto-split mode and -F'' sets the field separator to an empty string, so each character goes into a separate array element. The array is shuffled using the function provided by the core module List::Util.
Here is my solution, usage: shuffleString "any-string". Performance is not in my consideration when using bash.
function shuffleString() {
local line="$1"
for i in $(seq 1 ${#line}); do
local p=$(expr $RANDOM % ${#line})
if [[ $p -lt $i ]]; then
local line="${line:0:$p}${line:$i:1}${line:$p+1:$i-$p-1}${line:$p:1}${line:$i+1}"
elif [[ $p -gt $i ]]; then
local line="${line:0:$i}${line:$p:1}${line:$i+1:$p-$i-1}${line:$i:1}${line:$p+1}"
fi
done
echo "$line"
}

Resources