linux bash, camel case string to separate by dash - linux

Is there a way to convert something like this:
MyDirectoryFileLine
to
my-directory-file-line
I found some ways to convert all letters to uppercase or lowercase, but not in that way; any ideas?

You can use s/\([A-Z]\)/-\L\1/g to find an upper case letter and replace it with a dash and it's lower case. However, this gives you a dash at the beginning of the line, so you need another sed expression to handle that.
This should work:
sed --expression 's/\([A-Z]\)/-\L\1/g' \
--expression 's/^-//' \
<<< "MyDirectoryFileLine"

I propose to use sed to do that:
NEW=$(echo MyDirectoryFileLine \
| sed 's/\(.\)\([A-Z]\)/\1-\2/g' \
| tr '[:upper:]' '[:lower:]')
UPD I forget to convert to lower case, updated code

echo MyDirectoryFileLine | perl -ne 'print lc(join("-", split(/(?=[A-Z])/)))'
prints my-directory-file-line

Slight variation on #bilalq's answer that covers some more possible edge cases:
echo "MyDirectoryMVPFileLine" \
| sed 's/\([^A-Z]\)\([A-Z0-9]\)/\1-\2/g' \
| sed 's/\([A-Z0-9]\)\([A-Z0-9]\)\([^A-Z]\)/\1-\2\3/g' \
| tr '[:upper:]' '[:lower:]'
output is still:
my-directory-mvp-file-line
but also:
WhatADeal -> what-a-deal
TheMVP -> the-mvp
DoSomeABTesting -> do-some-ab-testing
The3rdThing -> the-3rd-thing
The3Things -> the-3-things
ThingNumber3 -> thing-number-3

None of the solutions posted here worked for me. Most didn't support multiple platforms well. The one from #4ndrew was close, but it failed on edge cases that had multiple capitalized characters next to each other (example: FooMVPClient turns into foo-mv-pclient instead of foo-mvp-client).
This worked for me:
echo "MyDirectoryMVPFileLine" \
| sed 's/\([a-z]\)\([A-Z]\)/\1-\2/g' \
| sed 's/\([A-Z]\{2,\}\)\([A-Z]\)/\1-\2/g' \
| tr '[:upper:]' '[:lower:]'
output:
my-directory-mvp-file-line

My modest contribution that works with "/" (possible use for directory names or github repo names). It's not as clean as it could be, but does the job. I've used #Peter contribution as a base, then tweaked a bit.
function kebab_case() {
echo -n "$1" |\
sed 's/\([^A-Z+]\)\([A-Z0-9]\)/\1-\2/g' |\
sed 's/\([0-9]\)\([A-Z]\)/\1-\2/g' |\
sed 's/\([A-Z]\)\([0-9]\)/\1-\2/g' |\
sed 's/--/-/g' |\
sed 's/\([\/]\)-/\1/g' |\
tr '[:upper:]' '[:lower:]'
}
function assert_kebab_equal() {
local Actual
local Expected
Expected="$1"
Actual="$(kebab_case "$2")"
if [ "${Expected}" != "${Actual}" ]; then
echo Error:
echo " Actual: ${Actual}"
echo "Expected: ${Expected}"
else
echo "$2" "$1" | awk '{ printf "%-30s -> %-40s\n", $1, $2}'
fi
}
assert_kebab_equal "abc-def" "AbcDef"
assert_kebab_equal "/abc-def-ghi/def" "/AbcDef-Ghi/Def"
assert_kebab_equal "/ab-cd-ef" "/AbCdEf"
assert_kebab_equal "repo-owner/repo-name" "RepoOwner/RepoName"
assert_kebab_equal "repo-12-owner/repo-12-name" "Repo12Owner/Repo12Name"
assert_kebab_equal "repo-12-3-owner/repo-12-name" "Repo12-3Owner/Repo12Name"
assert_kebab_equal "repo-owner/repo-name" "REPO-OWNER/REPO-NAME"
assert_kebab_equal "repo-owner-2/repo-name" "REPO-OWNER2/REPO-NAME"
assert_kebab_equal "repo-1-owner" "REPO1-OWNER"
assert_kebab_equal "repo-1-owner-1/22-repo-2-name" "REPO1-OWNER1/22REPO-2NAME"
# Outputs:
AbcDef -> abc-def
/AbcDef-Ghi/Def -> /abc-def-ghi/def
/AbCdEf -> /ab-cd-ef
RepoOwner/RepoName -> repo-owner/repo-name
Repo12Owner/Repo12Name -> repo-12-owner/repo-12-name
Repo12-3Owner/Repo12Name -> repo-12-3-owner/repo-12-name
REPO-OWNER/REPO-NAME -> repo-owner/repo-name
REPO-OWNER2/REPO-NAME -> repo-owner-2/repo-name
REPO1-OWNER -> repo-1-owner
REPO1-OWNER1/22REPO-2NAME -> repo-1-owner-1/22-repo-2-name

This might work for you:
<<<"MyDirectoryFileLine" sed 's/[A-Z]/-\l&/g;s/.//'
my-directory-file-line

With GNU sed:
echo "MyDirectoryFileLine"|sed -e 's/\([A-Z]\)/-\L\1/g'
You just need to strip the first dash if that's bothers you:
echo "MyDirectoryFileLine"|sed -e 's/\([A-Z]\)/-\L\1/g' -e 's/^-//'
With BSD sed it it's a bit longer:
echo "MyDirectoryFileLine"|sed -e 's/\([A-Z]\)/-\1/g' -e 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/' -e 's/^-//'
Update: the BSD version will work with the GNU version, so I recommend using the latter.

echo "SomeACRONYMInCamelCaseString" \
| sed -e 's/\([a-z]\)\([A-Z]\)/\1-\L\2/' \
| sed -e 's/\(.*\)/\L\1/')
sed -e 's/\([a-z]\)\([A-Z]\)/\1-\L\2/' replace an uppercase letter with a hyphen and a lowercase letter only if it is preceded by a lowercase letter.
sed -e 's/\(.*\)/\L\1/' puts the whole string in lowercase

Related

Add suffix to comma-separated strings in bash ecosystem

Is there a way of transforming a comma-delimited variable to add a suffix to each token using standard gnu tools? e.g.
VARIABLE=`aaa,bbb,ccc`
suffix=`-foo`
Expected output = `aaa-foo,bbb-foo,ccc-foo`
Additionally, if I have only one token, the transformation should behave in the same way
e.g. aaa -> aaa-foo
echo "aaa,bbb,ccc" | sed -E 's/([^,]+)/\1-foo/g'
It makes groups of characters that are not "," and then append -foo on it
With variables:
suffix="-foo"; VARIABLE="aaa,bbb,ccc"; echo ${VARIABLE} | sed -E "s/([^,]+)/\1${suffix}/g"
echo $VARIBLE | tr "," "\n" | awk '{print $1"-foo"}' | paste -sd "," -
explanation:
put each token on single line
tr "," "\n"
append "-foo" to each token
awk '{print $1"-foo"}'
join back up with the original comma
paste -sd "," -
Try:
answer = `echo $VARIABLE | sed "s/,/-foo,/g" | sed "s/$/-foo/"`
If you need to have the suffix as a variable then try:
answer = `echo $VARIABLE | sed "s/,/${suffix},/g" | sed "s/$/${suffix}/"`
I don't have access to a Unix box at the moment to prove this works.
The following:
s="aaa,bbb,ccc"
IFS=,
a=( $s )
mapfile -t b < <(printf '%s-foo\n' "${a[#]}")
should give us:
$ declare -p b
declare -a b=([0]="aaa-foo" [1]="bbb-foo" [2]="ccc-foo")
From there, if you can reconstruct the original format in a number of ways...
IFS=, eval 'JOINED="${b[*]}"'
Or if you don't like using eval, perhaps:
d=""; o=""
for x in "${b[#]}"; do
printf -v o '%s%s%s' "$o" "$d" "$x"
d=,
done
... which will put the complete modified string in $o.
With bash Parameter Expansion
var='aaa,bbb,ccc';[ -n "$var" ] && printf "%s\n" "${var//,/-foo,}-foo"

Best way to swap first 4 chars with last 4 chars of string?

What's the way to swap first 4 chars with last 4 chars of string?
e.g. I have the string 20140613, I'd like to convert that to 06132014.
$ f=20140613
$ g=${f#????}${f%????}
$ echo $g
06132014
For dealing with longer strings something like the following is needed. (With inspiration from konsolebox's answer.)
echo ${f:(-4)}${f:4:${#f} - 8}${f:0:4}
Using pure BASH regex:
s='20140613'
[[ "$s" =~ ^(.*)([[:digit:]]{4})$ ]] && echo "${BASH_REMATCH[2]}${BASH_REMATCH[1]}"
06132014
Simply use substring expansion:
$ STRING=20140613
$ echo "${STRING:(-4)}${STRING:0:4}"
06132014
See Parameter Expansion.
Using date which is optimized for such kind of conversion:
$ str="20140613"
$ date +"%m%d%Y" -d "$str"
06132014
When you have to convert dates, no need to look so far ;)
Using sed:
STRING="20140613"
STRING=$(echo $STRING | sed 's/\(....\)\(.*\)/\2\1/')
Or using awk:
echo 20140613 | awk '{print substr($0,5,7) substr($0,1,4)}'
Test:
~$ echo 20140613 | awk '{print substr($0,5,7) substr($0,1,4)}'
>> 06132014
Through sed,
$ echo 20140613 | sed 's/^\(.\{4\}\)\(.\{4\}\)$/\2\1/g'
06132014
Through perl,
$ echo 20140613 | perl -pe 's/^(.{4})(.{4})$/\2\1/g'
06132014
With GNU Coreutils:
input=20140613
output=$(echo $input | fold -w4 | tac | tr -d \\n)
If you also need the last line feed, you can replace tr -d \\n with printf %s%s\\n or just append && echo to the command.
With perl
for str in 11112222 1111xxxx2222 111222
do
echo -n "$str -> "
echo "$str" | perl -ple 's/^(.{4})(.*)(.{4})$/\3\2\1/'
done
produces:
11112222 -> 22221111
1111xxxx2222 -> 2222xxxx1111
111222 -> 111222

How to extract numbers from a string?

I have string contains a path
string="toto.titi.12.tata.2.abc.def"
I want to extract only the numbers from this string.
To extract the first number:
tmp="${string#toto.titi.*.}"
num1="${tmp%.tata*}"
To extract the second number:
tmp="${string#toto.titi.*.tata.*.}"
num2="${tmp%.abc.def}"
So to extract a parameter I have to do it in 2 steps. How to extract a number with one step?
You can use tr to delete all of the non-digit characters, like so:
echo toto.titi.12.tata.2.abc.def | tr -d -c 0-9
To extract all the individual numbers and print one number word per line pipe through -
tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed 's/ /\n/g'
Breakdown:
Replaces all line breaks with spaces: tr '\n' ' '
Replaces all non numbers with spaces: sed -e 's/[^0-9]/ /g'
Remove leading white space: -e 's/^ *//g'
Remove trailing white space: -e 's/ *$//g'
Squeeze spaces in sequence to 1 space: tr -s ' '
Replace remaining space separators with line break: sed 's/ /\n/g'
Example:
echo -e " this 20 is 2sen\nten324ce 2 sort of" | tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed 's/ /\n/g'
Will print out
20
2
324
2
Here is a short one:
string="toto.titi.12.tata.2.abc.def"
id=$(echo "$string" | grep -o -E '[0-9]+')
echo $id // => output: 12 2
with space between the numbers.
Hope it helps...
Parameter expansion would seem to be the order of the day.
$ string="toto.titi.12.tata.2.abc.def"
$ read num1 num2 <<<${string//[^0-9]/ }
$ echo "$num1 / $num2"
12 / 2
This of course depends on the format of $string. But at least for the example you've provided, it seems to work.
This may be superior to anubhava's awk solution which requires a subshell. I also like chepner's solution, but regular expressions are "heavier" than parameter expansion (though obviously way more precise). (Note that in the expression above, [^0-9] may look like a regex atom, but it is not.)
You can read about this form or Parameter Expansion in the bash man page. Note that ${string//this/that} (as well as the <<<) is a bashism, and is not compatible with traditional Bourne or posix shells.
This would be easier to answer if you provided exactly the output you're looking to get. If you mean you want to get just the digits out of the string, and remove everything else, you can do this:
d#AirBox:~$ string="toto.titi.12.tata.2.abc.def"
d#AirBox:~$ echo "${string//[a-z,.]/}"
122
If you clarify a bit I may be able to help more.
You can also use sed:
echo "toto.titi.12.tata.2.abc.def" | sed 's/[0-9]*//g'
Here, sed replaces
any digits (class [0-9])
repeated any number of times (*)
with nothing (nothing between the second and third /),
and g stands for globally.
Output will be:
toto.titi..tata..abc.def
Convert your string to an array like this:
$ str="toto.titi.12.tata.2.abc.def"
$ arr=( ${str//[!0-9]/ } )
$ echo "${arr[#]}"
12 2
Use regular expression matching:
string="toto.titi.12.tata.2.abc.def"
[[ $string =~ toto\.titi\.([0-9]+)\.tata\.([0-9]+)\. ]]
# BASH_REMATCH[0] would be "toto.titi.12.tata.2.", the entire match
# Successive elements of the array correspond to the parenthesized
# subexpressions, in left-to-right order. (If there are nested parentheses,
# they are numbered in depth-first order.)
first_number=${BASH_REMATCH[1]}
second_number=${BASH_REMATCH[2]}
Using awk:
arr=( $(echo $string | awk -F "." '{print $3, $5}') )
num1=${arr[0]}
num2=${arr[1]}
Hi adding yet another way to do this using 'cut',
echo $string | cut -d'.' -f3,5 | tr '.' ' '
This gives you the following output:
12 2
Fixing newline issue (for mac terminal):
cat temp.txt | tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed $'s/ /\\\n/g'
Assumptions:
there is no embedded white space
the string of text always has 7 period-delimited strings
the string always contains numbers in the 3rd and 5th period-delimited positions
One bash idea that does not require spawning any subprocesses:
$ string="toto.titi.12.tata.2.abc.def"
$ IFS=. read -r x1 x2 num1 x3 num2 rest <<< "${string}"
$ typeset -p num1 num2
declare -- num1="12"
declare -- num2="2"
In a comment OP has stated they wish to extract only one number at a time; the same approach can still be used, eg:
$ string="toto.titi.12.tata.2.abc.def"
$ IFS=. read -r x1 x2 num1 rest <<< "${string}"
$ typeset -p num1
declare -- num1="12"
$ IFS=. read -r x1 x2 x3 x4 num2 rest <<< "${string}"
$ typeset -p num2
declare -- num2="2"
A variation on anubhava's answer that uses parameter expansion instead of a subprocess call to awk, and still working with the same set of initial assumptions:
$ arr=( ${string//./ } )
$ num1=${arr[2]}
$ num2=${arr[4]}
$ typeset -p num1 num2
declare -- num1="12"
declare -- num2="2"

linux shell title case

I am wrinting a shell script and have a variable like this: something-that-is-hyphenated.
I need to use it in various points in the script as:
something-that-is-hyphenated, somethingthatishyphenated, SomethingThatIsHyphenated
I have managed to change it to somethingthatishyphenated by stripping out - using sed "s/-//g".
I am sure there is a simpler way, and also, need to know how to get the camel cased version.
Edit: Working function derived from #Michał's answer
function hyphenToCamel {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,1,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hyphenToCamel)
echo $CAMEL
Edit: Finally, a sed one liner thanks to #glenn
echo a-hyphenated-string | sed -E "s/(^|-)([a-z])/\u\2/g"
a GNU sed one-liner
echo something-that-is-hyphenated |
sed -e 's/-\([a-z]\)/\u\1/g' -e 's/^[a-z]/\u&/'
\u in the replacement string is documented in the sed manual.
Pure bashism:
var0=something-that-is-hyphenated
var1=(${var0//-/ })
var2=${var1[*]^}
var3=${var2// /}
echo $var3
SomethingThatIsHyphenated
Line 1 is trivial.
Line 2 is the bashism for replaceAll or 's/-/ /g', wrapped in parens, to build an array.
Line 3 uses ${foo^}, which means uppercase (while ${foo,} would mean 'lowercase' [note, how ^ points up while , points down]) but to operate on every first letter of a word, we address the whole array with ${foo[*]} (or ${foo[#]}, if you would prefer that).
Line 4 is again a replace-all: blank with nothing.
Line 5 is trivial again.
You can define a function:
hypenToCamel() {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,0,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hypenToCamel)
echo $CAMEL
In the shell you are stuck with being messy:
aa="aaa-aaa-bbb-bbb"
echo " $aa" | sed -e 's/--*/ /g' -e 's/ a/A/g' -e 's/ b/B/g' ... -e 's/ *//g'
Note the carefully placed space in the echo and the double space in the last -e.
I leave it as an exercise to complete the code.
In perl it is a bit easier as a one-line shell command:
perl -e 'print map{ $a = ucfirst; $a =~ s/ +//g; $a} split( /-+/, $ARGV[0] ), "\n"' $aa
For the records, here's a pure Bash safe method (that is not subject to pathname expansion)—using Bash≥4:
var0=something-that-is-hyphenated
IFS=- read -r -d '' -a var1 < <(printf '%s\0' "${var0,,}")
printf '%s' "${var1[#]^}"
This (safely) splits the lowercase expansion of var0 at the hyphens, with each split part in array var1. Then we use the ^ parameter expansion to uppercase the first character of the fields of this array, and concatenate them.
If your variable may also contain spaces and you want to act on them too, change IFS=- into IFS='- '.

How do I replace backspace characters (\b) using sed?

I want to delete a fixed number of some backspace characters ocurrences ( \b ) from stdin. So far I have tried this:
echo -e "1234\b\b\b56" | sed 's/\b{3}//'
But it doesn't work. How can I achieve this using sed or some other unix shell tool?
You can use the hexadecimal value for backspace:
echo -e "1234\b\b\b56" | sed 's/\x08\{3\}//'
You also need to escape the braces.
You can use tr:
echo -e "1234\b\b\b56" | tr -d '\b'
123456
If you want to delete three consecutive backspaces, you can use Perl:
echo -e "1234\b\b\b56" | perl -pe 's/(\010){3}//'
sed interprets \b as a word boundary. I got this to work in perl like so:
echo -e "1234\b\b\b56" | perl -pe '$b="\b";s/$b//g'
With sed:
echo "123\b\b\b5" | sed 's/[\b]\{3\}//g'
You have to escape the { and } in the {3}, and also treat the \b special by using a character class.
[birryree#lilun ~]$ echo "123\b\b\b5" | sed 's/[\b]\{3\}//g'
1235
Note if you want to remove the characters being deleted also, have a look at ansi2html.sh which contains processing like:
printf "12..\b\b34\n" | sed ':s; s#[^\x08]\x08##g; t s'
No need for Perl here!
# version 1
echo -e "1234\b\b\b56" | sed $'s/\b\{3\}//' | od -c
# version 2
bvar="$(printf '%b' '\b')"
echo -e "1234\b\b\b56" | sed 's/'${bvar}'\{3\}//' | od -c

Resources