Extract part of a string using bash/cut/split - string

I have a string like this:
/var/cpanel/users/joebloggs:DNS9=domain.example
I need to extract the username (joebloggs) from this string and store it in a variable.
The format of the string will always be the same with exception of joebloggs and domain.example so I am thinking the string can be split twice using cut?
The first split would split by : and we would store the first part in a variable to pass to the second split function.
The second split would split by / and store the last word (joebloggs) into a variable
I know how to do this in PHP using arrays and splits but I am a bit lost in bash.

To extract joebloggs from this string in bash using parameter expansion without any extra processes...
MYVAR="/var/cpanel/users/joebloggs:DNS9=domain.example"
NAME=${MYVAR%:*} # retain the part before the colon
NAME=${NAME##*/} # retain the part after the last slash
echo $NAME
Doesn't depend on joebloggs being at a particular depth in the path.
Summary
An overview of a few parameter expansion modes, for reference...
${MYVAR#pattern} # delete shortest match of pattern from the beginning
${MYVAR##pattern} # delete longest match of pattern from the beginning
${MYVAR%pattern} # delete shortest match of pattern from the end
${MYVAR%%pattern} # delete longest match of pattern from the end
So # means match from the beginning (think of a comment line) and % means from the end. One instance means shortest and two instances means longest.
You can get substrings based on position using numbers:
${MYVAR:3} # Remove the first three chars (leaving 4..end)
${MYVAR::3} # Return the first three characters
${MYVAR:3:5} # The next five characters after removing the first 3 (chars 4-9)
You can also replace particular strings or patterns using:
${MYVAR/search/replace}
The pattern is in the same format as file-name matching, so * (any characters) is common, often followed by a particular symbol like / or .
Examples:
Given a variable like
MYVAR="users/joebloggs/domain.example"
Remove the path leaving file name (all characters up to a slash):
echo ${MYVAR##*/}
domain.example
Remove the file name, leaving the path (delete shortest match after last /):
echo ${MYVAR%/*}
users/joebloggs
Get just the file extension (remove all before last period):
echo ${MYVAR##*.}
example
NOTE: To do two operations, you can't combine them, but have to assign to an intermediate variable. So to get the file name without path or extension:
NAME=${MYVAR##*/} # remove part before last slash
echo ${NAME%.*} # from the new var remove the part after the last period
domain

Define a function like this:
getUserName() {
echo $1 | cut -d : -f 1 | xargs basename
}
And pass the string as a parameter:
userName=$(getUserName "/var/cpanel/users/joebloggs:DNS9=domain.example")
echo $userName

What about sed? That will work in a single command:
sed 's#.*/\([^:]*\).*#\1#' <<<$string
The # are being used for regex dividers instead of / since the string has / in it.
.*/ grabs the string up to the last backslash.
\( .. \) marks a capture group. This is \([^:]*\).
The [^:] says any character _except a colon, and the * means zero or more.
.* means the rest of the line.
\1 means substitute what was found in the first (and only) capture group. This is the name.
Here's the breakdown matching the string with the regular expression:
/var/cpanel/users/ joebloggs :DNS9=domain.example joebloggs
sed 's#.*/ \([^:]*\) .* #\1 #'

Using a single Awk:
... | awk -F '[/:]' '{print $5}'
That is, using as field separator either / or :, the username is always in field 5.
To store it in a variable:
username=$(... | awk -F '[/:]' '{print $5}')
A more flexible implementation with sed that doesn't require username to be field 5:
... | sed -e s/:.*// -e s?.*/??
That is, delete everything from : and beyond, and then delete everything up until the last /. sed is probably faster too than awk, so this alternative is definitely better.

Using a single sed
echo "/var/cpanel/users/joebloggs:DNS9=domain.example" | sed 's/.*\/\(.*\):.*/\1/'

I like to chain together awk using different delimitators set with the -F argument. First, split the string on /users/ and then on :
txt="/var/cpanel/users/joebloggs:DNS9=domain.com"
echo $txt | awk -F"/users/" '{print$2}' | awk -F: '{print $1}'
$2 gives the text after the delim, $1 the text before it.

I know I'm a little late to the party and there's already good answers, but here's my method of doing something like this.
DIR="/var/cpanel/users/joebloggs:DNS9=domain.example"
echo ${DIR} | rev | cut -d'/' -f 1 | rev | cut -d':' -f1

Related

Using echo in bash puts last variable in front of the output

I'm trying to write a script and one of the parts of the script requires me to concatenate some variables together to create a URL.
REPO_URL='https://github.com/Example/Repo.Game/'
FILENAME='Example.Game-linux.zip'
latest_version="$(curl -LIs "${REPO_URL}/releases/latest" | grep -i '^location:' | cut -d' ' -f2 | cut -d'/' -f8)"
echo "$latest_version"
echo "$FILENAME"
echo "$REPO_URL"
echo "${REPO_URL}releases/download/${latest_version}/${FILENAME}"
Output:
2.0.5164
Example.Game-linux.zip
https://github.com/Example/Repo.Game/
/Example.Game-linux.ziple/Repo.Game/releases/download/2.0.5164
My actual output:
2.0.5164
Oxide.Rust-linux.zip
https://github.com/OxideMod/Oxide.Rust/
/Oxide.Rust-linux.zipideMod/Oxide.Rust/releases/download/2.0.5164
It looks like some kind of overflow problem? I'm not exactly sure. I added abcabc to the filename and the output became
/Oxide.Rust-linux.zipabcabc/Oxide.Rust/releases/download/2.0.5164
Any help would be appreciated.
I resolved the problem by removing the carriage return value from the variable.
tr -d '\r' seems to have resolved it. I'm not sure where the variable came from and if anyone has advice on how to clean up this mess I would love some advice.
latest_version="$(curl -LIs "${REPO_URL}/releases/latest" | grep -i '^location:' | cut -d' ' -f2 | cut -d'/' -f8 | tr -d '\r')
You can use ANSI quoting, and variable substitution to remove control characters from variables without having to invoke sub-shells.
ANSI quoting uses the special format $'\*' to represent special characters. For example use $'\t' for tab, $'\n' for new-line and $'\r' for carriage-return.
Variable substitution uses extra characters at the end of the variable name to perform actions on the variable. For example
${variable//[pattern]/[substitution]} will replace all instances of [pattern] in ${variable} with [substitution].
${variable%[pattern]} will remove [pattern] from ${variable} if it is at the end.
By combining these two, you can remove carriage-return characters from the end of your variable like this:
echo ${variable%$'\r'}
Note: Variable substitution doesn't actually change the contents of the variable. To do that, you have to re-assign the result back to the variable:
variable="${variable%$'\r'}"
There is a cleaner way to get the version number, minus any trailing carriage-return, from github using sed.
latest_version =$(curl -LIs "${REPO_URL}/releases/latest" | sed -n 's/^Location:.*\/\([^\r]*\).*$/\1/p')
sed reads every line of input (STDIN by default) and performs operations on it defined by the action string parameter. The action string is a little tricky to explain in this case, but here goes:
The -n option suppresses the printing of each input line. Output will then only happen if it is explicitly stated in the action string.
The s/[pattern]/[substitution]/p construct says whenever you find [pattern], replace it with [substitution] and print it. Our [pattern] is ^Location:.*\/\(.*\)$, and our [substitution] is \1.
The expression ^ matches the beginning of the line.
The expression . means any single character, and the expression .* means any number of characters (including zero). This will match the largest possible string, so, for example .*/ will match abc/def/ in the string abc/def/ghi.
The expression \/ just escapes the forward slash (because we are using backslash as a delimiter, we have to escape it).
The expression \([pattern]\) says any time you find [pattern], remember it. in our case, it will remember whatever matches [^\r].
The expression [{chars}] matches any one of the characters in {chars}. [^{chars}] matches any character that is not in {chars}. so [^\r]* matches any number of characters that is not a carriage return.
The expression $ matches the end of a line.
The expression \1 is replaced by the first remembered pattern.
So altogether, our action string says:
If you find a line that starts with Location:, followed by any number of characters, followed by a /, followed by any number of characters that are not a carriage return (which will be remembered), followed by any number of characters, followed by an end of line, then print the remembered characters.

Split a string on a word in Bash

I wish to be able to split a string on a word. Essentially a multi-character delimiter.
For example, I have a string:
test-server-domain-name.com
I wish to keep everything before 'domain' so the output would be:
test-server-
Note: I cannot cut on the '-'. I have to be able to cut before the term 'domain' as the string's format will vary but 'domain' will always be present and I will always want to capture the elements before 'domain'.
Is this possible in bash?
Use awk:
echo test-server-domain-name.com | awk -F 'domain' '{print $1}'
This will cut at the first domain it finds:
cutat=domain
fqdm=test-server-domain-name.com
res=${fqdm%%${cutat}*}
echo $res
Output:
test-server-
If you have multiple domains in the string and want to cut on the last, use res=${fqdm%${cutat}*} (one %) instead.
From Shell Parameter Expansion:
${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the % case) or the longest matching pattern (the %% case) deleted. If parameter is # or *, the pattern removal operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with # or *, the pattern removal operation is applied to each member of the array in turn, and the expansion is the resultant list.
Incrivel is correct.
$ name=test-server-domain-name.com
$ echo $name
test-server-domain-name.com
$ echo $name |awk -F '-domain-name.com' '{print $1}'
test-server

How to use sed to replace a string that contains the slash?

I have a text file that contain a lot of mess text.
I used grep to get all the text that contains the string prod like this
cat textfile | grep "<host>prod*"
The result
<host>prod-reverse-proxy01</host>
<host>prod-reverse-proxy01</host>
<host>prod-reverse-proxy01</host>
Continually, i used sed with the intention to remove all the "host" part
cat textfile | grep "<host>prod*" | sed "s/<host>//g"; "s/</host>//g"
But only the first "host" was removed.
prod-reverse-proxy01</host>
prod-reverse-proxy01</host>
prod-reverse-proxy01</host>
How can i remove the other "/host" part?
sed -n -e "s/^<host>\(.*\)<\/host>/\1/p" textfile
sed can process your file directly. No need to grep or cat.
-n is there to suppress any lines that do not match. Last 'p' in the script will print all matching files.
Script dissection:
s/.../.../...
is the search/replace form. The bit between the first and the second '/' is what you search for. The bit between the second and third is what you replace it with. The last part is any commands you want to apply to the replacement.
Search:
^<host>\(.*\)<\/host>
finds all lines beginning with <host> followed by any text (.*) followed by </host>. Any text between <host> and </host> is stored into internal variable '1' using '(' and ')'. Note that (, ) and / (in </host>) have to be escaped.
Replace:
\1
Replace found text with contents of variable 1 (1 has to be escaped, otherwise, everything is replaced by character '1'.
Commands:
p
Print resulting line (after replacement).
Note: Your search involves removing two similar but not identical strings (<host> and </host>).
I think this sed is enough
sed 's/<[/]*host>//g' infile

Bash: How to extract numbers preceded by _ and followed by

I have the following format for filenames: filename_1234.svg
How can I retrieve the numbers preceded by an underscore and followed by a dot. There can be between one to four numbers before the .svg
I have tried:
width=${fileName//[^0-9]/}
but if the fileName contains a number as well, it will return all numbers in the filename, e.g.
file6name_1234.svg
I found solutions for two underscores (and splitting it into an array), but I am looking for a way to check for the underscore as well as the dot.
You can use simple parameter expansion with substring removal to simply trim from the right up to, and including, the '.', then trim from the left up to, and including, the '_', leaving the number you desire, e.g.
$ width=filename_1234.svg; val="${width%.*}"; val="${val##*_}"; echo $val
1234
note: # trims from left to first-occurrence while ## trims to last-occurrence. % and %% work the same way from the right.
Explained:
width=filename_1234.svg - width holds your filename
val="${width%.*}" - val holds filename_1234
val="${val##*_}" - finally val holds 1234
Of course, there is no need to use a temporary value like val if your intent is that width should hold the width. I just used a temp to protect against changing the original contents of width. If you want the resulting number in width, just replace val with width everywhere above and operate directly on width.
note 2: using shell capabilities like parameter expansion prevents creating a separate subshell and spawning a separate process that occurs when using a utility like sed, grep or awk (or anything that isn't part of the shell for that matter).
Try the following code :
filename="filename_6_1234.svg"
if [[ "$filename" =~ ^(.*)_([^.]*)\..*$ ]];
then
echo "${BASH_REMATCH[0]}" #will display 'filename_6_1234.svg'
echo "${BASH_REMATCH[1]}" #will display 'filename_6'
echo "${BASH_REMATCH[2]}" #will display '1234'
fi
Explanation :
=~ : bash operator for regex comparison
^(.*)_([^.])\..*$ : we look for any character, followed by an underscore, followed by any character, followed by a dot and an extension. We create 2 capture groups, one for before the last underscore, one for after
BASH_REMATCH : array containing the captured groups
Some more way
[akshay#localhost tmp]$ filename=file1b2aname_1234.svg
[akshay#localhost tmp]$ after=${filename##*_}
[akshay#localhost tmp]$ echo ${after//[^0-9]}
1234
Using awk
[akshay#localhost tmp]$ awk -F'[_.]' '{print $2}' <<< "$filename"
1234
I would use
sed 's!_! !g' | awk '{print "_" $NF}'
to get from filename_1234.svg to _1234.svg then
sed 's!svg!!g'
to get rid of the extension.
If you set IFS, you can use Bash's build-in read.
This splits the filename by underscores and dots and stores the result in the array a.
IFS='_.' read -a a <<<'file1b2aname_1234.svg'
And this takes the second last element from the array.
echo ${a[-2]}
There's a solution using cut:
name="file6name_1234.svg"
num=$(echo "$name" | cut -d '_' -f 2 | cut -d '.' -f 1)
echo "$num"
-d is for specifying a delimiter.
-f refers to the desired field.
I don't know anything about performance but it's simple to understand and simple to maintain.

Removing a portion of a string that has forward slashes in it

I'm stumped with how to remove a portion of a string that has forward slashes and question marks in it.
Example: /diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN
and I need the output to be RXMWANT8WFYJNF7K6DXXXJLJVN
I've tried tr and sed but tr removes some of the characters I need in the output. sed is giving me trouble because of the forward slashes.
What's a quick method to remove the /diag/PeerManager/list?deviceid= portion of my string?
thanks!
echo "/diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN" | sed -n 's:/[a-zA-Z]/[a-zA-Z]/[a-zA-Z]?[a-zA-Z]=::p'
This should do the trick. I chose the colon as the delimiter as it will not cause any issues with the forward slash. This makes a lot of assumptions about the type of input it will be receiving, specifically that it will only contain three backslashes with lower and uppercase letters between them, a series of letters ending in a question mark, another series of letters ending in an equals sign. This then removes those items and prints the remaining characters (your device id).
This worked for me:
sed 's/.*deviceid=\([^&]*\).*/\1/'
Example:
$ echo '/diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN' | sed 's/.*deviceid=\([^&]*\).*/\1/'
RXMWANT8WFYJNF7K6DXXXJLJVN
This is not the most robust solution, but if you have a fixed set of input that will never change, it's probably good enough.
One way using awk, if there is only a single occurrence of an = on each line:
awk -F= '{ print $2 }' file.txt
Results:
RXMWANT8WFYJNF7K6DXXXJLJVN
Use Equals Sign as Field Delimiter
If you know that your GET query string will always have only one parameter (in this case, deviceid) then you can just use the equals sign as a field delimiter with the standard cut utility. For example:
$ echo '/diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN' |
cut -d= -f2-
RXMWANT8WFYJNF7K6DXXXJLJVN
How about:
$ echo /diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN | sed 's/^.*=//'
RXMWANT8WFYJNF7K6DXXXJLJVN

Resources