Split a string on a word in Bash

Split a string on a word in Bash - linux

I wish to be able to split a string on a word. Essentially a multi-character delimiter.
For example, I have a string:
test-server-domain-name.com
I wish to keep everything before 'domain' so the output would be:
test-server-
Note: I cannot cut on the '-'. I have to be able to cut before the term 'domain' as the string's format will vary but 'domain' will always be present and I will always want to capture the elements before 'domain'.
Is this possible in bash?

Use awk:
echo test-server-domain-name.com | awk -F 'domain' '{print $1}'

This will cut at the first domain it finds:
cutat=domain
fqdm=test-server-domain-name.com
res=${fqdm%%${cutat}*}
echo $res
Output:
test-server-
If you have multiple domains in the string and want to cut on the last, use res=${fqdm%${cutat}*} (one %) instead.
From Shell Parameter Expansion:
${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the % case) or the longest matching pattern (the %% case) deleted. If parameter is # or *, the pattern removal operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with # or *, the pattern removal operation is applied to each member of the array in turn, and the expansion is the resultant list.

Incrivel is correct.
$ name=test-server-domain-name.com
$ echo $name
test-server-domain-name.com
$ echo $name |awk -F '-domain-name.com' '{print $1}'
test-server

Related

Using echo in bash puts last variable in front of the output

I'm trying to write a script and one of the parts of the script requires me to concatenate some variables together to create a URL.
REPO_URL='https://github.com/Example/Repo.Game/'
FILENAME='Example.Game-linux.zip'
latest_version="$(curl -LIs "${REPO_URL}/releases/latest" | grep -i '^location:' | cut -d' ' -f2 | cut -d'/' -f8)"
echo "$latest_version"
echo "$FILENAME"
echo "$REPO_URL"
echo "${REPO_URL}releases/download/${latest_version}/${FILENAME}"
Output:
2.0.5164
Example.Game-linux.zip
https://github.com/Example/Repo.Game/
/Example.Game-linux.ziple/Repo.Game/releases/download/2.0.5164
My actual output:
2.0.5164
Oxide.Rust-linux.zip
https://github.com/OxideMod/Oxide.Rust/
/Oxide.Rust-linux.zipideMod/Oxide.Rust/releases/download/2.0.5164
It looks like some kind of overflow problem? I'm not exactly sure. I added abcabc to the filename and the output became
/Oxide.Rust-linux.zipabcabc/Oxide.Rust/releases/download/2.0.5164
Any help would be appreciated.

I resolved the problem by removing the carriage return value from the variable.
tr -d '\r' seems to have resolved it. I'm not sure where the variable came from and if anyone has advice on how to clean up this mess I would love some advice.
latest_version="$(curl -LIs "${REPO_URL}/releases/latest" | grep -i '^location:' | cut -d' ' -f2 | cut -d'/' -f8 | tr -d '\r')

You can use ANSI quoting, and variable substitution to remove control characters from variables without having to invoke sub-shells.
ANSI quoting uses the special format $'\*' to represent special characters. For example use $'\t' for tab, $'\n' for new-line and $'\r' for carriage-return.
Variable substitution uses extra characters at the end of the variable name to perform actions on the variable. For example
${variable//[pattern]/[substitution]} will replace all instances of [pattern] in ${variable} with [substitution].
${variable%[pattern]} will remove [pattern] from ${variable} if it is at the end.
By combining these two, you can remove carriage-return characters from the end of your variable like this:
echo ${variable%$'\r'}
Note: Variable substitution doesn't actually change the contents of the variable. To do that, you have to re-assign the result back to the variable:
variable="${variable%$'\r'}"

There is a cleaner way to get the version number, minus any trailing carriage-return, from github using sed.
latest_version =$(curl -LIs "${REPO_URL}/releases/latest" | sed -n 's/^Location:.*\/\([^\r]*\).*$/\1/p')
sed reads every line of input (STDIN by default) and performs operations on it defined by the action string parameter. The action string is a little tricky to explain in this case, but here goes:
The -n option suppresses the printing of each input line. Output will then only happen if it is explicitly stated in the action string.
The s/[pattern]/[substitution]/p construct says whenever you find [pattern], replace it with [substitution] and print it. Our [pattern] is ^Location:.*\/\(.*\)$, and our [substitution] is \1.
The expression ^ matches the beginning of the line.
The expression . means any single character, and the expression .* means any number of characters (including zero). This will match the largest possible string, so, for example .*/ will match abc/def/ in the string abc/def/ghi.
The expression \/ just escapes the forward slash (because we are using backslash as a delimiter, we have to escape it).
The expression \([pattern]\) says any time you find [pattern], remember it. in our case, it will remember whatever matches [^\r].
The expression [{chars}] matches any one of the characters in {chars}. [^{chars}] matches any character that is not in {chars}. so [^\r]* matches any number of characters that is not a carriage return.
The expression $ matches the end of a line.
The expression \1 is replaced by the first remembered pattern.
So altogether, our action string says:
If you find a line that starts with Location:, followed by any number of characters, followed by a /, followed by any number of characters that are not a carriage return (which will be remembered), followed by any number of characters, followed by an end of line, then print the remembered characters.

Find the depth of the current path

How can I write a shell script to find the depth of the current path?
Assuming I am in:
/home/user/test/test1/test2/test3
It should return 6.

With shell parameter expansions, no external commands:
$ var=${PWD//[!\/]}
$ echo ${#var}
6
The first expansion removes all characters that are not /; the second one prints the length of var.
Explanations with details for support by POSIX shell or Bash (the links in parentheses go to the corresponding sections in the POSIX standard or the Bash manual):
$PWD contains the path to the current working directory. (sh/Bash)
The ${parameter/pattern/string} expansion replaces the first occurrence of pattern in the expansion of parameter with string. (Bash)
If the first slash is doubled (as in our case), all occurrences are replaced.
If string is empty, the slash after pattern is optional (as in our case).
The pattern [!\/] is a bracket expression and stands for "any character other than slash". (sh/Bash)
The slash has to be escaped, \/, or it is interpreted as ending the pattern.
! as the first character in a bracket expression negates the expression: any character other than the ones in the expression match the pattern. POSIX sh requires support for ! and says the behaviour for using ^ is undefined; Bash supports both ! and ^. Notice that this is not a bracket expression as seen in regular expressions, where only ^ is valid.
${#parameter} expands to the length of parameter. (sh/Bash)

A simple approach in fish:
count (string split / $PWD)

You could count the number of slashes in the current path:
pwd | awk -F"/" '{print NF-1}'

You can do this using a pipeline. pipe string into grep with the -o option. This prints out each "/" on a new line. pipe again into wc -l counts the number of lines printed.
echo "$path_str" | grep -o '/' - | wc -l

Assuming you don't have trailing "/", you can just count the "/".
So you would
Remove everything that is not a "/"
Count the length of the resulting string
In fish, this would be done with something like
string replace --regex --all '[^/]' '' -- $PWD | string length
The regular expression - [^/] here matches every single character that is not a "/". With "--all", this will be done as often as possible, and replace it with '', i.e. nothing.
The -- is the option separator, so that nothing in the argument is interpreted as an option (otherwise you'd have issues if an argument started with a "-a").
$PWD is the current directory.
string length simply outputs the length of its input.

Using perl :
echo '/home/user/test/test1/test2/test3' |
perl -lne '#_ = split /\//; print scalar #_ -1'
Output
6

You could use find just like that :
find / -printf "%d %p\n" 2>/dev/null | grep "$PWD$" | awk '{print $1}'
Maybe not the most efficient, but handles slashes well.

Use sed/grep to get string from tail to a char in middle

I have some strings (variables) i need to edit in the bash for further analysis
They consist of things like
str="~/folder/item"
How can I use sed or grep to grab just "item" (meaning from tail to char '/')?

Refer to Shell Parameter Expansion. You can say:
$ str="~/folder/item"
$ echo ${str##*/}
item
Quoting from the manual:
${parameter##word}
The word is expanded to produce a pattern just as in filename
expansion (see Filename Expansion). If the pattern matches the
beginning of the expanded value of parameter, then the result of the
expansion is the expanded value of parameter with the shortest
matching pattern (the ‘#’ case) or the longest matching pattern (the
‘##’ case) deleted. If parameter is ‘#’ or ‘*’, the pattern removal
operation is applied to each positional parameter in turn, and the
expansion is the resultant list. If parameter is an array variable
subscripted with ‘#’ or ‘*’, the pattern removal operation is applied
to each member of the array in turn, and the expansion is the
resultant list.
Using grep:
$ grep -Po '.*/\K.*' <<< $str
item

Assuming you are dealing with a text file containing lines like the one shown, then:
sed 's%.*/\([^/]*\)"%\1%' <<< 'str="~/folder/item"'
This yields:
item
If you are dealing with a variable str that contains a string ~/folder/item, then you can use:
basename "$str"
or:
echo "${str##*/}"

basename "${str}" gives the last part of the string after the final /, including any file extensions you may have. If you want to do the opposite and grab the directory, use dirname "${str}"

echo $str|awk -F"/" '{print $NF}'
or
echo "$str" | perl -pe 's/.*\///g'

basename also suitable for this
for example
str="~/folder/item"
name=$(basename $file)
echo $name
the output would be "item"

Extract part of a string using bash/cut/split

I have a string like this:
/var/cpanel/users/joebloggs:DNS9=domain.example
I need to extract the username (joebloggs) from this string and store it in a variable.
The format of the string will always be the same with exception of joebloggs and domain.example so I am thinking the string can be split twice using cut?
The first split would split by : and we would store the first part in a variable to pass to the second split function.
The second split would split by / and store the last word (joebloggs) into a variable
I know how to do this in PHP using arrays and splits but I am a bit lost in bash.

To extract joebloggs from this string in bash using parameter expansion without any extra processes...
MYVAR="/var/cpanel/users/joebloggs:DNS9=domain.example"
NAME=${MYVAR%:*} # retain the part before the colon
NAME=${NAME##*/} # retain the part after the last slash
echo $NAME
Doesn't depend on joebloggs being at a particular depth in the path.
Summary
An overview of a few parameter expansion modes, for reference...
${MYVAR#pattern} # delete shortest match of pattern from the beginning
${MYVAR##pattern} # delete longest match of pattern from the beginning
${MYVAR%pattern} # delete shortest match of pattern from the end
${MYVAR%%pattern} # delete longest match of pattern from the end
So # means match from the beginning (think of a comment line) and % means from the end. One instance means shortest and two instances means longest.
You can get substrings based on position using numbers:
${MYVAR:3} # Remove the first three chars (leaving 4..end)
${MYVAR::3} # Return the first three characters
${MYVAR:3:5} # The next five characters after removing the first 3 (chars 4-9)
You can also replace particular strings or patterns using:
${MYVAR/search/replace}
The pattern is in the same format as file-name matching, so * (any characters) is common, often followed by a particular symbol like / or .
Examples:
Given a variable like
MYVAR="users/joebloggs/domain.example"
Remove the path leaving file name (all characters up to a slash):
echo ${MYVAR##*/}
domain.example
Remove the file name, leaving the path (delete shortest match after last /):
echo ${MYVAR%/*}
users/joebloggs
Get just the file extension (remove all before last period):
echo ${MYVAR##*.}
example
NOTE: To do two operations, you can't combine them, but have to assign to an intermediate variable. So to get the file name without path or extension:
NAME=${MYVAR##*/} # remove part before last slash
echo ${NAME%.*} # from the new var remove the part after the last period
domain

Define a function like this:
getUserName() {
echo $1 | cut -d : -f 1 | xargs basename
}
And pass the string as a parameter:
userName=$(getUserName "/var/cpanel/users/joebloggs:DNS9=domain.example")
echo $userName

What about sed? That will work in a single command:
sed 's#.*/\([^:]*\).*#\1#' <<<$string
The # are being used for regex dividers instead of / since the string has / in it.
.*/ grabs the string up to the last backslash.
\( .. \) marks a capture group. This is \([^:]*\).
The [^:] says any character _except a colon, and the * means zero or more.
.* means the rest of the line.
\1 means substitute what was found in the first (and only) capture group. This is the name.
Here's the breakdown matching the string with the regular expression:
/var/cpanel/users/ joebloggs :DNS9=domain.example joebloggs
sed 's#.*/ \([^:]*\) .* #\1 #'

Using a single Awk:
... | awk -F '[/:]' '{print $5}'
That is, using as field separator either / or :, the username is always in field 5.
To store it in a variable:
username=$(... | awk -F '[/:]' '{print $5}')
A more flexible implementation with sed that doesn't require username to be field 5:
... | sed -e s/:.*// -e s?.*/??
That is, delete everything from : and beyond, and then delete everything up until the last /. sed is probably faster too than awk, so this alternative is definitely better.

Using a single sed
echo "/var/cpanel/users/joebloggs:DNS9=domain.example" | sed 's/.*\/\(.*\):.*/\1/'

I like to chain together awk using different delimitators set with the -F argument. First, split the string on /users/ and then on :
txt="/var/cpanel/users/joebloggs:DNS9=domain.com"
echo $txt | awk -F"/users/" '{print$2}' | awk -F: '{print $1}'
$2 gives the text after the delim, $1 the text before it.

I know I'm a little late to the party and there's already good answers, but here's my method of doing something like this.
DIR="/var/cpanel/users/joebloggs:DNS9=domain.example"
echo ${DIR} | rev | cut -d'/' -f 1 | rev | cut -d':' -f1

Extract file basename without path and extension in bash [duplicate]

This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 6 years ago.
Given file names like these:
/the/path/foo.txt
bar.txt
I hope to get:
foo
bar
Why this doesn't work?
#!/bin/bash
fullfile=$1
fname=$(basename $fullfile)
fbname=${fname%.*}
echo $fbname
What's the right way to do it?

You don't have to call the external basename command. Instead, you could use the following commands:
$ s=/the/path/foo.txt
$ echo "${s##*/}"
foo.txt
$ s=${s##*/}
$ echo "${s%.txt}"
foo
$ echo "${s%.*}"
foo
Note that this solution should work in all recent (post 2004) POSIX compliant shells, (e.g. bash, dash, ksh, etc.).
Source: Shell Command Language 2.6.2 Parameter Expansion
More on bash String Manipulations: http://tldp.org/LDP/LG/issue18/bash.html

The basename command has two different invocations; in one, you specify just the path, in which case it gives you the last component, while in the other you also give a suffix that it will remove. So, you can simplify your example code by using the second invocation of basename. Also, be careful to correctly quote things:
fbname=$(basename "$1" .txt)
echo "$fbname"

A combination of basename and cut works fine, even in case of double ending like .tar.gz:
fbname=$(basename "$fullfile" | cut -d. -f1)
Would be interesting if this solution needs less arithmetic power than Bash Parameter Expansion.

Here are oneliners:
$(basename "${s%.*}")
$(basename "${s}" ".${s##*.}")
I needed this, the same as asked by bongbang and w4etwetewtwet.

Pure bash, no basename, no variable juggling. Set a string and echo:
p=/the/path/foo.txt
echo "${p//+(*\/|.*)}"
Output:
foo
Note: the bash extglob option must be "on", (Ubuntu sets extglob "on" by default), if it's not, do:
shopt -s extglob
Walking through the ${p//+(*\/|.*)}:
${p -- start with $p.
// substitute every instance of the pattern that follows.
+( match one or more of the pattern list in parenthesis, (i.e. until item #7 below).
1st pattern: *\/ matches anything before a literal "/" char.
pattern separator | which in this instance acts like a logical OR.
2nd pattern: .* matches anything after a literal "." -- that is, in bash the "." is just a period char, and not a regex dot.
) end pattern list.
} end parameter expansion. With a string substitution, there's usually another / there, followed by a replacement string. But since there's no / there, the matched patterns are substituted with nothing; this deletes the matches.
Relevant man bash background:
pattern substitution:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pat
tern just as in pathname expansion. Parameter is expanded and
the longest match of pattern against its value is replaced with
string. If pattern begins with /, all matches of pattern are
replaced with string. Normally only the first match is
replaced. If pattern begins with #, it must match at the begin‐
ning of the expanded value of parameter. If pattern begins with
%, it must match at the end of the expanded value of parameter.
If string is null, matches of pattern are deleted and the / fol
lowing pattern may be omitted. If parameter is # or *, the sub
stitution operation is applied to each positional parameter in
turn, and the expansion is the resultant list. If parameter is
an array variable subscripted with # or *, the substitution
operation is applied to each member of the array in turn, and
the expansion is the resultant list.
extended pattern matching:
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the fol
lowing sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns

Here is another (more complex) way of getting either the filename or extension, first use the rev command to invert the file path, cut from the first . and then invert the file path again, like this:
filename=`rev <<< "$1" | cut -d"." -f2- | rev`
fileext=`rev <<< "$1" | cut -d"." -f1 | rev`

If you want to play nice with Windows file paths (under Cygwin) you can also try this:
fname=${fullfile##*[/|\\]}
This will account for backslash separators when using BaSH on Windows.

Just an alternative that I came up with to extract an extension, using the posts in this thread with my own small knowledge base that was more familiar to me.
ext="$(rev <<< "$(cut -f "1" -d "." <<< "$(rev <<< "file.docx")")")"
Note: Please advise on my use of quotes; it worked for me but I might be missing something on their proper use (I probably use too many).

Use the basename command. Its manpage is here: http://unixhelp.ed.ac.uk/CGI/man-cgi?basename

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Split a string on a word in Bash - linux

Use awk: echo test-server-domain-name.com | awk -F 'domain' '{print $1}'

Incrivel is correct. $ name=test-server-domain-name.com $ echo $name test-server-domain-name.com $ echo $name |awk -F '-domain-name.com' '{print $1}' test-server

Related

Using echo in bash puts last variable in front of the output

Find the depth of the current path

Use sed/grep to get string from tail to a char in middle

Extract part of a string using bash/cut/split

Extract file basename without path and extension in bash [duplicate]

Categories

Resources