Unix - how to use cut -d on one word - linux

I have a string with two words but sometimes it may contain only one word and i need to get both words and if the second one is empty i want an empty string.
I am using the following:
STRING1=`echo $STRING|cut -d' ' -f1`
STRING2=`echo $STRING|cut -d' ' -f2`
When STRING is only one word both strings are equal but I need the second screen to be empty.

Your problem is (from cut(1))
`-f FIELD-LIST'
`--fields=FIELD-LIST'
Select for printing only the fields listed in FIELD-LIST. Fields
are separated by a TAB character by default. Also print any line
that contains no delimiter character, unless the
`--only-delimited' (`-s') option is specified.
You could specify -s when extracing the second word, or use
echo " $STRING" | cut -d' ' -f3
to extract the second word (note the fake separator in front of $STRING).

The shell has built-in functionality for this.
echo "First word: ${STRING%% *}"
echo "Last word: ${STRING##* }"
The double ## or %% is not compatible with older shells; they only had a single-separator variant, which trims the shortest possible match instead of the longest. (You can simulate longest suffix by extracting the shortest prefix, then trim everything else, but this takes two trims.)
Mnemonic: # is to the left of $ on the keyboard, % is to the right.
For your actual problem, I would add a simple check to see if the first extraction extracted the whole string; if so, the second should be left empty.
STRING1="${STRING%% *}"
case $STRING1 in
"$STRING" ) STRING2="" ;;
* ) STRING2="${STRING#$STRING1 }" ;;
esac
As an aside, there's also this:
set $STRING
STRING1=$1
STRING2=$2

Why not just use read:
STR='word1 word2'
read string1 string2 <<< "$STR"
echo "$string1"
word1
echo "$string2"
word2
Now the missing 2nd word:
STR='word1'
read string1 string2 <<< "$STR"
echo "$string1"
word1
echo "$string2" | cat -vte
$

Related

How to extract strings between nth and mth occurence of a certain character in linux bash?

File1 contains:
a:b:c:d:any words here:e:f:G
w/r "any words here" can be a single word, two words, three words, and so on.
I want to get the string between the 4rd ":" and the 5th ":". So, that will be "any words here".
My initial idea was to replace ":" with space then, use awk to print.. but since the string i want to extract can be composed of multiple words, it will not accurately work.
cut command allow you to split a line based on a delimiter, and extract required fields from it
In your example,
> echo 'a:b:c:d:any words here:e:f:G' |cut -f 5 -d:
should give you
any words here
With awk
$ echo 'a:b:c:d:any words here:e:f:G' | awk -F: '{print $5}'
any words here
Or by creating an array with IFS changed to :
$ IFS=: words=( $(echo 'a:b:c:d:any words here:e:f:G') ); echo ${words[4]}
any words here
If it is just 1 line input you can use the bash regex. It is more of a pain if you want to return more than 1 field, but for 1 field it is easy enough:
f=3
[[ "1:2:3:4:5:6:7:8" =~ (^([^:]*:){$f,$f})([^:]*)(:|$) ]]
echo "${BASH_REMATCH[3]}"
4

How can I display unique words contained in a Bash string?

I have a string that has duplicate words. I would like to display only the unique words. The string is:
variable="alpha bravo charlie alpha delta echo charlie"
I know several tools that can do this together. This is what I figured out:
echo $variable | tr " " "\n" | sort -u | tr "\n" " "
What is a more effective way to do this?
Use a Bash Substitution Expansion
The following shell parameter expansion will substitute spaces with newlines, and then pass the results into the sort utility to return only the unique words.
$ echo -e "${variable// /\\n}" | sort -u
alpha
bravo
charlie
delta
echo
This has the side-effect of sorting your words, as the sort and uniq utilities both require input to be sorted in order to detect duplicates. If that's not what you want, I also posted a Ruby solution that preserves the original word order.
Rejoining Words
If, as one commenter pointed out, you're trying to reassemble your unique words back into a single line, you can use command substitution to do this. For example:
$ echo $(echo -e "${variable// /\\n}" | sort -u)
alpha bravo charlie delta echo
The lack of quotes around the command substitution are intentional. If you quote it, the newlines will be preserved because Bash won't do word-splitting. Unquoted, the shell will return the results as a single line, however unintuitive that may seem.
You may use xargs:
echo "$variable" | xargs -n 1 | sort -u | xargs
Note: This solution assumes that all unique words should be output in the order they're encountered in the input. By contrast, the OP's own solution attempt outputs a sorted list of unique words.
A simple Awk-only solution (POSIX-compliant) that is efficient by avoiding a pipeline (which invariably involves subshells).
awk -v RS=' ' '{ if (!seen[$1]++) { printf "%s%s",sep,$1; sep=" " } }' <<<"$variable"
# The above prints without a trailing \n, as in the OP's own solution.
# To add a trailing newline, append `END { print }` to the end
# of the Awk script.
Note how $variable is double-quoted to prevent it from accidental shell expansions, notably pathname expansion (globbing), and how it is provided to Awk via a here-string (<<<).
-v RS=' ' tells Awk to split the input into records by a single space.
Note that the last word will have the input line's trailing newline included, which is why we don't use $0 - the entire record - but $1, the record's first field, which has the newline stripped due to Awk's default field-splitting behavior.
seen[$1]++ is a common Awk idiom that either creates an entry for $1, the input word, in associative array seen, if it doesn't exist yet, or increments its occurrence count.
!seen[$0]++ therefore only returns true for the first occurrence of a given word (where seen[$0] is implicitly zero/the empty string; the ++ is a post-increment, and therefore doesn't take effect until after the condition is evaluated)
{printf "%s%s",sep,$1; sep=" "} prints the word at hand $1, preceded by separator sep, which is implicitly the empty string for the first word, but a single space for subsequent words, due to setting sep to " " immediately after.
Here's a more flexible variant that handles any run of whitespace between input words; it works with GNU Awk and Mawk[1]:
awk -v RS='[[:space:]]+' '{if (!seen[$0]++){printf "%s%s",sep,$0; sep=" "}}' <<<"$variable"
-v RS='[[:space:]]s+' tells Awk to split the input into records by any mix of spaces, tabs, and newlines.
[1] Unfortunately, BSD/OSX Awk (in strict compliance with the POSIX spec), doesn't support using regular expressions or even multi-character literals as RS, the input record separator.
Preserve Input Order with a Ruby One-Liner
I posted a Bash-specific answer already, but if you want to return only unique words while preserving the word order of the original string, then you can use the following Ruby one-liner:
$ echo "$variable" | ruby -ne 'puts $_.split.uniq'
alpha
bravo
charlie
delta
echo
This will split the input string on whitespace, and then return unique elements from the resulting array.
Unlike the sort or uniq utilities, Ruby doesn't need the words to be sorted to detect duplicates. This may be a better solution if you don't want your results to be sorted, although given your input sample it makes no practical difference for the posted example.
Rejoining Words
If, as one commenter pointed out, you're then trying to reassemble the words back into a single line after deduplication, you can do that too. For that, we just append the Array#join method:
$ echo "$variable" | ruby -ne 'puts $_.split.uniq.join(" ")'
alpha bravo charlie delta echo
You can use awk:
$ echo "$variable" | awk '{for(i=1;i<=NF;i++){if (!seen[$i]++) printf $i" "}}'
alpha bravo charlie delta echo
If you do not want the trailing space and want a trailing CR, you can do:
$ echo "$variable" | awk 'BEGIN{j=""} {for(i=1;i<=NF;i++){if (!seen[$i]++)j=j==""?j=$i:j=j" "$i}} END{print j}'
alpha bravo charlie delta echo
Using associative arrays in BASH 4+ you can simplify this:
variable="alpha bravo charlie alpha delta echo charlie"
# declare an associative array
declare -A unq
# read sentence into an indexed array
read -ra arr <<< "$variable"
# iterate each word and populate associative array with word as key
for w in "${arr[#]}"; do
unq["$w"]=1
done
# print unique results
printf "%s\n" "${!unq[#]}"
delta
bravo
echo
alpha
charlie
## if you want results in same order as original string
for w in "${arr[#]}"; do
[[ ${unq["$w"]} ]] && echo "$w" && unset unq["$w"]
done
alpha
bravo
charlie
delta
echo
pure, ugly bash:
for x in $vaviable; do
if [ "$(eval echo $(echo \$un__$x))" = "" ]; then
echo -n $x
eval un__$x=1
__usv="$__usv un__$x"
fi
done
unset $__usv

Split string at special character in bash

I'm reading filenames from a textfile line by line in a bash script. However the the lines look like this:
/path/to/myfile1.txt 1
/path/to/myfile2.txt 2
/path/to/myfile3.txt 3
...
/path/to/myfile20.txt 20
So there is a second column containing an integer number speparated by space. I only need the part of the string before the space.
I found only solutions using a "for-loop". But I need a function that explicitly looks for the " "-character (space) in my string and splits it at that point.
In principle I need the equivalent to Matlabs "strsplit(str,delimiter)"
If you are already reading the file with something like
while read -r line; do
(and you should be), then pass two arguments to read instead:
while read -r filename somenumber; do
read will split the line on whitespace and assign the first field to filename and any remaining field(s) to somenumber.
Three (of many) solutions:
# Using awk
echo "$string" | awk '{ print $1 }'
# Using cut
echo "$string" | cut -d' ' -f1
# Using sed
echo "$string" | sed 's/\s.*$//g'
If you need to iterate trough each line of the file anyways, you can cut off everything behind the space with bash:
while read -r line ; do
# bash string manipulation removes the space at the end
# and everything which follows it
echo ${line// *}
done < file
This should work too:
line="${line% *}"
This cuts the string at it's last occurrence (from left) of a space. So it will work even if the path contains spaces (as long as it follows by a space at end).
while read -r line
do
{ rev | cut -d' ' -f2- | rev >> result.txt; } <<< $line
done < input.txt
This solution will work even if you have spaces in your filenames.

lowercase + capitalize + concatenate words of a string in shell (e.g. bash)

How to capitalize+concatenate words of a string?
(first letter uppercase and all other other letters lowercase)
example:
input = "jAMeS bOnD"
output = "JamesBond"
String manipulation available in bash version 4:
${variable,,} to lowercase all letters
${variable^} to uppercase first letter of each word
use ${words[*]^} instead of ${words[#]^} to save some script lines
And other improvements from mklement0 (see his comments):
Variable names in lower-case because upper-case ones may conflict with environment variables
Give meaningful names to variables (e.g. ARRAY -> words)
Use local to avoid impacting IFS outside the function (once is enougth)
Use local for all other local variables ( variable can be first declared, and later assigned)
ARRAY=( $LOWERCASE ) may expands globs (filename wildcards)
temporarily disable Pathname Expansion using set -f or shopt -so noglob
or use read -ra words <<< "$input" instead of words=( $input )
Ultimate function:
capitalize_remove_spaces()
{
local words IFS
read -ra words <<< "${#,,}"
IFS=''
echo "${words[*]^}"
}
If you want to keep alphanumeric characters only, extends the IFS built-in variable just before the read -ra words operation:
capitalize_remove_punctuation()
{
local words IFS=$' \t\n-\'.,;!:*?' #Handle hyphenated names and punctuation
read -ra words <<< "${#,,}"
IFS=''
echo "${words[*]^}"
}
Examples:
> capitalize_remove_spaces 'jAMeS bOnD'
JamesBond
> capitalize_remove_spaces 'jAMeS bOnD *'
JamesBond*
> capitalize_remove_spaces 'Jean-luc GRAND-PIERRE'
Jean-lucGrand-pierre
> capitalize_remove_punctuation 'Jean-luc GRAND-PIERRE'
JeanLucGrandPierre
> capitalize_remove_punctuation 'Jean-luc GRAND-PIERRE *'
JeanLucGrandPierre
Here's a bash 3+ solution that utilizes tr for case conversion (the case conversion operators (,, ^, ...) were introduced in bash 4):
input="jAMeS bOnD"
read -ra words <<<"$input" # split input into an array of words
output="" # initialize output variable
for word in "${words[#]}"; do # loop over all words
# add capitalized 1st letter
output+="$(tr '[:lower:]' '[:upper:]' <<<"${word:0:1}")"
# add lowercase version of rest of word
output+="$(tr '[:upper:]' '[:lower:]' <<<"${word:1}")"
done
Note:
Concatenation (removal of whitespace between words) happens implicitly by always directly appending to the output variable.
It's tempting to want to use words=( $input ) to split the input string into an array of words, but there's a gotcha: the string is subject to pathname expansion, so if a word happens to be a valid glob (e.g., *), it will be expanded (replaced with matching filenames), which is undesired; using read -ra to create the array avoids this problem (-a reads into an array, -r turns off interpretation of \ chars. in the input).
From other posts, I came up with this working script:
str="jAMeS bOnD"
res=""
split=`echo $str | sed -e 's/ /\n/g'` # Split with space as delimiter
for word in $split; do
word=${word,,} # Lowercase
word=${word^} # Uppercase first letter
res=$res$word # Concatenate result
done
echo $res
References:
Converting string to lower case in Bash shell scripting
How do I split a string on a delimiter in Bash?
Troubleshooting bash script to capitalize first letter in every word
Using awk it is little verbose but does the job::
s="jAMeS bOnD"
awk '{for (i=1; i<=NF; i++)
printf toupper(substr($i, 1, 1)) tolower(substr($i,2)); print ""}' <<< "$s"
JamesBond
echo -e '\n' "!!!!! PERMISSION to WRITE in /var/log/ DENIED !!!!!"
echo -e '\n'
echo "Do you want to continue?"
echo -e '\n' "Yes or No"
read -p "Please Respond_: " Response #get input from keyboard "yes/no"
#Capitalizing 'yes/no' with # echo $Response | awk '{print toupper($0)}' or echo $Response | tr [a-z] [A-Z]
answer=$(echo $Response | awk '{print toupper($0)}')
case $answer in
NO)
echo -e '\n' "Quitting..."
exit 1
;;
YES)
echo -e '\n' "Proceeding..."
;;
esac

How to extract last part of string in bash?

I have this variable:
A="Some variable has value abc.123"
I need to extract this value i.e abc.123. Is this possible in bash?
Simplest is
echo "$A" | awk '{print $NF}'
Edit: explanation of how this works...
awk breaks the input into different fields, using whitespace as the separator by default. Hardcoding 5 in place of NF prints out the 5th field in the input:
echo "$A" | awk '{print $5}'
NF is a built-in awk variable that gives the total number of fields in the current record. The following returns the number 5 because there are 5 fields in the string "Some variable has value abc.123":
echo "$A" | awk '{print NF}'
Combining $ with NF outputs the last field in the string, no matter how many fields your string contains.
Yes; this:
A="Some variable has value abc.123"
echo "${A##* }"
will print this:
abc.123
(The ${parameter##word} notation is explained in ยง3.5.3 "Shell Parameter Expansion" of the Bash Reference Manual.)
Some examples using parameter expansion
A="Some variable has value abc.123"
echo "${A##* }"
abc.123
Longest match on " " space
echo "${A% *}"
Some variable has value
Longest match on . dot
echo "${A%.*}"
Some variable has value abc
Shortest match on " " space
echo "${A%% *}"
some
Read more Shell-Parameter-Expansion
The documentation is a bit painful to read, so I've summarised it in a simpler way.
Note that the '*' needs to swap places with the ' ' depending on whether you use # or %. (The * is just a wildcard, so you may need to take off your "regex hat" while reading.)
${A% *} - remove shortest trailing * (strip the last word)
${A%% *} - remove longest trailing * (strip the last words)
${A#* } - remove shortest leading * (strip the first word)
${A##* } - remove longest leading * (strip the first words)
Of course a "word" here may contain any character that isn't a literal space.
You might commonly use this syntax to trim filenames:
${A##*/} removes all containing folders, if any, from the start of the path, e.g.
/usr/bin/git -> git
/usr/bin/ -> (empty string)
${A%/*} removes the last file/folder/trailing slash, if any, from the end:
/usr/bin/git -> /usr/bin
/usr/bin/ -> /usr/bin
${A%.*} removes the last extension, if any (just be wary of things like my.path/noext):
archive.tar.gz -> archive.tar
How do you know where the value begins? If it's always the 5th and 6th words, you could use e.g.:
B=$(echo "$A" | cut -d ' ' -f 5-)
This uses the cut command to slice out part of the line, using a simple space as the word delimiter.
As pointed out by Zedfoxus here. A very clean method that works on all Unix-based systems. Besides, you don't need to know the exact position of the substring.
A="Some variable has value abc.123"
echo "$A" | rev | cut -d ' ' -f 1 | rev
# abc.123
More ways to do this:
(Run each of these commands in your terminal to test this live.)
For all answers below, start by typing this in your terminal:
A="Some variable has value abc.123"
The array example (#3 below) is a really useful pattern, and depending on what you are trying to do, sometimes the best.
1. with awk, as the main answer shows
echo "$A" | awk '{print $NF}'
2. with grep:
echo "$A" | grep -o '[^ ]*$'
the -o says to only retain the matching portion of the string
the [^ ] part says "don't match spaces"; ie: "not the space char"
the * means: "match 0 or more instances of the preceding match pattern (which is [^ ]), and the $ means "match the end of the line." So, this matches the last word after the last space through to the end of the line; ie: abc.123 in this case.
3. via regular bash "indexed" arrays and array indexing
Convert A to an array, with elements being separated by the default IFS (Internal Field Separator) char, which is space:
Option 1 (will "break in mysterious ways", as #tripleee put it in a comment here, if the string stored in the A variable contains certain special shell characters, so Option 2 below is recommended instead!):
# Capture space-separated words as separate elements in array A_array
A_array=($A)
Option 2 [RECOMMENDED!]. Use the read command, as I explain in my answer here, and as is recommended by the bash shellcheck static code analyzer tool for shell scripts, in ShellCheck rule SC2206, here.
# Capture space-separated words as separate elements in array A_array, using
# a "herestring".
# See my answer here: https://stackoverflow.com/a/71575442/4561887
IFS=" " read -r -d '' -a A_array <<< "$A"
Then, print only the last elment in the array:
# Print only the last element via bash array right-hand-side indexing syntax
echo "${A_array[-1]}" # last element only
Output:
abc.123
Going further:
What makes this pattern so useful too is that it allows you to easily do the opposite too!: obtain all words except the last one, like this:
array_len="${#A_array[#]}"
array_len_minus_one=$((array_len - 1))
echo "${A_array[#]:0:$array_len_minus_one}"
Output:
Some variable has value
For more on the ${array[#]:start:length} array slicing syntax above, see my answer here: Unix & Linux: Bash: slice of positional parameters, and for more info. on the bash "Arithmetic Expansion" syntax, see here:
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arithmetic-Expansion
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Arithmetic
You can use a Bash regex:
A="Some variable has value abc.123"
[[ $A =~ [[:blank:]]([^[:blank:]]+)$ ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
Prints:
abc.123
That works with any [:blank:] delimiter in the current local (Usually [ \t]). If you want to be more specific:
A="Some variable has value abc.123"
pat='[ ]([^ ]+)$'
[[ $A =~ $pat ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
echo "Some variable has value abc.123"| perl -nE'say $1 if /(\S+)$/'

Resources