Linux Command to make columns separated by multiple delimeters

Linux Command to make columns separated by multiple delimeters - linux

Want to convert the following pattern
ab
cd
de
fg as
'ab','cd','de','fg' using unix / linux command .
----Guys -------- The patern is as following
QRTC1065173134
QRTC3988977812
QRTC0889556882
QUTR1641276912
ABCD1763495154
QRTC3991601819
and this is the required pattern 'QRTC1065173134','QRTC3988977812','QRTC0889556882','QUTR1641276912','ABCD1763495154','QRTC3991601819'

I agree with the comments that it's a bit unclear but, for fun sake:
A="ab cd ef gh "
echo $A | sed -e "s/^\s*/'/; s/ \{1,\}/','/g; s/\s*$/'/g"
Since the question wasn't precise, I only worked on spaces & string boundaries. So it will work with any number of characters, separated space(s). The result is also trimmed at both ends. HTH.

From the clarification, it seems that this is what you want.
$ echo "QRTC1065173134 QRTC3988977812 QRTC0889556882 QUTR1641276912 ABCD1763495154 QRTC3991601819" | sed -E "s/ /', '/g" | sed -E "s/$/'/" | sed -E "s/^/'/"
'QRTC1065173134', 'QRTC3988977812', 'QRTC0889556882', 'QUTR1641276912', 'ABCD1763495154', 'QRTC3991601819'
Here "E" is for extended regex so that we do not need to escape the regex metacharacters.
COMMENT 1: Removing an extra whitespace is left as an exercise for you.

If I understand correctly, that you have groups (e.g. ab bc de ...) separated by spaces, where you want to include a ' at the beginning/end of everything, and replace the spaces with ',' then sed can handle this with relative ease. Below ab cd ... can be any string of characters, such as QRTC1065173134. There are several ways to piece together a matching regular expression, but the following is fairly simple:
sed -e "s/\s/','/g" -e "s/^/'/" -e "s/$/'/"
example
$ echo "ab cd de fg hi" | sed -e "s/\s/','/g" -e "s/^/'/" -e "s/$/'/"
'ab','cd','de','fg','hi'
or
$ echo "QRTC1065173134 QRTC3988977812 QRTC0889556882" | sed -e "s/\s/','/g" -e "s/^/'/" -e "s/$/'/"
'QRTC1065173134','QRTC3988977812','QRTC0889556882'

Related

Complicated string replacement in files

So, I have some strings to replace in some data files for a game I'm working on. I'm fine with using either sed or awk. I need help programming the bash script to replace this, so here's a list of things I want to replace. (Keep in mind that I will be using X in brackets in place of numbers, also, for strings 1 and 2 there is a space after the string I need to replace, which I need to replace the space as well.)
Replace every instance of \\P[x]: with \\n<\\P[x]>\\pf[x]
Replace every instance of \\N[x]: with \\n<\\N[x]>\\af[x]
Replace every instance of "volume":80 with "volume":90
Replace every instance of ... with \\..\\..\\..\\.
Replace every instance of "]},{"code":401,"indent":0,"parameters":[" with "]},{"code":401,"indent":0,"parameters":[" (There is a space at the beginning of the string I need to replace, I want to move that to the end of the string.)
Here's what I have tried:
sed -e 's/"volume":80/"volume":90/g' -e 's///g' -e 's/\ \"]},{\"code\":401,\"indent\":0,\"parameters\":[\"/\"]},{\"code\":401,\"indent\":0,\"parameters\":[\"\ /g' < ${file} > ${outdir}/${filename}.${extension}
I get this error: `sed: -e expression #3, char 109: unterminated s' command
And here's a script I created, can't figure out how to get it to work right:
#!/bin/bash
direct=./1
outdir=./2
mkdir -p $direct
mkdir -p $outdir
for file in ${direct}/*.*; do
filepath=$(basename "$file")
extension="${filepath##*.}"
filename="${filepath%.*}"
strb='"volume":80'
strr='"volume":90'
echo 1 && sed -e "s#\"${strb}\"#\"${strr}\"#g" -i ${file}
strb='...'
strr='\\..\\..\\..\\.'
echo 2 && sed -e "s#\"${strb}\"#\"${strr}\"#g" -i ${file}
num=0
while [ "$num" -le "99" ]; do
strb='\\P['"$num"']\: '
strr='\\n<\\P['"$num"']>\\pf['"$num"']'
sed -e "s#\"${strb}\"#\"${strr}\"#g" -i ${file}
strb='\\N['"$num"']\: '
strr='\\n<\\N['"$num"']>\\af['"$num"']'
sed -e "s#\"${strb}\"#\"${strr}\"#g" -i ${file}
num=$(echo $num+1 | bc -l)
done
strb=' "]},{"code":401,"indent":0,"parameters":["'
strr='"]},{"code":401,"indent":0,"parameters":[" '
echo 5 && sed -e "s#\"${strb}\"#\"${strr}\"#g" -i ${file}
strb='Move1'
strr='Move'
echo 6 && sed -e "s#\"${strb}\"#\"${strr}\"#g" -i ${file}
strb='Move'
strr='Move1'
echo 7 && sed -e "s#\"${strb}\"#\"${strr}\"#g" -i ${file}
echo ${filename}.${extension}
done
On the fifth instance of sed, I get unterminated s' command

I think you get unterminated s' command on the fifth instance of sed in your script because your s' command is terminated with double-quotes, but your $strb and $strr strings also have unescaped double quotes in them. This is a problem for your entire script, but luckily, the fifth instance happens to have an odd number of double quotes, causing a syntax error and revealing the problem. Since none of your strings have single-quotes, using only single-quotes to terminate sed and wrapping all of your variables in double-quotes to preserve their literals is a safe bet, e.g.,
sed -e 's#'"${strb}"'#'"${strr}"'#g' -i ${file}
You also have some [] and some . in your strings. These have a special meaning to sed, so you should escape those (for the pattern, not the replacement), e.g.,
change
strb='...'
to
strb='\.\.\.'
and change
strb=' "]},{"code":401,"indent":0,"parameters":["'
to
strb=' "\]},{"code":401,"indent":0,"parameters":\["'
See https://unix.stackexchange.com/a/33005/65267 for more on characters with special meaning to sed.

How to extract numbers from a string?

I have string contains a path
string="toto.titi.12.tata.2.abc.def"
I want to extract only the numbers from this string.
To extract the first number:
tmp="${string#toto.titi.*.}"
num1="${tmp%.tata*}"
To extract the second number:
tmp="${string#toto.titi.*.tata.*.}"
num2="${tmp%.abc.def}"
So to extract a parameter I have to do it in 2 steps. How to extract a number with one step?

You can use tr to delete all of the non-digit characters, like so:
echo toto.titi.12.tata.2.abc.def | tr -d -c 0-9

To extract all the individual numbers and print one number word per line pipe through -
tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed 's/ /\n/g'
Breakdown:
Replaces all line breaks with spaces: tr '\n' ' '
Replaces all non numbers with spaces: sed -e 's/[^0-9]/ /g'
Remove leading white space: -e 's/^ *//g'
Remove trailing white space: -e 's/ *$//g'
Squeeze spaces in sequence to 1 space: tr -s ' '
Replace remaining space separators with line break: sed 's/ /\n/g'
Example:
echo -e " this 20 is 2sen\nten324ce 2 sort of" | tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed 's/ /\n/g'
Will print out
20
2
324
2

Here is a short one:
string="toto.titi.12.tata.2.abc.def"
id=$(echo "$string" | grep -o -E '[0-9]+')
echo $id // => output: 12 2
with space between the numbers.
Hope it helps...

Parameter expansion would seem to be the order of the day.
$ string="toto.titi.12.tata.2.abc.def"
$ read num1 num2 <<<${string//[^0-9]/ }
$ echo "$num1 / $num2"
12 / 2
This of course depends on the format of $string. But at least for the example you've provided, it seems to work.
This may be superior to anubhava's awk solution which requires a subshell. I also like chepner's solution, but regular expressions are "heavier" than parameter expansion (though obviously way more precise). (Note that in the expression above, [^0-9] may look like a regex atom, but it is not.)
You can read about this form or Parameter Expansion in the bash man page. Note that ${string//this/that} (as well as the <<<) is a bashism, and is not compatible with traditional Bourne or posix shells.

This would be easier to answer if you provided exactly the output you're looking to get. If you mean you want to get just the digits out of the string, and remove everything else, you can do this:
d#AirBox:~$ string="toto.titi.12.tata.2.abc.def"
d#AirBox:~$ echo "${string//[a-z,.]/}"
122
If you clarify a bit I may be able to help more.

You can also use sed:
echo "toto.titi.12.tata.2.abc.def" | sed 's/[0-9]*//g'
Here, sed replaces
any digits (class [0-9])
repeated any number of times (*)
with nothing (nothing between the second and third /),
and g stands for globally.
Output will be:
toto.titi..tata..abc.def

Convert your string to an array like this:
$ str="toto.titi.12.tata.2.abc.def"
$ arr=( ${str//[!0-9]/ } )
$ echo "${arr[#]}"
12 2

Use regular expression matching:
string="toto.titi.12.tata.2.abc.def"
[[ $string =~ toto\.titi\.([0-9]+)\.tata\.([0-9]+)\. ]]
# BASH_REMATCH[0] would be "toto.titi.12.tata.2.", the entire match
# Successive elements of the array correspond to the parenthesized
# subexpressions, in left-to-right order. (If there are nested parentheses,
# they are numbered in depth-first order.)
first_number=${BASH_REMATCH[1]}
second_number=${BASH_REMATCH[2]}

Using awk:
arr=( $(echo $string | awk -F "." '{print $3, $5}') )
num1=${arr[0]}
num2=${arr[1]}

Hi adding yet another way to do this using 'cut',
echo $string | cut -d'.' -f3,5 | tr '.' ' '
This gives you the following output:
12 2

Fixing newline issue (for mac terminal):
cat temp.txt | tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed $'s/ /\\\n/g'

Assumptions:
there is no embedded white space
the string of text always has 7 period-delimited strings
the string always contains numbers in the 3rd and 5th period-delimited positions
One bash idea that does not require spawning any subprocesses:
$ string="toto.titi.12.tata.2.abc.def"
$ IFS=. read -r x1 x2 num1 x3 num2 rest <<< "${string}"
$ typeset -p num1 num2
declare -- num1="12"
declare -- num2="2"
In a comment OP has stated they wish to extract only one number at a time; the same approach can still be used, eg:
$ string="toto.titi.12.tata.2.abc.def"
$ IFS=. read -r x1 x2 num1 rest <<< "${string}"
$ typeset -p num1
declare -- num1="12"
$ IFS=. read -r x1 x2 x3 x4 num2 rest <<< "${string}"
$ typeset -p num2
declare -- num2="2"
A variation on anubhava's answer that uses parameter expansion instead of a subprocess call to awk, and still working with the same set of initial assumptions:
$ arr=( ${string//./ } )
$ num1=${arr[2]}
$ num2=${arr[4]}
$ typeset -p num1 num2
declare -- num1="12"
declare -- num2="2"

Bash Shell - Return substring after second occurrence of certain character

I need to return everything after a delimeter I decide but still don't fully know how to use sed.
What I need to do is:
$ echo "ABC DE,FG_HI J,123.XYZ-A1,DD/MM/YYYY HH24:MI:SS,,," \
| sed <some regexp>
For this example the return should be (substring)everything after the second comma:
123.XYZ-A1,DD/MM/YYYY HH24:MI:SS,,,
I can do this with cut like this:
echo "ABC DE,FG_HI J,123.XYZ-A1,DD/MM/YYYY HH24:MI:SS,,," | cut -d',' -f 2-
but I've been told cut is slower than sed...
Can some guru who has them (and wants to... :) ) give me a few minutes of his time and advice me please?
Thanks!
Leo

In my experience cut is always faster than sed.
To do what you want with sed you could use a non-matching group:
echo 'ABC DE,FG_HI J,123.XYZ-A1,DD/MM/YYYY HH24:MI:SS,,,' |
sed -r 's/([^,]*,){2}//'
This removes the first two fields (if the fields do not contain commas themselves) by removing non-comma characters [^,] followed by a comma twice {2}.
Output:
123.XYZ-A1,DD/MM/YYYY HH24:MI:SS,,,

You could also try doing the extraction in bash without spawning an external process at all:
$ [[ 'ABC DE,FG_HI J,123.XYZ-A1,DD/MM/YYYY HH24:MI:SS,,,' =~ [^,]*,[^,]*,(.*) ]]
$ echo "${BASH_REMATCH[#]}"
123.XYZ-A1,DD/MM/YYYY HH24:MI:SS,,,
or
$ FOO='ABC DE,FG_HI J,123.XYZ-A1,DD/MM/YYYY HH24:MI:SS,,,'
$ echo ${FOO/+([^,]),+([^,]),}
or
$ IFS=, read -a FOO <<< 'ABC DE,FG_HI J,123.XYZ-A1,DD/MM/YYYY HH24:MI:SS,,,'
$ echo ${FOO[#]:2}
(Assuming this is for a one-off match, not iterating over the contents of a file.)

This method is by find the index of second occurrence of a character and using bash substring to get the required result
input="ABC DE,FG_HI J,123.XYZ-A1,DD/MM/YYYY HH24:MI:SS,,,"
index=$(($(echo $input| grep -aob '/' | grep -oE '[0-9]+' | awk 'NR==2') + 1))
result=${input:$index}

linux shell title case

I am wrinting a shell script and have a variable like this: something-that-is-hyphenated.
I need to use it in various points in the script as:
something-that-is-hyphenated, somethingthatishyphenated, SomethingThatIsHyphenated
I have managed to change it to somethingthatishyphenated by stripping out - using sed "s/-//g".
I am sure there is a simpler way, and also, need to know how to get the camel cased version.
Edit: Working function derived from #Michał's answer
function hyphenToCamel {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,1,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hyphenToCamel)
echo $CAMEL
Edit: Finally, a sed one liner thanks to #glenn
echo a-hyphenated-string | sed -E "s/(^|-)([a-z])/\u\2/g"

a GNU sed one-liner
echo something-that-is-hyphenated |
sed -e 's/-\([a-z]\)/\u\1/g' -e 's/^[a-z]/\u&/'
\u in the replacement string is documented in the sed manual.

Pure bashism:
var0=something-that-is-hyphenated
var1=(${var0//-/ })
var2=${var1[*]^}
var3=${var2// /}
echo $var3
SomethingThatIsHyphenated
Line 1 is trivial.
Line 2 is the bashism for replaceAll or 's/-/ /g', wrapped in parens, to build an array.
Line 3 uses ${foo^}, which means uppercase (while ${foo,} would mean 'lowercase' [note, how ^ points up while , points down]) but to operate on every first letter of a word, we address the whole array with ${foo[*]} (or ${foo[#]}, if you would prefer that).
Line 4 is again a replace-all: blank with nothing.
Line 5 is trivial again.

You can define a function:
hypenToCamel() {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,0,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hypenToCamel)
echo $CAMEL

In the shell you are stuck with being messy:
aa="aaa-aaa-bbb-bbb"
echo " $aa" | sed -e 's/--*/ /g' -e 's/ a/A/g' -e 's/ b/B/g' ... -e 's/ *//g'
Note the carefully placed space in the echo and the double space in the last -e.
I leave it as an exercise to complete the code.
In perl it is a bit easier as a one-line shell command:
perl -e 'print map{ $a = ucfirst; $a =~ s/ +//g; $a} split( /-+/, $ARGV[0] ), "\n"' $aa

For the records, here's a pure Bash safe method (that is not subject to pathname expansion)—using Bash≥4:
var0=something-that-is-hyphenated
IFS=- read -r -d '' -a var1 < <(printf '%s\0' "${var0,,}")
printf '%s' "${var1[#]^}"
This (safely) splits the lowercase expansion of var0 at the hyphens, with each split part in array var1. Then we use the ^ parameter expansion to uppercase the first character of the fields of this array, and concatenate them.
If your variable may also contain spaces and you want to act on them too, change IFS=- into IFS='- '.

How do I replace backspace characters (\b) using sed?

I want to delete a fixed number of some backspace characters ocurrences ( \b ) from stdin. So far I have tried this:
echo -e "1234\b\b\b56" | sed 's/\b{3}//'
But it doesn't work. How can I achieve this using sed or some other unix shell tool?

You can use the hexadecimal value for backspace:
echo -e "1234\b\b\b56" | sed 's/\x08\{3\}//'
You also need to escape the braces.

You can use tr:
echo -e "1234\b\b\b56" | tr -d '\b'
123456
If you want to delete three consecutive backspaces, you can use Perl:
echo -e "1234\b\b\b56" | perl -pe 's/(\010){3}//'

sed interprets \b as a word boundary. I got this to work in perl like so:
echo -e "1234\b\b\b56" | perl -pe '$b="\b";s/$b//g'

With sed:
echo "123\b\b\b5" | sed 's/[\b]\{3\}//g'
You have to escape the { and } in the {3}, and also treat the \b special by using a character class.
[birryree#lilun ~]$ echo "123\b\b\b5" | sed 's/[\b]\{3\}//g'
1235

Note if you want to remove the characters being deleted also, have a look at ansi2html.sh which contains processing like:
printf "12..\b\b34\n" | sed ':s; s#[^\x08]\x08##g; t s'

No need for Perl here!
# version 1
echo -e "1234\b\b\b56" | sed $'s/\b\{3\}//' | od -c
# version 2
bvar="$(printf '%b' '\b')"
echo -e "1234\b\b\b56" | sed 's/'${bvar}'\{3\}//' | od -c

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Linux Command to make columns separated by multiple delimeters - linux

Related

Complicated string replacement in files

How to extract numbers from a string?

Bash Shell - Return substring after second occurrence of certain character

linux shell title case

How do I replace backspace characters (\b) using sed?

Categories

Resources