Bash, split string into array based on first n occurrences of a character - string

Input and desired output:
a-b-c-d-e -> [a,b,c-d-e]
a-b-c -> [a,b,c]
abd-der-asd-d -> [abd,der,asd-d]
a-b -> throw error
Currently I am doing as such:
arr = ($(echo "$stringVar" | tr '-' '\n'))
But this splits on all '-' instances; I am trying to replicate only for the first two '-' characters.
Any suggestion on how this can be done, using only bash functions?

Use read to read the input into 3 variables. This will split out the first two items, and keep the rest as a single value.
input=a-b-c-d-e
IFS=- read -r item1 item2 rest <<<"$input"
if [ -z "$rest" ]
echo "Error"
exit 1
fi
array=("$item1" "$item2" "$rest")

Related

Convert carriage return (\r) to actual overwrite

Questions
Is there a way to convert the carriage returns to actual overwrite in a string so that 000000000000\r1010 is transformed to 101000000000?
Context
1. Initial objective:
Having a number x (between 0 and 255) in base 10, I want to convert this number in base 2, add trailing zeros to get a 12-digits long binary representation, generate 12 different numbers (each of them made of the last n digits in base 2, with n between 1 and 12) and print the base 10 representation of these 12 numbers.
2. Example:
With x = 10
Base 2 is 1010
With trailing zeros 101000000000
Extract the 12 "leading" numbers: 1, 10, 101, 1010, 10100, 101000, ...
Convert to base 10: 1, 2, 5, 10, 20, 40, ...
3. What I have done (it does not work):
x=10
x_base2="$(echo "obase=2;ibase=10;${x}" | bc)"
x_base2_padded="$(printf '%012d\r%s' 0 "${x_base2}")"
for i in {1..12}
do
t=$(echo ${x_base2_padded:0:${i}})
echo "obase=10;ibase=2;${t}" | bc
done
4. Why it does not work
Because the variable x_base2_padded contains the whole sequence 000000000000\r1010. This can be confirmed using hexdump for instance. In the for loop, when I extract the first 12 characters, I only get zeros.
5. Alternatives
I know I can find alternative by literally adding zeros to the variable as follow:
x_base2=1010
x_base2_padded="$(printf '%s%0.*d' "${x_base2}" $((12-${#x_base2})) 0)"
Or by padding with zeros using printf and rev
x_base2=1010
x_base2_padded="$(printf '%012s' "$(printf "${x_base2}" | rev)" | rev)"
Although these alternatives solve my problem now and let me continue my work, it does not really answer my question.
Related issue
The same problem may be observed in different contexts. For instance if one tries to concatenate multiple strings containing carriage returns. The result may be hard to predict.
str=$'bar\rfoo'
echo "${str}"
echo "${str}${str}"
echo "${str}${str}${str}"
echo "${str}${str}${str}${str}"
echo "${str}${str}${str}${str}${str}"
The first echo will output foo. Although you might expect the other echo to output foofoofoo..., they all output foobar.
The following function overwrite transforms its argument such that after each carriage return \r the beginning of the string is actually overwritten:
overwrite() {
local segment result=
while IFS= read -rd $'\r' segment; do
result="$segment${result:${#segment}}"
done < <(printf '%s\r' "$#")
printf %s "$result"
}
Example
$ overwrite $'abcdef\r0123\rxy'
xy23ef
Note that the printed string is actually xy23ef, unlike echo $'abcdef\r0123\rxy' which only seems to print the same string, but still prints \r which is then interpreted by your terminal such that the result looks the same. You can confirm this with hexdump:
$ echo $'abcdef\r0123\rxy' | hexdump -c
0000000 a b c d e f \r 0 1 2 3 \r x y \n
000000f
$ overwrite $'abcdef\r0123\rxy' | hexdump -c
0000000 x y 2 3 e f
0000006
The function overwrite also supports overwriting by arguments instead of \r-delimited segments:
$ overwrite abcdef 0123 xy
xy23ef
To convert variables in-place, use a subshell: myvar=$(overwrite "$myvar")
With awk, you'd set the field delimiter to \r and iterate through fields printing only the visible portions of them.
awk -F'\r' '{
offset = 1
for (i=NF; i>0; i--) {
if (offset <= length($i)) {
printf "%s", substr($i, offset)
offset = length($i) + 1
}
}
print ""
}'
This is indeed too long to put into a command substitution. So you better wrap this in a function, and pipe the lines to be resolved to that.
To answer the specific question, how to convert 000000000000\r1010 to 101000000000, refer to Socowi's answer.
However, I wouldn't introduce the carriage return in the first place and solve the problem like this:
#!/usr/bin/env bash
x=$1
# Start with 12 zeroes
var='000000000000'
# Convert input to binary
binary=$(bc <<< "obase = 2; $x")
# Rightpad with zeroes: ${#binary} is the number of characters in $binary,
# and ${var:x} removes the first x characters from $var
var=$binary${var:${#binary}}
# Print 12 substrings, convert to decimal: ${var:0:i} extracts the first
# i characters from $var, and $((x#$var)) interprets $var in base x
for ((i = 1; i <= ${#var}; ++i)); do
echo "$((2#${var:0:i}))"
done

Reading from A to B, but stop at the first occurrence of B

I'm trying to do a shell script that reads from a file from a string A to a B string. The string A I'm sure that is UNIQUE, but the B string is repeated more than one time.
I'm reading from a file that contains a lot of CREATE queries.
each query ends with (my String B)
); ------------------------
String A is composed this way:
CREATE MULTISET TABLE DBNAME.TABLENAME
so I read with sed from A to B
sed -n "/$FROMSTR/,/$TOSTR/p" $2 >> querytest.txt
I want to stop to the first occurrence of $TOSTR (String B)
In place of:
sed -n "/$FROMSTR/,/$TOSTR/p"
use:
sed -n "/$FROMSTR/,\${p; /$TOSTR/q}"
This prints from the first occurrence of $FROMSTR to the last line $ except that it quits when it sees the first occurrence of $TOSTR.
Aside: You should be sure that you trust the source of FROMSTR and TOSTR. If either variable contained sed-active characters, the result might not be what you want.
Example 1
As a simple example:
$ FROMSTR=2; TOSTR=4; seq 10 | sed -n "/$FROMSTR/,\${p; /$TOSTR/q}"
2
3
4
Example 2
As an exampled closer to your actual input, consider this test file:
$ cat file
1
CREATE MULTISET TABLE DBNAME.TABLENAME
2
3
); ------------------------
4
And run this command:
$ FROMSTR="CREATE MULTISET TABLE DBNAME.TABLENAME"
$ TOSTR="); ------------------------"
$ sed -n "/$FROMSTR/,\${p; /$TOSTR/q}" file
CREATE MULTISET TABLE DBNAME.TABLENAME
2
3
); ------------------------
Example 3
Consider this test file:
$ cat file
1
CREATE MULTISET TABLE DBNAME.TABLENAME
2
); ------------------------
); ------------------------
3
CREATE MULTISET TABLE DBNAME.TABLENAME
4
); ------------------------
5
We define our variables:
$ FROMSTR="CREATE MULTISET TABLE DBNAME.TABLENAME"
$ TOSTR="); ------------------------"
And, run our code:
$ sed -n "/$FROMSTR/,\${p; /$TOSTR/q}" file
CREATE MULTISET TABLE DBNAME.TABLENAME
2
); ------------------------
I solved my problem, seems that sed has some problem in managing escape characters (\r\n)
I changed my $TOSTR to ");"
and used a loop.
sed -n "/$FROMSTR/{p; :loop n; p; /$TOSTR/q; b loop}" $2 >> $3
then i echo the characters that i need after ");"
echo -e "\r\n--------------------------------------------------------------------------------" >> $3
An useful one on stackoverflow that explain loop

How to split a string and extract its specific elements?

I need split a string and extract specific elements.
For instance, I have str
str='C50F2N2Ne50A13.224343968R2'
And than, I need extract ...
C = 50
F = 2
N = 2
Ne = 50
A = 13.224343968
R = 2
Other example ...
str='C5F10N2Ne5A2.0330517838R2'
And than, I need extract
C = 5
F = 10
N = 2
Ne = 5
A = 2.0330517838
R = 2
My first idea was to extract uppercase characters ...
classes=$(tr -dc '[:upper:]' <<< $name)
But It return only CFNNAR.
My second idea was to split by specific character [delimiter]
classes=(${name//F/ })
classes=(${classes//C/ })
But I can not isolate the values.
I tried split by number of characters, but each part of string can vary its size.
I would appreciate if someone could help me with this problem. :)
Besides all the usual caveats about never using eval, try:
eval $( echo $str | sed 's/\([A-Za-z][A-Za-z]*\)/ \1=/g')
or, get all modern and use:
echo $str | sed -E 's/([[:alpha:]]+)/ \1=/g'

remove end of line characters with a bash script?

I'm trying to make a script to remove this characters (/r/n) that windows puts. BUT ONLY if they are between this ( " ) why this?
because the dump file puts this characters I don't know why.
and why between quotes? because it only affect me if they are chopping my result
For Example. "this","is","a","result","from","database"
the problem :
"this","is","a","result","from","da
tabase"
[EDIT]
Thanks to the answer of #Cyrus I got something like this
, but it gets bad flag in substitute command '}' I'm on MAC OSX
Can you help me?
Thanks
OS X uses a different sed than the one that's typically installed in Linux.
The big differences are that sequences like \r and \n don't get expanded or used as part of the expression as you might expect, and you tend to need to separate commands with semicolons a little more.
If you can get by with a sed one-liner that implements a rule like "Remove any \r\n on lines containing quotes", it will certainly simplify your task...
For my experiments, I used what I infer is your sample input data:
$ od -c input.txt
0000000 F o r E x a m p l e . " t h
0000020 i s " , " i s " , " a " , " r e
0000040 s u l t " , " f r o m " , " d a
0000060 t a \r \n b a s e " \n
0000072
First off, a shell-only solution might be to use smaller tools that are built in to the operating system. For example, here's a one-liner:
od -A n -t o1 -v input.txt | rs 0 1 | while read n; do [ $n -eq 015 ] && read n && continue; printf "\\$n"; done
Broken out for easier reading, here's what this looks like:
od -A n -t o1 -v input.txt | rs 0 1 - convert the file into a stream of ocal numbers
| while read n; do - step through the numbers...
[ $n -eq 015 ] && - if the current number is 15 (i.e. octal for a Carriage Return)
read n - read a line (thus skipping it),
&& continue - and continue to the next octal number (thus skipping the newline after a CR)
printf "\\$n"; done - print the current octal number.
This kind of data conversion and stream logic works nicely in a pipeline, but is a bit harder to implement in sed, which only knows how to deal with the original input rather than its converted form.
Another bash option might be to use conditional expressions matching the original lines of input:
while read line; do
if [[ $line =~ .*\".*$'\r'$ ]]; then
echo -n "${line:0:$((${#line}-1))}"
else
echo "$line"
fi
done < input.txt
This walks through text, and if it sees a CR, it prints everything up to and not including it, with no trailing newline. For all other lines, it just prints them as usual. The result is that lines that had a carriage return are joined, other lines are not.
From sed's perspective, we're dealing with two input lines, the first of which ends in a carriage return. The strategy for this would be to search for carriage returns, remove them and join the lines. I struggled for a while trying to come up with something that would do this, then gave up. Not to say it's impossible, but I suspect a generally useful script will be lengthy (by sed standards).

edit ASCII value of a character in bash

I am trying to update the ASCII value of each character of a string array in bash on which I want to add 2 to the existing character ASCII value.
Example:
declare -a x =("j" "a" "f" "a" "r")
I want to update the ASCII value incrementing the existing by 2 , such "j" will become "l"
I can't find anything dealing with the ASCII value beyond
print f '%d' "'$char"
Can anyone help me please?
And also when I try to copy an array into another it doesn't work
note that I am using
declare -a temp=("${x[#]}")
What is wrong with it?
You can turn an integer into a char by first using printf to turn it into an octal escape sequence (like \123) and then using that a printf format string to produce the character:
#!/bin/bash
char="j"
printf -v num %d "'$char"
(( num += 2 ))
printf -v newchar \\$(printf '%03o' "$num")
echo "$newchar"
This only works for ASCII.
It seems tr can help you here:
y=($(echo ${x[#]} | tr a-z c-zab))
tr maps characters from one set to another. In this example, from the set of a b c ... z, it maps to c d e ... z a b. So, you're effectively "rotating" the characters. This principle is used by the ROT13 cipher.

Resources