Convert carriage return (\r) to actual overwrite - string

Questions
Is there a way to convert the carriage returns to actual overwrite in a string so that 000000000000\r1010 is transformed to 101000000000?
Context
1. Initial objective:
Having a number x (between 0 and 255) in base 10, I want to convert this number in base 2, add trailing zeros to get a 12-digits long binary representation, generate 12 different numbers (each of them made of the last n digits in base 2, with n between 1 and 12) and print the base 10 representation of these 12 numbers.
2. Example:
With x = 10
Base 2 is 1010
With trailing zeros 101000000000
Extract the 12 "leading" numbers: 1, 10, 101, 1010, 10100, 101000, ...
Convert to base 10: 1, 2, 5, 10, 20, 40, ...
3. What I have done (it does not work):
x=10
x_base2="$(echo "obase=2;ibase=10;${x}" | bc)"
x_base2_padded="$(printf '%012d\r%s' 0 "${x_base2}")"
for i in {1..12}
do
t=$(echo ${x_base2_padded:0:${i}})
echo "obase=10;ibase=2;${t}" | bc
done
4. Why it does not work
Because the variable x_base2_padded contains the whole sequence 000000000000\r1010. This can be confirmed using hexdump for instance. In the for loop, when I extract the first 12 characters, I only get zeros.
5. Alternatives
I know I can find alternative by literally adding zeros to the variable as follow:
x_base2=1010
x_base2_padded="$(printf '%s%0.*d' "${x_base2}" $((12-${#x_base2})) 0)"
Or by padding with zeros using printf and rev
x_base2=1010
x_base2_padded="$(printf '%012s' "$(printf "${x_base2}" | rev)" | rev)"
Although these alternatives solve my problem now and let me continue my work, it does not really answer my question.
Related issue
The same problem may be observed in different contexts. For instance if one tries to concatenate multiple strings containing carriage returns. The result may be hard to predict.
str=$'bar\rfoo'
echo "${str}"
echo "${str}${str}"
echo "${str}${str}${str}"
echo "${str}${str}${str}${str}"
echo "${str}${str}${str}${str}${str}"
The first echo will output foo. Although you might expect the other echo to output foofoofoo..., they all output foobar.

The following function overwrite transforms its argument such that after each carriage return \r the beginning of the string is actually overwritten:
overwrite() {
local segment result=
while IFS= read -rd $'\r' segment; do
result="$segment${result:${#segment}}"
done < <(printf '%s\r' "$#")
printf %s "$result"
}
Example
$ overwrite $'abcdef\r0123\rxy'
xy23ef
Note that the printed string is actually xy23ef, unlike echo $'abcdef\r0123\rxy' which only seems to print the same string, but still prints \r which is then interpreted by your terminal such that the result looks the same. You can confirm this with hexdump:
$ echo $'abcdef\r0123\rxy' | hexdump -c
0000000 a b c d e f \r 0 1 2 3 \r x y \n
000000f
$ overwrite $'abcdef\r0123\rxy' | hexdump -c
0000000 x y 2 3 e f
0000006
The function overwrite also supports overwriting by arguments instead of \r-delimited segments:
$ overwrite abcdef 0123 xy
xy23ef
To convert variables in-place, use a subshell: myvar=$(overwrite "$myvar")

With awk, you'd set the field delimiter to \r and iterate through fields printing only the visible portions of them.
awk -F'\r' '{
offset = 1
for (i=NF; i>0; i--) {
if (offset <= length($i)) {
printf "%s", substr($i, offset)
offset = length($i) + 1
}
}
print ""
}'
This is indeed too long to put into a command substitution. So you better wrap this in a function, and pipe the lines to be resolved to that.

To answer the specific question, how to convert 000000000000\r1010 to 101000000000, refer to Socowi's answer.
However, I wouldn't introduce the carriage return in the first place and solve the problem like this:
#!/usr/bin/env bash
x=$1
# Start with 12 zeroes
var='000000000000'
# Convert input to binary
binary=$(bc <<< "obase = 2; $x")
# Rightpad with zeroes: ${#binary} is the number of characters in $binary,
# and ${var:x} removes the first x characters from $var
var=$binary${var:${#binary}}
# Print 12 substrings, convert to decimal: ${var:0:i} extracts the first
# i characters from $var, and $((x#$var)) interprets $var in base x
for ((i = 1; i <= ${#var}; ++i)); do
echo "$((2#${var:0:i}))"
done

Related

How do I get AWK to reaarrange and manipulate text in a file to two output files depending on conditions?

I tried to find an efficient way to split then recombine text in one file into two seperate files. it's got a lot going on like removing the decimal point, reversing the sign (+ becomes - and - becomes +) in amount field and padding. For example:
INPUT file input.txt:
(this first line is there just to give character position more easily instead of counting, it's not present in the input file, the "|" is just there to illustrate position only)
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345
| | | | | | | ("|" shows position)
123456789XXPPPPPPPPPP NNNNNN#1404.58 #0.00 0 1
987654321YYQQQQQQQQQQ NNNNNN#-97.73 #-97.73 1 1
777777777XXGGGGGGGGGG NNNNNN#115.92 #115.92 0 0
888888888YYHHHHHHHHHH NNNNNN#3.24 #3.24 1 0
Any line that contains a "1" as the 85th character above goes to one file say OutputA.txt rearranged like this:
PPPPPPPPPP~~NNNNNN123456789XX~~~-0000140458-0000000000
QQQQQQQQQQ~~NNNNNN987654321YY~~~+0000009773+0000009773
As well as any line that contains a "0" as the 85th character above goes to another file OutputB.txt rearranged like this:
GGGGGGGGGG~~NNNNNN777777777XX~~~-0000011592-0000011592
HHHHHHHHHH~~NNNNNN888888888YY~~~-0000000324-0000000324
It seems so complicated, but if I could just grab each portion of the input lines as different variables and then write them out in a different order with right alignment for the amount padded with 0s and splitting them into different files depending on the last column. Not sure how I can put all these things together in one go.
I tried printing out each line into a different file depending whether the 85th charater is a 1 or 0, then then trying to create variables say from first character to 11th character is varA and the next 10 is varB etc... but it get complex quickly because I need to change + to - and - to + and then pad with zeros and change te spacing. it gets a bit mad. This should be possible with one script but I just can't put all the pieces together.
I've looked for tutorials but nothing seems to cover grabbing based on condition whilst at the same time padding, rearranging, splitting etc.
Many thanks in advance
split
Use GNU AWK ability to print to file, consider following simple example
seq 20 | awk '$1%2==1{print $0 > "fileodd.txt"}$1%2==0{print $0 > "fileeven.txt"}'
which does read output of seq 20 (numbers from 1 to 20, inclusive, each on separate line) and does put odd numbers to fileodd.txt and even number to fileeven.txt
recombine text
Use substr and string contatenation for that task, consider following simple example, say you have file.txt with DD-MM-YYYY dates like so
01-29-2022
01-30-2022
01-31-2022
but you want YYYY-MM-DD then you could do that by
awk '{print substr($0,7,4) "-" substr($0,1,2) "-" substr($0,4,2)}' file.txt
which gives output
2022-01-29
2022-01-30
2022-01-31
substr arguments are: string ($0 is whole line), start position and length, space is concatenation operator.
removing the decimal point
Use gsub with second argument set to empty string to delete unwanted characters, but keep in mind . has special meaning in regular expression, consider following simple example, let file.txt content be
100.15
200.30
300.45
then
awk '{gsub(/[.]/,"");print}' file.txt
gives output
10015
20030
30045
Observe that /[.]/ not /./ is used and gsub does change in-place.
reversing the sign(...)padding
Multiply by -1, then use sprintf with suitable modifier, consider following example, let file.txt content be
1
-10
100
then
awk '{print "Reversed value is " sprintf("%+05d",-1*$1)}' file.txt
gives output
Reversed value is -0001
Reversed value is +0010
Reversed value is -0100
Explanation: % - this is place where value will be instered, + - prefix using - or +, 05 - pad with leading zeros to width of 5 characters, d assume value is integer. sprintf does return formatted string which can be concatenated with other string as shown above.
(tested in GNU Awk 5.0.1)
You can use jq for this task:
#!/bin/bash
INPUT='
123456789XXPPPPPPPPPP NNNNNN#1404.58 #0.00 0 1
987654321YYQQQQQQQQQQ NNNNNN#-97.73 #-97.73 1 1
777777777XXGGGGGGGGGG NNNNNN#115.92 #115.92 0 0
888888888YYHHHHHHHHHH NNNNNN#3.24 #3.24 1 0
'
convert() {
jq -rR --arg lineSelector "$1" '
def transformNumber($len):
tonumber | # convert string to number
(if . < 0 then "+" else "-" end) as $sign | # store inverted sign
if . < 0 then 0 - . else . end | # abs(number)
. * 100 | # number * 100
tostring | # convert number back to string
$sign + "0" * ($len - length) + .; # indent with leading zeros
# Main program
split(" ") | # split each line by space
map(select(length > 0)) | # remove empty entries
select(.[4] == $lineSelector) | # keep only lines with the selected value in last column
# generate output # example for first line
.[0][11:21] + # PPPPPPPPPP
"~~" + # ~~
(.[1] | split("#")[0]) + # NNNNNN
.[0][0:11] + # 123456789XX
"~~~" + # ~~~
(.[1] | split("#")[1] | transformNumber(10)) + # -0000140458
(.[2] | split("#")[1] | transformNumber(10)) # -0000000000
' <<< "$2"
}
convert 0 "$INPUT" # or convert 1 "$INPUT"
Output for 0
GGGGGGGGGG~~NNNNNN777777777XX~~~-0000011592-0000011592
HHHHHHHHHH~~NNNNNN888888888YY~~~-0000000324-0000000324
Output for 1
PPPPPPPPPP~~NNNNNN123456789XX~~~-0000140458-0000000000
QQQQQQQQQQ~~NNNNNN987654321YY~~~+0000009773+0000009773

remove end of line characters with a bash script?

I'm trying to make a script to remove this characters (/r/n) that windows puts. BUT ONLY if they are between this ( " ) why this?
because the dump file puts this characters I don't know why.
and why between quotes? because it only affect me if they are chopping my result
For Example. "this","is","a","result","from","database"
the problem :
"this","is","a","result","from","da
tabase"
[EDIT]
Thanks to the answer of #Cyrus I got something like this
, but it gets bad flag in substitute command '}' I'm on MAC OSX
Can you help me?
Thanks
OS X uses a different sed than the one that's typically installed in Linux.
The big differences are that sequences like \r and \n don't get expanded or used as part of the expression as you might expect, and you tend to need to separate commands with semicolons a little more.
If you can get by with a sed one-liner that implements a rule like "Remove any \r\n on lines containing quotes", it will certainly simplify your task...
For my experiments, I used what I infer is your sample input data:
$ od -c input.txt
0000000 F o r E x a m p l e . " t h
0000020 i s " , " i s " , " a " , " r e
0000040 s u l t " , " f r o m " , " d a
0000060 t a \r \n b a s e " \n
0000072
First off, a shell-only solution might be to use smaller tools that are built in to the operating system. For example, here's a one-liner:
od -A n -t o1 -v input.txt | rs 0 1 | while read n; do [ $n -eq 015 ] && read n && continue; printf "\\$n"; done
Broken out for easier reading, here's what this looks like:
od -A n -t o1 -v input.txt | rs 0 1 - convert the file into a stream of ocal numbers
| while read n; do - step through the numbers...
[ $n -eq 015 ] && - if the current number is 15 (i.e. octal for a Carriage Return)
read n - read a line (thus skipping it),
&& continue - and continue to the next octal number (thus skipping the newline after a CR)
printf "\\$n"; done - print the current octal number.
This kind of data conversion and stream logic works nicely in a pipeline, but is a bit harder to implement in sed, which only knows how to deal with the original input rather than its converted form.
Another bash option might be to use conditional expressions matching the original lines of input:
while read line; do
if [[ $line =~ .*\".*$'\r'$ ]]; then
echo -n "${line:0:$((${#line}-1))}"
else
echo "$line"
fi
done < input.txt
This walks through text, and if it sees a CR, it prints everything up to and not including it, with no trailing newline. For all other lines, it just prints them as usual. The result is that lines that had a carriage return are joined, other lines are not.
From sed's perspective, we're dealing with two input lines, the first of which ends in a carriage return. The strategy for this would be to search for carriage returns, remove them and join the lines. I struggled for a while trying to come up with something that would do this, then gave up. Not to say it's impossible, but I suspect a generally useful script will be lengthy (by sed standards).

edit ASCII value of a character in bash

I am trying to update the ASCII value of each character of a string array in bash on which I want to add 2 to the existing character ASCII value.
Example:
declare -a x =("j" "a" "f" "a" "r")
I want to update the ASCII value incrementing the existing by 2 , such "j" will become "l"
I can't find anything dealing with the ASCII value beyond
print f '%d' "'$char"
Can anyone help me please?
And also when I try to copy an array into another it doesn't work
note that I am using
declare -a temp=("${x[#]}")
What is wrong with it?
You can turn an integer into a char by first using printf to turn it into an octal escape sequence (like \123) and then using that a printf format string to produce the character:
#!/bin/bash
char="j"
printf -v num %d "'$char"
(( num += 2 ))
printf -v newchar \\$(printf '%03o' "$num")
echo "$newchar"
This only works for ASCII.
It seems tr can help you here:
y=($(echo ${x[#]} | tr a-z c-zab))
tr maps characters from one set to another. In this example, from the set of a b c ... z, it maps to c d e ... z a b. So, you're effectively "rotating" the characters. This principle is used by the ROT13 cipher.

Shell program - determine average word length in a file

I am trying to write a shell program to determine the average word length in a file. I'm assuming I need to use wc and expr somehow. Guidance in the right direction would be great!
Assuming your file is ASCII and wc can indeed read it...
chars=$(cat inputfile | wc -c)
words=$(cat inputfile | wc -w)
Then a simple
avg_word_size=$(( ${chars} / ${words} ))
will calculate a (rounded) integer. But it will be "more wrong" than just the rounding error is: you'll have included all whitespace character in your avarage wordsize as well. And I assume you want to be more precise...
The following will give you some increased precision by calculating the rounded integer from a number that is multiplied by 100:
_100x_avg_word_size=$(( $((${chars} * 100)) / ${words} ))
Now we can use that for telling the world:
echo "Avarage word size is: ${avg_word_size}.${_100x_avg_word_size: -2:2}"
To further refine, we could assume that only 1 whitespace character is separating words:
chars=$(cat inputfile | wc -c)
words=$(cat inputfile | wc -w)
avg_word_size=$(( $(( ${chars} - $(( ${words} - 1 )) )) / ${words} ))
_100x_avg_word_size=$(( $((${chars} * 100)) / ${words} ))
echo "Avarage word size is: ${avg_word_size}.${_100x_avg_word_size: -2:2}"
Now it's your job to try and include the concept of 'lines' into your computations... :-)
Update: to show clearly (hopefully) the differenct between wc and this method; and fixed a "too-many-newlines" bug; Also added finer control of apostrophes in word endings .
If your want to consider a word as being a bash word, then using wc alone is fine.
However if you want to consider a word as word in a spoken/written language, then you can't use wc for the word parsing.
Eg.. wc considers the following to contain 1 word (of size average = 112.00),
wheras the script belows shows it to contain 19 words (of size average = 4.58)
"/home/axiom/zap_notes/apps/eng-hin-devnag-itrans/Platt's_Urdu_and_classical_Hindi_to_English_-_preface5.doc't"
Using Kurt's script, the following line is shown to contain 7 words (of size average = 8.14),
wheras the script presented below shows it to contain 7 words (of size average = 4.43) ...बे = 2 chars
"बे = {Platts} ... —be-ḵẖẉabī, s.f. Sleeplessness:"
So, if wc is your flavour, good, and if not, something like this may suit:
# Cater for special situation words: eg 's and 't
# Convert each group of anything which isn't a "character" (including '_') into a newline.
# Then, convert each CHARACTER which isn't a newline into a BYTE (not character!).
# This leaves one 'word' per line, each 'word' being made up of the same BYTE ('x').
#
# Without any options, wc prints newline, word, and byte counts (in that order),
# so we can capture all 3 values in a bash array
#
# Use `awk` as a floating point calculator (bash can only do integer arithmetic)
count=($(sed "s/\>'s\([[:punct:]]\|$\)/\1/g # ignore apostrophe-s ('s) word endings
s/'t\>/xt/g # consider words ending in apostrophe-t ('t) as base word + 2 characters
s/[_[:digit:][:blank:][:punct:][:cntrl:]]\+/\n/g
s/^\n*//; s/\n*$//; s/[^\n]/x/g" "$file" | wc))
echo "chars / word average:" \
$(awk -vnl=${count[0]} -vch=${count[2]} 'BEGIN{ printf( "%.2f\n", (ch-nl)/nl ) }')

Linux command or Bash syntax that calculate the next ASCII character

I have a Linux machine (Red Hat Linux 5.1), and I need to add the following task to my Bash script.
Which Linux command or Bash syntax will calculate the next ASCII character?
Remark – the command syntax can be also AWK/Perl, but this syntax must be in my Bash script.
Example:
input results
a --> the next is b
c --> the next is d
A --> the next is B
Use translate (tr):
echo "aA mM yY" | tr "a-yA-Y" "b-zB-Z"
It prints:
bB nN zZ
Perl's ++ operator also handles strings, to an extent:
perl -nle 'print ++$_'
The -l option with autochomp is necessary here, since a\n for example will otherwise return 1.
You could use chr() and ord() functions for Bash (see How do I convert an ASCII character to its decimal (or hexadecimal) value and back?):
# POSIX
# chr() - converts decimal value to its ASCII character representation
# ord() - converts ASCII character to its decimal value
perl -le "print chr(ord(<>) + 1)"
Interactive:
breqwas#buttonbox:~$ perl -le "print chr(ord(<>) + 1)"
M
N
Non-interactive:
breqwas#buttonbox:~$ echo a | perl -le "print chr(ord(<>) + 1)"
b
The character value:
c="a"
To convert the character to its ASCII value:
v=$(printf %d "'$c")
The value you want to add to this ASCII value:
add=1
To change its ASCII value by adding $add to it:
((v+=add))
To convert the result to char:
perl -X -e "printf('The character is %c\n', $v);"
I used -X to disable all warnings
You can combine all of these in one line and put the result in the vairable $r:
c="a"; add=1; r=$(perl -X -e "printf('%c', $(($add+$(printf %d "'$c"))));")
you can print the result:
echo "$r"
You can make a function to return the result:
achar ()
{
c="$1"; add=$2
printf "$(perl -X -e "printf('%c', $(($add+$(printf %d "'$c"))));")"
}
you can use the function:
x=$(achar "a" 1) // x = the character that follows a by 1
or you can make a loop:
array=( a k m o )
for l in "${array[#]}"
do
echo "$l" is followed by $(achar "$l" 1)
done

Resources