bash bc command converts uppercase letters to 9 - gnu

I pipe different values to bc.
If the value is a number, it works fine. If it's a string with lowercase letters, it returns 0 which makes sense to me, but if it's uppercase letters, bc converts it to 9 as the length of the input characters:
echo 1 | bc
1
echo aaa | bc
0
echo AAA | bc
999
echo FO | bc
99
echo null | bc
0
echo NULL | bc
9999
Why does bc have this behavior? What's the best way to work with unexpected string values?

According to https://www.gnu.org/software/bc/manual/html_mono/bc.html
(emphasis by me):
A simple expression is just a constant. bc converts constants into internal decimal numbers using the current input base, specified by the variable ibase. (There is an exception in functions.) The legal values of ibase are 2 through 16. Assigning a value outside this range to ibase will result in a value of 2 or 16. Input numbers may contain the characters 0-9 and A-F. (Note: They must be capitals. Lower case letters are variable names.) Single digit numbers always have the value of the digit regardless of the value of ibase. (i.e. A = 10.) For multi-digit numbers, bc changes all input digits greater or equal to ibase to the value of ibase-1. This makes the number FFF always be the largest 3 digit number of the input base.
So, assuming that your ibase is 10 your observation is explained.
It is unrelated to "unexected string values" or "the length of the input characters". bc does consider them (somewhat odd) attempts to provide numeric values and converts and uses them according to the quoted rule.

Related

Looking for a Regex which can find all the number combinaitions without having 3 zero's in between and mixed with delimeters

I would like to find all the number combinaitions without having 3 zero's in between.
There might be some delimiters (max 2 characters) in between the numbers.
I'm using python and I would like to perform this search with the regex.
Accepted numbers
This is number 1234 which should be accepted.
12-45
1 2 0 0 3 4 5
not accepted numbers:
1
12
123
1000
1000-2000
30000-31000
21 000-32 000-50 000
21 00 03 00 00
The regex with which I could come up is:
([\s\-]{0,2}\d(?!000)){4,}
My regex can find all the accepted numbers but it doesn't filter out all the excepted numbers.
See the results in regex
Actually this regex is used in python to remove the matched numbers from the text:
See python code
p.s. Delimiters are not only space but should be at least \s and dash.
p.s.s. The numbers might be in the middle of the string. So I think I cannot use ^ and $ in my regex.
You could assert not 3 zeroes in a row while matching optional delimiters in between.
\b(?![\d\s-]*?0(?:[\s-]*0){2})\d(?:[\s-]*\d){3,}\b
Explanation
\b A word boundary
(?! Negative lookahead, assert what is at the right is not
[\d\s-]*? Match any of a digit, whitespace char or - as least as possible
0(?:[\s-]*0){2} - ) Match a zere followed by 2 times a zero with optional delimiters in between
\d Match a digit
(?:[\s-]*\d){3,} Repeat 3 or more times matching a digit with optional delimiters in between
\b A word boundary
Regex demo

What does the '(( 10#$H > 5 ))' mean in bash script?

I am confused about the following code snippet:
#!/bin/bash
H=$(date +%H);
if (( 10#$H > 5 ))
then
# do something
else
# do something else
fi
What does the (( 10#$H > 5 )) mean in above code snippet?
The 10#$H means to expand the number using base 10.
This is probably done to remove any leading zeros from the date due to the fact that bash will interpret the number in base 8 (octal).
Example:
$ echo "$(( 08 < 5 ))"
bash: 08: value too great for base (error token is "08")
ARITHMETIC EVALUATION: Constants with a leading 0 are interpreted as octal numbers. A leading 0x or 0X denotes hexadecimal. Otherwise, numbers take the form [base#]n, where base is a decimal number between 2 and 64 representing
the arithmetic base and n is a number in that base. If base# is omitted, then base 10 is used. The digits greater than 9 are represented by the lowercase letters, the uppercase letters, #, and _, in that order. If base is less than or equal to 36, lowercase and uppercase letters may be used interchangeably to represent numbers between 10 and 35.
source: man bash

Bash, how can I apply one arithmetic expression to every line?

I have two scripts, A and B.
I want to execute them and read respectively two values. V and VALS.
V is just a floating point number, let's say 0.5
VALS has the following format:
1 10
2 20
3 60
4 45
and so on.
What I'm trying to do is to get a new variable where the second column of VALS (10, 20, ...) is divided by V.
As I understand this can be implemented with a mix of xargs and cut but I'm not really familiar with these tools.
#!/bin/bash
V=`./A`
VALS=`./B`
RESULT=#magic happens
The final result with the previous data should be:
1 20
2 40
3 120
4 90
Bash's builtin arithmetic expansion only works for integers. You can use awk for data extraction and floating point numbers.
V=`./A`
# No VALS needed
RESULT=($(./B | awk "{print \$2 / $V"}))
Note the escaped dollar sign in \$2.

Lua pattern to match similar consecutive digits

I have a string that contains only digits from 2 to 9, like '223488875662264442', it is guaranteed that it does not contain more than 3 adjacent similar digits, for example, it cannot contain '7777', but it can contain '27747772'.
I want to make a pattern that matches all the similar consecutive numbers, example:
> str = '44788895532244474568884511123331566';
> for n in string.gmatch(str,pat) do -- pat is the pattern
>> print(n);
> end
44
7
888
9
55
...
tried with patterns like '(%d)%1*' with no success.
I cannot use regex,I need to do it with Lua patterns.

Can brace expansions and grep be used together in linux

I am trying to count how many times each ASCII printable character is present in a file. I thought a good way to do this might be to list the printable characters in a { } enclosed list, and use grep on each item within the braces. An example code is below. I would like to expand the char list to include all 64 ASCII printable characters. I cannot figure out how to get the code to read and use each characters between the braces separately. I would really like to output a file in the format "character\tcharacterCount". Any suggestions?
char={" ",!,\",#,"\$"}
cat PHRED_scores.txt | grep -e "$char" | wc -m
Below command will display the special characters present in the file and their total count.
grep -oP '[ !\\$#]' file | sort | uniq -c
Explanation:
o - print the match only.
P - grep with Perl-regexp option.
[ !\\&#] - Special characters are included in the character class. You have to escape \ so that it means a literal \
sort Output would be sorted.
uniq -c All the duplicates are counted and then it will be combined into one.
There is a way to avoid listing all 64 characters individually to match the ASCII character set. Bash provides character classes and allows ranges to represent numerous characters without listing each individual character. Some examples are:
[a-z] match all lowercase characters
[A-Z] match all uppercase characters
[0-9] match all digits
[[:print:]] all printable characters
So with very little effort, you can match all upper and lowercase characters and all digits with:
[a-zA-Z0-9]
You can then add the additional printable characters, but you must take care to escape or avoid those with special meaning to regular expressions themselves. An example (not intended to be all-inclusive is)
[a-zA-Z0-0:;~!##$%&*()_-+=]
or you can use the predefined class:
[:print:]
You can add as required. To solve your problem, as Avinash provided sort | uniq -c can provide the individual count. Adding an additional call to wc -m will provide the total. With that, it is not difficult to develop a script that will take the filename as an argument and give the total and individual character counts you require. Something similar to the following will work:
#!/bin/bash
echo -n "Total character count: "
grep $cclass "$1" | wc -m # obtain the total character count
echo -e " Individual frequency:"
grep -o [[:print:]] "$1" | sort | uniq -c # obtain the individual frequency
exit 0
Sample output:
Total character count: 455
Individual frequency:
6 =
10 _
7 -
4 ,
12 ;
1 /
4 .
6 "
9 (
9 )
2 {
2 }
2 *
5 \
2 #
4 %
4 0
3 a
17 b
11 c
1 C
24 d
4 D
28 e
1 E
...

Resources