Verilog modulus operator for wrapping around a range - verilog

My background is in software and I'm new to (System)Verilog so when tasked with implementing a caesar shifter (shift each letter in a string by N letters, wrapping around if necessary e.g. ABCXYZ shifted by 3 becomes DEFABC), I wrote the following, hoping to be able to reduce code duplication, like I would in software:
/* every variable except 'direction' has the type 'byte' */
always_comb
begin
shifted_char = fresh_char; /* don't touch bytes that aren't letters */
is_lower_case = "z" >= fresh_char && fresh_char >= "a";
is_upper_case = "Z" >= fresh_char && fresh_char >= "A";
if (is_lower_case || is_upper_case)
begin
unique if (is_lower_case)
alphabet_start = "a";
else if (is_upper_case)
alphabet_start = "A";
alphabet_position = fresh_char - alphabet_start;
if (direction == "f") /* direction is a module parameter: f for forwards results in a shifter, any other value results in an 'unshifter' */
new_alphabet_position = (26 + (alphabet_position + shift_by)) % 26;
else
new_alphabet_position = (26 + (alphabet_position - shift_by)) % 26;
shifted_char = new_alphabet_position + alphabet_start;
end
end
My question is (assuming it's a forward shifter): regarding the "% 26" part, can I expect the synthesizer to deduce that the range of possible values it's going to get at that point is [26, 26+25+25] ([26, 76]) and so there's only 2 cases the logic needs to distinguish between (>26 and >52), rather than [whatever is the smart call when having handle all possible 256 different inputs - (would it be to consider the cases >26, >52, >78 etc...? Or is there a better way? I digress...)]?
I could always do the following:
new_alphabet_position = alphabet_position + shift_by;
if (new_alphabet_position > 25)
new_alpahbet_position -= 26;
/* Or, for the reverse shifter: */
new_alphabet_position = alphabet_position - shift_by;
if (new_alphabet_position < 0)
new_alpahbet_position += 26;
...but was curious and wanted to ask that, as well as a related one (that I expect more people will be able to answer): Can it be used to make a normal non-power-of-2 counter (e.g.
count <= (count + 1) % 6;
)? Going by hgleamon1's response to the following thread, it seems as though (at least one) VHDL synth tool might interpret it as intended: https://forums.xilinx.com/t5/Synthesis/Modulus-synthesizable-or-non-synthesizable/td-p/747493

Unless there is a specialized macro cell, non powers of 2 modulus will take a large number of gates and have relatively long propagation delays especially if done as pure combiantional logic.
Be aware depending on your synthesizer the variables 'alphabet_start', 'alphabet_position', and 'new_alphabet_position' my be inferred latches. The way you used them is as intermediated logic, so if you don't references them outside this always block and your synthesizer has decent optimization, then it will not be a latch. To guarantee they will not be latches, they must be given default values outside the if statement.
You state that all variables except 'direction' are type 'byte', this means 'shift_by' may have a value greater than 25 or less than -25 ('byte' is a signed value by default). By using a signed values and adding three value (26 + (alphabet_position + shift_by)) before using the modulus, there is a decent changes that the mod26 will be evaluated on a 10-bit signed value. That will use more logic than if used on an 8-bit value. There is a change your synthesizer may do some optimization, but it might not be great.
If you can guarantee 'shift_by' is less than 26 and greater than -26 ( greater or equal to 0 if unsigned), then you don't need 'alphabet_position' or 'new_alphabet_position'. Simply add or subtract the 'shift_by' and calculate if out of range. For the range check, fist check if 8'(shifted_char-26) >= alphabet_start. The reason for this is to make sure we are comparing positive numbers. "z"+25 is 147 which is negative for a signed 8-bit value. The 8'() with cast it as an 8-bit unsigned value to trim any non-zero intermediate 9th+ bit(s). If an adjustment is not needed then check if hifted_char < alphabet_start as now the possibility of overflowing to a negative number has been already handled.
If you cannot guarantee 'shift_by' is within range, then you have no choose by to mod it. Luckily this is an 8-bit signed value which is better than your original worse case with a 10-bit signed value. This is not ideal but the best I can offer. It is more optimal to have the driver of 'shift_by' assign a legal value then adding more logic to mod it.
Since you are using SystemVerilog, you may want to consider using fresh_char inside { ["A":"Z"] } which is functionally the same as "Z" >= fresh_char && fresh_char >= "A". The inside is keyword is intended to be synthesizable, but I don't know if it is commonly supported.
Consider the following code. It may not be the most optimized, but it is more optimized than your original code:
always_comb
begin
shift_by_mod26 = shift_by % 26; // %26 is not need if guaranteed asb(value) < 26
alphabet_start = (fresh_char inside { ["A":"Z"] }) ? "A" : "a";
if ( fresh_char inside { ["A":"Z"], ["a":"z"] } )
begin
if (direction == "f")
shifted_char = fresh_char + shift_by_mod26;
else
shifted_char = fresh_char - shift_by_mod26;
// subtract 26 first in case shifted_char is >127
// bring back to a positive if signed (>127 unsigned is negative signed)
if (8'(shifted_char-26) >= alphabet_start)
shifted_char -= 26;
else if (shifted_char < alphabet_start)
shifted_char += 26;
end
else
begin
/* don't touch bytes that aren't letters */
shifted_char = fresh_char;
end
end
Note: if 'direction' is not a type 'byte', then it must be at least a 7bits(unsigned) wide or greater (sign agnostic) to every match "f"
Cross post answer for a cross post question

Related

Counter for two binary strings C++

I am trying to count two binary numbers from string. The maximum number of counting digits have to be 253. Short numbers works, but when I add there some longer numbers, the output is wrong. The example of bad result is "10100101010000111111" with "000011010110000101100010010011101010001101011100000000111000000000001000100101101111101000111001000101011010010111000110".
#include <iostream>
#include <stdlib.h>
using namespace std;
bool isBinary(string b1,string b2);
int main()
{
string b1,b2;
long binary1,binary2;
int i = 0, remainder = 0, sum[254];
cout<<"Get two binary numbers:"<<endl;
cin>>b1>>b2;
binary1=atol(b1.c_str());
binary2=atol(b2.c_str());
if(isBinary(b1,b2)==true){
while (binary1 != 0 || binary2 != 0){
sum[i++] =(binary1 % 10 + binary2 % 10 + remainder) % 2;
remainder =(binary1 % 10 + binary2 % 10 + remainder) / 2;
binary1 = binary1 / 10;
binary2 = binary2 / 10;
}
if (remainder != 0){
sum[i++] = remainder;
}
--i;
cout<<"Result: ";
while (i >= 0){
cout<<sum[i--];
}
cout<<endl;
}else cout<<"Wrong input"<<endl;
return 0;
}
bool isBinary(string b1,string b2){
bool rozhodnuti1,rozhodnuti2;
for (int i = 0; i < b1.length();i++) {
if (b1[i]!='0' && b1[i]!='1') {
rozhodnuti1=false;
break;
}else rozhodnuti1=true;
}
for (int k = 0; k < b2.length();k++) {
if (b2[k]!='0' && b2[k]!='1') {
rozhodnuti2=false;
break;
}else rozhodnuti2=true;
}
if(rozhodnuti1==false || rozhodnuti2==false){ return false;}
else{ return true;}
}
One of the problems might be here: sum[i++]
This expression, as it is, first returns the value of i and then increases it by one.
Did you do it on purporse?
Change it to ++i.
It'd help if you could also post the "bad" output, so that we can try to move backward through the code starting from it.
EDIT 2015-11-7_17:10
Just to be sure everything was correct, I've added a cout to check what binary1 and binary2 contain after you assing them the result of the atol function: they contain the integer numbers 547284487 and 18333230, which obviously dont represent the correct binary-to-integer transposition of the two 01 strings you presented in your post.
Probably they somehow exceed the capacity of atol.
Also, the result of your "math" operations bring to an even stranger result, which is 6011111101, which obviously doesnt make any sense.
What do you mean, exactly, when you say you want to count these two numbers? Maybe you want to make a sum? I guess that's it.
But then, again, what you got there is two signed integer numbers and not two binaries, which means those %10 and %2 operations are (probably) misused.
EDIT 2015-11-07_17:20
I've tried to use your program with small binary strings and it actually works; with small binary strings.
It's a fact(?), at this point, that atol cant handle numerical strings that long.
My suggestion: use char arrays instead of strings and replace 0 and 1 characters with numerical values (if (bin1[i]){bin1[i]=1;}else{bin1[i]=0}) with which you'll be able to perform all the math operations you want (you've already written a working sum function, after all).
Once done with the math, you can just convert the char array back to actual characters for 0 and 1 and cout it on the screen.
EDIT 2015-11-07_17:30
Tested atol on my own: it correctly converts only strings that are up to 10 characters long.
Anything beyond the 10th character makes the function go crazy.

Asymmetric Levenshtein distance

Given two bit strings, x and y, with x longer than y, I'd like to compute a kind of asymmetric variant of the Levensthein distance between them. Starting with x, I'd like to know the minimum number of deletions and substitutions it takes to turn x into y.
Can I just use the usual Levensthein distance for this, or do I need I need to modify the algorithm somehow? In other words, with the usual set of edits of deletion, substitution, and addition, is it ever beneficial to delete more than the difference in lengths between the two strings and then add some bits back? I suspect the answer is no, but I'm not sure. If I'm wrong, and I do need to modify the definition of Levenshtein distance to disallow deletions, how do I do so?
Finally, I would expect intuitively that I'd get the same distance if I started with y (the shorter string) and only allowed additions and substitutions. Is this right? I've got a sense for what these answers are, I just can't prove them.
If i understand you correctly, I think the answer is yes, the Levenshtein edit distance could be different than an algorithm that only allows deletions and substitutions to the larger string. Because of this, you would need to modify, or create a different algorithm to get your limited version.
Consider the two strings "ABCD" and "ACDEF". The Levenshtein distance is 3 (ABCD->ACD->ACDE->ACDEF). If we start with the longer string, and limit ourselves to deletions and substitutions we must use 4 edits (1 deletion and 3 substitutions. The reason is that strings where deletions are applied to the smaller string to efficiently get to the larger string can't be achieved when starting with the longer string, because it does not have the complimentary insertion operation (since you're disallowing that).
Your last paragraph is true. If the path from shorter to longer uses only insertions and substitutions, then any allowed path can simply be reversed from the longer to the shorter. Substitutions are the same regardless of direction, but the inserts when going from small to large become deletions when reversed.
I haven't tested this thoroughly, but this modification shows the direction I would take, and appears to work with the values I've tested with it. It's written in c#, and follows the psuedo code in the wikipedia entry for Levenshtein distance. There are obvious optimizations that can be made, but I refrained from doing that so it was more obvious what changes I've made from the standard algorithm. An important observation is that (using your constraints) if the strings are the same length, then substitution is the only operation allowed.
static int LevenshteinDistance(string s, string t) {
int i, j;
int m = s.Length;
int n = t.Length;
// for all i and j, d[i,j] will hold the Levenshtein distance between
// the first i characters of s and the first j characters of t;
// note that d has (m+1)*(n+1) values
var d = new int[m + 1, n + 1];
// set each element to zero
// c# creates array already initialized to zero
// source prefixes can be transformed into empty string by
// dropping all characters
for (i = 0; i <= m; i++) d[i, 0] = i;
// target prefixes can be reached from empty source prefix
// by inserting every character
for (j = 0; j <= n; j++) d[0, j] = j;
for (j = 1; j <= n; j++) {
for (i = 1; i <= m; i++) {
if (s[i - 1] == t[j - 1])
d[i, j] = d[i - 1, j - 1]; // no operation required
else {
int del = d[i - 1, j] + 1; // a deletion
int ins = d[i, j - 1] + 1; // an insertion
int sub = d[i - 1, j - 1] + 1; // a substitution
// the next two lines are the modification I've made
//int insDel = (i < j) ? ins : del;
//d[i, j] = (i == j) ? sub : Math.Min(insDel, sub);
// the following 8 lines are a clearer version of the above 2 lines
if (i == j) {
d[i, j] = sub;
} else {
int insDel;
if (i < j) insDel = ins; else insDel = del;
// assign the smaller of insDel or sub
d[i, j] = Math.Min(insDel, sub);
}
}
}
}
return d[m, n];
}

Modifying strings in emu8086 assembly

I currently working on an intro assignment for a computer architecture course and i was asked to accomplish some string modifications. My question is not how to do it, but what should i be researching to be able to do it? Is there any functions that will make this easier, for example .reverse() is java.
What i need to accomplish is getting string input from the user, reverse the letters (while reversing numbers keep them where they are), add spaces whenever there is a vowel, and alternate the caps.
Example:
Input: AbC_DeF12
Output: f E d _ c B a 2 1
This is code i ripped from the lecture: http://pastebin.com/2E1UtGdD I put it in pastebin to avoid clutter. Anything used in this is fair game. (this code does have limitiations though, it only support ~9 characters and the looping doesn't work at the end of strings)
I would look at it like this.
Generate a function on paper of how you want to achieve this. This is notes and only a starting point.
Loop from 0 to string length.
if(byte >= 'A' || byte <= 'Z') then byte -= 'A' - 'a'; /* convert to lower case */
if(byte >= 'a' || byte <= 'z') then byte += 'A' - 'a'; /* convert to upper case */
/* Switch the letters only. */
a = 0; b = string length
Loop i from a to b. if((input >= 'A' && input <='Z') || (input >= 'a' && input <='z')) p = i
Loop j from b to a. if((input >= 'A' && input <='Z') || (input >= 'a' && input <='z')) q = j
c = input[i]; input[i] = input[j]; input[j] = c;
/* Regenerate the string and add spaces. */
loop i, 0 to string length
if(input[i] == 'A' 'a' 'E' 'e' ...) string2[j] = ' '; j++; string2[j] = input[i]; j++;
i++
After that if you don't know 8086 I would look at examples online of how to do each individual part. The most important bit is generating the code in your head and on paper on how it is going to work.

Integer to String goes wrong in Synthesis (Width Mismatch)

I am trying to convert a integer to string (using integer'image(val)) and either pad or limit it to a specific length. I have made this function which does the job just fine when I use a report statement and simulate.
function integer2string_pad(val: integer; stringSize: integer) return string is
variable imageString: string(1 to integer'image(val)'length);
variable returnString: string(1 to stringSize);
begin
imageString := integer'image(val);
-- Are we smaller than the desired size?
if integer'image(val)'length < stringSize then
-- Pad the string if we are
returnString := integer'image(val) & (1 to stringSize-integer'image(val)'length => ' ');
-- Are we to big for the desired size
elsif integer'image(val)'length > stringSize then
-- Only use the top most string bits and append a "." to the end signifing that there is more
returnString := imageString(1 to stringSize-1) & ".";
-- Otherwise we are just the right size
else
returnString := integer'image(val);
end if;
return returnString;
end function;
Here is some sample input, output of that function (underscore = space because SO inline code truncates extra space):
integer2string_pad(12, 6) : 12____
integer2string_pad(123456, 6) : 123456
integer2string_pad(1234567890, 6) : 12345.
integer2string_pad(0, 6) : 0_____
integer2string_pad(-123, 6) : -123__
integer2string_pad(-1, 6) : -1____
integer2string_pad(-123456, 6) : -1234.
But when I synthesize, I get width mismatch errors on all 4 lines where I assign values to pongScoreLeft or pongScoreRight. It also says they have a constant value of 0 and get trimmed out.
Width mismatch. <pongScoreLeft> has a width of 48 bits but assigned
expression is 6-bit wide.
Width mismatch. <pongScoreRight> has a width
of 48 bits but assigned expression is 6-bit wide.
Width mismatch. <pongScoreLeft> has a width of 48 bits but assigned expression is 6-bit wide.
Width mismatch. <pongScoreRight> has a width of 48 bits but assigned expression is 6-bit wide.
VHDL that produces those width mismatch errors:
type type_score is
record
left : integer range 0 to 255;
right : integer range 0 to 255;
end record;
constant init_type_score: type_score := (left => 0, right => 0);
signal pongScore: type_score := init_type_score;
signal pongScoreLeft: string(1 to 6) := (others => NUL);
signal pongScoreRight: string(1 to 6) := (others => NUL);
...
scoreToString: process(clk)
begin
if rising_edge(clk) then
if reset = '1' then
pongScoreLeft <= (others => NUL);
pongScoreRight <= (others => NUL);
else
pongScoreLeft <= integer2string_pad(pongScore.left, 6);
pongScoreRight <= integer2string_pad(pongScore.right, 6);
--report "|" & integer2string_pad(pongScore.left, 6) & "|";
end if;
end if;
end process;
What is wrong with my integer2string_pad function? What goes wrong in synthesis?
I would not expect 'image or 'value to be supported for synthesis - other than in asserts that run at elaboration time. They would involve a lot of processing.
Whenever I have converted integers to ASCII I have processed a character at a time, using character'val and character'pos, which are synthesisable, because they involve no processing; they just convert a character to/from its underlying binary representation.
EDIT:
Think how you would implement 'image! It involves multiple divisions by 10 : that's a LOT of hardware if you unroll it into a single delta cycle (as required by the semantics of an unclocked function call)
Processing a digit per (several) clock cycle(s) you can reduce that to a single division, or successive subtraction, or excess-6 addition, or however you want according to your hardware resources and time budget.
It really doesn't make sense for the synthesis tool to make these decisions on your behalf. So - while I concede it's theoretically possible, I would be surprised to see a synth tool that did it correctly. (OTOH it's such an unlikely scenario I'd not be surprised to see bugs in synth tool's error reporting should you try it)

Generating a fake ISBN from book title? (Or: How to hash a string into a 6-digit numeric ID)

Short version: How can I turn an arbitrary string into a 6-digit number with minimal collisions?
Long version:
I'm working with a small library that has a bunch of books with no ISBNs. These are usually older, out-of-print titles from tiny publishers that never got an ISBN to begin with, and I'd like to generate fake ISBNs for them to help with barcode scanning and loans.
Technically, real ISBNs are controlled by commercial entities, but it is possible to use the format to assign numbers that belong to no real publisher (and so shouldn't cause any collisions).
The format is such that:
978-0-01-######-?
Gives you 6 digits to work with, from 000000 to 999999, with the ? at the end being a checksum.
Would it be possible to turn an arbitrary book title into a 6-digit number in this scheme with minimal chance of collisions?
After using code snippets for making a fixed-length hash and calculating the ISBN-13 checksum, I managed to create really ugly C# code that seems to work. It'll take an arbitrary string and convert it into a valid (but fake) ISBN-13:
public int GetStableHash(string s)
{
uint hash = 0;
// if you care this can be done much faster with unsafe
// using fixed char* reinterpreted as a byte*
foreach (byte b in System.Text.Encoding.Unicode.GetBytes(s))
{
hash += b;
hash += (hash << 10);
hash ^= (hash >> 6);
}
// final avalanche
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
// helpfully we only want positive integer < MUST_BE_LESS_THAN
// so simple truncate cast is ok if not perfect
return (int)(hash % MUST_BE_LESS_THAN);
}
public int CalculateChecksumDigit(ulong n)
{
string sTemp = n.ToString();
int iSum = 0;
int iDigit = 0;
// Calculate the checksum digit here.
for (int i = sTemp.Length; i >= 1; i--)
{
iDigit = Convert.ToInt32(sTemp.Substring(i - 1, 1));
// This appears to be backwards but the
// EAN-13 checksum must be calculated
// this way to be compatible with UPC-A.
if (i % 2 == 0)
{ // odd
iSum += iDigit * 3;
}
else
{ // even
iSum += iDigit * 1;
}
}
return (10 - (iSum % 10)) % 10;
}
private void generateISBN()
{
string titlehash = GetStableHash(BookTitle.Text).ToString("D6");
string fakeisbn = "978001" + titlehash;
string check = CalculateChecksumDigit(Convert.ToUInt64(fakeisbn)).ToString();
SixDigitID.Text = fakeisbn + check;
}
The 6 digits allow for about 10M possible values, which should be enough for most internal uses.
I would have used a sequence instead in this case, because a 6 digit checksum has relatively high chances of collisions.
So you can insert all strings to a hash, and use the index numbers as the ISBN, either after sorting or without it.
This should make collisions almost impossible, but it requires keeping a number of "allocated" ISBNs to avoid collisions in the future, and keeping the list of titles that are already in store, but it's information that you would most probably want to keep anyway.
Another option is to break the ISBN standard and use hexadecimal/uuencoded barcodes, that may increase the possible range to a point where it may work with a cryptographic hash truncated to fit.
I would suggest that since you are handling old book titles, which may have several editions capitalized and punctuated differently, I would strip punctuation, duplicated whitespaces and convert everything to lowercase before the comparison to minimize the chance of a technical duplicate even though the string is different (Unless you want different editions to have different ISBNs, in that case, you can ignore this paragraph).

Resources