Decryption algorithm of a substitution cipher

Decryption algorithm of a substitution cipher - security

Let Z_10 = {0,1,2,3,4,5,6,7,8,9}
I have here a symmetric encryption scheme in which
a message M = M[1]M[2]M[3]M[4] is in Z_10, is a four-digit string,
a key π <- Perm(Z_10) is a random permutation on Z_10,
and the ciphertext C = C[1]C[2]C[3]C[4] = E_π(M) is in Z_10, is computed as follows:
Alg E_π(M)
For i=1,...,4 do
P[i] <- (M[i] + i) mod 10
C[i] <- π(P[i])
Return C
Is this the correct decryption algorithm?
Alg D_π(M)
For i=1,...,4 do
P[i] <- (C[i] - i) mod 10
M[i] <- π^(-1)(P[i])
Return M
I believe this is a subsitution cipher, but I am not sure. Is it a substitution cipher? How do we know that?

There seems to be a mistake in the encryption algorithm: either you need mod 11, or, more likely, Z_10 = {0,...9}. Otherwise, the operation
P[i] <- (M[i] + i) mod 10
translates both 0 and 10 to 1, making it irreversible.
Other than than, yes, it's a substitution cipher by definition, because every character of the input alphabet is always substituted by the same corresponding output character. You can even replace the encryption logic with a table.
You also need to reverse the order of the operations in the decryption part: first you inverse the permutation, then modular addition.

Related

how to decrypt a string that is encrypted using XOR

I have tried to encrypt a string using a XOR operator and took the output in alphabets. Now when I am trying to decrypt it I'm not getting the string again.
Encryption code:
string= "Onions"
keyword = "MELLON"
def xor(string, key):
st=[]
ke=[]
xored=[]
for i in string:
asc= (ord(i))
st.append(int(asc))
print(st)
for i in key:
asc= (ord(i))
ke.append(int(asc))
print(ke)
for i in range(len(string)):
s1=st[i]
k1=ke[i]
abc = s1^k1
le = ord('A')+abc
ch = chr(le)
if le> 90:
le= le-26
ch = chr(le)
print(s1,k1)
print('XOR =',abc)
print(ch)
xored.append(ch)
print(xored)
return("" . join(xored))
Need help!!

The algorithm does not perform a pure XOR, but maps values conditionally to another value, leading to a relation that is no longer bijective.
To illustrate this point. See what this script outputs:
keyword = "MELLON"
print(xor("Onions", keyword) == xor("OTGEHs", keyword))
It will output True!
So this means you have two words that are encrypted to the same string. This also means that if you need to do the reverse, there is no way to know which of these is the real original word.
If you want to decryption to be possible, make sure to only use operations that lead to a bijective mapping. For instance, if you only use a XOR, without adding or subtracting values, it will be OK.
Here is an approach where only lower and uppercase letters of the Latin alphabet are allowed (for both arguments):
def togglecrypt(string, key):
mapper = "gUMtuAqhaEDcsGjBbreSNJYdFTiOmHKwnXWxzClQLRVyvIkfPpoZ"
res = []
for i, ch in enumerate(string):
shift = mapper.index(key[i % len(key)]) % 26
i = mapper.index(ch)
if i < 26:
j = 26 + (i + shift) % 26
else:
j = (i - shift) % 26
res.append(mapper[j])
return("".join(res))
keyword = "MELLON"
encoded = togglecrypt("Onions", keyword)
print(encoded) # TdsDAn
print(togglecrypt(encoded, keyword)) # Onions

Recreating JS bitwise integer handling in Python3

I need to translate a hash function from JavaScript to Python.
The function is as follows:
function getIndex(string) {
var length = 27;
string = string.toLowerCase();
var hash = 0;
for (var i = 0; i < string.length; i++) {
hash = string.charCodeAt(i) + (hash << 6) + (hash << 16) - hash;
}
var index = Math.abs(hash % length);
return index;
}
console.log(getIndex(window.prompt("Enter a string to hash")));
This function is Objectively Correct™. It is perfection itself. I can't change it, I just have to recreate it. Whatever it outputs, my Python script must also output.
However - I'm having a couple of problems, and I think that it's all to do with the way that the two languages handle signed integers.
JS bitwise operators treat their operands as a sequence of 32 bits. Python, however, has no concept of bit limitation and just keeps going like an absolute madlad. I think that this is the one relevant difference between the two languages.
I can limit the length of hash in Python by masking it to 32 bits with hash & 0xFFFFFFFF.
I can also negate hash if it's above 0x7FFFFFFF with hash = hash ^ 0xFFFFFFFF (or hash = ~hash - they both seem to do the same thing). I believe that this simulates negative numbers.
I apply both of these restrictions to the hash with a function called t.
Here's my Python code so far:
def nickColor(string):
length = 27
def t(x):
x = x & 0xFFFFFFFF
if x > 0x7FFFFFFF:
x = x ^ 0xFFFFFFFF
return x
string = string.lower()
hash = t(0)
for letter in string:
hash = t(hash)
hash = t(t(ord(letter)) + t(hash << 6) + t(hash << 16) - t(hash))
index = hash % length
return index
It seems to work up until the point that a hash needs to become negative, at which point the two scripts diverge. This normally happens about 4 letters into the string.
I'm assuming that my problem lies in recreating JS negative numbers in Python. How can I say bye to this problem?

Here is a working translation:
def nickColor(string):
length = 27
def t(x):
x &= 0xFFFF_FFFF
if x > 0x7FFF_FFFF:
x -= 0x1_0000_0000
return float(x)
bytes = string.lower().encode('utf-16-le')
hash = 0.0
for i in range(0, len(bytes), 2):
char_code = bytes[i] + 256*bytes[i+1]
hash = char_code + t(int(hash) << 6) + t(int(hash) << 16) - hash
return int(hash % length if hash >= 0 else abs(hash % length - length))
The point is, only the shifts (<<) are calculated as 32-bit integer operations, their result is converted back to double before entering additions and subtractions. I'm not familiar with the rules for double-precision floating point representation in the two languages, but it's safe to assume that on all personal computing devices and web servers it is the same for both languages, namely double-precision IEEE 754. For very long strings (thousands of characters) the hash could lose some bits of precision, which of course affects the final result, but in the same way in JS as in Python (not what the author of the Objectively Correct™ function intended, but that's the way it is…). The last line corrects for the different definition of the % operator for negative operands in JavaScript and Python.
Furthermore (thanks to Mark Ransom for reminding me this), to fully emulate JavaScript, it is also necessary to consider its encoding, which is UTF-16, but with surrogate pairs handled as if they consisted of 2 characters. Encoding the string as utf-16-le you make sure that the first byte in each 16-bit “word” is the least significant one, plus, you don't get the BOM that you would get if you used utf-16 tout court (thank you Martijn Pieters).

isn't computing hash value by polynomial rolling for string O(n), where n is the string size?

Just read about string hashing and polynomial hash function to calculate it.
It looks like to me as if the time complexity of computing hash(string) is O(N) where 'N' is the string size
long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
hash_value = (hash_value + (c - 'a' + 1) * p_pow) % m;
p_pow = (p_pow * p) % m;
}
return hash_value;
}
Where the hash value of string 'S' can be computed as S[0] + S[1].P + S[2].P.P + . . . S[N - 1].P^(N - 1)
And if the computation is O(N) then isn't hashing N strings is O(N^2)?

Short answer: given you insert n strings with length n, the reasoning is correct, but the "scenario" that the length of the strings is determined by the number of strings to hash, is a bit strange.
And if the computation is O(N) then isn't hashing N strings is O(N2)?
Well given the length of the strings scales with the number of strings, then, for the given hashing algorithm, this will indeed result in O(n2). But typically there is no correlation between the length of a string, and the number of strings to hash.
If the strings have an average length of k, and there are n strings, then this is an O(n×k) algorithm. You are thus correct that the "size" of the objects can have an impact on the performance, given of course the hashing algorithm scales with the size of the object.

Yes, hashing N strings of length N is an O(N²) process. (Though in practice, you very very rarely meet such a coincidental case.)

what is this shift used in the simplified galil seiferas string match algorithm?

I'm self-studying problem 32-1 in CLRS; part c), presents the following algorithm for string matching:
REPETITION-MATCHER(P, T)
m = P.length
n = T.length
k = 1 + ρ'(P)
q = 0
s = 0
while s <= n-m
if T[s+q+1] == P[q+1]
q = q+1
if q==m
print "Pattern occurs with shift" s
if q==m or T[s+q+1] != P[q+1]
s = s+max(1, ceil(q/k))
q = 0
Here, ρ'(P), which is a function of P only, is defined as the largest integer r such that some prefix P[1..i] = y^r, e.g. a substring y repeated r times.
This algorithm appears to be 95 percent similar to the naive brute-force string matcher. However, the one part which greatly confuses me, and which seems to be the centerpiece of the entire algorithm, is the second to last line. Here, q is the number of characters of P matched so far. What is the rationale behind ceil(q/k)? It is completely opaque to me. It would have made more sense if that line were something like s = s + max(1+q, 1+i), where i is the length of the prefix that gives rise to ρ'(P).
CLRS claims that this algorithm is due to Galil and Seiferas, but in the reference they provide, I cannot find anything that resembles the algorithm provided above. It appears that reference contains, if anything, a much more advanced version of what is here. Can someone explain this ceil(q/k) value, and/or point me toward a reference that describes this particular algorithm, instead of the more well-known main Galil Seiferas paper?

Example #1:
Match aaaa in aaaaab, here ρ' = 4. Consider state:
aaaa ab
^
We have a mismatch here, and we want to move forward by one symbol, no more, because we will match full pattern again (last line sets q to zero). q = 4 and k = 5, so ceil(q/k) = 1, that's all right.
Example #2: Match abcd.abcd.abcd.X in abcd.abcd.abcd.abcd.X. Consider state:
abcd.abcd.abcd. abcd.X
^
We have a mismatch here, and we would like to move forward by five symbols. q = 15 and k = 4, so ceil(q/k) = 4. That's ok, it is almost 5, we still can match our pattern. Had we bigger ρ', say 10, we would have ceil(50/(10+1)) = 5.
Yeh, algorithms skips forward less symbols than KMP does, in case ρ'=10 its running time is O(10n+m) while KMP has O(n+m).

I figured out the proof of correctness.
let k = ρ'(P) + 1, and ρ'(P) is the largest possible repetition factor out of all the prefixes of P.
Suppose T[s+1..s+q] = P[1..q], and either q=m or T[s+q+1] != P[q+1]
Then, for 1 <= j <= floor(q/k) (except for the case q=m and m mod k = 0, in which the upper limit must be ceil(m/k)), we have
T[s+1..s+j] = P[1..j]
T[s+j+1..s+2j] = P[j+1..2j]
T[s+2j+1..s+3j] = P[2j+1..3j]
...
T[s+(k-1)j+1..s+kj] = P[(k-1)j+1..kj]
where not every quantity on every line is equal, since k cannot be a repetition factor, since the largest possible repetition factor out of any prefix of P is k-1.
Suppose we now make a comparison at shift s' = s+j, so that we will make the following comparisons
T[s+j+1..s+2j] with P[1..j]
T[s+2j+1..s+3j] with P[j+1..2j]
T[s+3j+1..s+4j] with P[2j+1..3j]
...
T[s+kj+1..s+(k+1)j] with P[(k-1)j+1..kj]
We claim that not every comparison can match, e.g. at least one of the above "with"s must be replaced with !=. We prove by contradiction. Suppose every "with" above is replaced by =. Then, comparing to the first set of comparisons we did, we would immediately have the following:
P[1..j] = P[j+1..2j]
P[j+1..2j] = [2j+1..3j]
...
P[(k-2)j+1..(k-1)j] = P[(k-1)j+1..kj]
However, this cannot be true, because k is not a repetition factor, hence a contradiction.
Hence, for any 1 <= j <= floor(q/k), testing a new shift s'=s+j is guaranteed to mismatch.
Hence, the smallest shift that is possible to result in a match is s + floor(q/k) + 1 >= ceil(q/k).
Note the code uses ceil(q/k) for simplicity, solely to deal with the case that q = m and m mod k = 0, in which case k * (floor(q/k)+1) would be greater than m, so only ceil(q/k) would do. However, when q mod k = 0 and q < m, then ceil(q/k) = floor(q/k), so is slightly suboptimal, since that shift is guaranteed to fail, and floor(q/k)+1 is the first shift that has any chance of matching.

How to find all combinations of a multiset in a string in linear time?

I am given a bag B (multiset) of characters with the size m and a string text S of size n. Is it possible to find all substrings that can be created by B (4!=24 combinations) in S in linear time O(n)?
Example:
S = abdcdbcdadcdcbbcadc (n=19)
B = {b, c, c, d} (m=4)
Result: {cdbc (Position 3), cdcb (Position 10)}
The fastest solution I found is to keep a counter for each character and compare it with the Bag in each step, thus the runtime is O(n*m). Algorithm can be shown if needed.

There is a way to do it in O(n), assuming we're only interested in substrings of length m (otherwise it's impossible, because for the bag that has all characters in the string, you'd have to return all substrings of s, which means a O(n^2) result that can't be computed in O(n)).
The algorithm is as follows:
Convert the bag to a histogram:
hist = []
for c in B do:
hist[c] = hist[c] + 1
Initialize a running histogram that we're going to modify (histrunsum is the total count of characters in histrun):
histrun = []
histrunsum = 0
We need two operations: add a character to the histogram and remove it. They operate as follows:
add(c):
if hist[c] > 0 and histrun[c] < hist[c] then:
histrun[c] = histrun[c] + 1
histrunsum = histrunsum + 1
remove(c):
if histrun[c] > 0 then:
histrun[c] = histrun[c] - 1
histrunsum = histrunsum + 1
Essentially, histrun captures the amount of characters that are present in B in current substring. If histrun is equal to hist, our substring has the same characters as B. histrun is equal to hist iff histrunsum is equal to length of B.
Now add first m characters to histrun; if histrunsum is equal to length of B; emit first substring; now, until we reach the end of string, remove the first character of the current substring and add the next character.
add, remove are O(1) since hist and histrun are arrays; checking if hist is equal to histrun is done by comparing histrunsum to length(B), so it's also O(1). Loop iteration count is O(n), the resulting running time is O(n).

Thanks for the answer. The add() and remove() methods have to be changed to make the algorithm work correctly.
add(c):
if hist[c] > 0 and histrun[c] < hist[c] then
histrunsum++
else
histrunsum--
histrun[c] = histrun[c] + 1
remove(c):
if histrun[c] > hist[c] then
histrunsum++
else
histrunsum--
histrun[c] = histrun[c] - 1
Explanation:
histrunsum can be seen as a score of how identical both multisets are.
add(c): when there are less occurrences of a char in the histrun multiset than in the hist multiset, the additional occurrence of that char has to be "rewarded" since the histrun multiset is getting closer to the hist multiset. If there are at least equal or more chars in the histrun set already, and additional char is negative.
remove(c): like add(c), where a removal of a char is weighted positively when it's number in the histrun multiset > hist multiset.
Sample Code (PHP):
function multisetSubstrings($sequence, $mset)
{
$multiSet = array();
$substringLength = 0;
foreach ($mset as $char)
{
$multiSet[$char]++;
$substringLength++;
}
$sum = 0;
$currentSet = array();
$result = array();
for ($i=0;$i<strlen($sequence);$i++)
{
if ($i>=$substringLength)
{
$c = $sequence[$i-$substringLength];
if ($currentSet[$c] > $multiSet[$c])
$sum++;
else
$sum--;
$currentSet[$c]--;
}
$c = $sequence[$i];
if ($currentSet[$c] < $multiSet[$c])
$sum++;
else
$sum--;
$currentSet[$c]++;
echo $sum."<br>";
if ($sum==$substringLength)
$result[] = $i+1-$substringLength;
}
return $result;
}

Use hashing. For each character in the multiset, assign a UNIQUE prime number. Compute the hash for any string by multiplying the prime number associated with a number, as many times as the frequency of that number.
Example : CATTA. Let C = 2, A=3, T = 5. Hash = 2*3*5*5*3 = 450
Hash the multiset ( treat it as a string ). Now go through the input string, and compute the hash of each substring of length k ( where k is the number of characters in the multiset ). Check if this hash matches the multiset hash. If yes, then it is one such occurence.
The hashes can be computed very easily in linear time as follows :
Let multiset = { A, A, B, C }, A=2, B=3, C=5.
Multiset hash = 2*2*3*5 = 60
Let text = CABBAACCA
(i) CABB = 5*2*3*3 = 90
(ii) Now, the next letter is A, and the letter discarded is the first one, C. So the new hash = ( 90/5 )*2 = 36
(iii) Now, A is discarded, and A is also added, so new hash = ( 36/2 ) * 2= 36
(iv) Now B is discarded, and C is added, so hash = ( 36/3 ) * 5 = 60 = multiset hash. Thus we have found one such required occurence - BAAC
This procedure will obviously take O( n ) time.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Decryption algorithm of a substitution cipher - security

Related

how to decrypt a string that is encrypted using XOR

Recreating JS bitwise integer handling in Python3

isn't computing hash value by polynomial rolling for string O(n), where n is the string size?

what is this shift used in the simplified galil seiferas string match algorithm?

How to find all combinations of a multiset in a string in linear time?

Categories

Resources