Confused about a string hash function - string

As I was looking through some string hash fucntions, I came across this one (code below). The function processes the string four bytes at a time, and interprets each of the four-byte chunks as a single long integer value. The integer values for the four-byte chunks are added together. In the end, the resulting sum is converted to the range 0 to M-1 using the modulus operator.
The following is the function code :
// Use folding on a string, summed 4 bytes at a time
long sfold(String s, int M) {
int intLength = s.length() / 4;
long sum = 0;
for (int j = 0; j < intLength; j++) {
char c[] = s.substring(j * 4, (j * 4) + 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
}
}
char c[] = s.substring(intLength * 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
}
return(Math.abs(sum) % M);
}
The confusion for me is this chunk of code, especially the first line.
char c[] = s.substring(intLength * 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
To my knowledge, the substring function used in this line takes as argument : begin index inclusive, The substring will start from the specified beginIndex and it will extend to the end of the string.
For the sake of example, let's assume we want to hash the following string : aaaabbbb. In this case intLength is going to be 2 (second line of function code). Replacing the value of intlength in s.substring(intLength * 4).toCharArray() will give us s.substring(8).toCharArray() which means string index is out of bounds given the string to be hashed has 8 characters.
I don't quite understand what's going on !

This hash function is awful, but to answer your question:
There is no IndexOutOfBoundsException, because "aaaabbbb".substring(8) is ""
The purpose of that last loop is to deal with leftovers when the string length isn't a multiple of 4. When s is "aaaabbbbcc", for example, then intLength == 2, and s.substring(8) is "cc".

Related

Number of palindromic subsequences of length 5

Given a String s, return the number of palindromic subsequences of length 5.
Test case 1:
input : "abcdba"
Output : 2
"abcba" and "abdba"
Test case 2:
input : "aabccba"
Output : 4
"abcba" , "abcba" , "abcba" , "abcba"
Max length of String: 700
My TLE Approach: O(2^n)
https://www.online-java.com/5YegWkAVad
Any inputs are highly appreciated...
Whenever 2 characters match, we only have to find how many palindromes of length 3 are possible in between these 2 characters.
For example:
a bcbc a
^ ^
|_ _ _ |
In the above example, you can find 2 palindromes of length 3 which is bcb and cbc. Hence, we can make palindromic sequence of length 5 as abcba or acbca. Hence, the answer is 2.
Computing how many palindromes of length 3 are possible for every substring can be time consuming if we don't cache the results when we do it the first time. So, cache those results and reuse them for queries generated by other 2 character matches. (a.k.a dynamic programming)
This way, the solution becomes quadratic O(n^2) time where n is length of the string.
Snippet:
private static long solve(String s){
long ans = 0;
int len = s.length();
long[][] dp = new long[len][len];
/* compute how many palindromes of length 3 are possible for every 2 characters match */
for(int i = len - 2;i >= 0; --i){
for(int j = i + 2; j < len; ++j){
dp[i][j] = dp[i][j-1] + (dp[i + 1][j] == dp[i + 1][j-1] ? 0 : dp[i + 1][j] - dp[i + 1][j - 1]);
if(s.charAt(i) == s.charAt(j)){
dp[i][j] += j - i - 1;
}
}
}
/* re-use the above data to calculate for palindromes of length 5*/
for(int i = 0; i < len; ++i){
for(int j = i + 4; j < len; ++j){
if(s.charAt(i) == s.charAt(j)){
ans += dp[i + 1][j - 1];
}
}
}
//for(int i=0;i<len;++i) System.out.println(Arrays.toString(dp[i]));
return ans;
}
Online Demo
Update:
dp[i][j] = dp[i][j-1] + (dp[i + 1][j] == dp[i + 1][j-1] ? 0 : dp[i + 1][j] - dp[i + 1][j - 1]);
The above line basically mean this,
For any substring, say bcbcb, with matching first and last b, the total 3 length palindromes can be addition of
The total count possible for bcbc.
The total count possible for cbcb.
The total count possible for bcbcb (which is (j - i - 1) in the if condition).
dp[i][j] For the current substring at hand.
dp[i][j-1] - Adding the previous substring counts of length 3. In this example, bcbc.
dp[i + 1][j], Adding the substring ending at current index excluding the first character. (Here, cbcb).
dp[i + 1][j] == dp[i + 1][j-1] ? 0 : dp[i + 1][j] - dp[i + 1][j - 1] This is to basically avoid duplicate counting for internal substrings and only adding them if there is a difference in the counts.
Observation:
The preceding method is too cool because it gives the impression of a number of palindrome substrings of length 5, whereas the preceding method is o(n^2) using 2 DP. Cant we reduced to o(n) by using 3 dp.  Yes we can becuase here n should should be length of string but next  2 parameters length lie in range 0 to 25.
Eg: dp[i][j][k]  j, k is between 0 and 25. i is between 0 and the length of the string.
We can't get an idea directly from observation, so go to the institution.
Intitution:
Of length 3
For palindromic substring of length 3 should be  number of palindrome substring of length 3 would count the occurence of left to the index multiply with right side of the index .
Eg: _ s[i]_  => number of palindromic substring of length 3 should be at index is occurence of each alphabet before multiply with after index. So that it becomes palindrome of length 3.
Time complexity : o(n)
Of length 5
Similary for the case of length if 5 => _ _ s[i] _ _ Here number of occurence of combination of 2 characters before index multiply with after index, So that it becomes palindrome of length 5.
Eg: x y s[i] y x ; x,y belongs to a to z. Here we need to store occurence of xy before index and after index.
Time complexity : o(26 * 26 * n)
Of length 7
Similary for the case of length if 7 => _ _ _ s[i] _ _ _ Here number of occurence of combination of 3 characters before index multiply with after index, So that it becomes palindrome of length 7.
Eg: x y z s[i] z y x ; x,y ,z belongs to a to z. Here we need to store occurence of xyz before index and after index.
Time complexity : o(26 * 26 * 26*n)
Code
int pre[10000][26][26], suf[10000][26][26], cnts[26] = {};
int countPalindromes(string s) {
int mod = 1e9 + 7, n = s.size(), ans = 0;
for (int i = 0; i < n; i++) {
int c = s[i] - '0';
if (i)
for (int j = 0; j < 26; j++)
for (int k = 0; k < 26; k++) {
pre[i][j][k] = pre[i - 1][j][k];
if (k == c) pre[i][j][k] += cnts[j];
}
cnts[c]++;
}
memset(cnts, 0, sizeof(cnts));
for (int i = n - 1; i >= 0; i--) {
int c = s[i] - '0';
if (i < n - 1)
for (int j = 0; j < 26; j++)
for (int k = 0; k < 26; k++) {
suf[i][j][k] = suf[i + 1][j][k];
if (k == c) suf[i][j][k] += cnts[j];
}
cnts[c]++;
}
for (int i = 2; i < n - 2; i++)
for (int j = 0; j < 26; j++)
for (int k = 0; k < 26; k++)
ans = (ans + 1LL * pre[i - 1][j][k] * suf[i + 1][j][k]) % mod;
return ans;
}
Reference
Here's a link! for related problem , there 0 to 9, Most voted blog for problem.

Number of substrings with count of each character as k

Source: https://www.geeksforgeeks.org/number-substrings-count-character-k/
Given a string and an integer k, find number of substrings in which all the different characters occurs exactly k times.
Looking for a solution in O(n), using two pointers/sliding window approach. I'm able to find only longest substrings satisfying this criteria but not substrings within that long substring.
For ex: ababbaba, k = 2
My solution finds abab, ababba etc, but not bb within ababba.
Can someone help me with the logic?
If you could edit your question to include your solution code, I'd be happy to help you with that.
For now I'm sharing my solution code (in java) which runs in O(n2). I've added enough comments to make the code self explanatory. Nonetheless the logic for the solution is as follows:
As you correctly pointed out, the problem can be solved using sliding window approach (with variable window size). The solution below considers all possible sub-strings, using nested for loops for setting start and end indices. For each sub-string, we check if every element in the sub-string occurs exactly k times.
To avoid recalculating the count for every sub-string, we maintain the count in a map, and keep putting new elements in the map as we increment the end index (slide the window). This ensures that our solution runs in O(n2) and not O(n3).
To further improve efficiency, we only check the count of individual elements if the sub-string's size matches our requirement. e.g. for n unique elements (keys in the map), the size of required sub-string would be n*k. If the sub-string's size doesn't match this value, there's no need to check how many times the individual characters occur.
import java.util.*;
/**
* Java program to count the number of perfect substrings in a given string. A
* substring is considered perfect if all the elements within the substring
* occur exactly k number of times.
*
* #author Codextor
*/
public class PerfectSubstring {
public static void main(String[] args) {
String s = "aabbcc";
int k = 2;
System.out.println(perfectSubstring(s, k));
s = "aabccc";
k = 2;
System.out.println(perfectSubstring(s, k));
}
/**
* Returns the number of perfect substrings in the given string for the
* specified value of k
*
* #param s The string to check for perfect substrings
* #param k The number of times every element should occur within the substring
* #return int The number of perfect substrings
*/
public static int perfectSubstring(String s, int k) {
int finalCount = 0;
/*
* Set the initial starting index for the subarray as 0, and increment it with
* every iteration, till the last index of the string is reached.
*/
for (int start = 0; start < s.length(); start++) {
/*
* Use a HashMap to store the count of every character in the subarray. We'll
* start with an empty map everytime we update the starting index
*/
Map<Character, Integer> frequencyMap = new HashMap<>();
/*
* Set the initial ending index for the subarray equal to the starting index and
* increment it with every iteration, till the last index of the string is
* reached.
*/
for (int end = start; end < s.length(); end++) {
/*
* Get the count of the character at end index and increase it by 1. If the
* character is not present in the map, use 0 as the default count
*/
char c = s.charAt(end);
int count = frequencyMap.getOrDefault(c, 0);
frequencyMap.put(c, count + 1);
/*
* Check if the length of the subarray equals the desired length. The desired
* length is the number of unique characters we've seen so far (size of the map)
* multilied by k (the number of times each character should occur). If the
* length is as per requiremets, check if each element occurs exactly k times
*/
if (frequencyMap.size() * k == (end - start + 1)) {
if (check(frequencyMap, k)) {
finalCount++;
}
}
}
}
return finalCount;
}
/**
* Returns true if every value in the map is equal to k
*
* #param map The map whose values are to be checked
* #param k The required value for keys in the map
* #return true if every value in the map is equal to k
*/
public static boolean check(Map<Character, Integer> map, int k) {
/*
* Iterate through all the values (frequency of each character), comparing them
* with k
*/
for (Integer i : map.values()) {
if (i != k) {
return false;
}
}
return true;
}
}
For a given value k and a string s of length n with alphabet size D, we can solve the problem in O(n*D).
We need to find sub-strings with each character having exactly k-occurences
Minimum size of such sub-string = k (when only one character is there)
Maximum size of such sub-string = k*D (when all characters are there)
So we will check for all sub-strings of sizes in range [k, k*D]
from collections import defaultdict
ALPHABET_SIZE = 26
def check(count, k):
for v in count.values():
if v != k and v != 0:
return False
return True
def countSubstrings(s, k):
total = 0
for d in range(1, ALPHABET_SIZE + 1):
size = d * k
count = defaultdict(int)
l = r = 0
while r < len(s):
count[s[r]] += 1
# if window size exceed `size`, then fix left pointer and count
if r - l + 1 > size:
count[s[l]] -= 1
l += 1
# if window size is adequate then check and update count
if r - l + 1 == size:
total += check(count, k)
r += 1
return total
def main():
string1 = "aabbcc"
k1 = 2
print(countSubstrings(string1, k1)) # output: 6
string2 = "bacabcc"
k2 = 2
print(countSubstrings(string2, k2)) # output: 2
main()
I can't give you a O(n) solution but I can give you a O(k*n) solution (better than O(n^2) mentioned in the geeksforgeeks page).
The idea is that max no. elements are 26. So, we don't have to check all the substrings, we just have to check substrings with length<=26*k (26*k length is the case when all elements will occur k times. If length is more than that then at least one element will have to occur at least k+1 times). Also, we need to check only those substrings whose lengths are a factor of k.
So, check all 26*k*l possible substrings! (assuming k<<l). Thus, solution is O(k*n) but with a bit high constant (26).
There are few observation which will help optimize the solution
Notice that, you don't need to check every possible size substrings, you just need to check substrings of size k, 2k, 3k so on up to ALPHABET_SIZE * k (remember Pigeonhole principle)
You can pre-calculate frequency of alphabets till certain index from any end and later you can use it to find the frequency of alphabets between any two indexes in O(26)
C++ Implementation of your problem in O(n * ALPHABET_SIZE^2)
I have added comments and diagrams to help you out in understanding code quickly
diagram 1
diagram 2
#include <bits/stdc++.h>
#define ll long long
#define ALPHABET_SIZE 26
using namespace std;
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
cout.tie(NULL);
int n, k;
string s;
cin >> n >> k;
cin >> s;
ll cnt = 0;
/**
* It will be storing frequency of each alphabets
**/
vector<int> f(ALPHABET_SIZE, 0);
/**
* It will store alphabets frequency till that index
**/
vector<vector<int>> v;
v.push_back(f);
/**
* Scan array from left to right and calculate the frequency of each alphabets till that index
* Now push that frequency array in v
* This loop will run for n times
**/
for (int i = 1; i <= n; i++)
{
f[s[i - 1] - 'a']++;
v.push_back(f);
}
/**
* This loop will run for k times
**/
for (int i = 0; i < k; i++)
{
/**
* start is the lower bound (left end from where window will start sliding)
**/
int start = i;
/**
* end is the upper bound (right end till where window will be sliding)
**/
int end = (n / k) * k + i;
if (end > n)
{
end -= k;
}
/**
* This loop will run for n/k times
**/
for (int j = start; j <= end; j += k)
{
/**
* This is a ALPHABET_SIZE * k size window
* It will be sliding between start and end (inclusive)
* This loop will run for at most ALPHABET_SIZE times
**/
for (int d = j + k; d <= min(ALPHABET_SIZE * k + j, end); d += k)
{
/**
* A flag to check weather substring is valid or not
**/
bool flag = true;
/**
* Check if frequencies at two different indexes differ only by zero or k (element wise)
* Note that frequencies at two different index can't be same
* This loop will run for ALPHABET_SIZE times
**/
for (int idx = 0; idx < ALPHABET_SIZE; idx++)
{
if (abs(v[j][idx] - v[d][idx]) != k && abs(v[j][idx] - v[d][idx]) != 0)
{
flag = false;
}
}
/**
* Increase the total count if flag is true
**/
if (flag)
{
cnt++;
}
}
}
}
/**
* Print the total count
**/
cout << cnt;
return 0;
}
if you want solution in simple way and not worried about time complexity. Here is the solution.
public class PerfecSubstring {
public static void main(String[] args) {
String st = "aabbcc";
int k = 2;
System.out.println(perfect(st, k));
}
public static int perfect(String st, int k) {
int count = 0;
for (int i = 0; i < st.length(); i++) {
for (int j = st.length(); j > i; j--) {
String sub = st.substring(i, j);
if (sub.length() > k && check(sub, k)) {
System.out.println(sub);
count++;
}
}
}
return count;
}
public static boolean check(String st, int k) {
Map<Character, Integer> map = new HashMap<>();
for (int i = 0; i < st.length(); i++) {
Character c = st.charAt(i);
map.put(c, map.getOrDefault(c, 0) + 1);
}
return map.values().iterator().next() == k && new HashSet<>(map.values()).size() == 1;
}
}
Here is an answer I did in C#, with O(n^2) complexity. I probably should have used a helper method to avoid having a large chunk of code, but it does the job. :)
namespace CodingChallenges
{
using System;
using System.Collections.Generic;
class Solution
{
// Returns the number of perfect substrings of repeating character value 'num'.
public static int PerfectSubstring(string str, int num)
{
int count = 0;
for (int startOfSliceIndex = 0; startOfSliceIndex < str.Length - 1; startOfSliceIndex++)
{
for (int endofSliceIndex = startOfSliceIndex + 1; endofSliceIndex < str.Length; endofSliceIndex++)
{
Dictionary<char, int> dict = new Dictionary<char, int>();
string slice = str.Substring(startOfSliceIndex, (endofSliceIndex - startOfSliceIndex) + 1);
for (int i = 0; i < slice.Length; i++)
{
if (dict.ContainsKey(slice[i]))
{
dict[slice[i]]++;
}
else
{
dict[slice[i]] = 1;
}
}
bool isPerfect = true;
foreach (var entry in dict)
{
if (entry.Value != num)
{
isPerfect = false;
}
}
if (isPerfect)
{
Console.WriteLine(slice);
count++;
}
}
}
if (count == 1)
{
Console.WriteLine(count + " perfect substring.");
}
else
{
Console.WriteLine(count + " perfect substrings.");
}
return count;
}
public static void Main(string[] args)
{
string test = "1102021222";
PerfectSubstring(test, 2);
}
}
}
This solution works in O(n*D)
I think it can be upgraded to be O(n) by replacing the hash_map(frozenset(head_sum_mod_k.items())) with a map implementation that updates its hash rather than recalculating it -
this can be done because only one entry of head_sum_mod_k is changed per iteration.
from copy import deepcopy
def countKPerfectSequences(string:str, k):
print(f'Processing \'{string}\', k={k}')
# init running sum
head_sum = {char: 0 for char in string}
tail_sum = deepcopy(head_sum)
tail_position = 0
# to match both 0 & k sequence lengths, test for mod k == 0
head_sum_mod_k = deepcopy(head_sum)
occurrence_positions = {frozenset(head_sum_mod_k.items()): [0]}
# iterate over string
perfect_counter = 0
for i, val in enumerate(string):
head_sum[val] += 1
head_sum_mod_k[val] = head_sum[val] % k
while head_sum[val] - tail_sum[val] > k:
# update tail to avoid longer than k sequnces
tail_sum[string[tail_position]] += 1
tail_position += 1
# print(f'str[{tail_position}..{i}]=\'{string[tail_position:i+1]}\', head_sum_mod_k={head_sum_mod_k} occurrence_positions={occurrence_positions}')
# get matching sequences between head and tail
indices = list(filter(lambda i: i >= tail_position, occurrence_positions.get(frozenset(head_sum_mod_k.items()), [])))
# for start in indices:
# print(f'{string[start:i+1]}')
perfect_counter += len(indices)
# add head
indices.append(i+1)
occurrence_positions[frozenset(head_sum_mod_k.items())] = indices
return perfect_counter

Find the index of a specific combination without generating all ncr combinations

I am trying to find the index of a specific combination without generating the actual list of all possible combinations. For ex: 2 number combinations from 1 to 5 produces, 1,2;1,3,1,4,1,5;2,3,2,4,2,5..so..on. Each combination has its own index starting with zero,if my guess is right. I want to find that index without generating the all possible combination for a given combination. I am writing in C# but my code generates all possible combinations on fly. This would be expensive if n and r are like 80 and 9 and i even can't enumerate the actual range. Is there any possible way to find the index without producing the actual combination for that particular index
public int GetIndex(T[] combination)
{
int index = (from i in Enumerable.Range(0, 9)
where AreEquivalentArray(GetCombination(i), combination)
select i).SingleOrDefault();
return index;
}
I found the answer to my own question in simple terms. It is very simple but seems to be effective in my situation.The choose method is brought from other site though which generates the combinations count for n items chosen r:
public long GetIndex(T[] combinations)
{
long sum = Choose(items.Count(),atATime);
for (int i = 0; i < combinations.Count(); i++)
{
sum = sum - Choose(items.ToList().IndexOf(items.Max())+1 - (items.ToList().IndexOf(combinations[i])+1), atATime - i);
}
return sum-1;
}
private long Choose(int n, int k)
{
long result = 0;
int delta;
int max;
if (n < 0 || k < 0)
{
throw new ArgumentOutOfRangeException("Invalid negative parameter in Choose()");
}
if (n < k)
{
result = 0;
}
else if (n == k)
{
result = 1;
}
else
{
if (k < n - k)
{
delta = n - k;
max = k;
}
else
{
delta = k;
max = n - k;
}
result = delta + 1;
for (int i = 2; i <= max; i++)
{
checked
{
result = (result * (delta + i)) / i;
}
}
}
return result;
}

Longest Common Prefix property

I was going through suffix array and its use to compute longest common prefix of two suffixes.
The source says:
"The lcp between two suffixes is the minimum of the lcp's of all pairs of adjacent suffixes between them on the array"
i.e. lcp(x,y)=min{ lcp(x,x+1),lcp(x+1,x+2),.....,lcp(y-1,y) }
where x and y are two index of the string from where the two suffix of the string starts.
I am not convinced with the statement as in example of string "abca".
lcp(1,4)=1 (considering 1 based indexing)
but if I apply the above equation then
lcp(1,4)=min{lcp(1,2),lcp(2,3),lcp(3,4)}
and I think lcp(1,2)=0.
so the answer must be 0 according to the equation.
Am i getting it wrong somewhere?
I think the index referred by the source is not the index of the string itself, but index of the sorted suffixes.
a
abca
bca
ca
Hence
lcp(1,2) = lcp(a, abca) = 1
lcp(1,4) = min(lcp(1,2), lcp(2,3), lcp(3,4)) = 0
You can't find LCP of any two suffixes by simply calculating the minimum of the lcp's of all pairs of adjacent suffixes between them on the array.
We can calculate the LCPs of any suffixes (i,j)
with the Help of Following :
LCP(suffix i,suffix j)=LCP[RMQ(i + 1; j)]
Also Note (i<j) as LCP (suff i,suff j) may not necessarly equal LCP (Suff j,suff i).
RMQ is Range Minimum Query .
Page 3 of this paper.
Details:
Step 1:
First Calculate LCP of Adjacents /consecutive Suffix Pairs .
n= Length of string.
suffixArray[] is Suffix array.
void calculateadjacentsuffixes(int n)
{
for (int i=0; i<n; ++i) Rank[suffixArray[i]] = i;
Height[0] = 0;
for (int i=0, h=0; i<n; ++i)
{
if (Rank[i] > 0)
{
int j = suffixArray[Rank[i]-1];
while (i + h < n && j + h < n && str[i+h] == str[j+h])
{
h++;
}
Height[Rank[i]] = h;
if (h > 0) h--;
}
}
}
Note: Height[i]=LCPs of (Suffix i-1 ,suffix i) ie. Height array contains LCP of adjacent suffix.
Step 2:
Calculate LCP of Any two suffixes i,j using RMQ concept.
RMQ pre-compute function:
void preprocesses(int N)
{
int i, j;
//initialize M for the intervals with length 1
for (i = 0; i < N; i++)
M[i][0] = i;
//compute values from smaller to bigger intervals
for (j = 1; 1 << j <= N; j++)
{
for (i = 0; i + (1 << j) - 1 < N; i++)
{
if (Height[M[i][j - 1]] < Height[M[i + (1 << (j - 1))][j - 1]])
{
M[i][j] = M[i][j - 1];
}
else
{
M[i][j] = M[i + (1 << (j - 1))][j - 1];
}
}
}
}
Step 3: Calculate LCP between any two Suffixes i,j
int LCP(int i,int j)
{
/*Make sure we send i<j always */
/* By doing this ,it resolve following
suppose ,we send LCP(5,4) then it converts it to LCP(4,5)
*/
if(i>j)
swap(i,j);
/*conformation over*/
if(i==j)
{
return (Length_of_str-suffixArray[i]);
}
else
{
return Height[RMQ(i+1,j)];
//LCP(suffix i,suffix j)=LCPadj[RMQ(i + 1; j)]
//LCPadj=LCP of adjacent suffix =Height.
}
}
Where RMQ function is:
int RMQ(int i,int j)
{
int k=log((double)(j-i+1))/log((double)2);
int vv= j-(1<<k)+1 ;
if(Height[M[i][k]]<=Height[ M[vv][ k] ])
return M[i][k];
else
return M[ vv ][ k];
}
Refer Topcoder tutorials for RMQ.
You can check the complete implementation in C++ at my blog.

Generate all compositions of an integer into k parts

I can't figure out how to generate all compositions (http://en.wikipedia.org/wiki/Composition_%28number_theory%29) of an integer N into K parts, but only doing it one at a time. That is, I need a function that given the previous composition generated, returns the next one in the sequence. The reason is that memory is limited for my application. This would be much easier if I could use Python and its generator functionality, but I'm stuck with C++.
This is similar to Next Composition of n into k parts - does anyone have a working algorithm?
Any assistance would be greatly appreciated.
Preliminary remarks
First start from the observation that [1,1,...,1,n-k+1] is the first composition (in lexicographic order) of n over k parts, and [n-k+1,1,1,...,1] is the last one.
Now consider an exemple: the composition [2,4,3,1,1], here n = 11 and k=5. Which is the next one in lexicographic order? Obviously the rightmost part to be incremented is 4, because [3,1,1] is the last composition of 5 over 3 parts.
4 is at the left of 3, the rightmost part different from 1.
So turn 4 into 5, and replace [3,1,1] by [1,1,2], the first composition of the remainder (3+1+1)-1 , giving [2,5,1,1,2]
Generation program (in C)
The following C program shows how to compute such compositions on demand in lexicographic order
#include <stdio.h>
#include <stdbool.h>
bool get_first_composition(int n, int k, int composition[k])
{
if (n < k) {
return false;
}
for (int i = 0; i < k - 1; i++) {
composition[i] = 1;
}
composition[k - 1] = n - k + 1;
return true;
}
bool get_next_composition(int n, int k, int composition[k])
{
if (composition[0] == n - k + 1) {
return false;
}
// there'a an i with composition[i] > 1, and it is not 0.
// find the last one
int last = k - 1;
while (composition[last] == 1) {
last--;
}
// turn a b ... y z 1 1 ... 1
// ^ last
// into a b ... (y+1) 1 1 1 ... (z-1)
// be careful, there may be no 1's at the end
int z = composition[last];
composition[last - 1] += 1;
composition[last] = 1;
composition[k - 1] = z - 1;
return true;
}
void display_composition(int k, int composition[k])
{
char *separator = "[";
for (int i = 0; i < k; i++) {
printf("%s%d", separator, composition[i]);
separator = ",";
}
printf("]\n");
}
void display_all_compositions(int n, int k)
{
int composition[k]; // VLA. Please don't use silly values for k
for (bool exists = get_first_composition(n, k, composition);
exists;
exists = get_next_composition(n, k, composition)) {
display_composition(k, composition);
}
}
int main()
{
display_all_compositions(5, 3);
}
Results
[1,1,3]
[1,2,2]
[1,3,1]
[2,1,2]
[2,2,1]
[3,1,1]
Weak compositions
A similar algorithm works for weak compositions (where 0 is allowed).
bool get_first_weak_composition(int n, int k, int composition[k])
{
if (n < k) {
return false;
}
for (int i = 0; i < k - 1; i++) {
composition[i] = 0;
}
composition[k - 1] = n;
return true;
}
bool get_next_weak_composition(int n, int k, int composition[k])
{
if (composition[0] == n) {
return false;
}
// there'a an i with composition[i] > 0, and it is not 0.
// find the last one
int last = k - 1;
while (composition[last] == 0) {
last--;
}
// turn a b ... y z 0 0 ... 0
// ^ last
// into a b ... (y+1) 0 0 0 ... (z-1)
// be careful, there may be no 0's at the end
int z = composition[last];
composition[last - 1] += 1;
composition[last] = 0;
composition[k - 1] = z - 1;
return true;
}
Results for n=5 k=3
[0,0,5]
[0,1,4]
[0,2,3]
[0,3,2]
[0,4,1]
[0,5,0]
[1,0,4]
[1,1,3]
[1,2,2]
[1,3,1]
[1,4,0]
[2,0,3]
[2,1,2]
[2,2,1]
[2,3,0]
[3,0,2]
[3,1,1]
[3,2,0]
[4,0,1]
[4,1,0]
[5,0,0]
Similar algorithms can be written for compositions of n into k parts greater than some fixed value.
You could try something like this:
start with the array [1,1,...,1,N-k+1] of (K-1) ones and 1 entry with the remainder. The next composition can be created by incrementing the (K-1)th element and decreasing the last element. Do this trick as long as the last element is bigger than the second to last.
When the last element becomes smaller, increment the (K-2)th element, set the (K-1)th element to the same value and set the last element to the remainder again. Repeat the process and apply the same principle for the other elements when necessary.
You end up with a constantly sorted array that avoids duplicate compositions

Resources