sublists of a list with possitive sum - array-algorithms

I'm trying to find the sublist of a list (with at least one positive integer) with the following 2 properties
1. the sum of it's elements is positive
2. it has the maximum length of all the other sub lists with positive sum
I'm only interesting in the length of that list. Kadane's algorithm finds the sublist with maximum sum in O(n) time. Is there an algorithm that can do the same here in O(n)? I've found a solution but it really computes all the sublists and is of course very very slow....
thank you for your time

Calculate the sum of all the numbers say it is n.
If n > 0 then return the full list as the answer.
else
keep trimming the smaller element from both ends and subtract from the sum till the sum turns positive.
Return this as the result.
It is an O(n) algorithm.
Hope it helps

A possible solution is here. You can use Counting sort to sort the array.
After that staring form the maximum element make a sum and check that if adding this element
will retain positive sum of not, if it remain positive add that and increment count move ahead. This might have some bugs for some input,i mean it may not work for all test cases.
but, this is just an idea which may help you as some improvement to this will give you your desired output.
at the end of one traversal count variable will give u result.
example:
array=[12,10,8,5,4,-2,-3,-20,-30] //considered already sorted now
i=0 sum=12 count=1
i=1 sum=22 count=2
i=2 sum=30 count=3
i=3 sum=35 count=4
i=4 sum=39 count=5
i=5 sum=37 count=6
i=6 sum=34 count=7
i=7 sum=14 count=8
i=8 sum=14 count=8 //as now 30 cant be added
so, here count=8 says maximum length sub array of 8 can give you positive sum.

OK, you have almost answered it already. Just modify Kadane's to use length of subsequence rather than the sum of the subsequence. That solves your problem. Here is Kadane's from Wikipedia:
int sequence(int numbers[])
{
// These variables can be added in to track the position of the subarray
size_t begin = 0;
size_t begin_temp = 0;
size_t len_temp = 0;
size_t end = 0;
// Find sequence by looping through
for(size_t i = 1; i < numbers.size(); i++)
{
// calculate max_ending_here
if(max_ending_here < 0)
{
max_ending_here = numbers[i];
begin_temp = i;
}
else
{
max_ending_here += numbers[i];
len_temp += (i - begin_temp);
}
// calculate max_so_far_len
if(len_temp >= max_so_far_len )
{
max_so_far_len = len_temp;
begin = begin_temp;
end = i;
}
}
return max_so_far_len ;
}

Since in your answer u consider sub-lists as consequetive elements , I guess a slight modification in Kadane's Algo will work for you.
Just introduce a variable named max_length_till_now . And update it whenever you found a sub list with length greater than it's present value.

Related

Subset String Array based on length

I have a vector with > 30000 words. I want to create a subset of this vector which contains only those words whose length is greater than 5. What is the best way to achieve this?
Basically df contains mutiple sentences.
So,
wordlist = df2;
wordlist = [strip(wordlist[i]) for i in [1:length(wordlist)]];
Now, I need to subset wordlist so that it contains only those words whose length is greater than 5.
sub(A,find(x->length(x)>5,A)) # => creates a view (most efficient way to make a subset)
EDIT: getindex() returns a copy of desired elements
getindex(A,find(x->length(x)>5,A)) # => makes a copy
You can use filter
wordlist = filter(x->islenatleast(x,6),wordlist)
and combine it with a fast condition such as islenatleast defined as:
function islenatleast(s,l)
if sizeof(s)<l return false end
# assumes each char takes at least a byte
l==0 && return true
p=1
i=0
while i<l
if p>sizeof(s) return false end
p = nextind(s,p)
i += 1
end
return true
end
According to my timings islenatleast is faster than calculating the whole length (in some conditions). Additionally, this shows the strength of Julia, by defining a primitive competitive with the core function length.
But doing:
wordlist = filter(x->length(x)>5,wordlist)
will also do.

Max. Product Dynamic Programming

So I was trying to solve the Max. Product Question and came up with the following recursion :
maxProd(n) = max of [k*(n-k),k*maxProd(n-k),maxProd(k)*(n-k),maxProd(k)*maxProd(n-k)]
However in the second solution given on that link they have skipped the maxProd(k)*maxProd(n-k).
int maxProd(int n)
{
// Base cases
if (n == 0 || n == 1) return 0;
// Make a cut at different places and take the maximum of all
int max_val = 0;
for (int i = 1; i < n; i++)
max_val = max(max_val, i*(n-i), maxProd(n-i)*i);
// Return the maximum of all values
return max_val;
}
Is that still right? If so, how? Wouldn't it give wrong answers when the only way to get Max. Product is recursively split both k and n-k?
The formula you wrote here will also work. But they have a smaller one.
Note that you can get any solution from the original formula, since it checks all possible ways to choose the first cut. So if the first cut has to be i, then i will be checked, and recursively continue to the other parts.
If you use memoization, you'll get the same runtime for both formulas.

How do you generate all permutations of n variables, each one taking on 1 or 0?

For example, let's say I want to generate all permutations of two values,each one could be 0 or 1, I would get:
[11,10,01,00]
Note here the first variable varies the slowest, so it stays fixed while the remaining one varies.
In the case of three variables, I would get
[111,110,101,100,011,010,001,000]
I see that there should be a recursive definition for it, but it's not clear enough in my head so that I could express it.
This is not about permutations, but about combinations and you can generate them easily in Haskell:
replicateM 3 "01"
= ["000","001","010","011","100","101","110","111"]
If you need actual integers:
replicateM 3 [0, 1]
= [[0,0,0],[0,0,1],[0,1,0],[0,1,1],
[1,0,0],[1,0,1],[1,1,0],[1,1,1]]
Finally if the values at the various positions are different:
sequence [".x", ".X", "-+"]
= ["..-","..+",".X-",".X+","x.-","x.+","xX-","xX+"]
This too works for integers, of course:
sequence [[0,1], [0,2], [0,4]]
= [[0,0,0],[0,0,4],[0,2,0],[0,2,4],
[1,0,0],[1,0,4],[1,2,0],[1,2,4]]
If you want permuations, as in a list of lists, here's a solution using a list monad.
\n -> mapM (const [0, 1]) [1..n]
ghci> :m +Data.List
ghci> permutations [0,1]
[[0,1],[1,0]]
(Edited based on feedback)
The smallest n-digit binary integer is 000..0 (n times), which is 0.
The largest n-digit binary integer is 111...1 (n times), which is 2^n - 1.
Generate the integers from 0 to 1<<n - 1 and print out the values you have.
Haskell's Int should be safe for <= 28 binary variables.
Hope that helps.
I don't know haskel, but here is a block of psuedo code on how I do permutations.
var positions = [0,0,0];
var max = 1;
done:
while(true){
positions[0]++; //Increment by one
for (var i = 0; i < positions.length; i++) {
if(positions[i] > max){ //If the current position is at max carry over
positions[i] = 0;
if(i+1 >= positions.length){ //no more positions left
break done;
}
positions[i+1]++;
}
}
}

Finding Maximum Value In A Set

If I have a list of numbers where the numbers increase to a point and then decrease after that point, is there a finite number of guesses independent of the size of the set that I would have to make in order to find that maximum value?
The distance between the values is arbitrary and the number of values on the increasing side can be different than the number of values on the decreasing side.
What would be the best method? Check element 1, then the last element, then the half between? And repeat? Or something more sophisticated?
What would the processing time be for such an algorithm?
You could use a Binary Search comparing two neighboring elements instead of one element to a fixed value. Start at a := 0, b := n, i := (a+b)/2 and compare element(i) to element(i+1). If you notice e(i+1) > e(i), you know the breakpoint is somewhere after i, so set a := i. If e(i) < e(i-1), the opposite is true and you set b := i.
The Complexity would then be O(log n). It would be slightly slower than a regular binary search because you need more comparisons.
You could try a rcursive algorithm simmilar to ordered search, based on the number of elements in your list.
Pseudo code:
List search(int index, List listpart){
if(listpart.length()==1){ // already found
return listpart
}
else if(listpart(index+1)>listpart(i) && listpart(index-1)<listpart(i)){ //search right part
List listpart_tmp = getlistpart(listpart, index, listpart.length())
return search(index+(listpart.length()/4), listpart_tmp
}
else if(listpart(index+1)<listpart(i) && listpart(index-1)>listpart(i)){ //search left part
List listpart_tmp = getlistpart(listpart, 0, index)
return search(index-(listpart.length()/4), listpart_tmp
}
else if(listpart(index+1)<listpart(i) && listpart(index-1)<listpart(i)){ //found
return getlistpart(listpart, index, index)
}
}
The function
getlistpart
is a function that returns a list consisting of the elements in the original list between the given indices.

Best algorithm for delete duplicates in array of strings

Today at school the teacher asked us to implement a duplicate-deletion algorithm. It's not that difficult, and everyone came up with the following solution (pseudocode):
for i from 1 to n - 1
for j from i + 1 to n
if v[i] == v[j] then remove(v, v[j]) // remove(from, what)
next j
next i
The computational complexity for this algo is n(n-1)/2. (We're in high school, and we haven't talked about big-O, but it seems to be O(n^2)). This solution appears ugly and, of course, slow, so I tried to code something faster:
procedure binarySearch(vector, element, *position)
// this procedure searches for element in vector, returning
// true if found, false otherwise. *position will contain the
// element's place (where it is or where it should be)
end procedure
----
// same type as v
vS = new array[n]
for i from 1 to n - 1
if binarySearch(vS, v[i], &p) = true then
remove(v, v[i])
else
add(vS, v[i], p) // adds v[i] in position p of array vS
end if
next i
This way vS will contain all the elements we've already passed. If element v[i] is in this array, then it is a duplicate and is removed. The computational complexity for the binary search is log(n) and for the main loop (second snippet) is n. Therefore the whole CC is n*log(n) if I'm not mistaken.
Then I had another idea about using a binary tree, but I can't put it down.
Basically my questions are:
Is my CC calculation right? (and, if not, why?)
Is there a faster method for this?
Thanks
The easiest solution will be to simply sort the array (takes O(n log n) with standard implementation if you may use them. otherwise consider making an easy randomized quicksort (code is even on wikipedia)).
Afterwards scan it for one additional time.
During that scan simple eliminate consecutive identical elements.
If you want to do it in O(n), you can also use a HashSet with elements you have already seen.
Just iterate once over your array, for each element check if it is in your HashSet.
If it isn't in there, add it.
If it is in there, remove it from the array.
Note, that this will take some additional memory and the hashing will have a constant factor that contributes to your runtime. Althought the time complexity is better, the practical runtime will only be onyl be faster once you exceed a certain array size
You can often use a space-time tradeoff and invest more space to reduce time.
In this case you could use a hash table to determine the unique words.
add is O(n), so your CC calculation is wrong. Your algorithm is O(n^2).
Moreover, how would remove be implemented? It also looks like it would be O(n) - so the initial algorithm would be O(n^3).
Binary search will only work if the array you're searching is sorted. I guess that's not the case here, or you wouldn't be looping over your entire array in the inner loop of the original solution.
If the order of the final solution is irrelevant, you could break the array into smaller arrays based on length of the strings, and then remove duplicates from those arrays. Example:
// You have
{"a", "ab", "b", "ab", "a", "c", "cd", "cd"},
// you break it into
{"a", "b", "a", "c"} and {"ab", "ab", "cd", "cd"},
// remove duplicates from those arrays using the merge method that others have mentioned,
// and then combine the arrays back together into
{"a", "b", "c", "ab", "cd"}
This is the shortest algorithm that worked where arrNames and arrScores is parallel arrays and the highest score is taken.
I := 0;
J := 0;
//iCount being the length of the array
for I := 1 to iCount do
for J := I + 1 to iCount do
if arrNames[I] = arrNames[J] then
begin
if arrScores[I] <= arrScores[J] then
arrScores[I] := arrScores[J];
arrScores[J] := arrScores[iCount];
arrNames[J] := arrNames[iCount];
arrScores[iCount] := 0;
arrNames[iCount] := '';
Dec(iCount);
end;
def dedup(l):
ht, et = [(None, None) for _ in range(len(l))], []
for e in l:
h, n = hash(e), h % len(ht)
while True:
if ht[n][0] is None:
et.append(e)
ht[n] = h, len(et) - 1
if ht[n][0] == h and et[ht[n][1]] == e:
break
if (n := n + 1) == len(ht):
n = 0
return et

Resources