SSE instruction is giving error - visual-c++

I am using the following code to divide all int array elements with constant factor using SSE.
void sse_div(int *arr,int num_shift,int N) // devide all array elements by 2
{
num_shift=1;
int nb_iters = N / 4;
__declspec(align(32))int *a1=arr;
__m128i* l = (__m128i*)a1;
for (int i = 0; i < nb_iters; ++i, ++l)
_mm_store_si128( l, _mm_srai_epi32(*l,num_shift)); //Error line
}
But I am getting the following error
I am unable to get rid of this problem.
Can anybody please help to solve this problem.
Any help will be appreciated.
Thanks in Advance

Since your input array is apparently misaligned you can use unaligned loads/stores, e.g.:
void sse_div(int *arr, int N) // divide all array elements by 2
{
for (int i = 0; i < nb_iters; i += 4)
{
__m128i v = _mm_loadu_si128(&arr[i]);
v = _mm_srai_epi32(v, 1);
_mm_storeu_si128(&arr[i], v);
}
}
Note that there may be a significant performance hit from using unaligned loads/stores (depending on what CPU you are running on), so if possible you should make your arr array 16 byte aligned when you allocate the memory.

Related

garbage in loop for no reason

i wrote a function that receives a string as a char array and converts it to an int:
int makeNumFromString(char Str[])
{
int num = 0, len = 0;
int p;
len = strlen(Str);
for (p = 0; p<len; p++)
{
num = num * 10 + (Str[p] - 48);
}
return num;
}
the problem is that no matter how long the string i input is, when "p" gets to 10 the value of "num" turns to garbage!!!
i tried debbuging and checking the function outside of the larger code but no success.
what could be the problem and how can i fix it?
THANKS
Perhaps your int can only store 32 bits, so the number cannot be higher than 2,147,483,647.
Try using a type for num with more storage, like long.

Maximum element in array which is equal to product of two elements in array

We need to find the maximum element in an array which is also equal to product of two elements in the same array. For example [2,3,6,8] , here 6=2*3 so answer is 6.
My approach was to sort the array and followed by a two pointer method which checked whether the product exist for each element. This is o(nlog(n)) + O(n^2) = O(n^2) approach. Is there a faster way to this ?
There is a slight better solution with O(n * sqrt(n)) if you are allowed to use O(M) memory M = max number in A[i]
Use an array of size M to mark every number while you traverse them from smaller to bigger number.
For each number try all its factors and see if those were already present in the array map.
Here is a pseudo code for that:
#define M 1000000
int array_map[M+2];
int ans = -1;
sort(A,A+n);
for(i=0;i<n;i++) {
for(j=1;j<=sqrt(A[i]);j++) {
int num1 = j;
if(A[i]%num1==0) {
int num2 = A[i]/num1;
if(array_map[num1] && array_map[num2]) {
if(num1==num2) {
if(array_map[num1]>=2) ans = A[i];
} else {
ans = A[i];
}
}
}
}
array_map[A[i]]++;
}
There is an ever better approach if you know how to find all possible factors in log(M) this just becomes O(n*logM). You have to use sieve and backtracking for that
#JerryGoyal 's solution is correct. However, I think it can be optimized even further if instead of using B pointer, we use binary search to find the other factor of product if arr[c] is divisible by arr[a]. Here's the modification for his code:
for(c=n-1;(c>1)&& (max==-1);c--){ // loop through C
for(a=0;(a<c-1)&&(max==-1);a++){ // loop through A
if(arr[c]%arr[a]==0) // If arr[c] is divisible by arr[a]
{
if(binary_search(a+1, c-1, (arr[c]/arr[a]))) //#include<algorithm>
{
max = arr[c]; // if the other factor x of arr[c] is also in the array such that arr[c] = arr[a] * x
break;
}
}
}
}
I would have commented this on his solution, unfortunately I lack the reputation to do so.
Try this.
Written in c++
#include <vector>
#include <algorithm>
using namespace std;
int MaxElement(vector< int > Input)
{
sort(Input.begin(), Input.end());
int LargestElementOfInput = 0;
int i = 0;
while (i < Input.size() - 1)
{
if (LargestElementOfInput == Input[Input.size() - (i + 1)])
{
i++;
continue;
}
else
{
if (Input[i] != 0)
{
LargestElementOfInput = Input[Input.size() - (i + 1)];
int AllowedValue = LargestElementOfInput / Input[i];
int j = 0;
while (j < Input.size())
{
if (Input[j] > AllowedValue)
break;
else if (j == i)
{
j++;
continue;
}
else
{
int Product = Input[i] * Input[j++];
if (Product == LargestElementOfInput)
return Product;
}
}
}
i++;
}
}
return -1;
}
Once you have sorted the array, then you can use it to your advantage as below.
One improvement I can see - since you want to find the max element that meets the criteria,
Start from the right most element of the array. (8)
Divide that with the first element of the array. (8/2 = 4).
Now continue with the double pointer approach, till the element at second pointer is less than the value from the step 2 above or the match is found. (i.e., till second pointer value is < 4 or match is found).
If the match is found, then you got the max element.
Else, continue the loop with next highest element from the array. (6).
Efficient solution:
2 3 8 6
Sort the array
keep 3 pointers C, B and A.
Keeping C at the last and A at 0 index and B at 1st index.
traverse the array using pointers A and B till C and check if A*B=C exists or not.
If it exists then C is your answer.
Else, Move C a position back and traverse again keeping A at 0 and B at 1st index.
Keep repeating this till you get the sum or C reaches at 1st index.
Here's the complete solution:
int arr[] = new int[]{2, 3, 8, 6};
Arrays.sort(arr);
int n=arr.length;
int a,b,c,prod,max=-1;
for(c=n-1;(c>1)&& (max==-1);c--){ // loop through C
for(a=0;(a<c-1)&&(max==-1);a++){ // loop through A
for(b=a+1;b<c;b++){ // loop through B
prod=arr[a]*arr[b];
if(prod==arr[c]){
System.out.println("A: "+arr[a]+" B: "+arr[b]);
max=arr[c];
break;
}
if(prod>arr[c]){ // no need to go further
break;
}
}
}
}
System.out.println(max);
I came up with below solution where i am using one array list, and following one formula:
divisor(a or b) X quotient(b or a) = dividend(c)
Sort the array.
Put array into Collection Col.(ex. which has faster lookup, and maintains insertion order)
Have 2 pointer a,c.
keep c at last, and a at 0.
try to follow (divisor(a or b) X quotient(b or a) = dividend(c)).
Check if a is divisor of c, if yes then check for b in col.(a
If a is divisor and list has b, then c is the answer.
else increase a by 1, follow step 5, 6 till c-1.
if max not found then decrease c index, and follow the steps 4 and 5.
Check this C# solution:
-Loop through each element,
-loop and multiply each element with other elements,
-verify if the product exists in the array and is the max
private static int GetGreatest(int[] input)
{
int max = 0;
int p = 0; //product of pairs
//loop through the input array
for (int i = 0; i < input.Length; i++)
{
for (int j = i + 1; j < input.Length; j++)
{
p = input[i] * input[j];
if (p > max && Array.IndexOf(input, p) != -1)
{
max = p;
}
}
}
return max;
}
Time complexity O(n^2)

Algorithm for doing many substring reversals?

Suppose I have a string S of length N, and I want to perform M of the following operations:
choose 1 <= L,R <= N and reverse the substring S[L..R]
I am interested in what the final string looks like after all M operations. The obvious approach is to do the actual swapping, which leads to O(MN) worst-case behavior. Is there a faster way? I'm trying to just keep track of where an index ends up, but I cannot find a way to reduce the running time (though I have a gut feeling O(M lg N + N) -- for the operations and the final reading -- is possible).
Yeah, it's possible. Make a binary tree structure like
struct node {
struct node *child[2];
struct node *parent;
char label;
bool subtree_flipped;
};
Then you can have a logical getter/setter for left/right child:
struct node *get_child(struct node *u, bool right) {
return u->child[u->subtree_flipped ^ right];
}
void set_child(struct node *u, bool right, struct node *c) {
u->child[u->subtree_flipped ^ right] = c;
if (c != NULL) { c->parent = u; }
}
Rotations have to preserve flipped bits:
struct node *detach(struct node *u, bool right) {
struct node *c = get_child(u, right);
if (c != NULL) { c->subtree_flipped ^= u->subtree_flipped; }
return c;
}
void attach(struct node *u, bool right, struct node *c) {
set_child(u, right, c);
if (c != NULL) { c->subtree_flipped ^= u->subtree_flipped; }
}
// rotates one of |p|'s child up.
// does not fix up the pointer to |p|.
void rotate(struct node *p, bool right) {
struct node *u = detach(p, right);
struct node *c = detach(u, !right);
attach(p, right, c);
attach(u, !right, p);
}
Implement splay with rotations. It should take a "guard" pointer that is treated as a NULL parent for the purpose of splaying, so that you can splay one node to the root and another to its right child. Do this and then you can splay both endpoints of the flipped region and then toggle the flip bits for the root and the two subtrees corresponding to segments left unaffected.
Traversal looks like this.
void traverse(struct node *u, bool flipped) {
if (u == NULL) { return; }
flipped ^= u->subtree_flipped;
traverse(u->child[flipped], flipped);
visit(u);
traverse(u->child[!flipped], flipped);
}
Splay tree may help you, it supports reverse operation in an array, with total complexity O(mlogn)
#F. Ju is right, splay trees are one of the best data structures to achieve your goal.
However, if you don't want to implement them, or a solution in O((N + M) * sqrt(M)) is good enough, you can do the following:
We will perform sqrt(M) consecutive queries and then rebuilt the array from the scratch in O(N) time.
In order to do that, for each query, we will store the information that the queried segment [a, b] is reversed or not (if you reverse some range of elements twice, they become unreversed).
The key here is to maintain the information for disjoint segments here. Notice that since we are performing at most sqrt(M) queries before rebuilding the array, we will have at most sqrt(M) disjoint segments and we can perform query operation on sqrt(M) segments in sqrt(M) time. Let me know if you need a detailed explanation on how to "reverse" these disjoint segments.
This trick is very useful while solving problems like that and it is worth to know it.
UPDATE:
I solved the problem exactly corresponding to yours on HackerRank, during their contest, using the method I described.
Here is the problem
Here is my solution in C++.
Here is the discussion about the problem and a brief description of my method, please check my 3rd message there.
I'm trying to just keep track of where an index ends up
If you're just trying to follow one entry of the starting array, it's easy to do that in O(M) time.
I was going to just write pseudocode, but no hand-waving was needed so I ended up with what's probably valid C++.
// untested C++, but it does compile to code that looks right.
struct swap {
int l, r;
// or make these non-member functions for C
bool covers(int pos) { return l <= pos && pos <= r; }
int apply_if_covering(int pos) {
// startpos - l = r - endpos;
// endpos = l - startpos + r
if(covers(pos))
pos = l - pos + r;
return pos;
}
};
int follow_swaps (int pos, int len, struct swap swaps[], int num_swaps)
{
// pos = starting position of the element we want to track
// return value = where it will be after all the swaps
for (int i = 0 ; i < num_swaps ; i++) {
pos = swaps[i].apply_if_covering(pos);
}
return pos;
}
This compiles to very efficient-looking code.

Given length and number of digits,we have to find minimum and maximum number that can be made?

As the question states,we are given a positive integer M and a non-negative integer S. We have to find the smallest and the largest of the numbers that have length M and sum of digits S.
Constraints:
(S>=0 and S<=900)
(M>=1 and M<=100)
I thought about it and came to conclusion that it must be Dynamic Programming.However I failed to build DP state.
This is what I thought:-
dp[i][j]=First 'i' digits having sum 'j'
And tried to make program.This is how it looks like
/*
*** PATIENCE ABOVE PERFECTION ***
"When in doubt, use brute force. :D"
-Founder of alloj.wordpress.com
*/
#include<bits/stdc++.h>
using namespace std;
#define pb push_back
#define mp make_pair
#define nline cout<<"\n"
#define fast ios_base::sync_with_stdio(false),cin.tie(0)
#define ull unsigned long long int
#define ll long long int
#define pii pair<int,int>
#define MAXX 100009
#define fr(a,b,i) for(int i=a;i<b;i++)
vector<int>G[MAXX];
int main()
{
int m,s;
cin>>m>>s;
int dp[m+1][s+1];
fr(1,m+1,i)
fr(1,s+1,j)
fr(0,10,k)
dp[i][j]=min(dp[i-1][j-k]+k,dp[i][j]); //Tried for Minimum
cout<<dp[m][s]<<endl;
return 0;
}
Please guide me about this DP state and what will be the time complexity of the program.This is my first try of DP.
dp solution goes here :-
#include<iostream>
using namespace std;
int dp[102][902][2] ;
void print_ans(int m , int s , int flag){
if(m==0)
return ;
cout<<dp[m][s][flag];
if(dp[m][s][flag]!=-1)
print_ans(m-1 , s-dp[m][s][flag] , flag );
return ;
}
int main(){
//freopen("problem.in","r",stdin);
//freopen("out.txt","w",stdout);
//int t;
//cin>>t;
//while(t--){
int m , s ;
cin>>m>>s;
if(s==0){
cout<<(m==1?"0 0":"-1 -1");
return 0;
}
for(int i = 0 ; i <=m ; i++){
for(int j=0 ; j<=s ;j++){
dp[i][j][0]=-1;
dp[i][j][1]=-1;
}
}
for(int i = 0 ; i < 10 ; i++){
dp[1][i][0]=i;
dp[1][i][1]=i;
}
for(int i = 2 ; i<=m ; i++){
for(int j = 0 ; j<=s ; j++){
int flag = -1;
int f = -1;
for(int k = 0 ; k <= 9 ; k++){
if(i==m&&k==0)
continue;
if( j>=k && flag==-1 && dp[i-1][j-k][0]!=-1)
flag = k;
}
for(int k = 9 ; k >=0 ;k--){
if(i==m&&k==0)
continue;
if( j>=k && f==-1 && dp[i-1][j-k][1]!=-1)
f = k;
}
dp[i][j][0]=flag;
dp[i][j][1]=f;
}
}
if(m!=0){
print_ans(m , s , 0);
cout<<" ";
print_ans(m,s,1);
}
else
cout<<"-1 -1";
cout<<endl;
// }
}
The DP state is (i,j). It can be thought of as the parameters of a mathematical function defined in terms of recurrences(Smaller problems ,Hence sub problems!)
More deeply,
State is generally the number of parameters to identify the problem uniquely , so that we always know on what we are computing on!!
Let us take the example of your question only
Just to define your problem we will need Number of Digits in the state + Sums that can be formed with these Digits (Note: You are kind of collectively keeping the sum while traversing through digits!)
I think that is enough for the state part.
Now,
Running time of Dynamic Programming is very simple.
First Let us see how many sub problems exist in a problem :
You need to fill up each and every state i.e. You have to cover all the unique sub problems smaller than or equal to the whole problem !!
Which problem is smaller than the other is known by the recurrent relation !!
For example:
Fibonacci Sequence
F(n)=F(n-1)+F(n-2)
Note the base case , is always the smallest sub problem .!!
Note Here for F(n) We have to calculate F(n-1) and F(n-2) , And it will reach a stage where n=1 , where you need to return the base case!!
Hence the total number of sub problems can be said as all the problems between the base case and the current problem!
Now,
In bottom up , we need to process each and every state in terms of size between this base case and problem!
Now, This tells us that the Running time should be
O(Number of Subproblems * Time per each subproblem).
So how many subproblems exist in your solution DP[0][0] to DP[M][S]
and for every problem you are running a loop of 10
O( M*S (Subproblems ) * 10 )
Chop that constant of!
But it is not necessarily a constant always!!
Here is some code which you might want to look! Feel free to ask anything !
#include<bits/stdc++.h>
using namespace std;
bool DP[9][101];
int Number[9][101];
int main()
{
DP[0][0]=true; // It is possible to form 0 using NULL digits!!
int N=9,S=100,i,j,k;
for(i=1;i<=9;++i)
for(j=0;j<=100;++j)
{
if(DP[i-1][j])
{
for(k=0;k<=9;++k)
if(j+k<=100)
{
DP[i][j+k]=true;
Number[i][j+k]=Number[i-1][j]*10+k;
}
}
}
cout<<Number[9][81]<<"\n";
return 0;
}
You can rather use backtracking rather than storing the numbers directly just because your constraints are high!
DP[i][j] represents if it is possible to form sum of digits using i digits only!!
Number[i][j]
is my laziness to avoid typing a backtrack way(Sleepy, its already 3A.M.)
I am trying to add all the possible digits to extend the state.
It is essentially kind of forward DP style!! You can read more about it at Topcoder

Generate 50 random numbers and store them into an array c++

this is what i have of the function so far. This is only the beginning of the problem, it is asking to generate the random numbers in a 10 by 5 group of numbers for the output, then after this it is to be sorted by number size, but i am just trying to get this first part down.
/* Populate the array with 50 randomly generated integer values
* in the range 1-50. */
void populateArray(int ar[], const int n) {
int n;
for (int i = 1; i <= length - 1; i++){
for (int i = 1; i <= ARRAY_SIZE; i++) {
i = rand() % 10 + 1;
ar[n]++;
}
}
}
First of all we want to use std::array; It has some nice property, one of which is that it doesn't decay as a pointer. Another is that it knows its size. In this case we are going to use templates to make populateArray a generic enough algorithm.
template<std::size_t N>
void populateArray(std::array<int, N>& array) { ... }
Then, we would like to remove all "raw" for loops. std::generate_n in combination with some random generator seems a good option.
For the number generator we can use <random>. Specifically std::uniform_int_distribution. For that we need to get some generator up and running:
std::random_device device;
std::mt19937 generator(device());
std::uniform_int_distribution<> dist(1, N);
and use it in our std::generate_n algorithm:
std::generate_n(array.begin(), N, [&dist, &generator](){
return dist(generator);
});
Live demo

Resources