I want to improve speed of my algorithm with multiple rows input. Python. Find average of consequitive elements in list - python-3.x

I need to find average of consecutive elements from list.
At first I am given lenght of list,
then list with numbers,
then am given how many test i need to perform(several rows with inputs),
then I am given several inputs to perform tests(and need to print as many rows with results)
every row for test consist of start and end element in list.
My algorithm:
nu = int(input()) # At first I am given lenght of list
numbers = input().split() # then list with numbers
num = input() # number of rows with inputs
k =[float(i) for i in numbers] # given that numbers in list are of float type
i= 0
while i < int(num):
a,b = input().split() # start and end element in list
i += 1
print(round(sum(k[int(a):(int(b)+1)])/(-int(a)+int(b)+1),6)) # round up to 6 decimals
But it's not fast enough.I was told it;s better to get rid of "while" but I don't know how. Appreciate any help.
Example:
Input:
8 - len(list)
79.02 36.68 79.83 76.00 95.48 48.84 49.95 91.91 - list
10 - number of test
0 0 - a1,b1
0 1
0 2
0 3
0 4
0 5
0 6
0 7
1 7
2 7
Output:
79.020000
57.850000
65.176667
67.882500
73.402000
69.308333
66.542857
69.713750
68.384286
73.668333

i= 0
while i < int(num):
a,b = input().split() # start and end element in list
i += 1
Replace your while-loop with a for loop. Also you could get rid of multiple int calls in the print statement:
for _ in range(int(num)):
a, b = [int(j) for j in input().split()]

You didn't spell out the constraints, but I am guessing that the ranges to be averaged could be quite large. Computing sum(k[int(a):(int(b)+1)]) may take a while.
However, if you precompute partial sums of the input list, each query can be answered in a constant time (sum of numbers in the range is a difference of corresponding partial sums).

Related

Confused in behaviour of 2-d array in python

n,m,k=map(int,input().split())
students=[int(x) for x in input().split()]
classroom=[]
count=0
rows=[0]*k
for i in range(m):
classroom.append(rows)
for i in students:
for j in range(k):
if c[i-1][j]==1:
continue
else:
c[i-1][j]=1
count+=1
break
print(classroom)
`"""
I want to calculate the number of students who are seated in their preferred row(it should vacant for student), in my case 0 is my vacancy and there are n students with their preferred rows( array of n with ith elemnt as preferred row).
Now my input is 5 2 2
1 1 2 1 1
here n=5,k=2(row length), m=2(no. of rows)
array=[1,1,2,1,1](students in the above code)
as per my code classroom will be my 2d array of size 2x2
Now, here logically it should print the classroom [[1,1],[1,0]] but im unable to understand why it is printing the classroom [[1,1],[1,1]]
I have testeed with input 5 2 2
1
so logically it should print classroom [[1,0][0,0]] but it is printing classroom [[1,0],[1,0]]. I have tested this on python 3 .
Please let me know what did i do wrong or what is the concept i didn't understand or what is logic behind this`
This line
classroom.append(rows)
appends the same address again and again. Thus when one of them is modified, all the others are modified. That's why the rows of your output are all the same.
Change this line to
classroom.append([0] * k)
This ensures that the rows are independent of each other.

Alternate between printing two series of numbers

Input format: The first line of input consists of the number of test cases, T
Next T lines consist of the value of N.
Constraints: 1<= T <=100, 1<= N <= 250
Output format: For each test case, print the space-separated N terms of the series in a separate line.
Sample test case 1
Input:
1
7
Output:
1 1 2 2 4 2 6
The series is a combination of 2 series, the 1st series: 1,2,4,6,... and the 2nd series: 1,2,2,.... I have made the code for the first series but cannot find how to code the 2nd one.
Code for the first series appended into list depending on the no of elements
def firstS:
l=[1]
i=1
x=math.ceil(7/2)
while(x!=0):
l.append(i+i)
i+=1
x-=1
return l
The problem is the no of elements, for 7 elements the 1st series has 4 and 2nd series has 3 elements, for 8 elements 1st has 4 and 2nd has 4 elements and for 9 elements 1st has 5 and 2nd has 4 elements so the no of elements will be for series 1 math.ceil(n/2) and for series 2 math.floor(n/2) where n is total elements of the combined series.
For iteration, one way do something every N iterations is to use the modulus operator (%). Modulus is basically a remainder operator, so the result periodically repeats as numbers are iterated one-by-one.
Also, in Python, the standard method for doing a for-loop (iterating a certain number of times) is using range.
Here's an example demonstrating both, where every third number has the same number of exclamation marks:
# List the numbers 0-9 (repeat ten times)
for i in range(0, 10):
if i % 3 == 0:
print(i, "!")
elif i % 3 == 1:
print(i, "!!")
else:
print(i, "!!!")
Result:
0 !
1 !!
2 !!!
3 !
4 !!
5 !!!
6 !
7 !!
8 !!!
9 !
I'll leave it as an exercise for the asker to determine how to apply this to their use-case of switching between printing two different sequences.

How to calculate the time complexity for nested for loops in the following example?

So in the following code, I am trying I am passing a (huge)number-string to the function where I have to find the maximum product of consecutive m digits
So, first, I am looping through let's say n-string and then the inner loop looping through m numbers.
So the inner loop is affected by the if-statement which makes a jump of m indexes if the next number is 0.
EDIT : 1
Actual Problem Question:
The four adjacent digits in the 1000-digit number that have the greatest product are 9 × 9 × 8 × 9 = 5832.
731671765313306249192251....(1000digits)
Find the thirteen adjacent digits in the 1000-digit number that have the greatest product. What is the value of this product?
Example:
m = 12 number = "1234567891120123456704832...(1000 digits)"
So in 1st iteration function will calculate the product of 1st 12 digits(i.e. from index-11 to index-0 - "1234567891120123456704832..."
Now, in 2nd iteration when it checks the value at index-12 which is 0 then index will jump to index-13. This way the loop will skip 11 iterations.
For the 3rd Iteration, the inner loop will execute for 4 iterations until it finds 0 ("0123456704832...".
def LargestProductInSeries_1(number,m):
max = -1
product = 1
index = 0
x = 0
while index < len(number)-(m-1):
for j in range(index+(m-1), index-1, -1):
num = int(number[j])
if(not num):
index = j
break
product = product * int(number[j])
max = product if max < product else max
product = 1
index += 1
return max
So according to me, the Worst Case Time Complexity would be O(n*m)
I think the Best Time would be O(n/m) if only once the inner loop is completely iterated or every mth digit is 0 which will make the outer loop execute but the index will jump to every mth digit.
Is my analysis correct?
What will be the Average Time for this case?
Will it be O(n*(log m)). Can anyone explain how? Or how to find Complexity in such cases?

I want to remove rows where a specific value doesn't increase. Is there a faster/more elegant way?

I have a dataframe with 30 columns, 1.000.000 rows and about 150 MB size. One column is categorical with 7 different elements and another column (Depth) contains mostly increasing numbers. The graph for each of the elements looks more or less like this.
I tried to save the column Depth as series and iterate through it while dropping rows that won't match the criteria. This was reeeeeaaaally slow.
Afterwards I added a boolean column to the dataframe which indicates if it will be dropped or not, so I could drop the rows in the end in a single step. Still slow. My last try (the code to it is in this post) was to create a boolean list to save the fact if it passes the criteria there. Still really slow (about 5 hours).
dropList = [True]*len(df.index)
for element in elements:
currentMax = 0
minIdx = df.loc[df['Element']==element]['Depth'].index.min()
maxIdx = df.loc[df['Element']==element]['Depth'].index.max()
for x in range(minIdx,maxIdx):
if df.loc[df['Element']==element]['Depth'][x] < currentMax:
dropList[x]=False
else:
currentMax = df.loc[df['Element']==element]['Depth'][x]
df: The main dataframe
elements: a list with the 7 different elements (same as in the categorical column in df)
All rows in an element, where the value Depth isn't bigger than all previous ones should be dropped. With the next element it should start with 0 again.
Example:
Input: 'Depth' = [0 1 2 3 4 2 3 5 6]
'AnyOtherColumn' = [a b c d e f g h i]
Output: 'Depth' [0 1 2 3 4 5 6]
'AnyOtherColumn' = [a b c d e h i]
This should apply to whole rows in the dataframe of course.
Is there a way to get this faster?
EDIT:
The whole rows of the input dataframe should stay as they are. Just the ones where the 'Depth' does not increase should be dropped.
EDIT2:
The remaining rows should stay in their initial order.
How about you take a 2-step approach. First you use a fast sorting algorithm (for example Quicksort) and next you get rid of all the duplicates?
Okay, I found a way thats faster. Here is the code:
dropList = [True]*len(df.index)
for element in elements:
currentMax = 0
minIdx = df.loc[df['Element']==element]['Tiefe'].index.min()
# maxIdx = df.loc[df['Element']==element]['Tiefe'].index.max()
elementList = df.loc[df['Element']==element]['Tiefe'].to_list()
for x in tqdm(range(len(elementList))):
if elementList[x] < currentMax:
dropList[x+minIdx]=False
else:
currentMax = elementList[x]
I took the column and saved it as a list. To preserve, the index of the dataframe I saved the lowest one and within the loop it gets added again.
Overall it seems the problem was the loc function. From initially 5 hours runtime, its now about 10 seconds.

Intermediate steps in evaluation of Frequency formula

This has reference to [SO question]Counting unique list of items from range based on criteria from other ranges
Formula Suggested by Scot Craner is :
=SUM(--(FREQUENCY(IF(B2:B7<=25,IF(C2:C7<=35,COUNTIF(A2:A7,"<"&A2:A7),""),""),COUNTIF(A2:A7,"<"&A2:A7))>0))
I have been able to understand clearly the logic and evaluation of the formula except for this step shown in the attached snapshots.
As per MS Office document:
FREQUENCY(data_array, bins_array) The FREQUENCY function syntax has
the following arguments: Data_array Required. An array of or
reference to a set of values for which you want to count frequencies.
If data_array contains no values, FREQUENCY returns an array of zeros.
Bins_array Required. An array of or reference to intervals into
which you want to group the values in data_array. If bins_array
contains no values, FREQUENCY returns the number of elements in
data_array.
It is clear to me as to How {1;1;4;0;"";"") comes in data_array and also how {1;1;4;0;5;3} comes in bins_array.But how it evaluates to {2;0;1;1;0;0;0} is not clear to me.
Would appreciate if someone can lucidly explain it.
So you wants to know how
FREQUENCY({1;1;4;0;"";""},{1;1;4;0;5;3}) evaluates to {2;0;1;1;0;0;0}?
Problem is that the bins_array not needs to be sorted to make FREQUENCY working. But of course it internally must sort the bins_array to get the intervals into which to group the values in data_array. Then it groups and counts and then it returns the counted numbers in the same order the bins was given in bins_array.
Scores Bins
1 1
1 1
4 4
0 0
"" 5
"" 3
Bins sorted
0 (<=0)
1 (>0, <=1)
1 (>1, <=1) == not possible
3 (>1, <=3)
4 (>3, <=4)
5 (>4, <=5)
(>5)
Bin Description Result
1 Number of scores (>0, <=1) 2
1 Number of scores (>1, <=1) == not possible 0
4 Number of scores (>3, <=4) 1
0 Number of scores (<=0) 1
5 Number of scores (>4, <=5) 0
3 Number of scores (>1, <=3) 0
Number of scores (>5) 0

Resources