Confused in behaviour of 2-d array in python - python-3.x

n,m,k=map(int,input().split())
students=[int(x) for x in input().split()]
classroom=[]
count=0
rows=[0]*k
for i in range(m):
classroom.append(rows)
for i in students:
for j in range(k):
if c[i-1][j]==1:
continue
else:
c[i-1][j]=1
count+=1
break
print(classroom)
`"""
I want to calculate the number of students who are seated in their preferred row(it should vacant for student), in my case 0 is my vacancy and there are n students with their preferred rows( array of n with ith elemnt as preferred row).
Now my input is 5 2 2
1 1 2 1 1
here n=5,k=2(row length), m=2(no. of rows)
array=[1,1,2,1,1](students in the above code)
as per my code classroom will be my 2d array of size 2x2
Now, here logically it should print the classroom [[1,1],[1,0]] but im unable to understand why it is printing the classroom [[1,1],[1,1]]
I have testeed with input 5 2 2
1
so logically it should print classroom [[1,0][0,0]] but it is printing classroom [[1,0],[1,0]]. I have tested this on python 3 .
Please let me know what did i do wrong or what is the concept i didn't understand or what is logic behind this`

This line
classroom.append(rows)
appends the same address again and again. Thus when one of them is modified, all the others are modified. That's why the rows of your output are all the same.
Change this line to
classroom.append([0] * k)
This ensures that the rows are independent of each other.

Related

Hacker rank problem - code optimisation and debugging logical errors required to pass all the test cases for the below python program

This problem is regarding sets, here is an array arr of integers. There are also disjoint sets, A and B, each containing integers. You like all the integers in the set A and dislike all the integers in set B. Your initial happiness is 0. For each integer in the array, if i belongs to A, you add 1 to your happiness. If i belongs to B, you add -1 to your happiness. Otherwise, your happiness does not change. Output your final happiness at the end.
Note: A and B are set, they have no repeated elements. However, the array might contain duplicate elements.
In the below code, I have tried to take input n,m
k = list(map(str,input().split(' ')))
n,m =k
arr=[]
arr = [int(i) for i in input().split()]
arr1 = list( dict.fromkeys(arr) )
A=set(int(i) for i in input().split())
B=set(int(i) for i in input().split())
a=len(set(arr1).intersection(A))
b=len(set(arr1).intersection(B))
print(a-b)
Input Format
The first line contains integers n and m and separated by a space.
The second line contains n integers, the elements of the array.
The third and fourth lines contain m integers, A, and B, respectively.
Input
**1** **2**
3 2 13 4
1 5 3 1 7 8 5 3 7 9 4 9 8 2 1 4
3 1 1 5 3 9
5 7 7 4 2 8
Output
1 0
The above piece of code works for small input test cases but it results as the Wrong answer for the rest.
Follow the link for the actual problem statement
This is the code I used but it was unable to clear most test cases. Need help.

I want to improve speed of my algorithm with multiple rows input. Python. Find average of consequitive elements in list

I need to find average of consecutive elements from list.
At first I am given lenght of list,
then list with numbers,
then am given how many test i need to perform(several rows with inputs),
then I am given several inputs to perform tests(and need to print as many rows with results)
every row for test consist of start and end element in list.
My algorithm:
nu = int(input()) # At first I am given lenght of list
numbers = input().split() # then list with numbers
num = input() # number of rows with inputs
k =[float(i) for i in numbers] # given that numbers in list are of float type
i= 0
while i < int(num):
a,b = input().split() # start and end element in list
i += 1
print(round(sum(k[int(a):(int(b)+1)])/(-int(a)+int(b)+1),6)) # round up to 6 decimals
But it's not fast enough.I was told it;s better to get rid of "while" but I don't know how. Appreciate any help.
Example:
Input:
8 - len(list)
79.02 36.68 79.83 76.00 95.48 48.84 49.95 91.91 - list
10 - number of test
0 0 - a1,b1
0 1
0 2
0 3
0 4
0 5
0 6
0 7
1 7
2 7
Output:
79.020000
57.850000
65.176667
67.882500
73.402000
69.308333
66.542857
69.713750
68.384286
73.668333
i= 0
while i < int(num):
a,b = input().split() # start and end element in list
i += 1
Replace your while-loop with a for loop. Also you could get rid of multiple int calls in the print statement:
for _ in range(int(num)):
a, b = [int(j) for j in input().split()]
You didn't spell out the constraints, but I am guessing that the ranges to be averaged could be quite large. Computing sum(k[int(a):(int(b)+1)]) may take a while.
However, if you precompute partial sums of the input list, each query can be answered in a constant time (sum of numbers in the range is a difference of corresponding partial sums).

Alternate between printing two series of numbers

Input format: The first line of input consists of the number of test cases, T
Next T lines consist of the value of N.
Constraints: 1<= T <=100, 1<= N <= 250
Output format: For each test case, print the space-separated N terms of the series in a separate line.
Sample test case 1
Input:
1
7
Output:
1 1 2 2 4 2 6
The series is a combination of 2 series, the 1st series: 1,2,4,6,... and the 2nd series: 1,2,2,.... I have made the code for the first series but cannot find how to code the 2nd one.
Code for the first series appended into list depending on the no of elements
def firstS:
l=[1]
i=1
x=math.ceil(7/2)
while(x!=0):
l.append(i+i)
i+=1
x-=1
return l
The problem is the no of elements, for 7 elements the 1st series has 4 and 2nd series has 3 elements, for 8 elements 1st has 4 and 2nd has 4 elements and for 9 elements 1st has 5 and 2nd has 4 elements so the no of elements will be for series 1 math.ceil(n/2) and for series 2 math.floor(n/2) where n is total elements of the combined series.
For iteration, one way do something every N iterations is to use the modulus operator (%). Modulus is basically a remainder operator, so the result periodically repeats as numbers are iterated one-by-one.
Also, in Python, the standard method for doing a for-loop (iterating a certain number of times) is using range.
Here's an example demonstrating both, where every third number has the same number of exclamation marks:
# List the numbers 0-9 (repeat ten times)
for i in range(0, 10):
if i % 3 == 0:
print(i, "!")
elif i % 3 == 1:
print(i, "!!")
else:
print(i, "!!!")
Result:
0 !
1 !!
2 !!!
3 !
4 !!
5 !!!
6 !
7 !!
8 !!!
9 !
I'll leave it as an exercise for the asker to determine how to apply this to their use-case of switching between printing two different sequences.

I want to remove rows where a specific value doesn't increase. Is there a faster/more elegant way?

I have a dataframe with 30 columns, 1.000.000 rows and about 150 MB size. One column is categorical with 7 different elements and another column (Depth) contains mostly increasing numbers. The graph for each of the elements looks more or less like this.
I tried to save the column Depth as series and iterate through it while dropping rows that won't match the criteria. This was reeeeeaaaally slow.
Afterwards I added a boolean column to the dataframe which indicates if it will be dropped or not, so I could drop the rows in the end in a single step. Still slow. My last try (the code to it is in this post) was to create a boolean list to save the fact if it passes the criteria there. Still really slow (about 5 hours).
dropList = [True]*len(df.index)
for element in elements:
currentMax = 0
minIdx = df.loc[df['Element']==element]['Depth'].index.min()
maxIdx = df.loc[df['Element']==element]['Depth'].index.max()
for x in range(minIdx,maxIdx):
if df.loc[df['Element']==element]['Depth'][x] < currentMax:
dropList[x]=False
else:
currentMax = df.loc[df['Element']==element]['Depth'][x]
df: The main dataframe
elements: a list with the 7 different elements (same as in the categorical column in df)
All rows in an element, where the value Depth isn't bigger than all previous ones should be dropped. With the next element it should start with 0 again.
Example:
Input: 'Depth' = [0 1 2 3 4 2 3 5 6]
'AnyOtherColumn' = [a b c d e f g h i]
Output: 'Depth' [0 1 2 3 4 5 6]
'AnyOtherColumn' = [a b c d e h i]
This should apply to whole rows in the dataframe of course.
Is there a way to get this faster?
EDIT:
The whole rows of the input dataframe should stay as they are. Just the ones where the 'Depth' does not increase should be dropped.
EDIT2:
The remaining rows should stay in their initial order.
How about you take a 2-step approach. First you use a fast sorting algorithm (for example Quicksort) and next you get rid of all the duplicates?
Okay, I found a way thats faster. Here is the code:
dropList = [True]*len(df.index)
for element in elements:
currentMax = 0
minIdx = df.loc[df['Element']==element]['Tiefe'].index.min()
# maxIdx = df.loc[df['Element']==element]['Tiefe'].index.max()
elementList = df.loc[df['Element']==element]['Tiefe'].to_list()
for x in tqdm(range(len(elementList))):
if elementList[x] < currentMax:
dropList[x+minIdx]=False
else:
currentMax = elementList[x]
I took the column and saved it as a list. To preserve, the index of the dataframe I saved the lowest one and within the loop it gets added again.
Overall it seems the problem was the loc function. From initially 5 hours runtime, its now about 10 seconds.

pandas how to flatten a list in a column while keeping list ids for each element

I have the following df,
A id
[ObjectId('5abb6fab81c0')] 0
[ObjectId('5abb6fab81c3'),ObjectId('5abb6fab81c4')] 1
[ObjectId('5abb6fab81c2'),ObjectId('5abb6fab81c1')] 2
I like to flatten each list in A, and assign its corresponding id to each element in the list like,
A id
ObjectId('5abb6fab81c0') 0
ObjectId('5abb6fab81c3') 1
ObjectId('5abb6fab81c4') 1
ObjectId('5abb6fab81c2') 2
ObjectId('5abb6fab81c1') 2
I think the comment is coming from this question ? you can using my original post or this one
df.set_index('id').A.apply(pd.Series).stack().reset_index().drop('level_1',1)
Out[497]:
id 0
0 0 1.0
1 1 2.0
2 1 3.0
3 1 4.0
4 2 5.0
5 2 6.0
Or
pd.DataFrame({'id':df.id.repeat(df.A.str.len()),'A':df.A.sum()})
Out[498]:
A id
0 1 0
1 2 1
1 3 1
1 4 1
2 5 2
2 6 2
This probably isn't the most elegant solution, but it works. The idea here is to loop through df (which is why this is likely an inefficient solution), and then loop through each list in column A, appending each item and the id to new lists. Those two new lists are then turned into a new DataFrame.
a_list = []
id_list = []
for index, a, i in df.itertuples():
for item in a:
a_list.append(item)
id_list.append(i)
df1 = pd.DataFrame(list(zip(alist, idlist)), columns=['A', 'id'])
As I said, inelegant, but it gets the job done. There's probably at least one better way to optimize this, but hopefully it gets you moving forward.
EDIT (April 2, 2018)
I had the thought to run a timing comparison between mine and Wen's code, simply out of curiosity. The two variables are the length of column A, and the length of the list entries in column A. I ran a bunch of test cases, iterating by orders of magnitude each time. For example, I started with A length = 10 and ran through to 1,000,000, at each step iterating through randomized A entry list lengths of 1-10, 1-100 ... 1-1,000,000. I found the following:
Overall, my code is noticeably faster (especially at increasing A lengths) as long as the list lengths are less than ~1,000. As soon as the randomized list length hits the ~1,000 barrier, Wen's code takes over in speed. This was a huge surprise to me! I fully expected my code to lose every time.
Length of column A generally doesn't matter - it simply increases the overall execution time linearly. The only case in which it changed the results was for A length = 10. In that case, no matter the list length, my code ran faster (also strange to me).
Conclusion: If the list entries in A are on the order of a few hundred elements (or less) long, my code is the way to go. But if you're working with huge data sets, use Wen's! Also worth noting that as you hit the 1,000,000 barrier, both methods slow down drastically. I'm using a fairly powerful computer, and each were taking minutes by the end (it actually crashed on the A length = 1,000,000 and list length = 1,000,000 case).
Flattening and unflattening can be done using this function
def flatten(df, col):
col_flat = pd.DataFrame([[i, x] for i, y in df[col].apply(list).iteritems() for x in y], columns=['I', col])
col_flat = col_flat.set_index('I')
df = df.drop(col, 1)
df = df.merge(col_flat, left_index=True, right_index=True)
return df
Unflattening:
def unflatten(flat_df, col):
flat_df.groupby(level=0).agg({**{c:'first' for c in flat_df.columns}, col: list})
After unflattening we get the same dataframe except column order:
(df.sort_index(axis=1) == unflatten(flatten(df)).sort_index(axis=1)).all().all()
>> True
To create unique index you can call reset_index after flattening

Resources