I have this piece of code
import itertools
values = [1, 2, 3, 4]
per = itertools.permutations(values, 2)
hyp = 3
for val in per:
print(*val)
Output:
1 2
1 3
1 4
2 1
2 3
2 4
3 1
3 2
3 4
4 1
4 2
4 3
I want to compare each tuple with value of hyp (e.g. 3). If each tuple has value less than or equal to hyp it keeps it and if condition doesn't meet, It discard it.
In this case the tuples (4,1),(4,2),(4,3) should be removed.
in other words,
Based on hyp value it takes pair.
If hyp =2 then from value list it output should be like this
1 2
1 3
1 4
2 1
2 3
2 4
I am not sure whether i explained my problem clearly or not. Let me know if it is unclear.
This will do it. You just need to extract the zero index of each tuple and compare it to hyp:
import itertools
values = [1, 2, 3, 4]
per = itertools.permutations(values, 2)
hyp = 3
for tup in per:
if tup[0] <= hyp:
print(*tup)
Related
I have a DataFrame with columns A, B, and C. For each value of A, I would like to select the row with the minimum value in column B.
That is, from this:
df = pd.DataFrame({'A': [1, 1, 1, 2, 2, 2],
'B': [4, 5, 2, 7, 4, 6],
'C': [3, 4, 10, 2, 4, 6]})
A B C
0 1 4 3
1 1 5 4
2 1 2 10
3 2 7 2
4 2 4 4
5 2 6 6
I would like to get:
A B C
0 1 2 10
1 2 4 4
For the moment I am grouping by column A, then creating a value that indicates to me the rows I will keep:
a = data.groupby('A').min()
a['A'] = a.index
to_keep = [str(x[0]) + str(x[1]) for x in a[['A', 'B']].values]
data['id'] = data['A'].astype(str) + data['B'].astype('str')
data[data['id'].isin(to_keep)]
I am sure that there is a much more straightforward way to do this.
I have seen many answers here that use MultiIndex, which I would prefer to avoid.
Thank you for your help.
I feel like you're overthinking this. Just use groupby and idxmin:
df.loc[df.groupby('A').B.idxmin()]
A B C
2 1 2 10
4 2 4 4
df.loc[df.groupby('A').B.idxmin()].reset_index(drop=True)
A B C
0 1 2 10
1 2 4 4
Had a similar situation but with a more complex column heading (e.g. "B val") in which case this is needed:
df.loc[df.groupby('A')['B val'].idxmin()]
The accepted answer (suggesting idxmin) cannot be used with the pipe pattern. A pipe-friendly alternative is to first sort values and then use groupby with DataFrame.head:
data.sort_values('B').groupby('A').apply(DataFrame.head, n=1)
This is possible because by default groupby preserves the order of rows within each group, which is stable and documented behaviour (see pandas.DataFrame.groupby).
This approach has additional benefits:
it can be easily expanded to select n rows with smallest values in specific column
it can break ties by providing another column (as a list) to .sort_values(), e.g.:
data.sort_values(['final_score', 'midterm_score']).groupby('year').apply(DataFrame.head, n=1)
As with other answers, to exactly match the result desired in the question .reset_index(drop=True) is needed, making the final snippet:
df.sort_values('B').groupby('A').apply(DataFrame.head, n=1).reset_index(drop=True)
I found an answer a little bit more wordy, but a lot more efficient:
This is the example dataset:
data = pd.DataFrame({'A': [1,1,1,2,2,2], 'B':[4,5,2,7,4,6], 'C':[3,4,10,2,4,6]})
data
Out:
A B C
0 1 4 3
1 1 5 4
2 1 2 10
3 2 7 2
4 2 4 4
5 2 6 6
First we will get the min values on a Series from a groupby operation:
min_value = data.groupby('A').B.min()
min_value
Out:
A
1 2
2 4
Name: B, dtype: int64
Then, we merge this series result on the original data frame
data = data.merge(min_value, on='A',suffixes=('', '_min'))
data
Out:
A B C B_min
0 1 4 3 2
1 1 5 4 2
2 1 2 10 2
3 2 7 2 4
4 2 4 4 4
5 2 6 6 4
Finally, we get only the lines where B is equal to B_min and drop B_min since we don't need it anymore.
data = data[data.B==data.B_min].drop('B_min', axis=1)
data
Out:
A B C
2 1 2 10
4 2 4 4
I have tested it on very large datasets and this was the only way I could make it work in a reasonable time.
You can sort_values and drop_duplicates:
df.sort_values('B').drop_duplicates('A')
Output:
A B C
2 1 2 10
4 2 4 4
The solution is, as written before ;
df.loc[df.groupby('A')['B'].idxmin()]
If the solution but then if you get an error;
"Passing list-likes to .loc or [] with any missing labels is no longer supported.
The following labels were missing: Float64Index([nan], dtype='float64').
See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"
In my case, there were 'NaN' values at column B. So, I used 'dropna()' then it worked.
df.loc[df.groupby('A')['B'].idxmin().dropna()]
You can also boolean indexing the rows where B column is minimal value
out = df[df['B'] == df.groupby('A')['B'].transform('min')]
print(out)
A B C
2 1 2 10
4 2 4 4
1. The Problem
Given a positive integer n. Print the pattern as shown in sample outputs.
A code has already been provided. You have to understand the logic of the code on your own and try and make changes to the code so that it gives correct output.
1.1 The Specifics
Input: A positive integer n, 1<= n <=9
Output: Pattern as shown in examples below
Sample input:
4
Sample output:
4444444
4333334
4322234
4321234
4322234
4333334
4444444
Sample input:
5
Sample output:
555555555
544444445
543333345
543222345
543212345
543222345
543333345
544444445
555555555
2. My Answer
2.1 My Code
n=int(input())
answer=[[1]]
for i in range(2, n+1):
t=[i]*((2*i)-3)
answer.insert(0, t)
answer.append(t)
for a in answer:
a.insert(0,i)
a.append(i)
print(answer)
outlst = [' '.join([str(c) for c in lst]) for lst in answer]
for a in outlst:
print(a)
2.2 My Output
Input: 4
4 4 4 4 4 4 4 4 4
4 4 3 3 3 3 3 3 3 4 4
4 4 3 3 2 2 2 2 2 3 3 4 4
4 3 2 1 2 3 4
4 4 3 3 2 2 2 2 2 3 3 4 4
4 4 3 3 3 3 3 3 3 4 4
4 4 4 4 4 4 4 4 4
2.3 Desired Output
4444444
4333334
4322234
4321234
4322234
4333334
4444444
Your answer isn't as expected because you add the same object t to the answer list twice:
answer.insert(0, t)
answer.append(t)
More specifically, when you assign t = [i]*(2*i - 3), a new data structure is created, [i, ..., i], and t just points to that data structure. Then you put the pointer t in the answer list twice.
In the for a in answer loop, when you use a.insert(0, i) and a.append(i), you update the data structure a is pointing to. Since you call insert(0, i) and append(i) on both pointers that point to the same data structure, you effectively insert and append i to that data structure twice. That's why you end up with more digits than you need.
Instead, you could run the loop for a in answer for only the top half of the rows in the answer list (and the middle row that has was created without a pair). E.g. for a in answer[:(len(answer)+1)/2].
Other things you could do:
using literals as the arguments instead of reusing the reference, e.g. append([i]*(2*i-3)). The literal expression will create a new data structure every time.
using a copy in one of the calls, e.g. append(t.copy()). The copy method creates a new list object with a "shallow" copy of the data structure.
Also, your output digits are space-separated, because you used a non-empty string in ' '.join(...). You should use the empty string: ''.join(...).
n=5
answer=[[1]]
for i in range(2, n+1):
t=[i]*((2*i)-3)
answer.insert(0, t)
answer.append(t.copy())
for a in answer:
a.insert(0,i)
a.append(i)
answerfinal=[]
for a in answer:
answerfinal.append(str(a).replace(' ','').replace(',','').replace(']','').replace('[',''))
for a in answerfinal:
print(a)
n = int(input())
for i in range(1,n*2):
for j in range(1,n*2):
if i <= j<=n*2-i: print(n-i+1,end='')
elif i>n and i>=j >= n*2 -i : print(i-n+1,end='')
elif j<=n: print(n-j+1,end="")
else: print(j-n+1,end='')
print()
n = int(input())
k = 2*n - 1
for i in range(k):
for j in range(k):
a = i if i<j else j
a = a if a<k-i else k-i-1
a = a if a<k-j else k-j-1
print(n-a, end = '')
print()
This question already has answers here:
Split a Pandas column of lists into multiple columns
(11 answers)
How to convert string representation of list to a list
(19 answers)
Closed 3 years ago.
I've a data frame and one of its columns contains a list.
A B
0 5 [3, 4]
1 4 [1, 1]
2 1 [7, 7]
3 3 [0, 2]
4 5 [3, 3]
5 4 [2, 2]
The output should look like this:
A x y
0 5 3 4
1 4 1 1
2 1 7 7
3 3 0 2
4 5 3 3
5 4 2 2
I have tried these options that I found here but its not working.
df = pd.DataFrame(data={"A":[0,1],
"B":[[3,4],[1,1]]})
df['x'] = df['B'].apply(lambda x:x[0])
df['y'] = df['B'].apply(lambda x:x[1])
df.drop(['B'],axis=1,inplace=True)
A x y
0 0 3 4
1 1 1 1
Incase the list is stored as string
from ast import literal_eval
df = pd.DataFrame(data={"A":[0,1],
"B":['[3,4]','[1,1]']})
df['x'] = df['B'].apply(lambda x:literal_eval(x)[0])
df['y'] = df['B'].apply(lambda x:literal_eval(x)[1])
df.drop(['B'],axis=1,inplace=True)
3rd way credit goes to #anky_91
df = pd.DataFrame(data={"A":[0,1],
"B":['[3,4]','[1,1]']})
df["B"] = df["B"].apply(lambda x :literal_eval(x))
df[['A']].join(pd.DataFrame(df["B"].values.tolist(),columns=['x','y'],index=df.index))
df.drop(["B"],axis=1,inplace=True)
I am analyzing a dataset that has an Origin ID (Column A), a Destination ID (Column B), and how many trips have happened between them (Column Count). Now I want to sum the A-B trips with the B-A trips. This sum is the total number of trips between A and B.
Here is how my data looks like (it is not necessarily ordered in the same way):
In [1]: group_station = pd.DataFrame([[1, 2, 100], [2, 1, 200], [4, 6, 5] , [6, 4, 10], [1, 4, 70]], columns=['A', 'B', 'Count'])
Out[2]:
A B Count
0 1 2 100
1 2 1 200
2 4 6 5
3 6 4 10
4 1 4 70
And I want the following output:
A B C
0 1 2 300
1 4 6 15
4 1 4 70
I have tried groupby and setting the index to both variables with no success. Right now I am doing a very inefficient double loop, that is too slow for the size of my dataset.
If it helps this is the code for the double loop (I removed some efficiency modifications to make it more clear):
# group_station is the dataframe
collapsed_group_station = np.zeros(len(group_station), 3))
for i, row in enumerate(group_station.iterrows()):
start_id = row[0][0]
end_id = row[0][1]
count = row[1][0]
for check_row in group_station.iterrows():
check_start_id = check_row[0][0]
check_end_id = check_row[0][1]
check_time = check_row[1][0]
if start_id == check_end_id and end_id == check_start_id:
new_group_station[i][0] = start_id
new_group_station[i][1] = end_id
new_group_station[i][2] = time + check_time
break
I have ideas of how to make this code more efficient, but I wanted to know if there is a way of doing it without looping.
You can using np.sort with groupby.sum()
import numpy as np; import pandas as pd
group_station[['A','B']]=np.sort(group_station[['A','B']],axis=1)
group_station.groupby(['A','B'],as_index=False).Count.sum()
Out[175]:
A B Count
0 1 2 300
1 1 4 70
2 4 6 15
How can I select all rows of a data frame where a condition is met according to a column, which has to do with the relationship between every 2 entries of that column. To give the specific example, lets say I have a DataFrame:
>>>df = pd.DataFrame({'A': [ 1, 2, 3, 4],
'B':['spam', 'ham', 'egg', 'foo'],
'C':[4, 5, 3, 4]})
>>> df
A B C
0 1 spam 4
1 2 ham 5
2 3 egg 3
3 4 foo 4
>>>df2 = df[ return every row of df where C[i] > C[i-1] ]
>>> df2
A B C
1 2 ham 5
3 4 foo 4
There is plenty of great information about slicing and indexing in the pandas docs and here, but this is a bit more complicated, I think. I could also be going about it wrong. What I'm looking for is the rows of data where the value stored in C is no longer monotonously declining.
Any help is appreciated!
Use boolean indexing with compare by shifted column values:
print (df[df['C'] > df['C'].shift()])
A B C
1 2 ham 5
3 4 foo 4
Detail:
print (df['C'] > df['C'].shift())
0 False
1 True
2 False
3 True
Name: C, dtype: bool
If want all monotonously declining rows compare diff of column:
print (df[df['C'].diff() > 0])
A B C
1 2 ham 5
3 4 foo 4