np.where in pandas, checking for empty lists - python-3.x

I have a DataFrame like this:
df = pd.DataFrame({'var1':['a','b','c'],
'var2':[[],[1,2,3],[2,3,4]]})
I would like to create a third column which gives the value in var1 if the corresponding list in var2 is empty, and the first element of the list in var2 otherwise. So my intended result is:
target = pd.DataFrame({'var1':['a','b','c'],
'var2':[[],[1,2,3],[2,3,4]],
'var3':['a',1,2]})
I've tried using np.where like this:
df['var3'] = np.where(len(df['var2'])>0 , df['var2'][0], df['var1'])
But it seems to be checking the length of the whole column rather than the length of the list within each row of the column. How can I get it to apply the condition to each row?
I have the same problem when I use bool(df['var2']) as my condition.

Let's use .str accessors and len:
df['var'] = np.where(df.var2.str.len() > 0, df.var2.str[0], df.var1)
Output:
var1 var2 var
0 a [] a
1 b [1, 2, 3] 1
2 c [2, 3, 4] 2

You could use a list comprehension:
v3 = [row['var1'] if len(row['var2'])==0 else row['var2'][0]
for i, row in df.iterrows()]
df['var3']=v3
Alternatively, you could use apply instead of where, to apply it to the whole dataframe:
First you need a function to use in apply
def f(row):
if len(row['var2'])==0:
return row['var1']
else:
return row['var2'][0]
Then apply it:
df['var3']= df.apply(f,axis=1)

It sounds like a post digging, but i would prefer use np.where because of vectorization than list comprehension (too time costy) or apply. A lot of online tutorial deeply explain the mechanism like here.

Related

How to subtract adjacent items in list with unknown length (python)?

Provided with a list of lists. Here's an example myList =[[70,83,90],[19,25,30]], return a list of lists which contains the difference between the elements. An example of the result would be[[13,7],[6,5]]. The absolute value of (70-83), (83-90), (19-25), and (25-30) is what is returned. I'm not sure how to iterate through the list to subtract adjacent elements without already knowing the length of the list. So far I have just separated the list of lists into two separate lists.
list_one = myList[0]
list_two = myList[1]
Please let me know what you would recommend, thank you!
A custom generator can return two adjacent items at a time from a sequence without knowing the length:
def two(sequence):
i = iter(sequence)
a = next(i)
for b in i:
yield a,b
a = b
original = [[70,83,90],[19,25,30]]
result = [[abs(a-b) for a,b in two(sequence)]
for sequence in original]
print(result)
[[13, 7], [6, 5]]
Well, for each list, you can simply get its number of elements like this:
res = []
for my_list in list_of_lists:
res.append([])
for i in range(len(my_list) - 1):
# Do some stuff
You can then add the results you want to res[-1].

Is there a python function to get all indexes from unique values?

I know there are methods like set() or np.unqiue() to get unique values from lists. But I search for a way to get the index for the value which occurs not more than one time.
example = [0,1,1,2,3,3,4]
what I looking for is
desired_index_list = [0,3,6]
Any suggestions?
Don't know of any prebuilt solution, probably you need to create your own. There are different approaches for that, but with classical Python implementation, you can easily create a count_dict and filter those values from the original list that have count of 1.
>>> from collections import Counter
>>> example = [0,1,1,2,3,3,4]
>>> counted = Counter(example)
>>> desired_index_list = [index for index, elem in enumerate(example) if counted[elem] == 1]
>>> desired_index_list
[0, 3, 6]
You can do this as a one-liner with a list comprehension:
from collections import Counter
[example.index(x) for x, y in Counter(example).items() if y == 1]
(Using Counter, return tuples for each item (x) and its number of occurrence (y), and return the index of the item if it's count is 1).

How can I append a different element for each list in a column in pandas?

I have a dataframe, df, with lists in a specific column, col_a. For example,
df = pd.DataFrame()
df['col_a'] = [[1,2,3], [3,4], [5,6,7]]
I want to use conditions on these lists and apply specific modifications, including appends. For example, imagine that if the length of the list is > 2, I want to append another element, which is the sum of the last two elements of the current list. So, considering the first list above, I have [1, 2, 3] and I want to have [1, 2, 3, 5].
What I tried to do was:
df.loc[:, col_a] = df[col_a].apply(
lambda value: value.append(value[-2]+value[-1])
if len(value) > 1 else value)
But the result in that column is None for all the elements of the column.
Can someone help me, please?
Thank you very much in advance.
The issue is that append is an in place function and returns None. You need to add two lists together. So a working example with dummy variable would be:
df = pd.DataFrame({'cola':[[1,2],[2,3,4]], 'dum':[1,2]})
df['cola']=df.cola.apply(lambda x: (x+[sum(x[-2:])] if len(x)>2 else x))
If you want to use append try this:
def my_logic_for_list(values):
if len(values) > 2:
return values + [values[-2]+values[-1]]
return values
df['new_a'] = df['a'].apply(my_logic_for_list)
You can not use append inside lambda function.

If condition working differently for same value in python

I am trying to write a function which will return True or False if the given number is not greater than 2.
So simple, but the if condition is returning different outputs for same value '2'. The code I used is:
The code I used is:
ele_list = [1,2,3,2]
for i in ele_list:
if not i>2:
print(i,False)
ele_list.remove(i)
print(ele_list)
The ouput I am receiving is:
1 False
[2, 3, 2]
2 False
[3, 2]
I am confused to see that the first 2 in the list is passing through the if condition but the second 2 in the list is not passing through the condition. Please help me figure out this..
Removing elements from the list you're looping over is generally a bad idea.
What's happening here is that when you're removing an element, you're changing the length of the array, and therefor changing what elements are located at what indexes as well as changing the "goal" of the forloop.
Lets have a look at the following example:
ele_list = [4,3,2,1]
for elem in ele_list:
print(elem)
ele_list.remove(elem)
In the first iteration of the loop elem is the value 4 which is located at index 0. Then you're removing from the array the first value equal to elem. In other words the value 4 at index 0 is now removed. This shifts which element is stored at what index. Before the removal ele_list[0] would be equal to 4, however after the removal ele_list[0] will equal 3, since 3 is the value that prior to the removal was stored at index 1.
Now when the loop continues to the second iteration the index that the loop "looks at" is incremented by 1. So the variable elem will now be the value of ele_list[1] which in the updated list (after the removal of the value 4 in the previous iteration) is equal to 2. Then you're (same as before) removing the value at index 1 from the list, so now the length of the list just 2 elements.
When the loops is about to start the third iteration it checks to see if the new index (in this case 2) is smaller than the length of the list. Which its not, since 2 is not smaller than 2. So the loop ends.
The simplest solutions is to create a new copy of the array and loop over the copy instead. This can easily be done using the slice syntax: ele_list[:]
ele_list = [1,2,3,2]
for elem in ele_list[:]:
if not elem > 2:
print(elem, False)
ele_list.remove(elem)
print(ele_list)
the problem is that you're modifying your list as you're iterating over it, as mentioned in #Olian04's answer.
it sounds like what you really want to do, however, is only keep values that are > 2. this is really easy using a list comprehension:
filtereds_vals = [v for v in ele_list if v > 2]
if you merely want a function that gives you True for numbers greater than 2 and False for others, you can do something like this:
def gt_2(lst):
return [v > 2 for v in lst]
or, finally, if you want to find out if any of the values is > 2 just do:
def any_gt_2(lst):
return any(v > 2 for v in lst)
I think the problem here is how the remove function interacts with the for function.
See the documentation, read the "note" part:
https://docs.python.org/3.7/reference/compound_stmts.html?highlight=while#grammar-token-for-stmt
This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence
A possible solution, as suggested into the documentation:
ele_list = [1,2,3,2]
for i in ele_list[:]:
if not i>2:
print(i,False)
ele_list.remove(i)
print(ele_list)
"""
1 False
[2, 3, 2]
2 False
[3, 2]
2 False
[3]
"""

How can i convert many variable to int in one line

I started to learn Python a few days ago.
I know that I can convert variables into int, such as x = int (x)
but when I have 5 variables, for example, is there a better way to convert these variables in one line? In my code, I have 2 variables, but what if I have 5 or more variables to convert, I think there is a way
You for help
(Sorry for my English)
x,y=input().split()
y=int(y)
x=int(x)
print(x+y)
You could use something like this .
a,b,c,d=[ int(i) for i in input().split()]
Check this small example.
>>> values = [int(x) for x in input().split()]
1 2 3 4 5
>>> values
[1, 2, 3, 4, 5]
>>> values[0]
1
>>> values[1]
2
>>> values[2]
3
>>> values[3]
4
>>> values[4]
5
You have to enter value separated with spaces. Then it convert to integer and save into list. As a beginner you won't understand what the List Comprehensions is. This is what documentation mention about it.
List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.
So the extracted version of [int(x) for x in input().split()] is similar to below function,
>>> values = []
>>> input_values = input().split()
1 2 3 4 5
>>> for val in input_values:
... values.append(int(val))
...
>>> values
[1, 2, 3, 4, 5]
You don't need to create multiple variables to save your values, as this example all the values are saved in values list. So you can access the first element by values[0] (0th element is the first value). When the number of input values are large, let's say 100, you have to create 100 variables to save it. But you can access 100th value by values[99].
This will work with any number of values:
# Split the input and convert each value to int
valuesAsInt = [int(x) for x in input().split()]
# Print the sum of those values
print(sum(valuesAsInt))
The first line is a list comprehension, which is a handy way to map each value in a list to another value. Here you're mapping each string x to int(x), leaving you with a list of integers.
In the second line, sum() sums the whole array, simple as that.
There is one easy way of converting multiple variables into integer in python:
right, left, top, bottom = int(right), int(left), int(top), int(bottom)
You could use the map function.
x, y = map(int, input().split())
print x + y
if the input was:
1 2
the output would be:
3
You could also use tuple unpacking:
x, y = input().split()
x, y = int(x), int(y)
I hope this helped you, have a nice day!

Resources