R: Deleting elements from a vector based on element length - string

How can I delete elements from a vector of strings depending on the number of characters or length of the strings?
df <- c("asdf","fweafewwf","af","","","aewfawefwef","awefWEfawefawef")
> df
[1] "asdf" "fweafewwf" "af" "" "" "aewfawefwef" "awefWEfawefawef"
For example, I may want to delete all elements of df with a length smaller than 5, so the output would be:
> df
[1]"fweafewwf" "aewfawefwef" "awefWEfawefawef"
Thanks!

Just use nchar:
> df[nchar(df) > 5]
[1] "fweafewwf" "aewfawefwef" "awefWEfawefawef"

Since nchar works weird with NA's:
nchar(NA)
## [1] 2
I recommend to use stri_length function from stringi package
require(stringi)
df[stri_length(df)>5]

Related

How to modify list of lists using str.format() so floats are 3 decimals and other data types remain the same

I have a list of lists that is very big, and looks like this:
list_of_lists = [[0,'pan', 17.892, 4.6555], [4, 'dogs', 19.2324, 1.4564], ...]
I need to modify it using the str.format() so the floats go to 3 decimal places and the rest of the data stays in its correct format. I also need to add a tab between each list entry so it looks organized and somewhat like this:
0 'pan' 17.892 4.655
4 'dogs' 19.232 1.456
...
And so on.
My problem is that I keep getting the error in my for loop and gow to fix it.
for x in list_of_lists:
print ("{:.2f}".format(x))
TypeError: unsupported format string passed to list.__format__
In your loop you are iterating through a nested list. This means that x is also a list itself, and not a valid argument to the format() function.
If the number of elements in the inner lists are small and it makes sense in the context of the problem, you can simply list all of them as arguments:
list_of_lists = [[0,'pan', 17.892, 4.6555], [4, 'dogs', 19.2324, 1.4564]]
for x in list_of_lists:
print ("{:d}\t{:s}\t{:.3f}\t{:.3f}".format(x[0], x[1], x[2], x[3]))
These are now tab delimited, and the floats have three decimal places.
for x in list_of_lists:
print ("{:.2f}".format(x))
This is only looping over the top level array - not the elements inside - therefore you are getting the error.
Try addressing the element individually
# build manually, join with tab-char and print on loop
for i in s:
result = []
result.append( f'{i[0]}' )
result.append( f'{i[1]}' )
result.append( f'{i[2]:.3f}' )
result.append( f'{i[3]:.3f}' )
print( '\t'.join(result) )
# build in one line and print
for i in s:
print( f'{i[0]}\t\'{i[1]}\'\t{i[2]:.3f}\t{i[3]:.3f}'
Or as a list comprehension
# build whole line from list comprehension, join on new-line chat
result = [f'{i[0]}\t\'{i[1]}\'\t{i[2]:.3f}\t{i[3]:.3f}' for i in s]
result = '\n'.join(result)
print(result
# all in one line
print( '\n'.join([f'{i[0]}\t\'{i[1]}\'\t{i[2]:.3f}\t{i[3]:.3f}' for i in s]))

Summation two by two of the elements of an array

I have a an array of 40 000 elements and I would like to add the elements two by two so that I can reduce the elements to 20 000. How can I do that. Thanks
The easiest way is probably iterate over a range of "every other index" - in this case, that would be for i in range(0, len(my_list), 2), which would produce [0, 2, 4, ..., 39996, 39998]. From there, we just add the contents at each index to the contents of the index following it.
import random
my_list = random.choices(range(100), k=40000)
print(len(my_list))
# 40000
new_list = [my_list[i] + my_list[i + 1] for i in range(0, len(my_list), 2)]
print(len(new_list))
# 20000
Another way that's a bit less efficient but doesn't have a risk of IndexErroring if the list has an odd number of elements, is to zip() two copies of the collection, using the same pattern in our slices to select only the even [::2] or odd [1::2] elements:
new_list = [even + odd for (even, odd) in zip(my_list[::2], my_list[1::2])]

How can I append a different element for each list in a column in pandas?

I have a dataframe, df, with lists in a specific column, col_a. For example,
df = pd.DataFrame()
df['col_a'] = [[1,2,3], [3,4], [5,6,7]]
I want to use conditions on these lists and apply specific modifications, including appends. For example, imagine that if the length of the list is > 2, I want to append another element, which is the sum of the last two elements of the current list. So, considering the first list above, I have [1, 2, 3] and I want to have [1, 2, 3, 5].
What I tried to do was:
df.loc[:, col_a] = df[col_a].apply(
lambda value: value.append(value[-2]+value[-1])
if len(value) > 1 else value)
But the result in that column is None for all the elements of the column.
Can someone help me, please?
Thank you very much in advance.
The issue is that append is an in place function and returns None. You need to add two lists together. So a working example with dummy variable would be:
df = pd.DataFrame({'cola':[[1,2],[2,3,4]], 'dum':[1,2]})
df['cola']=df.cola.apply(lambda x: (x+[sum(x[-2:])] if len(x)>2 else x))
If you want to use append try this:
def my_logic_for_list(values):
if len(values) > 2:
return values + [values[-2]+values[-1]]
return values
df['new_a'] = df['a'].apply(my_logic_for_list)
You can not use append inside lambda function.

How can i convert many variable to int in one line

I started to learn Python a few days ago.
I know that I can convert variables into int, such as x = int (x)
but when I have 5 variables, for example, is there a better way to convert these variables in one line? In my code, I have 2 variables, but what if I have 5 or more variables to convert, I think there is a way
You for help
(Sorry for my English)
x,y=input().split()
y=int(y)
x=int(x)
print(x+y)
You could use something like this .
a,b,c,d=[ int(i) for i in input().split()]
Check this small example.
>>> values = [int(x) for x in input().split()]
1 2 3 4 5
>>> values
[1, 2, 3, 4, 5]
>>> values[0]
1
>>> values[1]
2
>>> values[2]
3
>>> values[3]
4
>>> values[4]
5
You have to enter value separated with spaces. Then it convert to integer and save into list. As a beginner you won't understand what the List Comprehensions is. This is what documentation mention about it.
List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.
So the extracted version of [int(x) for x in input().split()] is similar to below function,
>>> values = []
>>> input_values = input().split()
1 2 3 4 5
>>> for val in input_values:
... values.append(int(val))
...
>>> values
[1, 2, 3, 4, 5]
You don't need to create multiple variables to save your values, as this example all the values are saved in values list. So you can access the first element by values[0] (0th element is the first value). When the number of input values are large, let's say 100, you have to create 100 variables to save it. But you can access 100th value by values[99].
This will work with any number of values:
# Split the input and convert each value to int
valuesAsInt = [int(x) for x in input().split()]
# Print the sum of those values
print(sum(valuesAsInt))
The first line is a list comprehension, which is a handy way to map each value in a list to another value. Here you're mapping each string x to int(x), leaving you with a list of integers.
In the second line, sum() sums the whole array, simple as that.
There is one easy way of converting multiple variables into integer in python:
right, left, top, bottom = int(right), int(left), int(top), int(bottom)
You could use the map function.
x, y = map(int, input().split())
print x + y
if the input was:
1 2
the output would be:
3
You could also use tuple unpacking:
x, y = input().split()
x, y = int(x), int(y)
I hope this helped you, have a nice day!

np.where in pandas, checking for empty lists

I have a DataFrame like this:
df = pd.DataFrame({'var1':['a','b','c'],
'var2':[[],[1,2,3],[2,3,4]]})
I would like to create a third column which gives the value in var1 if the corresponding list in var2 is empty, and the first element of the list in var2 otherwise. So my intended result is:
target = pd.DataFrame({'var1':['a','b','c'],
'var2':[[],[1,2,3],[2,3,4]],
'var3':['a',1,2]})
I've tried using np.where like this:
df['var3'] = np.where(len(df['var2'])>0 , df['var2'][0], df['var1'])
But it seems to be checking the length of the whole column rather than the length of the list within each row of the column. How can I get it to apply the condition to each row?
I have the same problem when I use bool(df['var2']) as my condition.
Let's use .str accessors and len:
df['var'] = np.where(df.var2.str.len() > 0, df.var2.str[0], df.var1)
Output:
var1 var2 var
0 a [] a
1 b [1, 2, 3] 1
2 c [2, 3, 4] 2
You could use a list comprehension:
v3 = [row['var1'] if len(row['var2'])==0 else row['var2'][0]
for i, row in df.iterrows()]
df['var3']=v3
Alternatively, you could use apply instead of where, to apply it to the whole dataframe:
First you need a function to use in apply
def f(row):
if len(row['var2'])==0:
return row['var1']
else:
return row['var2'][0]
Then apply it:
df['var3']= df.apply(f,axis=1)
It sounds like a post digging, but i would prefer use np.where because of vectorization than list comprehension (too time costy) or apply. A lot of online tutorial deeply explain the mechanism like here.

Resources