String formatting - python-3.x

I have a list filter = ['a', 'b', 'c']. I need to frame the following string out of the list
"item -a item -b item -c". Which is the most efficient way to do this? Usually the list filter contains 100 to 200 items and each would be of length 100 - 150. Wouldn't that lead to overflow? And what is the maximum length of the string supported?

A cleaner way to do this:
filter = ['a', 'b', 'c']
" ".join(["item -%s" % val for val in filter])
This works fine with large arrays, eg. filter = ['a'*1000] * 1000.

You can use join (I believe join is the same in Python 3.0):
>>> l = ['a','b','c']
>>> print ' item -'.join([''] + l)
>>> ' item -a item -b item -c'
>>> print ' item -'.join([''] + l).lstrip(' ') # eat the leading space
>>> 'item -a item -b item -c'

Again with join
But with f-string, this would be:
filter = ['a', 'b', 'c']
print(" ".join([f"item -{val}" for val in filter]))
And as mentioned, avoid using keywords to name your vars.

Related

Karate - To find the occurrence of element in a list and print the number of times its present in the list

In my case
list A = [a,a,a,b,b,c]
I have to find the occurrence of the elements available in the list and print their counts
For example print as a=3, b =2 and c =1
Just use JavaScript. The filter() operation is perfect for this:
* def data = ['a', 'c', 'b', 'c', 'c', 'd']
* def count = data.filter(x => x == 'c').length
* assert count == 3
Further reading: https://github.com/karatelabs/karate#json-transforms

Check if element is occurring very first time in python list

I have a list with values occurring multiple times. I want to loop over the list and check if value is occurring very first time.
For eg: Let's say I have a one list like ,
L = ['a','a','a','b','b','b','b','b','e','e','e'.......]
Now, at every first occurrence of element, I want to perform some set of tasks.
How to get the first occurrence of element?
Thanks in Advance!!
Use a set to check if you had processed that item already:
visited = set()
L = ['a','a','a','b','b','b','b','b','e','e','e'.......]
for e in L:
if e not in visited:
visited.add(e)
# process first time tasks
else:
# process not first time tasks
You can use unique_everseen from itertools recipes.
This function returns a generator which yield only the first occurence of an element.
Code
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Example
lst = ['a', 'a', 'b', 'c', 'b']
for x in unique_everseen(lst):
print(x) # Do something with the element
Output
a
b
c
The function unique_everseen also allows to pass a key for comparison of elements. This is useful in many cases, by example if you also need to know the position of each first occurence.
Example
lst = ['a', 'a', 'b', 'c', 'b']
for i, x in unique_everseen(enumerate(lst), key=lambda x: x[1]):
print(i, x)
Output
0 a
2 b
3 c
Why not using that?
L = ['a','a','a','b','b','b','b','b','e','e','e'.......]
for idxL, L_idx in enumerate(L):
if (L.index(L_idx) == idxL):
print("This is first occurence")
For very long lists, it is less efficient than building a set prior to the loop, but seems more direct to write.

Convert a string within a list to an element in the list in python

I am using python data to create a ReportLab report. I have a list that looks like this:
mylist = [['a b c d e f'],['g h i j k l']]
and want to convert it to look like this:
mylist2 = [[a,b,c,d,e],[g,h,i,j,k,l]]
the first list gives me a "List out of index" error when building the report.
the second list works in ReportLab, but columns and formatting in this list aren't what I want.
What is the best method to convert mylist 1 to mylist2 in python?
string to list can be done using split() method.
try mylist[1][0].split() and mylist[0][0].split()
Borrowing idea from Jibin Mathews, I tried the following
new_list = [mylist[0][0].split(), mylist[1][0].split()]
and it prints
[['a', 'b', 'c', 'd', 'e', 'f'], ['g', 'h', 'i', 'j', 'k', 'l']]
I saw 'f' is missing in your final list. Is that the mistake?
mylist = [['a b c d e f'],['g h i j k l']]
import re
space_re = re.compile(r'\s+')
output = []
for l in mylist:
element = l[0]
le = re.split(space_re, element)
output.append(le)
This not best answer but it will work fine.!

Getting all str type elements in a pd.DataFrame

Based on my little knowledge on pandas,pandas.Series.str.contains can search a specific str in pd.Series. But what if the dataframe is large and I just want to glance all kinds of str element in it before I do anything?
Example like this:
pd.DataFrame({'x1':[1,2,3,'+'],'x2':[2,'a','c','this is']})
x1 x2
0 1 2
1 2 a
2 3 c
3 + this is
I need a function to return ['+','a','c','this is']
If you are looking strictly at what are string values and performance is not a concern, then this is a very simple answer.
df.where(df.applymap(type).eq(str)).stack().tolist()
['a', 'c', '+', 'this is']
There are 2 possible ways - check numeric values saved as strings or not.
Check difference:
df = pd.DataFrame({'x1':[1,'2.78','3','+'],'x2':[2.8,'a','c','this is'], 'x3':[1,4,5,4]})
print (df)
x1 x2 x3
0 1 2.8 1
1 2.78 a 4 <-2.78 is float saved as string
2 3 c 5 <-3 is int saved as string
3 + this is 4
#flatten all values
ar = df.values.ravel()
#errors='coerce' parameter in pd.to_numeric return NaNs for non numeric
L = np.unique(ar[np.isnan(pd.to_numeric(ar, errors='coerce'))]).tolist()
print (L)
['+', 'a', 'c', 'this is']
Another solution is use custom function for check if possible convert to floats:
def is_not_float_try(str):
try:
float(str)
return False
except ValueError:
return True
s = df.stack()
L = s[s.apply(is_not_float_try)].unique().tolist()
print (L)
['a', 'c', '+', 'this is']
If need all values saved as strings use isinstance:
s = df.stack()
L = s[s.apply(lambda x: isinstance(x, str))].unique().tolist()
print (L)
['2.78', 'a', '3', 'c', '+', 'this is']
You can using str.isdigit with unstack
df[df.apply(lambda x : x.str.isdigit()).eq(0)].unstack().dropna().tolist()
Out[242]: ['+', 'a', 'c', 'this is']
Using regular expressions and set union, could try something like
>>> set.union(*[set(df[c][~df[c].str.findall('[^\d]+').isnull()].unique()) for c in df.columns])
{'+', 'a', 'c', 'this is'}
If you use a regular expression for a number in general, you could omit floating point numbers as well.

find elements in lists using For Loop

keys = ['a','H','c','D','m','l']
values = ['a','c','H','D']
category = []
for index, i in enumerate(keys):
for j in values:
if j in i:
category.append(j)
break
if index == len(category):
category.append("other")
print(category)
My expected output is ['a', 'H', 'c', 'D', 'other', 'other']
But i am getting ['a', 'other', 'H', 'c', 'D', 'other']
EDIT: OP edited his question multiple times.
Python documentation break statement:
It terminates the nearest enclosing loop.
You break out of the outer loop using the "break" statement. The execution never even reaches the inner while loop.
Now.. To solve your problem of categorising strings:
xs = ['Am sleeping', 'He is walking','John is eating']
ys = ['walking','eating','sleeping']
categories = []
for x in xs:
for y in ys:
if y in x:
categories.append(y)
break
categories.append("other")
print(categories) # ['sleeping', 'walking', 'eating']
Iterate over both lists and check if any categories match. If they do append to the categories list and continue with the next string to categorise. If didn't find any matching category (defined by the count of matched categories being less than the current index (index is 0 based, so they are shifted by 1, which means == is less than in this case) then categorise as "other.

Resources