Convert a string within a list to an element in the list in python - python-3.x

I am using python data to create a ReportLab report. I have a list that looks like this:
mylist = [['a b c d e f'],['g h i j k l']]
and want to convert it to look like this:
mylist2 = [[a,b,c,d,e],[g,h,i,j,k,l]]
the first list gives me a "List out of index" error when building the report.
the second list works in ReportLab, but columns and formatting in this list aren't what I want.
What is the best method to convert mylist 1 to mylist2 in python?

string to list can be done using split() method.
try mylist[1][0].split() and mylist[0][0].split()

Borrowing idea from Jibin Mathews, I tried the following
new_list = [mylist[0][0].split(), mylist[1][0].split()]
and it prints
[['a', 'b', 'c', 'd', 'e', 'f'], ['g', 'h', 'i', 'j', 'k', 'l']]
I saw 'f' is missing in your final list. Is that the mistake?

mylist = [['a b c d e f'],['g h i j k l']]
import re
space_re = re.compile(r'\s+')
output = []
for l in mylist:
element = l[0]
le = re.split(space_re, element)
output.append(le)
This not best answer but it will work fine.!

Related

Is there a way to split strings inside a list?

I am trying to split strings inside a list but I could not find any solution on the internet. This is a sample, but it should help you guys understand my problem.
array=['a','b;','c','d)','void','plasma']
for i in array:
print(i.split())
My desired output should look like this:
output: ['a','b',';','c','d',')','void','plasma']
One approach uses re.findall on each starting list term along with a list comprehension to flatten the resulting 2D list:
inp = ['a', 'b;', 'c', 'd)', 'void', 'plasma']
output = [j for sub in [re.findall(r'\w+|\W+', x) for x in inp] for j in sub]
print(output) # ['a', 'b', ';', 'c', 'd', ')', 'void', 'plasma']

Remove redundant sublists within list in python

Hello everyone I have a list of lists values such as :
list_of_values=[['A','B'],['A','B','C'],['D','E'],['A','C'],['I','J','K','L','M'],['J','M']]
and I would like to keep within that list, only the lists where I have the highest amount of values.
For instance in sublist1 : ['A','B'] A and B are also present in the sublist2 ['A','B','C'], so I remove the sublist1.
The same for sublist4.
the sublist6 is also removed because J and M were present in a the longer sublist5.
at the end I should get:
list_of_no_redundant_values=[['A','B','C'],['D','E'],['I','J','K','L','M']]
other exemple =
list_of_values=[['A','B'],['A','B','C'],['B','E'],['A','C'],['I','J','K','L','M'],['J','M']]
expected output :
[['A','B','C'],['B','E'],['I','J','K','L','M']]
Does someone have an idea ?
mylist=[['A','B'],['A','C'],['A','B','C'],['D','E'],['I','J','K','L','M'],['J','M']]
def remove_subsets(lists):
outlists = lists[:]
for s1 in lists:
for s2 in lists:
if set(s1).issubset(set(s2)) and (s1 is not s2):
outlists.remove(s1)
break
return outlists
print(remove_subsets(mylist))
This should result in [['A', 'B', 'C'], ['D', 'E'], ['I', 'J', 'K', 'L', 'M']]

How to replace list element with string without double quotes?

I have a simple list like this:
list = ['A','B','C']
and to replace element #1 of l1 with this string:
str = "'W','T'"
I'm doing like this:
>>> list[1] = str
>>> list
['A', "'W','T'", 'C']
How can I do to replace list[1] values with str content without the double quotes? like this:
['A','W','T','C']
You can't insert it directly like that. You need to clean it first and convert it to a list and use slicing.
list1 = ['A','B','C']
str1 = "'W','T'"
new_list = [a.strip("'") for a in str1.split(",")]
list1 = list1[:1] + new_list + list1[2:]
print(list1) # ['A', 'W', 'T', 'C']
I also modified your variable names because list and str are reserved keywords.
You have to put the elements of the string into a list and then add the list elements to the desired position in your destination list.
list_ = ['A', 'B', 'C']
string = "'W','T'"
formatted = [letter for letter in string if letter.isalpha()]
i = 1 #index of element to replace
list_[i:(i + len(formatted))-1] = formatted
print(list_)
list[item] = str(list[item])
>>> mylist = [1,3,5]
>>> mylist[0] = str(mylist[0])
>>> mylist
['1', 3, 5]
>>>

Retrieving repeated items in a list comprehension

Given a sorted list, I would like to retrieve the first repeated item in the list using list comprehension.
So I ran the line below:
list=['1', '2', '3', 'a', 'a', 'b', 'c']
print(k for k in list if k==k+1)
I expected the output "a". But instead I got:
<generator object <genexpr> at 0x0021AB30>
I'm pretty new at this, would someone be willing to clarify why this doesn't work?
You seem to confuse the notion of list element and index.
For example the generator expression iterating over all items of list xs equal to its predecessor would look like this:
g = (xs[k] for k in range(1, len(xs)) if xs[k] == xs[k - 1])
Since you are interested only in first such item, you could write
next(xs[k] for k in range(1, len(xs)) if xs[k] == xs[k - 1])
however you'll get an exception if there is in fact no such items.
As a general advice, prefer simple readable functions over clever long one-liners,
especially when you are new to language. Your task could be accomplished as follows:
def first_duplicate(xs):
for k in range(1, len(xs)):
if xs[k] == xs[k - 1]:
return xs[k]
chars = ['1', '2', '3', 'a', 'a', 'b', 'c']
print(first_duplicate(chars)) # 'a'
P.S. Beware using list as your variable name -- you're shadowing built-in type
If you want just the first repeated item in the list you can use the next function with a generator expression that iterates through the list zipped with itself but with an offset of 1 to compare adjacent items:
next(a for a, b in zip(lst, lst[1:]) if a == b)
so that given lst = ['1', '2', '3', 'a', 'a', 'b', 'c'], the above returns: 'a'.

Getting all str type elements in a pd.DataFrame

Based on my little knowledge on pandas,pandas.Series.str.contains can search a specific str in pd.Series. But what if the dataframe is large and I just want to glance all kinds of str element in it before I do anything?
Example like this:
pd.DataFrame({'x1':[1,2,3,'+'],'x2':[2,'a','c','this is']})
x1 x2
0 1 2
1 2 a
2 3 c
3 + this is
I need a function to return ['+','a','c','this is']
If you are looking strictly at what are string values and performance is not a concern, then this is a very simple answer.
df.where(df.applymap(type).eq(str)).stack().tolist()
['a', 'c', '+', 'this is']
There are 2 possible ways - check numeric values saved as strings or not.
Check difference:
df = pd.DataFrame({'x1':[1,'2.78','3','+'],'x2':[2.8,'a','c','this is'], 'x3':[1,4,5,4]})
print (df)
x1 x2 x3
0 1 2.8 1
1 2.78 a 4 <-2.78 is float saved as string
2 3 c 5 <-3 is int saved as string
3 + this is 4
#flatten all values
ar = df.values.ravel()
#errors='coerce' parameter in pd.to_numeric return NaNs for non numeric
L = np.unique(ar[np.isnan(pd.to_numeric(ar, errors='coerce'))]).tolist()
print (L)
['+', 'a', 'c', 'this is']
Another solution is use custom function for check if possible convert to floats:
def is_not_float_try(str):
try:
float(str)
return False
except ValueError:
return True
s = df.stack()
L = s[s.apply(is_not_float_try)].unique().tolist()
print (L)
['a', 'c', '+', 'this is']
If need all values saved as strings use isinstance:
s = df.stack()
L = s[s.apply(lambda x: isinstance(x, str))].unique().tolist()
print (L)
['2.78', 'a', '3', 'c', '+', 'this is']
You can using str.isdigit with unstack
df[df.apply(lambda x : x.str.isdigit()).eq(0)].unstack().dropna().tolist()
Out[242]: ['+', 'a', 'c', 'this is']
Using regular expressions and set union, could try something like
>>> set.union(*[set(df[c][~df[c].str.findall('[^\d]+').isnull()].unique()) for c in df.columns])
{'+', 'a', 'c', 'this is'}
If you use a regular expression for a number in general, you could omit floating point numbers as well.

Resources