Nested comprehension - inner loop inside an if loop - python-3.x

How do I exclude multiple parameters while inside a list comprehension?
I have a simple expression:
data = [x for x in blob if e not in x.thing]
However instead of one string 'e' I would like to test multiple strings
So a bit of pseudocode:
exclude = ['tom', 'dick', 'harry', ....]
data = [x for x in blob if <any of the values of exclude> not in x.thing]
I don't know the length or values of exclude so I can't do
[x for x in blob if e1 not in x.thing else x for x in blob if e2 not in x.thing else ... ]

Try this, just copy paste this line:
[x for x in blob if all(e not in x.thing for e in exclude)]
for example :
Input :
exclude = ['tom', 'dick', 'harry']
blob=['franceharry','germany','gerrytom']
[x for x in blob if all(e not in x for e in exclude)]
Output :
['germany']

Related

Split list python list of number into multiple lists

I have the following python list:
x = ["56843", "84631", "13831"]
And I want to get the following list:
x = [ ["5","6","8","4","3"] , ["8","4","6","3","1"] ]
How can I achieve this?
I have tried with split, but I need a separator that I don't know which is
Just this:
x = [list(y) for y in x]
list(y) which y is a string will convert the string into a list of characters.
x = [[ch for ch in s] for s in x]
x = ["56843","84631","13831"]
c = []
for i in x:
c.append(list(i))
print(c)

how do i write multiple function outputs to single csv file

i am scraping multiple websites so i am using one function for each website script, so each function returns 4 values, i want to print them in dataframe and write them in csv but i am facing this problem, i may be asking something too odd or basic but please help
Either i will have to write whole script in one block and that will look very nasty to handle so if i could find a way around, this is just a sample of problem i am facing..
def a1(x):
z=x+1
r = x+2
print(z, r)
def a2(x):
y=x+4
t=x+3
print(y, t)
x = 2
a1(x)
a2(x)
3 4
6 5
data = pd.Dataframe({'first' : [z],
'second' : [r],
'third' : [y],
'fourth' : [t]
})`
data
*error 'z' is not defined*
You may find it convenient to write functions that return a list of dicts.
For example:
rows = [dict(a=1, b=2, c=3),
dict(a=4, b=5, c=6)]
df = pd.DataFrame(rows)
The variables are only defined in the local scope of your functions, you'd either need to declare them globally or - the better way - return them so you can use them outside of the function by assigning the return values to new variables
import pandas as pd
def a1(x):
z = x+1
r = x+2
return (z, r)
def a2(x):
y = x+4
t = x+3
return (y, t)
x = 2
z, r = a1(x)
y, t = a2(x)
data = pd.DataFrame({'first' : [z],
'second' : [r],
'third' : [y],
'fourth' : [t]
})

Why does Python 3 print statement appear to alter a variable, declared later in the code, but works fine without it?

I am running Python 3.6.2 on Windows 10 and was learning about the zip() function.
I wanted to print part of the object returned by the zip() function.
Here is my code, without the troublesome print statement:
a = ("John", "Charles", "Mike")
b = ("Jenny", "Christy", "Monica", "Vicky")
x = zip(a, b)
tup = tuple(x)
print(tup)
print(type(tup))
print(len(tup))
print(tup[1])
Here is my code with the troublesome print statement:
a = ("John", "Charles", "Mike")
b = ("Jenny", "Christy", "Monica", "Vicky")
x = zip(a, b)
print(tuple(x)[1])
tup = tuple(x)
print(tup)
print(type(tup))
print(len(tup))
print(tup[1])
The print(tuple(x)[1]) statement appears to change the tuple 'tup' into a zero-length one and causes the print(tup[1]) to fail later in the code!
In this line, you create an iterator:
x = zip(a, b)
Within the print statement, you convert the iterator to a tuple. This tuple has 3 elements. This exhausts the iterator and anytime you call it afterwards, it will return no further elements.
Therefore, upon your creation of tup, your iterator does not return an element. Hence, you have a tuple with length 0. And of course, this will raise an exception when you try to access the element with index 1.
For testing, consider this:
a = ("John", "Charles", "Mike")
b = ("Jenny", "Christy", "Monica", "Vicky")
x = zip(a, b)
tup1 = tuple(x)
tup2 = tuple(x)
print(tup1)
print(tup2)
It will give you the following result:
(('John', 'Jenny'), ('Charles', 'Christy'), ('Mike', 'Monica'))
()
This is basically what you do when creating a tuple out of an iterator twice.

how to create a dataset with only two columns starting from a dictionary with an ID and list of values.

I'm trying to figure out how to create a dataset where the first column consists of the 'ID' from the dictionary and the second column of the value from the list of the dictionary so that I can plot this with seaborn.
di = {'a' : [1,4,5], 'b' : [1,8],'c' : [56,100,5,568],'d' : [20,10,2],'e' : [1000,3,675]}
I would thus want somthing like this:
ID Value
a 1
a 4
a 5
b 1
b 8
c 56
c 100
and so on..
For now I only have this piece of code which separates my ID and my value but still keeps my value as a list and not as the above given example of the result that I search for.
serie = pd.Series(di)
df = pd.DataFrame({'ID':serie.index, 'Value':serie.values})
Help would be greatly appreciated.
Thanks in advance!
You can either structure the dictionary into a records-like list (basically a list of lists) to pass to the pd.DataFrame.from_records function:
lol = [list(zip([x]*len(y), y)) for x, y in di.items()]
df = pd.DataFrame.from_records([x for y in lol for x in y], columns=['ID', 'Value'])
sns.swarmplot(x="ID", y="Value", data=df)
Or, you can use pd.Series and unwrap the lists inside the rows with the following:
df = pd.Series(di).apply(pd.Series).stack()
df = df.reset_index(level=0).‌​rename(columns=lambd‌​a x: 'ID' if x else 'Value')
sns.swarmplot(x="ID", y="Value", data=df)
Either should get you what you need.

How to use re.compile within a for loop to extract substring indices

I have a list of data from which I need to extract the indices of some strings within that list:
str=['cat','monkey']
list=['a cat','a dog','a cow','a lot of monkeys']
I've been using re.compile to match (even partial match) individual elements of the str list to the list:
regex=re.compile(".*(monkey).*")
b=[m.group(0) for l in list for m in [regex.search(l)] if m]
>>> list.index(b[0])
3
However, when I try to iterate over the str list to find the indices of those elements, I obtain empty lists:
>>> for i in str:
... regex=re.compile(".*(i).*")
... b=[m.group(0) for l in list for m in [regex.search(l)] if m]
... print(b)
...
[]
[]
I imagine that the problem is with regex=re.compile(".*(i).*"), but I don't know how to pass the ith element as a string.
Any suggestion is very welcome, thanks!!
It looks like you need to use string formatting.
for i in str:
match_pattern = ".*({}).*".format(i)
regex = re.compile(match_pattern)
b = [m.group(0) for l in list for m in [regex.search(l)] if m]
print(b)

Resources