python3 - check element is actually in list - python-3.x

for example i have excel header list like this
excel_headers = [
'Name',
'Age',
'Sex',
]
and i have another list to check againt it.
headers = {'Name' : 1, 'Age': 2, 'Sex': 3, 'Whatever': 4}
i dont care if headers have whatever elements, i care only element in headers has excel_headers element.
WHAT I've TRIED
lst = all(headers[idx][0] == header for idx,
header in enumerate(excel_headers))
print(lst)
however it always return False.
any help? pleasse

Another way to do it using sets would be to use set difference:
excel_headers = ['Name', 'Age', 'Sex']
headers = {'Name' : 1, 'Age': 2, 'Sex': 3, 'Whatever': 4}
diff = set(excel_headers) - set(headers)
hasAll = len(diff) == 0 # len 0 means every value in excel_headers is present in headers
print(diff) #this will give you unmatched elements

Just sort your list, the results shows you a before and after
excel_headers = [
'Name',
'Age',
'Sex',
]
headers = ['Age' , 'Name', 'Sex']
if excel_headers==headers: print "YES!"
else: print "NO!"
excel_headers.sort()
headers.sort()
if excel_headers==headers: print "YES!"
else: print "NO!"
Output:
No!
Yes!

Tip: this is a good use case for a set, since you're looking up elements by value to see if they exist. However, for small lists (<100 elements) the difference in performance isn't really noticeable, and using a list is fine.
excel_headers = ['Name', 'Age', 'Sex']
headers = {'Name' : 1, 'Age': 2, 'Sex': 3, 'Whatever': 4}
result = all(element in headers for element in excel_headers)
print(result) # --> True

Related

Python nested dictionary Issue when iterating

I have 5 list of words, which basically act as values in a dictionary where the keys are the IDs of the documents.
For each document, I would like to apply some calculations and display the values and results of the calculation in a nested dictionary.
So far so good, I managed to do everything but I am failing in the easiest part.
When showing the resulting nested dictionary, it seems it's only iterating over the last element of each of the 5 lists, and therefore not showing all the elements...
Could anybody explain me where I am failing??
This is the original dictionary data_docs:
{'doc01': ['simpl', 'hello', 'world', 'test', 'python', 'code'],
'doc02': ['today', 'wonder', 'day'],
'doc03': ['studi', 'pac', 'today'],
'doc04': ['write', 'need', 'cup', 'coffe'],
'doc05': ['finish', 'pac', 'use', 'python']}
This is the result I am getting (missing 'simpl','hello', 'world', 'test', 'python' in doc01 as example):
{'doc01': {'code': 0.6989700043360189},
'doc02': {'day': 0.6989700043360189},
'doc03': {'today': 0.3979400086720376},
'doc04': {'coffe': 0.6989700043360189},
'doc05': {'python': 0.3979400086720376}}
And this is the code:
def tfidf (data, idf_score): #function, 2 dictionaries as parameters
tfidf = {} #dict for output
for word, val in data.items(): #for each word and value in data_docs(first dict)
for v in val: #for each value in each list
a = val.count(v) #count the number of times that appears in that list
scores = {v :a * idf_score[v]} # dictionary that will act as value in the nested
tfidf[word] = scores #final dictionary, the key is doc01,doc02... and the value the above dict
return tfidf
tfidf(data_docs, idf_score)
Thanks,
Did you mean to do this?
def tfidf(data, idf_score): # function, 2 dictionaries as parameters
tfidf = {} # dict for output
for word, val in data.items(): # for each word and value in data_docs(first dict)
scores = {} # <---- a new dict for each outer iteration
for v in val: # for each value in each list
a = val.count(v) # count the number of times that appears in that list
scores[v] = a * idf_score[v] # <---- keep adding items to the dictionary
tfidf[word] = scores # final dictionary, the key is doc01,doc02... and the value the above dict
return tfidf
... see my changes with <----- arrow :)
Returns:
{'doc01': {'simpl': 1,
'hello': 1,
'world': 1,
'test': 1,
'python': 1,
'code': 1},
'doc02': {'today': 1, 'wonder': 1, 'day': 1},
'doc03': {'studi': 1, 'pac': 1, 'today': 1},
'doc04': {'write': 1, 'need': 1, 'cup': 1, 'coffe': 1},
'doc05': {'finish': 1, 'pac': 1, 'use': 1, 'python': 1}}

create new dictionary based on keys and split the dictionary values

I am relatively new to python programming. I was trying some challenges in online to thorough my programming skills. I got stuck with the below code. Please someone help here.
ress = {'product': ['Mountain Dew Spark', 'pepsi'], 'quantity': ['7', '5']}
prods_list = []
prods_dict = {}
for k , v in ress.items():
if "product" in k:
if len(ress['product']) > 1:
entity_names = {}
entity_list = []
for i in range(len(ress['product'])):
prod = "product_" + str(i)
entity_names['product'] = ress['product'][i]
entity_names['quantity'] = ress['quantity'][i]
entity_list.append(entity_names)
prods_dict[prod] = entity_list
prods_list.append(prods_dict)
print(prods_list)
i am expecting output as below
Expected output:
[{"product_0":
{"quantity" : "7",
"product" : "mountain dew spark"}
},
{"product_1" : {
"quantity" : "5",
"product" : "pepsi"
}}]
Actual output:
[{'product_0': [{'product': 'pepsi', 'quantity': '5'},
{'product': 'pepsi', 'quantity': '5'}],
'product_1': [{'product': 'pepsi', 'quantity': '5'},
{'product': 'pepsi', 'quantity': '5'}]}]
Please note i want my code work for single values as well like ress = {'product': ['Mountain Dew Spark'], 'quantity': ['7']}
This is one way you can achieve it with regular loops:
ress = {'product': ['Mountain Dew Spark', 'pepsi'], 'quantity': ['7', '5']}
prods_list = []
for key, value in ress.items():
for ind, el in enumerate(value):
prod_num = 'product_' + str(ind)
# If this element is already present
if (len(prods_list) >= ind + 1):
# Add to existing dict
prods_list[ind][prod_num][key] = el
else:
# Otherwise - create a new dict
prods_list.append({ prod_num : { key : el } })
print(prods_list)
The first loop goes through the input dictionary, the second one through each of its lists. The code then determines if a dictionary for that product is already in the output list by checking the output list length. If it is, the code simply appends new inner dict for that product. If it is not - the code creates an outer dict for that product - and an inner one for this particular value set.
Maybe using a list comprehension along with enumerate and zip might be easier:
>>> res = {'product': ['Mountain Dew Spark', 'pepsi'], 'quantity': ['7', '5']}
>>> prods_list = [
... {f'product_{i}': {'quantity': int(q), 'product': p.lower()}}
... for i, (q, p) in enumerate(zip(res['quantity'], res['product']))
... ]
>>> prods_list
[{'product_0': {'quantity': 7, 'product': 'mountain dew spark'}}, {'product_1': {'quantity': 5, 'product': 'pepsi'}}]
This assumes that there will be no duplicate product entries. In that case, you would need to use a traditional for loop.

Python: Convert 2d list to dictionary with indexes as values

I have a 2d list with arbitrary strings like this:
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
I want to create a dictionary out of this:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
How do I do this? This answer answers for 1D list for non-repeated values, but, I have a 2d list and values can repeat. Is there a generic way of doing this?
Maybe you could use two for-loops:
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
d = {}
overall_idx = 0
for sub_lst in lst:
for word in sub_lst:
if word not in d:
d[word] = overall_idx
# Increment overall_idx below if you want to only increment if word is not previously seen
# overall_idx += 1
overall_idx += 1
print(d)
Output:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
You could first convert the list of lists to a list using a 'double' list comprehension.
Next, get rid of all the duplicates using a dictionary comprehension, we could use set for that but would lose the order.
Finally use another dictionary comprehension to get the desired result.
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
# flatten list of lists to a list
flat_list = [item for sublist in lst for item in sublist]
# remove duplicates
ordered_set = {x:0 for x in flat_list}.keys()
# create required output
the_dictionary = {v:i for i, v in enumerate(ordered_set)}
print(the_dictionary)
""" OUTPUT
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
"""
also, with collections and itertools:
import itertools
from collections import OrderedDict
lstdict={}
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
lstkeys = list(OrderedDict(zip(itertools.chain(*lst), itertools.repeat(None))))
lstdict = {lstkeys[i]: i for i in range(0, len(lstkeys))}
lstdict
output:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}

extract dictionary elements from nested list in python

I have a question.
I have a nested list that looks like this.
x= [[{'screen_name': 'BreitbartNews',
'name': 'Breitbart News',
'id': 457984599,
'id_str': '457984599',
'indices': [126, 140]}],
[],
[],
[{'screen_name': 'BreitbartNews',
'name': 'Breitbart News',
'id': 457984599,
'id_str': '457984599',
'indices': [98, 112]}],
[{'screen_name': 'BreitbartNews',
'name': 'Breitbart News',
'id': 457984599,
'id_str': '457984599',
'indices': [82, 96]}]]
There are some empty lists inside the main list.
What I am trying to do is to extract screen_name and append them as a new list including the empty ones (maybe noting them as 'null').
y=[]
for i in x :
for j in i :
if len(j)==0 :
n = 'null'
else :
n = j['screen_name']
y.append(n)
I don't know why the code above outputs a list,
['BreitbartNews',
'BreitbartNews',
'BreitbartNews',
'BreitbartNews',
'BreitbartNews']
which don't reflect the empty sublist.
Can anyone help me how I can refine my code to make it right?
You are checking the lengths of the wrong lists. Your empty lists are in the i variables.
The correct code would be
y=[]
for i in x :
if len(i) == 0:
n = 'null'
else:
n = i[0]['screen_name']
y.append(n)
It may help to print(i) in each iteration to better understand what is actually happening.

How to substring the column name in python

I have a column named 'comment1abc'
I am writing a piece of code where I want to see that if a column contains certain string 'abc'
df['col1'].str.contains('abc') == True
Now, instead of hard coding 'abc', I want to use a substring like operation on column 'comment1abc' (to be precise, column name, not the column values)so that I can get the 'abc' part out of it. For example below code does a similar job
x = 'comment1abc'
x[8:11]
But how do I implement that for a column name ? I tried below code but its not working.
for col in ['comment1abc']:
df['col123'].str.contains('col.names[8:11]')
Any suggestion will be helpful.
Sample dataframe:
f = {'name': ['john', 'tom', None, 'rock', 'dick'], 'DoB': [None, '01/02/2012', '11/22/2014', '11/22/2014', '09/25/2016'], 'location': ['NY', 'NJ', 'PA', 'NY', None], 'code': ['abc1xtr', '778abc4', 'a2bcx98', None, 'ab786c3'], 'comment1abc': ['99', '99', '99', '99', '99'], 'comment2abc': ['99', '99', '99', '99', '99']}
df1 = pd.DataFrame(data = f)
and sample code:
for col in ['comment1abc', 'comment2abc']:
df1[col][df1['code'].str.contains('col.names[8:11]') == True] = '1'
I think the answer would be simple like this:
for col in ['comment1abc', 'comment2abc']:
x = col[8:11]
df1[col][df1['code'].str.contains('x') == True] = '1'
Trying to use a column name within .str.contains() wasn't a good idea. Better use a string.

Resources