Converting an Array into a Dictionary With Python - python-3.x

I am trying to scrape a table and convert it into a dictionary using the TH as the key and the td as the value.
Below is the code to grab the TD and the TH
for row in rows:
td = row.find_all('td')
th = row.find_all('th')
row2 = [i.text.replace("\n", "").strip() for i in td]
print(row2)
['', '90315', 'Printmaking I', 'S1', '01(REG-HR)', 'Faletto, Liana', '445',
'LS']
print(headers)
#['Class', 'Description', 'Term', 'Schedule', 'Primary Staff > Name', 'Clssrm', 'Name']
How do I convert the output into (delete the first blank array item)
thisdict = {
"class": "90315",
"description": "Printmaking I",
"term": "S1"
}

To delete first element of row2: (>>> is python prompt)
>>> del row2[0]
>>> row2
['90315', 'Printmaking I', 'S1', '01(REG-HR)', 'Faletto, Liana', '445', 'LS']
#!/usr/bin/python
thisdict = {'class':"", 'description':"", 'term':"", ...}
thisdict['class']=row2[1]
thisdict['description']=row2[2]
thisdict['term']=row2[3]
...
or if your headers contain list of dictionary keys:
for i in range(len(headers)):
thisdict[headers[i]] = row2[i]
>>> thisdict
{'Term': 'S1', 'Description': 'Printmaking I', 'term': 'S1', 'class': '90315', 'Class': '90315', 'description': 'Printmaking I'}

Related

create new dictionary based on keys and split the dictionary values

I am relatively new to python programming. I was trying some challenges in online to thorough my programming skills. I got stuck with the below code. Please someone help here.
ress = {'product': ['Mountain Dew Spark', 'pepsi'], 'quantity': ['7', '5']}
prods_list = []
prods_dict = {}
for k , v in ress.items():
if "product" in k:
if len(ress['product']) > 1:
entity_names = {}
entity_list = []
for i in range(len(ress['product'])):
prod = "product_" + str(i)
entity_names['product'] = ress['product'][i]
entity_names['quantity'] = ress['quantity'][i]
entity_list.append(entity_names)
prods_dict[prod] = entity_list
prods_list.append(prods_dict)
print(prods_list)
i am expecting output as below
Expected output:
[{"product_0":
{"quantity" : "7",
"product" : "mountain dew spark"}
},
{"product_1" : {
"quantity" : "5",
"product" : "pepsi"
}}]
Actual output:
[{'product_0': [{'product': 'pepsi', 'quantity': '5'},
{'product': 'pepsi', 'quantity': '5'}],
'product_1': [{'product': 'pepsi', 'quantity': '5'},
{'product': 'pepsi', 'quantity': '5'}]}]
Please note i want my code work for single values as well like ress = {'product': ['Mountain Dew Spark'], 'quantity': ['7']}
This is one way you can achieve it with regular loops:
ress = {'product': ['Mountain Dew Spark', 'pepsi'], 'quantity': ['7', '5']}
prods_list = []
for key, value in ress.items():
for ind, el in enumerate(value):
prod_num = 'product_' + str(ind)
# If this element is already present
if (len(prods_list) >= ind + 1):
# Add to existing dict
prods_list[ind][prod_num][key] = el
else:
# Otherwise - create a new dict
prods_list.append({ prod_num : { key : el } })
print(prods_list)
The first loop goes through the input dictionary, the second one through each of its lists. The code then determines if a dictionary for that product is already in the output list by checking the output list length. If it is, the code simply appends new inner dict for that product. If it is not - the code creates an outer dict for that product - and an inner one for this particular value set.
Maybe using a list comprehension along with enumerate and zip might be easier:
>>> res = {'product': ['Mountain Dew Spark', 'pepsi'], 'quantity': ['7', '5']}
>>> prods_list = [
... {f'product_{i}': {'quantity': int(q), 'product': p.lower()}}
... for i, (q, p) in enumerate(zip(res['quantity'], res['product']))
... ]
>>> prods_list
[{'product_0': {'quantity': 7, 'product': 'mountain dew spark'}}, {'product_1': {'quantity': 5, 'product': 'pepsi'}}]
This assumes that there will be no duplicate product entries. In that case, you would need to use a traditional for loop.

Make key lowercase in List of Dictionaries (Python3)

Been looking through Stackoverflow and documentations for 2 days now, I am a beginner, and I just can't progress. I am using Python 3.8.
I have a list of dictionaries:
books = [{'Type': 'Book', 'Date': '2011', 'Publication Year': '2011', 'Place Published': 'New York', 'Publisher': 'Simon & Schuster', 'Author': 'Walter Isaacson', 'ISBN': '978-1-4516-4853-9', 'Title': 'Test Steve Jobs'}, {'Type': 'Book', 'Date': '2001', 'Publication Year': '2001', 'Place Published': 'Oxford', 'Publisher': 'Oxford University press', 'Author': 'Peter Hall', 'ISBN': '978-0-19-924775-2', 'Title': 'Test Varieties of capitalism: the institutional foundations of comparative advantage'}]
print(books)
I want to make the key "Type" into a lowercase "type".
But with the following List Comprehension it somehow makes the key to a value and vice versa.
lower_list = [ { v:k.lower() for k,v in d.items() } for d in books ]
print(lower_list)
I end up with [{'Book': 'type',.... when it should be [{'type': 'Book',....
I am struggling with understanding the list comprehension syntax still, so would be grateful for 1. somebody explaining what my list comprehension does in plain English and 2. how to change it to achieve what I am looking for. :)
Thank you!
So your first problem:
lower_list = [ { k.lower():v for k,v in d.items() } for d in books ] ?
You was inverting key and values.
Your last question how to skip lowercasing the ISBN key:
[ { k if k is "ISBN" else k.lower():v.lower() for k,v in d.items()} for d in books ]
But you should consider using a for loop: if your need more operations or conditions, it would start to be difficult to modify further.
my_final_books = []
for d in books:
for k,v in d.items():
if k is "ISBN":
key = k
else:
key = k.lower()
# or ternary form key = k if k is "ISBN" else k.lower()
my_final_books.append({key:v})
# do more logic here

Extracting Rows by specific keyword in Python (Without using Pandas)

My csv file looks like this:-
ID,Product,Price
1,Milk,20
2,Bottle,200
3,Mobile,258963
4,Milk,24
5,Mobile,10000
My code of extracting row is as follow :-
def search_data():
fin = open('Products/data.csv')
word = input() # "Milk"
found = {}
for line in fin:
if word in line:
found[word]=line
return found
search_data()
While I run this above code I got output as :-
{'Milk': '1,Milk ,20\n'}
I want If I search for "Milk" I will get all the rows which is having "Milk" as Product.
Note:- Do this in only Python don't use Pandas
Expected output should be like this:-
[{"ID": "1", "Product": "Milk ", "Price": "20"},{"ID": "4", "Product": "Milk ", "Price": "24"}]
Can anyone tell me where am I doing wrong ?
In your script every time you assign found[word]=line it will overwrite the value that was before it. Better approach is load all the data and then do filtering:
If file.csv contains:
ID Product Price
1 Milk 20
2 Bottle 200
3 Mobile 10,000
4 Milk 24
5 Mobile 15,000
Then this script:
#load data:
with open('file.csv', 'r') as f_in:
lines = [line.split() for line in map(str.strip, f_in) if line]
data = [dict(zip(lines[0], l)) for l in lines[1:]]
# print only items with 'Product': 'Milk'
print([i for i in data if i['Product'] == 'Milk'])
Prints only items with Product == Milk:
[{'ID': '1', 'Product': 'Milk', 'Price': '20'}, {'ID': '4', 'Product': 'Milk', 'Price': '24'}]
EDIT: If your data are separated by commas (,), you can use csv module to read it:
File.csv contains:
ID,Product,Price
1,Milk ,20
2,Bottle,200
3,Mobile,258963
4,Milk ,24
5,Mobile,10000
Then the script:
import csv
#load data:
with open('file.csv', 'r') as f_in:
csvreader = csv.reader(f_in, delimiter=',', quotechar='"')
lines = [line for line in csvreader if line]
data = [dict(zip(lines[0], l)) for l in lines[1:]]
# # print only items with 'Product': 'Milk'
print([i for i in data if i['Product'].strip() == 'Milk'])
Prints:
[{'ID': '1', 'Product': 'Milk ', 'Price': '20'}, {'ID': '4', 'Product': 'Milk ', 'Price': '24'}]

xlsxwriter - Conditional formatting based on column name of the dataframe

I have a dataframe as below. I want to apply conditional formatting on column "Data2" using the column name. I know how to define format for a specific column but I am not sure how to define it based on column name as shown below.
So basically I want to do the same formatting on column name(because the order of column might change)
df1 = pd.DataFrame({'Data1': [10, 20, 30],
'Data2': ["a", "b", "c"]})
writer = pd.ExcelWriter('pandas_filter.xlsx', engine='xlsxwriter', )
workbook = writer.book
df1.to_excel(writer, sheet_name='Sheet1', index=False)
worksheet = writer.sheets['Sheet1']
blue = workbook.add_format({'bg_color':'#000080', 'font_color': 'white'})
red = workbook.add_format({'bg_color':'#E52935', 'font_color': 'white'})
l = ['B2:B500']
for columns in l:
worksheet.conditional_format(columns, {'type': 'text',
'criteria': 'containing',
'value': 'a',
'format': blue})
worksheet.conditional_format(columns, {'type': 'text',
'criteria': 'containing',
'value': 'b',
'format': red})
writer.save()
using xlsxwriter with xl_col_to_name we can get the column name using the index.
from xlsxwriter.utility import xl_col_to_name
target_col = xl_col_to_name(df1.columns.get_loc("Data2"))
l = [f'{target_col}2:{target_col}500']
for columns in l:
using opnpyxl with get_column_letter we can get the column name using the index.
from openpyxl.utils import get_column_letter
target_col = get_column_letter(df1.columns.get_loc("Data2") + 1) # add 1 because get_column_letter index start from 1
l = [f'{target_col}2:{target_col}500']
for columns in l:
...

finding non matching records in pandas

I would like to identify if a set of records is not represented by a distinct list of values; so in this example of:
raw_data = {
'subject_id': ['1', '2', '3', '4', '5'],
'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches'],
'sport' : ['soccer','soccer','soccer','soccer','soccer']}
df_a = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name','sport'])
raw_data = {
'subject_id': ['9', '5', '6', '7', '8'],
'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan'],
'sport' : ['soccer','soccer','soccer','soccer','soccer']}
df_b = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name','sport'])
raw_data = {
'subject_id': ['9', '5', '6', '7'],
'first_name': ['Billy', 'Brian', 'Bran', 'Bryce'],
'last_name': ['Bonder', 'Black', 'Balwner', 'Brice'],
'sport' : ['football','football','football','football']}
df_c = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name','sport'])
raw_data = {
'subject_id': ['1', '3', '5'],
'first_name': ['Alex', 'Allen', 'Ayoung'],
'last_name': ['Anderson', 'Ali', 'Atiches'],
'sport' : ['football','football','football']}
df_d = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name','sport'])
frames = [df_a,df_b,df_c,df_d]
frame = pd.concat(frames)
frame = frame.sort_values(by='subject_id')
raw_data = {
'sport':['soccer','football','softball']
}
sportlist = pd.DataFrame(raw_data,columns=['sport'])
Desired output: I would like to get a list of first_name and last_name pairs that do not play football. And also I would like be able to return a list of all the records since softball is not represented in the original list.
I tried using merge with how= outer, indicator=True options but since there is a record that plays soccer there is a match. And the '_right_only' yields no records since it was not populated in the original data.
Thanks,
aem
If you only want to get the names of people who do not play football all you need to do is:
frame[frame.sport != 'football']
Which would select only those persons who are not playing football.
If it has to be a list you can further call to_records(index=False)
frame[frame.sport != 'football'][['first_name', 'last_name']].to_records(index=False)
which returns a list of tuples:
[('Alex', 'Anderson'), ('Amy', 'Ackerman'), ('Allen', 'Ali'),
('Alice', 'Aoni'), ('Brian', 'Black'), ('Ayoung', 'Atiches'),
('Bran', 'Balwner'), ('Bryce', 'Brice'), ('Betty', 'Btisan'),
('Billy', 'Bonder')]
You can also use .loc indexer in pandas
frame.loc[frame['sport'].ne('football'), ['first_name','last_name']].values.tolist()
[['Alex', 'Anderson'],
['Amy', 'Ackerman'],
['Allen', 'Ali'],
['Alice', 'Aoni'],
['Brian', 'Black'],
['Ayoung', 'Atiches'],
['Bran', 'Balwner'],
['Bryce', 'Brice'],
['Betty', 'Btisan'],
['Billy', 'Bonder']]

Resources