openpyxl : Update multiple columns & rows from dictionary - python-3.x

I have a nested Dictionary
aDictionary = {'Asset': {'Name': 'Max', 'Age': 28, 'Job': 'Nil'}, 'Parameter': {'Marks': 60, 'Height': 177, 'Weight': 76}}
I want to update the values in an excel as follows
|Asset |Name |Max|
|Asset |Age |28 |
|Asset |Job |Nil|
|Parameter|Marks |60 |
|Parameter|Height|177|
|Parameter|Weight|76 |
I tried something like this, but result is not what I was expecting. Am pretty new to openpyxl. I can't seem to wrap my head around it.
from openpyxl import *
workbook=load_workbook('Empty.xlsx')
worksheet= workbook['Sheet1']
for m in range(1,7):
for i in aDictionary:
worksheet["A"+str(m)].value=i
for j, k in aDictionary[i].items():
worksheet["B"+str(m)].value=j
worksheet["C"+str(m)].value=k
workbook.save('Empty.xlsx')

One way to do this is to convert the Dictionary to a DataFrame and stack it the way you indicated, rearrange the columns and then load it into Excel. I've used pandas to_excel as it is a single line of code. But, you can use load_workbook() as well...
Stacking part was borrowed from here
Code
aDictionary = {'Asset': {'Name': 'Max', 'Age': 28, 'Job': 'Nil'}, 'Parameter': {'Marks': 60, 'Height': 177, 'Weight': 76}}
df = pd.DataFrame(aDictionary) # Convert to dataframe
df = df.stack().reset_index() # Stack
# Rearrange columns to the way you want it
cols = df.columns.tolist()
cols = list(df.columns.values)
cols[0], cols[1] = cols[1], cols[0]
df = df[cols]
#Write to Excel
df.to_excel('Empty.xlsx', sheet_name='Sheet1', index=False, header=None)
Output in Excel

Related

How to extract the specific part of text file in python?

I have big data as shown in the uploaded pic, it has 90 BAND-INDEX and each BAND-INDEX has 300 rows.
I want to search the text file for a specific value like -24.83271 and extract the BAND-INDEX containing that value in an array form. Can you please write the code to do so? Thank you in advance
I am unable to extract the specific BAND-INDEX in array form.
Try reading the file line by line and using a generator. Here is an example:
import csv
import pandas as pd
# generate and save demo csv
pd.DataFrame({
'Band-Index': (0.01, 0.02, 0.03, 0.04, 0.05, 0.06),
'value': (1, 2, 3, 4, 5, 6),
}).to_csv('example.csv', index=False)
def search_values_in_file(search_values: list):
with open('example.csv') as csvfile:
reader = csv.reader(csvfile)
reader.__next__() # skip header
for row in reader:
band_index, value = row
if value in search_values:
yield row
# get lines from csv where value in ['4', '6']
df = pd.DataFrame(list(search_values_in_file(['4', '6'])), columns=['Band-Index', 'value'])
print(df)
# Band-Index value
# 0 0.04 4
# 1 0.06 6

Question regarding converting one dictionary to csv fil

I am new to Python and using pandas.
I am trying to convert a data in dictionary to csv file.
Here is the Dictionary
data_new = {'bedrooms': 2.0, 'bathrooms': 3.0, 'sqft_living': 1200,
'sqft_lot': 5800, 'floors': 2.0,
'waterfront': 1, 'view': 1, 'condition': 2, 'sqft_above': 1200,
'sqft_basement': 20,
'yr_built': 1925, 'yr_renovated': 2003, 'city': "Shoreline"}
And I use the below method to save and read the dictionary as csv file
with open('test.csv', 'w') as f:
for key in data_new:
f.write("%s,%s\n" % (key, data_new[key]))
df1 = pd.read_csv("test.csv")
df1
And when I read df1 I get it in the below format
but I want all rows to be columns so I used transpose function as below
However from the above output you see bathrooms is index 0 but I want index to start from bedrooms because with the below output if I try tdf1.info() I do not see bedroom data at all.
Could you please guide me how I can fix this?
Regards
Aravind Viswanathan
I think it would be easier to just use pandas to both write and read your csv file. Does this satisfy what you're trying to do?
import pandas as pd
data_new = {'bedrooms': 2.0, 'bathrooms': 3.0, 'sqft_living': 1200,
'sqft_lot': 5800, 'floors': 2.0,
'waterfront': 1, 'view': 1, 'condition': 2, 'sqft_above': 1200,
'sqft_basement': 20,
'yr_built': 1925, 'yr_renovated': 2003, 'city': "Shoreline"}
df1 = pd.DataFrame.from_dict([data_new])
df1.to_csv('test.csv', index=None) # index=None prevents index being added as column 1
df2 = pd.read_csv('test.csv')
print(df1)
print(df2)
Output:
bedrooms bathrooms sqft_living ... yr_built yr_renovated city
0 2.0 3.0 1200 ... 1925 2003 Shoreline
[1 rows x 13 columns]
bedrooms bathrooms sqft_living ... yr_built yr_renovated city
0 2.0 3.0 1200 ... 1925 2003 Shoreline
[1 rows x 13 columns]
Identical.

Pandas Plot Bar Fixed Range Missing Values

I'm plotting a bar chart with data that I have in a pandas.DataFrame. My code is as follows
import pandas as pd
import matplotlib.pyplot as plot
from datetime import datetime
start_year = 2000
date_range = [ i + start_year for i in range(datetime.today().year - start_year)]
data = pd.DataFrame([
[2015, 100], [2016, 110], [2017, 105], [2018, 109], [2019, 110], [2020, 116], [2021, 113]
], columns=["year", "value"])
chart = data.plot.bar(
x="year",
y="value",
# xticks=date_range # ,
xlim=[date_range[0], date_range[-1]]
)
plot.show()
The resulting plot is:
I have to plot several of these, for which data may start from 2000 and finish in 2010, then another dataframe that has data that starts in 2010 and ends in the current year.
In order to make these plots visually comparable, I would like for all to start at the same year, 2000 in this example, and finish the current year. If no value is present for a given year, then 0 can be used. In this case, as example, I've used the year 2000, but it could also start from the year 2005, 2006 or 2010.
How can I achieve what I'm looking for? I've tried setting xticks and xlim, but with xticks, the data gets skewed all towards one side, as if there were thousands of values in between. It is strange since I'm using int values.
Thanks
You can prepare your dataframe so that it has all years you want. right merge() to a dataframe that has all required years
data = pd.DataFrame([
[2015, 100], [2016, 110], [2017, 105], [2018, 109], [2019, 110], [2020, 116], [2021, 113]
], columns=["year", "value"])
# NB range is zero indexed, hence endyear + 1
data.merge(pd.DataFrame({"year":range(2010,2021+1)}), on="year", how="right").plot(kind="bar", x="year", y="value")

Ranking of single datapoint against reference dataset

I have the following hypothetical dataframe:
data = {'score1':[60, 30, 80, 120],
'score2':[20, 21, 19, 18],
'score3':[12, 43, 71, 90]}
# Create the pandas DataFrame
df = pd.DataFrame(data)
# calculating the ranks
df['score1_rank'] = df['score1'].rank(pct = True)
df['score2_rank'] = df['score2'].rank(pct = True)
df['score3_rank'] = df['score3'].rank(pct = True)
I then have individual datapoints I would like to to rank against the references, for example:
data_to_test = {'score1':[12],
'score2':[4],
'score3':[6]}
How could I compare these new values against this reference?
Thank you for any help!

How to get the first column (index) in the dictionary output with Pandas?

I have not used pandas before but it looks like it could be a really nice tool for data manipulation. I am using python 3.7 and pandas 1.2.3.
I am passing a list of dictionaries to the dataframe that has 2 pieces to it. A sample of the dictionary would look like this:
data = [
{"Knowledge_Base_Link__c": null, "ClosedDate": "2021-01-06T19:02:14.000+0000"},
{"Knowledge_Base_Link__c": "http://someurl.com", "ClosedDate": "2021-01-08T21:26:49.000+0000"},
{"Knowledge_Base_Link__c": "http://someotherurl.com", "ClosedDate": "2021-01-09T20:35:58.000+0000"}
]
df = pd.DataFrame(data)
# Then I format the ClosedDate like so
df['ClosedDate'] = pd.to_datetime(df['ClosedDate'], format="%y-%m-%d", exact=False)
# Next i get a count of the data
articles = df.resample('M', on='ClosedDate').count()
# print the results to the console
print(articles)
These are the results and exactly what i want.
However, if i convert that to a list or when i push it to a dictionary to use the data like below, the first column (index i presume) is missing from the output.
articles_by_month = articles.to_dict('records')
This final output is almost what i want but it is missing the index column.
This is what i am getting:
[{'ClosedDate': 15, 'Knowledge_Base_Link__c': 5}, {'ClosedDate': 18, 'Knowledge_Base_Link__c': 11}, {'ClosedDate': 12, 'Knowledge_Base_Link__c': 6}]
This is what i want:
[{'Date': '2021-01-31', 'ClosedDate': 15, 'Knowledge_Base_Link__c': 5}, {'Date': '2021-02-28', 'ClosedDate': 18, 'Knowledge_Base_Link__c': 11}, {'Date': '2021-03-31', 'ClosedDate': 12, 'Knowledge_Base_Link__c': 6}]
Couple things i have tried:
df.reset_index(level=0, inplace=True)
# This just takes the sum and puts it in a column called index, not sure how to get date like it is displayed in the first column of the screenshot
# I also tried this
df['ClosedDate'] = df.index
# however this gives me a Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index' error.
I thought this would be simple and checked the pandas docs and many other stacked articles but i cannot find a way to do this. Any thoughts on this would be appreciated.
Thanks
You can get an additional key in the dict with
articles.reset_index().to_dict('records')
But BEFORE that you have to rename your index since ClosedDate (the index' name) is already a column:
articles.index = articles.index.rename('Date')

Resources