Massage csv dataframe into dictionary style - python-3.x

I have a pandas dataframe named tshirt_orders from an API call looking like this:
Alice, small, red
Alice, small, green
Bob, small, blue
Bob, small, orange
Cesar, medium, yellow
David, large, purple
How can I get this into a dictionary style format where I first go by size and have sub keys under for name and another sublist for color so that I can address it when iterating over by using tshirt_orders?
Like this:
size:
small:
Name:
Alice:
Color:
red
green
Bob:
Color:
blue
orange
medium:
Name:
Cesar:
Color:
yellow
large:
Name:
David:
Color:
purple
What would be the best solution to change this? It is in a pandas dataframe but changing that isn't a problem should there be better solutions.

The close is write DataFrame to yaml.
First create nested dictionaries in dict comprehension:
print (df)
A B C
0 Alice small red
1 Alice small green
2 Bob small blue
3 Bob small orange
4 Cesar medium yellow
5 David large purple
d = {k:v.groupby('A', sort=False)['C'].apply(list).to_dict()
for k, v in df.groupby('B', sort=False)}
print (d)
{'small': {'Alice': ['red', 'green'],
'Bob': ['blue', 'orange']},
'medium': {'Cesar': ['yellow']},
'large': {'David': ['purple']}}
Add size to dict for key and then write to yaml file:
import yaml
with open('result.yml', 'w') as yaml_file:
yaml.dump({'size': d}, yaml_file, default_flow_style=False, sort_keys=False)
size:
small:
Alice:
- red
- green
Bob:
- blue
- orange
medium:
Cesar:
- yellow
large:
David:
- purple
Or create json:
import json
with open("result.json", "w") as twitter_data_file:
json.dump({'size': d}, twitter_data_file, indent=4)
{
"size": {
"small": {
"Alice": [
"red",
"green"
],
"Bob": [
"blue",
"orange"
]
},
"medium": {
"Cesar": [
"yellow"
]
},
"large": {
"David": [
"purple"
]
}
}
}
EDIT:
df = df.assign(A1='Name', B1='size', C1='Color')
df1 = df.groupby(['B1','B','A1','A','C1'], sort=False)['C'].apply(list).reset_index()
#https://stackoverflow.com/a/19900276
def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0], sort=False)
d = {k: recur_dictify(g.iloc[:,1:]) for k,g in grouped}
return d
d = recur_dictify(df1)
print (d)
{'size': {'small': {'Name': {'Alice': {'Color': ['red', 'green']},
'Bob': {'Color': ['blue', 'orange']}}},
'medium': {'Name': {'Cesar': {'Color': ['yellow']}}},
'large': {'Name': {'David': {'Color': ['purple']}}}}}
import yaml
with open('result.yml', 'w') as yaml_file:
yaml.dump(d, yaml_file, default_flow_style=False, sort_keys=False)

Related

How to print whether each string in a list is in a pandas dataframe?

Given a list of search terms and a pandas dataframe, what is the most pythonic way to print whether the search term is present in the target dataframe?
search_terms = ["red", "blue", "green", "orange"]
input_df looks like...
color count
0 red 15
1 blue 39
2 yellow 40
3 green 21
I want to see...
red = true
blue = true
green = true
orange = false
I know how to filter the input_df to include only search_terms. This doesn't alert me to the fact that "orange" was not located in the input_df. The search_terms could contain hundreds or thousands of strings.
color = ['red', 'blue', 'yellow', 'green']
count = [15,39,40,21]
input_dict = dict(color=color, count=count)
input_df = pd.DataFrame(data=input_dict)
found_df = input_df[input_df['color'].isin(search_terms)]
You can try:
out = dict(zip(search_terms, pd.Series(search_terms).isin(input_df['color'])))
Or:
out = dict(zip(search_terms, np.isin(search_terms, input_df)) )
Output:
{'red': True, 'blue': True, 'green': True, 'orange': False}

Dash plotly overcoming duplicate callback

I have a dashboard for displaying historical data alongside forecasted values. I would like the user to be able to make edits to the forecasted values and update the graph. I am accomplishing this through an editable datatable. However I am unsure of how to update the scatter plot after getting user input on the editable datatable.
example data frame
item time_period number forecast
apple 1 5 0
apple 2 10 0
apple 3 8 0
apple 4 9 1
apple 5 12 1
orange 1 20 0
orange 2 46 0
orange 3 35 0
orange 4 32 1
orange 5 55 1
current code
import dash_core_components as dcc
import dash_html_components as html
import plotly.express as px
import pandas as pd
import dash_table
from dash.dependencies import Input, Output
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
raw_data = {"item": ["apple", "apple", "apple", "apple", "apple", "orange", "orange", "orange", "orange", "orange"], "time_period":[1,2,3,4,5,1,2,3,4,5], "number":[5, 10, 8, 9, 12, 20, 46, 35, 32, 55],
"forecast": [0,0,0,1,1,0,0,0,1,1]}
df = pd.DataFrame(raw_data)
items = df["item"].unique()
app.layout = html.Div([
dcc.Graph(
id="scatter-plot"
),
dcc.Dropdown(
id="dropdown",
options=[{"label":i, "value":i} for i in items]
),
dash_table.DataTable(
id="data-table",
columns=[{"id": "time_period", "name":"time_period"}, {"id":"number", "name":"number", "editable":True}],
data=df.to_dict("records")
)
])
#app.callback(
Output(component_id="scatter-plot", component_property="figure"),
Output(component_id="data-table", component_property="data"),
Input(component_id="dropdown", component_property="value")
)
def select_item(fruit):
# create copy of original dataframe
dff = df.copy()
# isolate out fruit from dropdown
fruit_df = dff[dff["item"] == fruit]
# create scatter plot for selected brand
fig = px.scatter(data_frame=fruit_df, x="time_period", y="number", color="forecast")
# isolate ordered cases and item
forecasts = fruit_df[["time_period", "number"]]
forecasts = forecasts.to_dict("records")
return fig, forecasts
#app.callback(
Output(component_id="scatter-plot", component_property="figure"),
Input(component_id="data-table", component_property="data")
)
def update_scatter(data):
fig = px.scatter(data_frame=data, x="time_period", y="number")
return fig
app.run_server(debug=True)
Combine the two, and use callback context to determine which input caused the callback to fire.

Converting an Array into a Dictionary With Python

I am trying to scrape a table and convert it into a dictionary using the TH as the key and the td as the value.
Below is the code to grab the TD and the TH
for row in rows:
td = row.find_all('td')
th = row.find_all('th')
row2 = [i.text.replace("\n", "").strip() for i in td]
print(row2)
['', '90315', 'Printmaking I', 'S1', '01(REG-HR)', 'Faletto, Liana', '445',
'LS']
print(headers)
#['Class', 'Description', 'Term', 'Schedule', 'Primary Staff > Name', 'Clssrm', 'Name']
How do I convert the output into (delete the first blank array item)
thisdict = {
"class": "90315",
"description": "Printmaking I",
"term": "S1"
}
To delete first element of row2: (>>> is python prompt)
>>> del row2[0]
>>> row2
['90315', 'Printmaking I', 'S1', '01(REG-HR)', 'Faletto, Liana', '445', 'LS']
#!/usr/bin/python
thisdict = {'class':"", 'description':"", 'term':"", ...}
thisdict['class']=row2[1]
thisdict['description']=row2[2]
thisdict['term']=row2[3]
...
or if your headers contain list of dictionary keys:
for i in range(len(headers)):
thisdict[headers[i]] = row2[i]
>>> thisdict
{'Term': 'S1', 'Description': 'Printmaking I', 'term': 'S1', 'class': '90315', 'Class': '90315', 'description': 'Printmaking I'}

filter dataframe columns as you iterate through rows and create dictionary

I have the following table of data in a spreadsheet:
Name Description Value
foo foobar 5
baz foobaz 4
bar foofoo 8
I'm reading the spreadsheet and passing the data as a dataframe.
I need to transform this table of data to json following a specific schema.
I have the following script:
for index, row in df.iterrows():
if row['Description'] == 'foofoo':
print(row.to_dict())
which return:
{'Name': 'bar', 'Description': 'foofoo', 'Value': '8'}
I want to be able to filter out a specific column. For example, to return this:
{'Name': 'bar', 'Description': 'foofoo'}
I know that I can print only the columns I want with this print(row['Name'],row['Description']) however this is only returning me values when I also want to return the key.
How can I do this?
I wrote this entire thing only to realize that #anky_91 had already suggested it. Oh well...
import pandas as pd
data = {
"name": ["foo", "abc", "baz", "bar"],
"description": ["foobar", "foofoo", "foobaz", "foofoo"],
"value": [5, 3, 4, 8],
}
df = pd.DataFrame(data=data)
print(df, end='\n\n')
rec_dicts = df.loc[df["description"] == "foofoo", ["name", "description"]].to_dict(
"records"
)
print(rec_dicts)
Output:
name description value
0 foo foobar 5
1 abc foofoo 3
2 baz foobaz 4
3 bar foofoo 8
[{'name': 'abc', 'description': 'foofoo'}, {'name': 'bar', 'description': 'foofoo'}]
After converting to dictionary you can delete the key which you don't need with:
del(row[value])
Now the dictionary will have only name and description.
You can try this:
import io
import pandas as pd
s="""Name,Description,Value
foo,foobar,5
baz,foobaz,4
bar,foofoo,8
"""
df = pd.read_csv(io.StringIO(s))
for index, row in df.iterrows():
if row['Description'] == 'foofoo':
print(row[['Name', 'Description']].to_dict())
Result:
{'Name': 'bar', 'Description': 'foofoo'}

How to choose randomly in Python

I have to write a code in python that chooses a word from 7 lists (total of 7 words) and then runs a requested number of lines to form a "poem". Each line of the "poem" is meant to be a different combination of the 7 lists. Any ideas of how to get the program to run different combinations? Mine just runs the same line the number of times I asked:
people=['Amir', 'Itai', 'Sheli','Gil','Jasmin','Tal','Nadav']
verbs = ['talks', 'smiles', 'sings', 'listens', 'eats', 'waves', 'plays', 'swims']
Adverbs =['slowly', 'quickly', 'solemnly', 'nicely', 'beautifully']
Prepositions=['to a', 'with a' ,'towards a', 'at a' ,'out of a']
Adjectives =['white', 'blue', 'green', 'small', 'large', 'yellow', 'pretty', 'sad']
Animated=['fish', 'parrot', 'flower', 'tree', 'snake']
Inanimated=['chair', 'lamp', 'car', 'ship', 'boat']
x=eval(input("How many lines are in the poem?"))
y=(random.choice(people), random.choice (verbs) ,random.choice(Adverbs) ,random.choice(Prepositions) ,random.choice(Adjectives) ,random.choice(Animated+Inanimated))
for i in range (x):
if (x< 10):
print (y)
You have the right idea, but you just need to re-evaluate the random choice each time:
people=['Amir', 'Itai', 'Sheli','Gil','Jasmin','Tal','Nadav']
verbs = ['talks', 'smiles', 'sings', 'listens', 'eats', 'waves', 'plays', 'swims']
Adverbs =['slowly', 'quickly', 'solemnly', 'nicely', 'beautifully']
Prepositions=['to a', 'with a' ,'towards a', 'at a' ,'out of a']
Adjectives =['white', 'blue', 'green', 'small', 'large', 'yellow', 'pretty', 'sad']
Animated=['fish', 'parrot', 'flower', 'tree', 'snake']
Inanimated=['chair', 'lamp', 'car', 'ship', 'boat']
x=eval(input("How many lines are in the poem?"))
for i in range (x):
y=(random.choice(people), random.choice (verbs) ,random.choice(Adverbs) ,random.choice(Prepositions) ,random.choice(Adjectives) ,random.choice(Animated+Inanimated))
if (i < 10):
print (y)
I think this can help you:
import random
people=['Amir', 'Itai', 'Sheli','Gil','Jasmin','Tal','Nadav']
verbs = ['talks', 'smiles', 'sings', 'listens', 'eats', 'waves', 'plays', 'swims']
Adverbs =['slowly', 'quickly', 'solemnly', 'nicely', 'beautifully']
Prepositions=['to a', 'with a' ,'towards a', 'at a' ,'out of a']
Adjectives =['white', 'blue', 'green', 'small', 'large', 'yellow', 'pretty', 'sad']
Animated=['fish', 'parrot', 'flower', 'tree', 'snake']
Inanimated=['chair', 'lamp', 'car', 'ship', 'boat']
while True:
x=eval(input("How many lines are in the poem?"))
if x == 0:
break
for i in range (x):
if (x< 10):
y = (random.choice(people), random.choice(verbs), random.choice(Adverbs), random.choice(Prepositions),random.choice(Adjectives), random.choice(Animated + Inanimated))
print (y)

Resources