Python - Visualizing data in a diagram - python-3.x

I have a data with two columns: Product and Category. See below for an example of the data:
import pandas as pd
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'Category': ['Text', 'Text2', 'Text3', 'Text4', 'Text', 'Text2', 'Text3', 'Text4'],
'Value': [80, 10, 5, 5, 5, 3, 2, 0]})
I would like to visualize this data in a diagram:
Here the "Total" is the total value of the entire data frame, "A" & "B" boxes are the total value for each product, and then the values for each product & category are in the right-most box.
I'm not very familiar with the viz packages available in Python. Is there a package that exists that does these types of visualizations.

You can use graphviz. But you need to extract your own blocks/nodes
Example:
from graphviz import Graph
g = Graph()
g.attr(rankdir='RL')
T = df['Value'].sum()
g.node('1', 'Total = ' + str(T), shape='square')
A = df.loc[df.Product == 'A', ['Category', 'Value']].to_string(index=False)
g.node('2', A, shape='square')
B = df.loc[df.Product == 'B', ['Category', 'Value']].to_string(index=False)
g.node('3', B, shape='square')
g.edges(['21', '31'])
g.render(view=True)

Related

Coloring sns barplot based on condition from another dataframe

I am using following code to generate a barplot from a dataframe df1(x,y) as below. (For simplicity I have added sample values in the chart code itself).
sns.barplot(x=['A','B','C','D','E','F','G'],y=[10,40,20,5,60,30,80],palette='Blues_r')
This generates a beautiful chart with shades of blue color in descending order for bars A to G.
However, I need the colors to be in the order determined based on another dataframe df2 where there are values against A to G. I do not wish to change the order of A to G in this chart, so sorting the dataframe df1 based on values of df2 will not work for me.
So, say df2 is like this:
A 90
B 70
C 40
D 30
E 30
F 20
G 80
Notice that df2 can have same values (D and E), in which case I do not care whether D and E has same colors or adjacent from the palette. But there should not be any other bar with color in between D and E. That is, I need the chart to have bars starting from A and ending at G (fix order). However, colors will be in the order of df2 values.
How do we do this?
You can use hue= with the values of the second dataframe. You'll also need dodge=False to tell Seaborn that you want a full bar per x-position.
import seaborn as sns
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'x': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
'y': [10, 40, 20, 5, 60, 30, 80]})
df2 = pd.DataFrame({'x': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
'y': [90, 70, 40, 30, 30, 20, 80]})
sns.barplot(data=df1, x='x', y='y', palette='Blues_r', hue=df2['y'], dodge=False, legend=False)
Note that this uses the values in df2['y] to make the relative coloring. If you just want to use the order, you can use np.argsort(df2['y']) to get the indices of the ordered array.
ax = sns.barplot(data=df1, x='x', y='y', palette='Blues_r', hue=np.argsort(df2['y']), dodge=False)
ax.legend_.remove() # remove the legend which consists of the indices 0,1,2,...
You can try vanilla plt.bar:
x = ['A','B','C','D','E','F','G']
y=[10,40,20,5,60,30,80]
# assuming the columns of df2 are 'x' and 'color'
colors = df2.set_index('x').loc[x, 'color']
cmap = plt.get_cmap('Blues_r')
plt.bar(x,y, color=[cmap(c) for c in colors])
Output:

How to find frequency of values in a list of lists and combine with another existing list by common value?

I have a nested list of music artists comprised of user inputs, lets say:
artists_list = [['A', 'B', 'C'],
['A', 'C', 'B'],
['B', 'A', 'D']]
I've also managed to create a separate list, based on order of input (not alphabetically), that assigns a genre to each unique artist in the above list:
artist_genre_list = [['A', 'Rock'],
['B', 'Rap'],
['C', 'Rock'],
['D', 'Blues']]
How do I combine these two to make either a master list or dictionary including the frequency count similar to:
master_list = [['A', 'Rock', 3],
['B', 'Rap', 3],
['C', 'Rock', 2],
['D', 'Blues', 1]]
master_dict = {'A': {
'Genre': 'Rock',
'Frequency': 3},
'B': {
'Genre': 'Rap',
'Frequency': 3},
'C': {
'Genre': 'Rock',
'Frequency': 2},
'D': {
'Genre': 'Blues',
'Frequency': 1}
}
The order doesn't necessarily have to be alphabetical. Here is a sample of what I'm doing to create the first two lists:
# Counters
count = 1
new_artist_counter = 0
# Generate Lists
artists_input_list = []
aux_artists_list = []
aux_genre_list = []
aux_artists_genre_list = []
def merge(aux_artists_list, aux_genre_list):
merged_list = [[aux_artists_list[i], aux_genre_list[i]] for i in range(0,
len(aux_artists_list))]
return merged_list
while count < 4:
# Inputs
a1_in = str(input("Artist 1: "))
a2_in = str(input("Artist 2: "))
a3_in = str(input("Artist 3: "))
artists_input_list.append([a1_in, a2_in, a3_in])
# Determines if new unique artist has been added and asks for it's genre
while new_artist_counter < len(artists_input_list):
for entry in artists_input_list:
for artist in entry:
if artist not in aux_artists_list:
aux_artists_list.append(artist)
genre_input = input("What is "+artist+"'s genre? ")
aux_genre_list.append(genre_input)
else: continue
new_artist_counter += 1
aux_artists_genre_list = merge(aux_artists_list, aux_genre_list)
# Counter updates
count += 1
print(artists_input_list)
print(aux_artists_genre_list)
This is what I came up with. It first flattens your artist list, gets the frequencies of each item in the list then combines it with your genre list
from itertools import groupby, chain
import pprint
artists_list = [
['A', 'B', 'C'],
['A', 'C', 'B'],
['B', 'A', 'D']
]
artist_genre_list = [
['A', 'Rock'],
['B', 'Rap'],
['C', 'Rock'],
['D', 'Blues']
]
frequencies = {
key: len(list(value)) for key,
value in groupby(sorted(chain.from_iterable(artists_list)))
}
frequency = [{
letter: {
'Genre': genre,
'Frequency': next((freq
for key, freq in frequencies.items() if key is letter), 0)
}
}
for letter, genre in artist_genre_list
]
pprint.pprint(frequency)
I used pprint just to make the output tidier, which shows as
[{'A': {'Frequency': 3, 'Genre': 'Rock'}},
{'B': {'Frequency': 3, 'Genre': 'Rap'}},
{'C': {'Frequency': 2, 'Genre': 'Rock'}},
{'D': {'Frequency': 1, 'Genre': 'Blues'}}]

Fetching value from dataframe with certain condition

I have a dataframe which is containing 3 columns (['A','B','C]) and 3 rows in it.
We are using a for loop to fetch value(storing into variable) from above dataframe based upon certain condition from column B.
Further we are using list to store value present in variable.
Here question is upon checking list value, we are getting variable value, its type.
I'm not sure why it is happening. As list should contain only variable value only.
Please can anyone help us to get ideal solution for same.
Thanks,
Bhuwan
dataframe: columns-A,B,C rows value- a to i :df = ([a,b,c][d,b,f][g,b,i]).
list_1=[]
for i in range(0,9):
variable_1=df['A'][df.B == 'b']
list_1.append(variable_1)
print(list_1):
Ideal output: ['a','d','g']
while we are getting output as
['a type: object','d type: object','g type: object'].
You can get your ideal output like this:
import pandas as pd
df = pd.DataFrame({'A': ['a', 'd', 'g'], 'B': ['b', 'b', 'b'], 'C': ['c', 'f', 'i']})
list_1 = list(df[df['B'] == 'b']['A'].values) # <- this line
print(list_1)
> ['a', 'd', 'g']
You just need:
1) to filter your dataframe by column "B" df[df['B'] == 'b']
2) and only then take values of the resulted column "A", turning them into list

How to aggregate string length sequence base on an indicator sequence

I have a dictionary with two keys and their values are lists of strings.
I want to calculate string length of one list base on an indicator in another list.
It's difficult to frame the question is words, so let's look at an example.
Here is an example dictionary:
thisdict ={
'brand': ['Ford','bmw','toyota','benz','audi','subaru','ferrari','volvo','saab'],
'type': ['O','B','O','B','I','I','O','B','B']
}
Now, I want to add an item to the dictionary that corresponds to string cumulative-length of "brand-string-sequence" base on condition of "type-sequence".
Here is the criteria:
If type = 'O', set string length = 0 for that index.
If type = 'B', set string length to the corresponding string length.
If type = 'I', it's when things get complicated. You would want to look back the sequence and sum up string length until you reach to the first 'B'.
Here is an example output:
thisdict ={
"brand": ['Ford','bmw','toyota','benz','audi','subaru','ferrari','volvo','saab'],
'type': ['O','B','O','B','I','I','O','B','B'],
'cumulative-length':[0,3,0,4,8,14,0,5,4]
}
where 8=len(benz)+len(audi) and 14=len(benz)+len(audi)+len(subaru)
Note that in the real data I'm working on, the sequence can be one "B" and followed by an arbitrary number of "I"s. ie. ['B','I','I','I','I','I','I',...,'O'] so I'm looking for a solution that is robust in such situation.
Thanks
You can use the zip fucntion to tie the brand and type together. Then just keep a running total as you loop through the dictionary values. This solution will support any length series and any length string in the brand list. I am assuming that len(thisdict['brand']) == len(thisdict['type']).
thisdict = {
'brand': ['Ford','bmw','toyota','benz','audi','subaru','ferrari','volvo','saab'],
'type': ['O','B','O','B','I','I','O','B','B']
}
lengths = []
running_total = 0
for b, t in zip(thisdict['brand'], thisdict['type']):
if t == 'O':
lengths.append(0)
elif t == 'B':
running_total = len(b)
lengths.append(running_total)
elif t == 'I':
running_total += len(b)
lengths.append(running_total)
print(lengths)
# [0, 3, 0, 4, 8, 14, 0, 5, 4]
Generating random data
import random
import string
def get_random_brand_and_type():
n = random.randint(1,8)
b = ''.join(random.choice(string.ascii_uppercase) for _ in range(n))
t = random.choice(['B', 'I', 'O'])
return b, t
thisdict = {
'brand': [],
'type': []
}
for i in range(random.randint(1,20)):
b, t = get_random_brand_and_type()
thisdict['brand'].append(b)
thisdict['type'].append(t)
yields the following result:
{'type': ['B', 'B', 'O', 'I', 'B', 'O', 'O', 'I', 'O'],
'brand': ['O', 'BSYMLFN', 'OF', 'SO', 'KPQGRW', 'DLCWW', 'VLU', 'ZQE', 'GEUHERHE']}
[1, 7, 0, 9, 6, 0, 0, 9, 0]

How to combine rows in pandas

I have a dataset like this
df = pd.DataFrame({'a' : ['a', 'b' , 'b', 'a'], 'b': ['a', 'b' , 'b', 'a'] })
And i want to combine first two rows and get dataset like this
df = pd.DataFrame({'a' : ['a b' , 'b', 'a'], 'b': ['a b' , 'b', 'a'] })
no rules but first two rows. I do not know how to combine row so i 'create' method to combine by transpose() as below
db = df.transpose()
db["new"] = db[0].map(str) +' '+ db[1]
db.drop([0, 1], axis=1, inplace=True) # remove these two columns
cols = db.columns.tolist() # re order
cols = cols[-1:] + cols[:-1]
db = db[cols]
df = db.transpose() # reverse operation
df.reset_index()
It works but i think there is an easier way
You can simply add the two rows
df.loc[0] = df.loc[0]+ df.loc[1]
df.drop(1, inplace = True)
You get
a b
0 ab ab
2 b b
3 a a
A bit more fancy looking :)
df.loc[0]= df[:2].apply(lambda x: ''.join(x))
df.drop(1, inplace = True)

Resources