plots with twisted data - linux

question has been delted sorry for inconvenience caused to anyone referiing

Here's a possible solution using a couple of bits of dummy aesthetics.
I called geom_boxplot on a subset of the data to exclude "Diff" and "rare". I gave this a dummy color aesthetic in order to get a legend for the boxplot.
Then I called geom_point on a subset where the data is only "Diff" or "rare". ggplot expects long shaped data, so rather than filtering for values and calling geom_point twice, you want to instead call geom_point once and use an aesthetic to have two different shapes.
The next step was controlling labels to match what you're after and hide the dummies. Adding guide = guide_legend(order = 1) or order = 2 sets the order of the legends, so that the one for Method A comes before the one for Method B.
One drawback is that there are two Method B legends, one for color and one for shape. That's because they have two different sets of levels. A workaround might be possible with interaction(section, Categories) instead.
library(tidyverse)
library(data.table)
df <- structure(
list(Categories = c("Aaas", "Aaas", "Aaas", "Aaas", "Bbbs", "Bbbs", "Bbbs", "Bbbs", "Cccs", "Cccs", "Cccs", "Cccs", "Diffs", "Diffs", "Diffs", "Diffs", "rare", "rare", "rare", "rare"),
section = c("red", "blue", "green", "yellow", "red", "blue", "green", "yellow", "red", "blue", "green", "yellow", "red", "blue", "green", "yellow", "red", "blue", "green", "yellow"),
range = c("top1K", "top2K", "top3K", "top4K", "top1K", "top2K", "top3K", "top4K", "top1K", "top2K", "top3K", "top4K", "top1K", "top2K", "top3K", "top4K", "top1K", "top2K", "top3K", "top4K"),
values = c(20L, 32L, 42L, 32L, 21L, 12L, 14L, 14L, 15L, 13L, 43L, 21L, 2L, 10L, 13L, 11L, 3L, 5L, 7L, 9L)),
class = "data.frame",
row.names = c(NA, -20L),
spec = structure(list(cols = structure(list(dummy = structure(list(), class = c("collector_character", "collector")), Categories = structure(list(), class = c("collector_character", "collector")), section = structure(list(), class = c("collector_character", "collector")), range = structure(list(), class = c("collector_character", "collector")), values = structure(list(), class = c("collector_integer", "collector"))),
.Names = c("dummy", "Categories", "section", "range", "values")), default = structure(list(), class = c("collector_guess", "collector"))), .Names = c("cols", "default"), class = "col_spec"), .Names = c("Categories", "section", "range", "values"))
ggplot(df, aes(x = range, y = values)) +
geom_boxplot(aes(fill = "Type A"), data = df[!df$Categories %in% "Diffs" & !df$Categories %like% "rare", ]) +
geom_jitter(data = df[!df$Categories %in% "Diffs" & !df$Categories %like% "rare", ]) +
geom_point(aes(color = Categories, shape = section), data = df[df$Categories %in% "Diffs" | df$Categories %like% "rare", ]) +
scale_fill_manual(values = "white", guide = guide_legend(order = 1)) +
scale_color_discrete(guide = guide_legend(order = 2), labels = c("Type B", "Type C")) +
labs(fill = "Method A", color = "Method B", shape = "Method B")
Created on 2018-04-22 by the reprex package (v0.2.0).

Related

How to print whether each string in a list is in a pandas dataframe?

Given a list of search terms and a pandas dataframe, what is the most pythonic way to print whether the search term is present in the target dataframe?
search_terms = ["red", "blue", "green", "orange"]
input_df looks like...
color count
0 red 15
1 blue 39
2 yellow 40
3 green 21
I want to see...
red = true
blue = true
green = true
orange = false
I know how to filter the input_df to include only search_terms. This doesn't alert me to the fact that "orange" was not located in the input_df. The search_terms could contain hundreds or thousands of strings.
color = ['red', 'blue', 'yellow', 'green']
count = [15,39,40,21]
input_dict = dict(color=color, count=count)
input_df = pd.DataFrame(data=input_dict)
found_df = input_df[input_df['color'].isin(search_terms)]
You can try:
out = dict(zip(search_terms, pd.Series(search_terms).isin(input_df['color'])))
Or:
out = dict(zip(search_terms, np.isin(search_terms, input_df)) )
Output:
{'red': True, 'blue': True, 'green': True, 'orange': False}

How to scale/drop NaN/0's from graph visually

I would like to start the graph from the first non-zero or non NaN value, also if possible, only connect non-zero/ non NaN terms.
def CreateAvgGraph(input_data):
KK = test3.loc[[input_data],:]
K = KK.T
K = K.fillna(0)
K = K.reset_index()
list1a = K['index'].tolist()
list2a = K[input_data].tolist()
return dcc.Graph(
id='example-graph2',
figure={
'data': [
{'x' : list1a , 'y': list2a, 'type':'line','name' :input_data},
],
'layout': {
'title': str(input_data) + ' Average Price'
}
}
)
[![enter image description here][1]][1]
Removing the fillNa doesn't really help as the view scale is too much.
def CreateAvgGraph(input_data):
KK = test3.loc[[input_data],:]
K = KK.T
K = K.reset_index()
list1a = K['index'].tolist()
list2a = K[input_data].tolist()
return dcc.Graph(
id='example-graph2',
figure={
'data': [
{'x' : list1a , 'y': list2a, 'type':'line','name' :input_data},
],
'layout': {
'title': str(input_data) + ' Average Price'
}
}
)
I have managed to do an ugly fix, but there has to be a better way?
def CreateAvgGraph(input_data):
KK = test3.loc[[input_data],:]
K = KK.T
K = K.fillna(0)
K = K.reset_index()
list1a = K['index'].tolist()
list2a = K[input_data].tolist()
list2aa = []
list1aa =[]
for i in range(0,len(list1a)):
if list2a[i] > 0:
list1aa.append(list1a[i])
list2aa.append(list2a[i])
else:
continue
return dcc.Graph(
id='example-graph2',
figure={
'data': [
{'x' : list1aa , 'y': list2aa, 'type':'line','name' :input_data},
],
'layout': {
'title': str(input_data) + ' Average Price'
If you simply want to plot all non-nan value, you should just drop the nan values rather than filling them with zeros, i.e. you should replace K.fillna(0) with K.dropna().

Networkx bug? color is misplaced

I people, I'm trying to plot a network graph using networkx module, but I am having results I was not expecting and I am starting to ask myself if it is any module issue!
I have this code inside a class:
def plotGraph(self):
conn = []
nodeLabel = {}
for node_idx in self.operatorNodes:
print("i = ", node_idx)
print(self.node[node_idx].childs)
for child in self.node[node_idx].childs:
conn.append((child.idx, node_idx))
for i in range(self.nn):
nodeLabel[i] = str(i) + ": " + self.node[i].opString
node_color = ['blue'] * self.nn
#for i in range(self.nOutputs):
# node_color[i] = 'red'
node_color[0] = 'red'
print('Graph Conn = ', conn)
print('Graph Color = ', node_color)
# you may name your edge labels
labels = map(chr, range(65, 65 + len(conn)))
print('nodeLabel = ', nodeLabel)
draw_graph(conn, nodeLabel, node_color=node_color, labels=labels)
From the prints I can see that what is being passed inside the draw_graph is (draw_graph code is based in https://www.udacity.com/wiki/creating-network-graphs-with-python):
Graph Conn = [(2, 0), (3, 0), (4, 1), (5, 1), (6, 2), (7, 2), (8, 5), (9, 5)]
Graph Color = ['red', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue']
nodeLabel = {0: '0: mul', 1: '1: mul', 2: '2: mul', 3: '3: cte', 4: '4: cte', 5: '5: sum', 6: '6: cte', 7: '7: cte', 8: '8: cte', 9: '9: cte'}
Yet the plot is the following
draw_graph code is:
def draw_graph(graph, nodeLabel, node_color, labels=None, graph_layout='shell',
node_size=1600, node_alpha=0.3,
node_text_size=12,
edge_color='blue', edge_alpha=0.3, edge_tickness=1,
edge_text_pos=0.3,
text_font='sans-serif'):
# create networkx graph
G=nx.DiGraph()
# add edges
for edge in graph:
G.add_edge(edge[0], edge[1])
# these are different layouts for the network you may try
# shell seems to work best
if graph_layout == 'spring':
graph_pos = nx.spring_layout(G)
elif graph_layout == 'spectral':
graph_pos = nx.spectral_layout(G)
elif graph_layout == 'random':
graph_pos = nx.random_layout(G)
else:
graph_pos = nx.shell_layout(G)
# draw graph
nx.draw_networkx_edges(G, graph_pos, width=edge_tickness, alpha=edge_alpha, edge_color=edge_color)
nx.draw_networkx_labels(G, graph_pos, labels=nodeLabel, font_size=node_text_size, font_family=text_font)
if labels is None:
labels = range(len(graph))
edge_labels = dict(zip(graph, labels))
nx.draw_networkx_edge_labels(G, graph_pos, edge_labels=edge_labels, label_pos=edge_text_pos)
nx.draw(G, graph_pos, node_size=node_size, alpha=node_alpha, node_color=node_color)
Has can be seen, the Graph Color in 0 position is red and the remain should be blue, yet the plot is putting in the third node! There is no way for me to access node 1 has well, apparently, nodes are misplaced! The nodes color are placed in the following positions [2, 0, 3, 4, 5,....].
When you use nx.draw and pass it an (optional) list of colors, it will assign those colors to the nodes in the same order as the (optional) nodelist. But you didn't define nodelist. So it will default to whatever order comes out of G.nodes().
Since the underlying data structure for a networkx graph is a dictionary, you have to deal with the fact that you cannot count on the nodes to have any specified order.
Try passing nodelist into the nx.draw command in the order you want.

How do I add an axis for a second trace in a Plotly subplot?

I have three traces, one of which I have in one subplot, and two of which are in another. I would like to have a distinct y-axis each of the traces in the subplot with 2 traces.
For example, I have
fig = plotly.tools.make_subplots(rows=2, cols=1, shared_xaxes=True)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 1)
fig.append_trace(trace3, 2, 1)
fig['layout'].update(height=200, width=400)
which produces
And when I have no subplots, I can get a second axis for the second trace with
layout = go.Layout(
yaxis=dict(
title='y for trace1'
),
yaxis2=dict(
title='y for trace2',
titlefont=dict(
color='rgb(148, 103, 189)'
),
tickfont=dict(
color='rgb(148, 103, 189)'
),
overlaying='y',
side='right'
)
)
fig = go.Figure(data=data, layout=layout)
which produces
But I can't figure out how to get the first subplot in the first example to look like the plot in the second example: with a distinct axis for the second trace there.
How do I add an axis for a second trace in a Plotly subplot?
This is a bit of a workaround, but it seems to work:
import plotly as py
import plotly.graph_objs as go
from plotly import tools
import numpy as np
left_trace = go.Scatter(x = np.random.randn(1000), y = np.random.randn(1000), yaxis = "y1", mode = "markers")
right_traces = []
right_traces.append(go.Scatter(x = np.random.randn(1000), y = np.random.randn(1000), yaxis = "y2", mode = "markers"))
right_traces.append(go.Scatter(x = np.random.randn(1000) * 10, y = np.random.randn(1000) * 10, yaxis = "y3", mode = "markers"))
fig = tools.make_subplots(rows = 1, cols = 2)
fig.append_trace(left_trace, 1, 1)
for trace in right_traces:
yaxis = trace["yaxis"] # Store the yaxis
fig.append_trace(trace, 1, 2)
fig["data"][-1].update(yaxis = yaxis) # Update the appended trace with the yaxis
fig["layout"]["yaxis1"].update(range = [0, 3], anchor = "x1", side = "left")
fig["layout"]["yaxis2"].update(range = [0, 3], anchor = "x2", side = "left")
fig["layout"]["yaxis3"].update(range = [0, 30], anchor = "x2", side = "right", overlaying = "y2")
py.offline.plot(fig)
Produces this, where trace0 is in the first subplot plotted on yaxis1, and trace1 and trace2 are in the second subplot, plotted on yaxis2 (0-3) and yaxis3 (0-30) respectively:
When traces are appended to subplots, the xaxis and yaxis seem to be overwritten, or that's my understanding of this discussion anyway.

Groovy sorting string asc

How to sort string names in array on ascending order .
I tried sort method but it fails to sort on name basis .
def words = ["orange", "blue", "apple", "violet", "green"]
I need to achieve like this :
["apple", "blue", "green", "orange", "violet" ]
thanks in advance.
["orange", "blue", "apple", "violet", "green"].sort()
def words = ["orange", "blue", "apple", "violet", "green"]
["orange", "blue", "apple", "violet", "green"].sort({ a, b -> a[0] <=> b[0] } as Comparator )
You can also change the indexes based on the requirement

Resources