Convert pandas MultiIndex columns to uppercase - python-3.x

I would like to replace pandas multi index columns with uppercase names. With a normal (1D/level) index, I would do something like
df.coulumns = [c.upper() for c in df.columns]
When this is done on a DataFrame with a pd.MultiIndex, I get the following error:
AttributeError: 'tuple' object has no attribute 'upper'
How would I apply the same logic to a pandas multi index? Example code is below.
import pandas as pd
import numpy as np
arrays = [
["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
["one", "two", "one", "two", "one", "two", "one", "two"],
]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
df = pd.DataFrame(np.random.randn(3, 8), index=["A", "B", "C"], columns=index)
arrays_upper = [
["BAR", "BAR", "BAZ", "BAZ", "FOO", "FOO", "QUX", "QUX"],
["ONE", "TWO", "ONE", "TWO", "ONE", "TWO", "ONE", "TWO"],
]
tuples_upper = list(zip(*arrays_upper))
index_upper = pd.MultiIndex.from_tuples(tuples_upper, names=['first', 'second'])
df_upper = pd.DataFrame(np.random.randn(3, 8), index=["A", "B", "C"], columns=index_upper)
print(f'Have: {df.columns}')
print(f'Want: {df_upper.columns}')

You can convert the multiindex to dataframe and uppercase the value in dataframe then convert it back to multiindex
df.columns = pd.MultiIndex.from_frame(df.columns.to_frame().applymap(str.upper))
print(df)
first BAR BAZ FOO QUX
second ONE TWO ONE TWO ONE TWO ONE TWO
A -0.374874 0.049597 -1.930723 -0.279234 0.235430 0.351351 -0.263074 -0.068096
B 0.040872 0.969948 -0.048848 -0.610735 -0.949685 0.336952 -0.012458 -0.258237
C 0.932494 -1.655863 0.900461 0.403524 -0.123720 0.207627 -0.372031 -0.049706
Or follow your loop idea
df.columns = pd.MultiIndex.from_tuples([tuple(map(str.upper, c)) for c in df.columns])

Use set_levels:
df.columns = df.columns.set_levels([level.str.upper() for level in df.columns.levels])

Related

Print table in plotly dash with multiple lines in one cell

Currently I have a pandas dataframe :
df = pd.DataFrame({
"date": ["20210613", "20210614", "20210615"],
"user": ["A\nB", "C", "D"],
"machine" : [1, 0, 3]
})
I wonder if there is any way to print this table to my dash app like this:
no matter using pure text print into dcc.Textarea or dash_table.DataTable are fine.
Currently I still can not figure out a good way to achieve this, many thanks.
You can do it in a DataTable by using the style_cell property on the DataTable like this:
import dash
import dash_table
import pandas as pd
df = pd.DataFrame(
{
"date": ["20210613", "20210614", "20210615"],
"user": ["A\nB", "C", "D"],
"machine": [1, 0, 3],
}
)
app = dash.Dash(__name__)
app.layout = dash_table.DataTable(
id="table",
columns=[{"name": i, "id": i} for i in df.columns],
data=df.to_dict("records"),
style_cell={"whiteSpace": "pre-line"},
)
if __name__ == "__main__":
app.run_server(debug=True)
You can make the datatable cells break when they encounter the \n character by setting the white-space CSS attribute for the cells in the table
I found this answer on this thread

Relationship Parsing

I am having a df with two columns col_1 and col_2. The entries in col_1 are related with entries in col_2. It is some sort of relationship where A belongs to B, and B belongs to C & D, therefore A belongs to B, C and D.
import pandas as pd
col_1 = ["A", "A", "B", "B", "I", "J", "C", "A"]
col_2 = ["B", "H", "C", "D", "J", "L", "E", "Z"]
df = pd.DataFrame({"col_1":col_1, "col_2":col_2})
df.sort_values("col_1", inplace=True)
df
I want to extract the relationship by keeping the first occurring key as the "my_key" and all other keys in "Connected" column.
How can I fetch all keys which are connected to each others ,keeping the conditions in mind.
The keys that are in col_1 should not be in the list of col_2
and
Only the related keys should be in front of my_key
Use networkx with connected_components for dictionary:
import networkx as nx
# Create the graph from the dataframe
g = nx.Graph()
g.add_edges_from(df[['col_1','col_2']].itertuples(index=False))
connected_components = nx.connected_components(g)
# Find the component id of the nodes
node2id = {}
for cid, component in enumerate(connected_components):
for node in component:
node2id[node] = cid + 1
Then get first values of groups to column col_1 and map all another values in lists:
g1 = df['col_1'].map(node2id)
df1 = df.loc[~g.duplicated(), ['col_1']]
s = pd.Series(list(node2id.keys()), index=list(node2id.values()))
s = s[~s.isin(df1['col_1'])]
d = s.groupby(level=0).agg(list)
df1['Connected'] = g1.map(d)
print (df1)
col_1 Connected
0 A [C, B, E, H, D, Z]
4 I [J, L]
For plotting use:
pos = nx.spring_layout(g, scale=20)
nx.draw(g, pos, node_color='lightblue', node_size=500, with_labels=True)

Printing list containing a string

I am trying to store a string variable containg some names, I want to store the respective variable in a list and print it, but am unable print the values which are stored in variable.
name='vsb','siva','anand','soubhik' #variable containg some names
lis=['name'] # storing the variable in a list
for x in lis:
print(x) #printing the list using loops
Image:
Maybe dictionary? Try this
variable_1 = "aa"
variable_2 = "bb"
lis = {}
lis['name1'] = variable_1
lis['name2'] = variable_2
for i in lis:
print(i)
print(lis[i])
Your name variable is actually a tuple.
Example of tuple declaration:
tup1 = ('physics', 'chemistry', 1997, 2000)
tup2 = (1, 2, 3, 4, 5 )
tup3 = "a", "b", "c", "d"
Example of list declaration:
list1 = ['physics', 'chemistry', 1997, 2000]
list2 = [1, 2, 3, 4, 5 ]
list3 = ["a", "b", "c", "d"]
For a better understanding you should read The Python Standard Library or do a tutorial.
For your problem maybe the dictionary is the solution:
# A tuple is a sequence of immutable Python objects
name='vsb','siva','anand','soubhik'
print('Tuple: ' + str(name)) # ('vsb', 'siva', 'anand', 'soubhik')
# This is a list containing one element: 'name'
lis=['name']
print('List: ' + str(lis)) # ['name']
# Dictionry with key 'name' and vlue ('vsb','siva','anand','soubhik')
dictionary={'name':name}
print('Dictionary: ' + str(dictionary))
print('Dictionary elements:')
print(dictionary['name'])
print('Tuple elements:')
for x in name:
print(x)
print('List elements:')
for x in lis:
print(x)
Output
Tuple: ('vsb', 'siva', 'anand', 'soubhik')
List: ['name']
Dictionary: {'name': ('vsb', 'siva', 'anand', 'soubhik')}
Dictionary elements:
('vsb', 'siva', 'anand', 'soubhik')
Tuple elements:
vsb
siva
anand
soubhik
List elements:
name

How can count different duplicates

I shared code below;
I want delete duplicates and count them.Also want a column for count times.
So clearly that code will count A column and count,delete duplicates.Finally it will add as a new column. Is it possible somehow?
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"]})
df = pd.DataFrame({"A":["foo","bar"], "B":[3,1]})
While completely not using pandas, you could achieve this using Counter from standard collections:
>>> from collections import Counter
>>> Counter(["foo", "foo", "foo", "bar"])
>>> counter = Counter(["foo", "foo", "foo", "bar"])
>>> counter.keys()
dict_keys(['foo', 'bar'])
>>> counter.values()
dict_values([3, 1])
So, for your case:
counter = Counter(["foo", "foo", "foo", "bar"])
df = pd.DataFrame({"A": list(counter.keys()), "B": list(counter.values())})

Join array of strings in d

In python I can do this:
In [1]: x = ["a", "b", "c"]
In [2]: "--".join(x)
Out[2]: 'a--b--c'
Is there an equivalent trick in d?
Yes, use std.array.join:
import std.array, std.stdio;
void main()
{
auto x = ["a", "b", "c"];
writeln(x.join("--"));
}
Note that D's argument order is reversed when compared to Python's.

Resources