Join array of strings in d - string

In python I can do this:
In [1]: x = ["a", "b", "c"]
In [2]: "--".join(x)
Out[2]: 'a--b--c'
Is there an equivalent trick in d?

Yes, use std.array.join:
import std.array, std.stdio;
void main()
{
auto x = ["a", "b", "c"];
writeln(x.join("--"));
}
Note that D's argument order is reversed when compared to Python's.

Related

Convert pandas MultiIndex columns to uppercase

I would like to replace pandas multi index columns with uppercase names. With a normal (1D/level) index, I would do something like
df.coulumns = [c.upper() for c in df.columns]
When this is done on a DataFrame with a pd.MultiIndex, I get the following error:
AttributeError: 'tuple' object has no attribute 'upper'
How would I apply the same logic to a pandas multi index? Example code is below.
import pandas as pd
import numpy as np
arrays = [
["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
["one", "two", "one", "two", "one", "two", "one", "two"],
]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
df = pd.DataFrame(np.random.randn(3, 8), index=["A", "B", "C"], columns=index)
arrays_upper = [
["BAR", "BAR", "BAZ", "BAZ", "FOO", "FOO", "QUX", "QUX"],
["ONE", "TWO", "ONE", "TWO", "ONE", "TWO", "ONE", "TWO"],
]
tuples_upper = list(zip(*arrays_upper))
index_upper = pd.MultiIndex.from_tuples(tuples_upper, names=['first', 'second'])
df_upper = pd.DataFrame(np.random.randn(3, 8), index=["A", "B", "C"], columns=index_upper)
print(f'Have: {df.columns}')
print(f'Want: {df_upper.columns}')
You can convert the multiindex to dataframe and uppercase the value in dataframe then convert it back to multiindex
df.columns = pd.MultiIndex.from_frame(df.columns.to_frame().applymap(str.upper))
print(df)
first BAR BAZ FOO QUX
second ONE TWO ONE TWO ONE TWO ONE TWO
A -0.374874 0.049597 -1.930723 -0.279234 0.235430 0.351351 -0.263074 -0.068096
B 0.040872 0.969948 -0.048848 -0.610735 -0.949685 0.336952 -0.012458 -0.258237
C 0.932494 -1.655863 0.900461 0.403524 -0.123720 0.207627 -0.372031 -0.049706
Or follow your loop idea
df.columns = pd.MultiIndex.from_tuples([tuple(map(str.upper, c)) for c in df.columns])
Use set_levels:
df.columns = df.columns.set_levels([level.str.upper() for level in df.columns.levels])

How to force three lists to have the same length in Python 3.7?

My function takes three lists as arguments. Then, it uses PrettyTable to make a table, with each list being a separate column.
My problem is, that those lists are not always of equal length, and PrettyTable wants them equal.
I wonder how to append some empty positions to the two of the shorter lists, so they are of equal length.
Alternatively... is there a table-making tool, that doesn't want all columns of equal length?
You could use these functions:
def padlist(lst, size, defvalue=None):
if len(lst) < size:
lst.extend([defvalue] * (size - len(lst)))
def padlists(lsts, defvalue=None):
size = max(len(lst) for lst in lsts)
for lst in lsts:
padlist(lst, size, defvalue)
# Demo
lsts = [
["a", "b", "c"],
["a", "b", "c", "d", "e"],
["a", "b"],
]
padlists(lsts, "") # provide the value to pad with
print(lsts) # shorter lists are padded with blanks
You could use zip_longest.
from itertools import zip_longest
def pad(*lists, padding=None):
padded = [[] for _ in lists]
for lst in zip_longest(*lists, fillvalue=padding):
for i, elem in enumerate(lst):
padded[i].append(elem)
return padded
test1 = [1, 2]
test2 = [1, 2, 3, 4]
test3 = [1, 2, 5, 8]
test1, test2, test3 = pad(test1, test2, test3)
print(test1, test2, test3)
This works with any amount of lists. zip_longest combines a variable amount of iterables into a single iterable padded to the length of the longest iterable.

Relationship Parsing

I am having a df with two columns col_1 and col_2. The entries in col_1 are related with entries in col_2. It is some sort of relationship where A belongs to B, and B belongs to C & D, therefore A belongs to B, C and D.
import pandas as pd
col_1 = ["A", "A", "B", "B", "I", "J", "C", "A"]
col_2 = ["B", "H", "C", "D", "J", "L", "E", "Z"]
df = pd.DataFrame({"col_1":col_1, "col_2":col_2})
df.sort_values("col_1", inplace=True)
df
I want to extract the relationship by keeping the first occurring key as the "my_key" and all other keys in "Connected" column.
How can I fetch all keys which are connected to each others ,keeping the conditions in mind.
The keys that are in col_1 should not be in the list of col_2
and
Only the related keys should be in front of my_key
Use networkx with connected_components for dictionary:
import networkx as nx
# Create the graph from the dataframe
g = nx.Graph()
g.add_edges_from(df[['col_1','col_2']].itertuples(index=False))
connected_components = nx.connected_components(g)
# Find the component id of the nodes
node2id = {}
for cid, component in enumerate(connected_components):
for node in component:
node2id[node] = cid + 1
Then get first values of groups to column col_1 and map all another values in lists:
g1 = df['col_1'].map(node2id)
df1 = df.loc[~g.duplicated(), ['col_1']]
s = pd.Series(list(node2id.keys()), index=list(node2id.values()))
s = s[~s.isin(df1['col_1'])]
d = s.groupby(level=0).agg(list)
df1['Connected'] = g1.map(d)
print (df1)
col_1 Connected
0 A [C, B, E, H, D, Z]
4 I [J, L]
For plotting use:
pos = nx.spring_layout(g, scale=20)
nx.draw(g, pos, node_color='lightblue', node_size=500, with_labels=True)

When utilizing a for loop what does each argument specify exactly?

I'm new to learning Python and have a clarifying question regarding for loops.
For instance:
dictionary_a = {"A": "Apple", "B": "Ball", "C": "Cat"}
dictionary_b = {"A": "Ant", "B": "Basket", "C": "Carrot"}
temp = ""
for k_a, v_a in dictionary_a.items():
temp = dictionary_b[k_a]
dictionary_b[k_a] = v_a
dictionary_a[k_a] = temp
How exactly is k_a run through the interpreter? I understand v_a in dictionary_a.items() as simply iterating through the sequence in whatever collection.
But when for loops have the syntax for x, y in z I don't quite understand what values x takes with each iteration.
Hope I'm making some sense. Appreciate any help.
when iterating over a dict.items(), it will return a 2 tuple, so when providing two variables in the for loop, each tuple elements will be assigned to it.
Here is another example to help you understand the mechanics:
coordinates = [(1, 2, 3), (4, 5, 6)]
for x, y, z in coordinates:
print(x)
Edit: you can make even more complicated unpacking. For example, let's assume you are interested to collect only the first and last item in a long list, you can proceed as follow:
long_list = 'This is a very long list to process'.split()
first_item, *_, last_item = long_list
In Python you can "Cast" multiple variables from another iterable variable.
Let's use this example:
>>> a, b = [1, 2]
>>> a
1
>>> b
2
The above behavior is what is happening when you loop over a dictionary with the dict.items() method.
Here is an example of what is happening:
>>> a = {"abc":123, "def":456}
>>> a.items()
dict_items([('abc', 123), ('def', 456)])
>>> for i in a.items():
... i
...
('abc', 123)
('def', 456)
>>>

Translating for loop into list comprehension

I can get this loop to work properly:
for x in range(0,len(l)):
for k in d:
if l[x] in d[k]:
l[x] = k
This looks through a list and checks if the value is in any of the dictionary items and then calculates it equal to the dictionary key it is found within (the dictionary contains lists.)
However, I want to convert to a list comprehension or other single line statement for use in a pandas dataframe - to populate a field based on whether or not another field's value is in the labeled dictionary keys and assign it the dictionary key value.
Here is my best attempt, but it does not work:
l = [ k for x in range(0,len(l)) if l[x] in d[k] for k in d ]
Thanks
Assuming I understand what you're after (example data that can be copied and pasted is always appreciated), I'd do something like this:
>>> l = ["a", "b", "c", "d"]
>>> d = {1: ["a"], 3: ["d", "c"]}
>>> l2 = [next((k for k,v in d.items() if lx in v), lx) for lx in l]
>>> l2
[1, 'b', 3, 3]
Don't forget to think about what behaviour you want if an entry in l is found in multiple lists in d, of course, although that may not be an issue with your data.
You can't do it with a list comprehension, because you have an assignment:
l[x] = k
which is an statement, and a list comprehension can't have them.

Resources