How to properly recombine grouped observables? - python-3.x

I'm trying to create a tool for analysing stock prices.
I've got a stream of price data for different stocks, and I want to have an observable to emit events whenever it receives a new, distinct and complete set of prices.
My plan: grouping the stream into different sub-streams for different stocks, and recombining their latest values.
Let's say I've got a stream of events like this:
from rx import Observable
stock_events = [
{'stock': 'A', 'price': 15},
{'stock': 'A', 'price': 16},
{'stock': 'B', 'price': 24},
{'stock': 'C', 'price': 37},
{'stock': 'A', 'price': 18},
{'stock': 'D', 'price': 42},
{'stock': 'B', 'price': 27},
{'stock': 'B', 'price': 27},
{'stock': 'C', 'price': 31},
{'stock': 'D', 'price': 44}
]
price_source = Observable.from_list(stock_events)
Here is my first (naive) approach:
a_source = price_source.filter(lambda x: x['stock'] == 'A').distinct_until_changed()
b_source = price_source.filter(lambda x: x['stock'] == 'B').distinct_until_changed()
c_source = price_source.filter(lambda x: x['stock'] == 'C').distinct_until_changed()
d_source = price_source.filter(lambda x: x['stock'] == 'D').distinct_until_changed()
(Observable
.combine_latest(a_source, b_source, c_source, d_source, lambda *x: x)
.subscribe(print))
This correctly gives me:
({'stock': 'A', 'price': 18}, {'stock': 'B', 'price': 24}, {'stock': 'C', 'price': 37}, {'stock': 'D', 'price': 42})
({'stock': 'A', 'price': 18}, {'stock': 'B', 'price': 27}, {'stock': 'C', 'price': 37}, {'stock': 'D', 'price': 42})
({'stock': 'A', 'price': 18}, {'stock': 'B', 'price': 27}, {'stock': 'C', 'price': 31}, {'stock': 'D', 'price': 42})
({'stock': 'A', 'price': 18}, {'stock': 'B', 'price': 27}, {'stock': 'C', 'price': 31}, {'stock': 'D', 'price': 44})
Yet, I feel that this should be better handled by group_by, instead of several filterings, so here's a re-write:
(price_source
.group_by(lambda e: e['stock'])
.map(lambda obs: obs.distinct_until_changed())
.combine_latest(lambda *x: x)
.subscribe(print))
But this time, I get:
(<rx.core.anonymousobservable.AnonymousObservable object at 0x000000000105EA20>,)
(<rx.core.anonymousobservable.AnonymousObservable object at 0x000000000776AB00>,)
(<rx.core.anonymousobservable.AnonymousObservable object at 0x000000000776A438>,)
(<rx.core.anonymousobservable.AnonymousObservable object at 0x000000000775E7F0>,)
What have I missed here? How do I "unwrap" the nested observables?

If you did want to use groupby it would be something like below in C#. This doesn't meet your requirement of a "complete" set though. As per comments, suspect CombineLatest would be better here.
price_source.GroupBy(x => x.Stock)
.Select(gp => gp.DistinctUntilChanged(x => x.Price))
.SelectMany(x => x)
.Subscribe(s => Console.WriteLine($"{s.Stock} : {s.Price}"));

Related

How do I replace all occurrences of '+' with '.5' in a dataframe?

I have a dataframe below:
data = {'Name': ['A', 'B', 'C', 'D'],
'Lower': ['+', '2', '2+', '3'],
'Upper': ['2','3+','4+','5']}
df= pd.DataFrame(data)
The expected output should be:
data = {'Name': ['A', 'B', 'C', 'D'],
'Lower': ['.5', '2', '2.5', '3'],
'Upper': ['2','3.5','4.5','5']}
I have tried using the code below but it only replaces + and not 2+, 3+, 4+
df.replace('+','.5', regex=False)
I also tried using str.replace but the rest of the values become NaN:
df['Lower'].str.replace('+', '.5')
you can override the value by looping, but it's not the fastest solution
import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D'],
'Lower': ['+', '2', '2+', '3'],
'Upper': ['2','3+','4+','5']}
lower = []
upper = []
newdata = {'Name': ['A', 'B', 'C', 'D'],
'Lower': lower,
'Upper': upper}
for i in data['Lower']:
if "+" in i:
lower.append(i.replace("+", ".5"))
else:
lower.append(i)
for j in data['Upper']:
if "+" in j:
upper.append(j.replace("+", ".5"))
else:
upper.append(j)
df= pd.DataFrame(newdata)
print(df)

Multi-List to an order dictionary by different column

I have a OrderedDict below, which column1 and column2 present a relationship
This created for me the following OrderedList
OrderedDict([('AD',
[['A', 'Q_30', 100],
['A', 'Q_24', 74],
['B', 'Q_28', 37],
['B', 'Q_30', 100],
['C', 'Q_25', 38],
['C', 'Q_30', 100],
['D', 'D_4', 44],
['E', 'D_4', 44],
['F', 'D_5', 44]])
I would like to iterate over the elements, each time look at other row and collect column2.
eg.
element A contain Q_30 and Q24 and collect related member from other rows
element B contain Q_30, so collect Q24,Q28,Q30 and order by column3
OrderedDict([('AD',
[{'Q_30':100, 'Q_24':74, 'Q_25':38, 'Q_28': 37}, {'D_4':44}, {'D_5':44}])
When I understand this correctly, your "OrderedDict" is currently a tuple with a list inside, in which is another list and is meant to look like this:
OrderedList = ('AD',
[['A', 'Q_30', 100],
['A', 'Q_24', 74],
['B', 'Q_28', 37],
['B', 'Q_30', 100],
['C', 'Q_25', 38],
['C', 'Q_30', 100],
['D', 'D_4', 44],
['E', 'D_4', 44],
['F', 'D_5', 44]])
and you want to convert it into a tuple with a list inside which holds dicts:
OrderedDict = ('AD',
[{'Q_30': 100,
'Q_24': 74,
'Q_25': 38,
'Q_28': 37},
{'D_4': 44},
{'D_5': 44}])
In this case I am guessing you look for groupby():
from itertools import groupby
OrderedList = ('AD',
[['A', 'Q_30', 100],
['A', 'Q_24', 74],
['B', 'Q_28', 37],
['B', 'Q_30', 100],
['C', 'Q_25', 38],
['C', 'Q_30', 100],
['D', 'D_4', 44],
['E', 'D_4', 44],
['F', 'D_5', 44]])
for key, group in groupby(OrderedList[1], lambda x: x[0]):
for thing in group:
print("%s is a %s." % (thing[1], key))
Gives:
Q_30 is a A.
Q_24 is a A.
Q_28 is a B.
Q_30 is a B.
Q_25 is a C.
Q_30 is a C.
D_4 is a D.
D_4 is a E.
D_5 is a F.
This is not the full answer, but an example as I feel like it would be spoon-feeding otherwise

ValueError: Invalid broadcasting comparison with block values - how to resolve it in pythonic way

Hi I have two data frames and trying to compare the values in it but facing a ValueError in broadcasting:
dict_1 = {'a': {0: [{'value': 'A123',
'label': 'Professional'},
{'value': 'B141', 'label': 'Passion'}]},
'b': {0: [{'value': 'B5529',
'label': 'Innovation'},
{'value': 'B3134', 'label': 'Businees Value'},
{'value': 'B3856',
'label': 'Electrofication'},
{'value': 'B3859', 'label': 'Insurance'},
{'value': 'B3856', 'label': 'Requirements'},
{'value': 'B3345', 'label': 'Stories'}]},
"c" : "hello"}
dict_2 = {'a': {0: np.nan},
'b': {0: [{'value': 'B4785',
'label': 'Innovation'},
{'value': 'B4635', 'label': 'Businees Value'},
{'value': 'B1234', 'label': 'Requirements'},
{'value': 'B9853', 'label': 'Stories'}]},
'c': "hello"
}
df1 = pd.DataFrame(dict_1)
df2 = pd.DataFrame(dict_2)
Here I wanted to compare two rows only but not two complete dataframes (as I had a scenario that shape of df1=(500, 2) and shape of shape of df2 = (1, 2)). So I used the below code two extract the different values in the rows .
df1[~(df1[['a', 'b', 'c']] == df2[['a', 'b', 'c']].iloc[0])]
The desired result should be:
Here, df2 which has one row should compare with every row values of df1(in my scenario I have more than 1 row). If they are identical then it should be nan else I should get the corresponding values of df1
You can use mask and replace True matches with np.nan. If df2 and df1 have a single row
condition = df1 == df2
df1.mask(condition, other=np.nan)
Output:
Now if df2 has more than one row you can apply a callable that return True or False values, in this case calling apply to compare each row of df1 to the first element of df2. Otherwise one gets a different shape error.
dict_1 = {'a':
{0: [{'value': 'A123',
'label': 'Professional'},
{'value': 'B141', 'label': 'Passion'}],
1: [{'value': 'B123',
'label': 'Professional'},
{'value': 'B141', 'label': 'Passion'}]
},
'b': {0: [{'value': 'B5529',
'label': 'Innovation'},
{'value': 'B3134', 'label': 'Businees Value'},
{'value': 'B3856',
'label': 'Electrofication'},
{'value': 'B3859', 'label': 'Insurance'},
{'value': 'B3856', 'label': 'Requirements'},
{'value': 'B3345', 'label': 'Stories'}],
1: [{'value': 'C5529',
'label': 'Innovation'},
{'value': 'B3134', 'label': 'Businees Value'},
{'value': 'B3856',
'label': 'Electrofication'},
{'value': 'B3859', 'label': 'Insurance'},
{'value': 'B3856', 'label': 'Requirements'},
{'value': 'B3345', 'label': 'Stories'}],
},
"c" : {0: "hello", 1: "hola"}}
# New df1 with two rows
df1 = pd.DataFrame(dict_1)
condition = df1.apply(lambda x: x==df2.iloc[0], axis=1)
df1.mask(condition, other=np.nan)
Output

How can I enter a dictionary inside an another empty dictionary?

The example code -
innerdict = {}
outerdict = {}
for line in range(1, 10, 2):
for j in range(1, 10, 2):
line_tuple = ("Item" + str( line ), int( line ))
key = line_tuple[1]
if line ==j:
outerdict[key] = dict( innerdict )
outerdict[key] = {'Name': '{0}'.format( "item"+str(j) ), 'Price': '{0}'.format( j )}
print(outerdict)
The ouput comes out like this-
{1: {'Name': 'item1', 'Price': '1'}, 3: {'Name': 'item3', 'Price': '3'}, 5: {'Name': 'item5', 'Price': '5'}, 7: {'Name': 'item7', 'Price': '7'}, 9: {'Name': 'item9', 'Price': '9'}}
The above output is achievable since it is conventional. I found a lot of online suggestions regarding nested dictionary comprehension.
But I want the output to come out like below-
{{'Name': 'item1', 'Price': '1'}, {'Name': 'item3', 'Price': '3'}, {'Name': 'item5', 'Price': '5'}, {'Name': 'item7', 'Price': '7'}, {'Name': 'item9', 'Price': '9'}}
Thanks in advance!
This is not possible, as the dict objects are not hashable.
{{1:2}} would mean putting a dict {1:2} into a set, which is not possible because of the un-hashability of the objects mentioned above. Better put them in a list:
[{1:2}, {2:3}]
What you want is something like a list of dictionaries. And this {{'Name': 'item1', 'Price': '1'}, {'Name': 'item3', 'Price': '3'}, {'Name': 'item5', 'Price': '5'}, {'Name': 'item7', 'Price': '7'}, {'Name': 'item9', 'Price': '9'}} is invalid as dictionary is considered to be a key-value pair and there is no key in this.
It can be checked by assigning the above to a variable and then checking its type.
d = {{'Name': 'item1', 'Price': '1'}, {'Name': 'item3', 'Price': '3'}, {'Name': 'item5', 'Price': '5'}, {'Name': 'item7', 'Price': '7'}, {'Name': 'item9', 'Price': '9'}}
print(type(d))
It will result in an error saying it's unhashable.

Mapping between matrices

I have 2 matrices:
list_alpha = [['a'],
['b'],
['c'],
['d'],
['e']]
list_beta = [['1', 'a', 'e', 'b'],
['2', 'd', 'X', 'X'],
['3', 'a', 'X', 'X'],
['4', 'd', 'a', 'c'],
And my goal is if a letter from list_alpha is in a sublist of list_beta, then the first element of that line in list_beta (the #) is added to the correct line in list_alpha.
So my output would be:
final_list = [['a', '1', '3', '4'],
['b', '1'],
['c', '4'],
['d', '2', '4'],
['e', '1']]
But I'm pretty new to python and coding in general and I'm not sure how to do this. Is there a way to code this? Or do I have to change the way the data is stored in either list?
Edit:
Changing list_alpha to a dictionary helped!
Final code:
dict_alpha = {'a': [], 'b': [], 'c': [], 'd': [], 'e':[]}
list_beta = [['1', 'a', 'e', 'b'],
['2', 'd', 'X', 'X'],
['3', 'a', 'X', 'X'],
['4', 'd', 'a', 'c'],
['5', 'X', 'X', 'e'],
['6', 'c', 'X', 'X']]
for letter in dict_alpha:
for item in list_beta:
if letter in item:
dict_alpha.get(letter).append(item[0])
print(dict_alpha)
You can use dict_alpha as same as list_alpha , then fix your for loop.
For example:
dict_alpha = [['a'],
['b'],
['c'],
['d'],
['e']]
list_beta = [['1', 'a', 'e', 'b'],
['2', 'd', 'X', 'X'],
['3', 'a', 'X', 'X'],
['4', 'd', 'a', 'c'],
['5', 'X', 'X', 'e'],
['6', 'c', 'X', 'X']]
for al in dict_alpha:
for bt in list_beta:
for i in range(1, len(bt)):
if (bt[i] == al[0]):
al.append(bt[0])
print(dict_alpha)
Output:
[['a', '1', '3', '4'],
['b', '1'],
['c', '4', '6'],
['d', '2', '4'],
['e', '1', '5']]
Hope to helpful!

Resources