How to check for identical strings in nested dictionaries - python-3.x

Let me explain, I'm working in a bank and I'm trying to make a short python script that calculates the percentage of different shareholders.
In my example EnterpriseA is owned by different Shareholders directly and indirectly I stored it as it follows :
EnterpriseA = {'Shareholder0': {'Shareholder1': 25, 'Shareholder2': 31, 'Shareholder3': 17, 'Shareholder4': 27},
'Shareholder3': {'Shareholder1': 34, 'Shareholder4': 66}}
I want to calculate how much each shareholders have of EntrepriseA, but I can't figure how to check if a shareholder appears multiple times in all my dictionaries.
What I'm thinking is checking if Shareholder1 appears multiple times if so calculate how many percentage he owns of EnterpriseA like this :
percentage = EnterpriseA['Shareholder0']['Shareholder1'] + (EnterpriseA['Shareholder0']['Shareholder3']*EnterperiseA['Shareholder3']['Shareholder1']/100)
I've made a quick drawing for better understanding

If the maximum depth is only ever singly nested then you can just write a little helper function.
Edit:
From what you've explained, 'Shareholder0' is basically a list of direct enterprise shares.
I've modified the helper function and included a constant reflecting that.
ENTERPRISE_SHARES = 'Shareholder0'
EnterpriseA = {
'Shareholder0': {
'Shareholder1': 25,
'Shareholder2': 31,
'Shareholder3': 17,
'Shareholder4': 27
},
'Shareholder3': {
'Shareholder1': 34,
'Shareholder4': 66
}
}
def calc_percent(enterprise, name):
parent_percents = enterprise[ENTERPRISE_SHARES]
total_percent = parent_percents.get(name, 0)
for shareholder, shares in enterprise.items():
if shareholder != ENTERPRISE_SHARES and shareholder != name:
total_percent += parent_percents[shareholder] / 100 * shares.get(name, 0)
return total_percent
print(calc_percent(EnterpriseA, 'Shareholder1'))
print(calc_percent(EnterpriseA, 'Shareholder2'))
print(calc_percent(EnterpriseA, 'Shareholder4'))

Related

Pagination URL Building with wildcards

I have a URL like this
website.com/news/*1/*2/
And have two objects with arrays such as:
{
*1: ['april', 'may', 'june'],
*2: [28, 29, 30],
}
So how should I build the list of possible URLs with given wildcard data?
The final result in this case should be:
website.com/news/april/28/
website.com/news/april/29/
website.com/news/april/30/
website.com/news/may/28/
website.com/news/may/29/
website.com/news/may/30/
website.com/news/june/28/
website.com/news/june/29/
website.com/news/june/30/
I know that the count of urls = *1.length * *2.length but cannot make the correct algorithm.
P.S. the number of wildcarded URL segments is not static, can be changed [1; n]
This is what I want. Could find after 2 days of researches.
https://github.com/luizomf/cartesianproduct

Python comparing values from two dictionaries where keys match and one set of values is greater

I have the following datasets:
kpi = {
"latency": 3,
"cpu_utilisation": 0.98,
"memory_utilisation": 0.95,
"MIR": 200,
}
ns_metrics = {
"timestamp": "2022-10-04T15:24:10.765000",
"ns_id": "cache",
"ns_data": {
"cpu_utilisation": 0.012666666666700622,
"memory_utilisation": 8.68265852766783,
},
}
What I'm looking for is an elegant way to compare the cpu_utilisation and memory_utilisation values from each dictionary and if the two utilisation figures from ns_metrics is greater than kpi, for now, print a message as to which utilisation value was greater,i.e. was it either cpu or memory or both. Naturally, I can do something simple like this:
if ns_metrics["ns_data"]["cpu_utilisation"] > kpi["cpu_utilisation"]:
print("true: over cpu threshold")
if ns_metrics["ns_data"]["memory_utilisation"] > kpi["memory_utilisation"]:
print("true: over memory threshold")
But this seems a bit longer winded to have many if conditions, and I was hoping there is a more elegant way of doing it. Any help would be greatly appreciated.
maybe you can use a loop to do this:
check_list = ["cpu_utilisation", "memory_utilisation"]
for i in check_list:
if ns_metrics["ns_data"][i] > kpi[i]:
print("true: over {} threshold".format(i.split('_')[0]))
if the key is different,you can use a mapping dict to do it,like this:
check_mapping = {"cpu_utilisation": "cpu_utilisation_1"}
for kpi_key, ns_key in check_mapping.items():
....

Failing to use sumproduct on date ranges with multiple conditions [Python]

From replacement data table (below on the image), I am trying to incorporate the solbox product replace in time series data format(above on the image). I need to extract out the number of consumers per day from the information.
What I need to find out:
On a specific date, which number of solbox product was active
On a specific date, which number of solbox product (which was a consumer) was active
I have used this line of code in excel but cannot implement this on python properly.
=SUMPRODUCT((Record_Solbox_Replacement!$O$2:$O$1367 = "consumer") * (A475>=Record_Solbox_Replacement!$L$2:$L$1367)*(A475<Record_Solbox_Replacement!$M$2:$M$1367))
I tried in python -
timebase_df['date'] = pd.date_range(start = replace_table_df['solbox_started'].min(), end = replace_table_df['solbox_started'].max(), freq = frequency)
timebase_df['date_unix'] = timebase_df['date'].astype(np.int64) // 10**9
timebase_df['no_of_solboxes'] = ((timebase_df['date_unix']>=replace_table_df['started'].to_numpy()) & (timebase_df['date_unix'] < replace_table_df['ended'].to_numpy() & replace_table_df['customer_type'] == 'customer']))
ERROR:
~\Anaconda3\Anaconda4\lib\site-packages\pandas\core\ops\array_ops.py in comparison_op(left, right, op)
232 # The ambiguous case is object-dtype. See GH#27803
233 if len(lvalues) != len(rvalues):
--> 234 raise ValueError("Lengths must match to compare")
235
236 if should_extension_dispatch(lvalues, rvalues):
ValueError: Lengths must match to compare
Can someone help me please? I can explain in comment section if I have missed something.

Python3: Adding two sets of dictionaries into new format

I have two dictionaries,
MaleDict = {'Jason':[(2014, 394),(2013, 350)...],
'Stephanie':[(2014, 3), (2013, 21),..]....}
FemaleDict = {'Jason':[(2014, 56),(2013, 23)...],
'Stephanie':[(2014, 335), (2013, 217),..]....}
I am attempting to add them so that
CompleteDict = {'Jason':[(2014, 394, 56),(2013, 350, 23)...],
'Stephanie':[(2014, 3, 335), (2013, 21, 217),..]....}
I have created a list comprehension that completes the task when the each dictionary has that year present. However, I need the output to present even if the year is not present in one of the MaleDict or FemaleDict. For example, if one year was not in the MaleDict the code would read ...'Stephanie':[....., (1999, 0, 389), ....]...
my list comprehensions are
for name_key in name_keys:
for year_key in year_keys:
[BaseDict[name_key].append((year_key, a[1], b[1])) for a in MaleDict[name_key] for b in FemaleDict[name_key] if (year_key == a[0] == b[0])]
#This is where I am stuck. My list comprehensions dont work when there is no value for a specific year
[BaseDict[name_key].append((year_key, a[1], 0)) for a in MaleDict[name_key] for b in FemaleDict[name_key] if (year_key == a[0] != b[0])]
[BaseDict[name_key].append((year_key, 0, b[1])) for a in MaleDict[name_key] for b in FemaleDict[name_key] if (year_key != a[0] == b[0])]
print(BaseDict)
If your data format is not set in stone, i would consider using a defaultdict from collections:
Instead of [(2014, 394),(2013, 350)...]
use collections.defaultdict(int, {2014: 394, 2013: 350}) etc.
Then you can use
for name_key in name_keys:
for year_key in year_keys:
CompleteDict[name_key].update([FemaleDict[year_key], MaleDict[year_key]])
CompleteDict['Stephanie'][1999] will then be [0, 389]

CouchDB historical view snapshots

I have a database with documents that are roughly of the form:
{"created_at": some_datetime, "deleted_at": another_datetime, "foo": "bar"}
It is trivial to get a count of non-deleted documents in the DB, assuming that we don't need to handle "deleted_at" in the future. It's also trivial to create a view that reduces to something like the following (using UTC):
[
{"key": ["created", 2012, 7, 30], "value": 39},
{"key": ["deleted", 2012, 7, 31], "value": 12}
{"key": ["created", 2012, 8, 2], "value": 6}
]
...which means that 39 documents were marked as created on 2012-07-30, 12 were marked as deleted on 2012-07-31, and so on. What I want is an efficient mechanism for getting the snapshot of how many documents "existed" on 2012-08-01 (0+39-12 == 27). Ideally, I'd like to be able to query a view or a DB (e.g. something that's been precomputed and saved to disk) with the date as the key or index, and get the count as the value or document. e.g.:
[
{"key": [2012, 7, 30], "value": 39},
{"key": [2012, 7, 31], "value": 27},
{"key": [2012, 8, 1], "value": 27},
{"key": [2012, 8, 2], "value": 33}
]
This can be computed easily enough by iterating through all of the rows in the view, keeping a running counter and summing up each day as I go, but that approach slows down as the data set grows larger, unless I'm smart about caching or storing the results. Is there a smarter way to tackle this?
Just for the sake of comparison (I'm hoping someone has a better solution), here's (more or less) how I'm currently solving it (in untested ruby pseudocode):
require 'date'
def date_snapshots(rows)
current_date = nil
current_count = 0
rows.inject({}) {|hash, reduced_row|
type, *ymd = reduced_row["key"]
this_date = Date.new(*ymd)
if current_date
# deal with the days where nothing changed
(current_date.succ ... this_date).each do |date|
key = date.strftime("%Y-%m-%d")
hash[key] = current_count
end
end
# update the counter and deal with the current day
current_date = this_date
current_count += reduced_row["value"] if type == "created_at"
current_count -= reduced_row["value"] if type == "deleted_at"
key = current_date.strftime("%Y-%m-%d")
hash[key] = current_count
hash
}
end
Which can then be used like so:
rows = couch_server.db(foo).design(bar).view(baz).reduce.group_level(3).rows
date_snapshots(rows)["2012-08-01"]
Obvious small improvement would be to add a caching layer, although it isn't quite as trivial to make that caching layer play nicely incremental updates (e.g. the changes feed).
I found an approach that seems much better than my original one, assuming that you only care about a single date:
def size_at(date=Time.now.to_date)
ymd = [date.year, date.month, date.day]
added = view.reduce.
startkey(["created_at"]).
endkey( ["created_at", *ymd, {}]).rows.first || {}
deleted = view.reduce.
startkey(["deleted_at"]).
endkey( ["deleted_at", *ymd, {}]).rows.first || {}
added.fetch("value", 0) - deleted.fetch("value", 0)
end
Basically, let CouchDB do the reduction for you. I didn't originally realize that you could mix and match reduce with startkey/endkey.
Unfortunately, this approach requires two hits to the DB (although those could be parallelized or pipelined). And it doesn't work as well when you want to get a lot of these sizes at once (e.g. view the whole history, rather than just look at one date).

Resources