Python3: Adding two sets of dictionaries into new format - python-3.x

I have two dictionaries,
MaleDict = {'Jason':[(2014, 394),(2013, 350)...],
'Stephanie':[(2014, 3), (2013, 21),..]....}
FemaleDict = {'Jason':[(2014, 56),(2013, 23)...],
'Stephanie':[(2014, 335), (2013, 217),..]....}
I am attempting to add them so that
CompleteDict = {'Jason':[(2014, 394, 56),(2013, 350, 23)...],
'Stephanie':[(2014, 3, 335), (2013, 21, 217),..]....}
I have created a list comprehension that completes the task when the each dictionary has that year present. However, I need the output to present even if the year is not present in one of the MaleDict or FemaleDict. For example, if one year was not in the MaleDict the code would read ...'Stephanie':[....., (1999, 0, 389), ....]...
my list comprehensions are
for name_key in name_keys:
for year_key in year_keys:
[BaseDict[name_key].append((year_key, a[1], b[1])) for a in MaleDict[name_key] for b in FemaleDict[name_key] if (year_key == a[0] == b[0])]
#This is where I am stuck. My list comprehensions dont work when there is no value for a specific year
[BaseDict[name_key].append((year_key, a[1], 0)) for a in MaleDict[name_key] for b in FemaleDict[name_key] if (year_key == a[0] != b[0])]
[BaseDict[name_key].append((year_key, 0, b[1])) for a in MaleDict[name_key] for b in FemaleDict[name_key] if (year_key != a[0] == b[0])]
print(BaseDict)

If your data format is not set in stone, i would consider using a defaultdict from collections:
Instead of [(2014, 394),(2013, 350)...]
use collections.defaultdict(int, {2014: 394, 2013: 350}) etc.
Then you can use
for name_key in name_keys:
for year_key in year_keys:
CompleteDict[name_key].update([FemaleDict[year_key], MaleDict[year_key]])
CompleteDict['Stephanie'][1999] will then be [0, 389]

Related

How to dynamically call variables and dynamically slice them with ```exec()```?

I'm doing a mapping of data. I have a lot of tables, one of them is "abonnement" for example, here I have multiple columns (id, datemajaudit, etc.).
Example of col dictionary;
{'abonnement': ['id',
'datemajaudit',
'objversion',
'profilmajaudit',
'supprime',
'userconnected',
'categorieabonnement',
'code',
'isaffichable',
'isextranetunsubscribed',
'libelle',
'media'],
'abonnement_carte_paiement': ['id',
'datemajaudit',
'objversion',
for every table, I created a variable that is named d_{table}, d_abonnement for example that contains some information about every column of the table.
Example of d_constab_assoc_equipement_infos_etat
#out[9]:
#[{1000: 3, 1002: 3}, {1022: 1, 1044: 1, 1059: 1, 1049: 1, 1051: 1, 1061: 1}]
Now I want to put this information on a txt, in this format:
abonnement
id
1000: 3, |
1002: 3, |
datemajaudit
1022: 1,
1044: 1,
1059: 1,
1049: 1,
1051: 1,
1061: 1
I do it with this code :
ab = open("premier.txt","w")
t=[0]
exec(f'''for i, zones in list(col.items())[0:300]:
print(f'\t%s'%i, file=ab)
for t, zone in enumerate(zones):
print(f'\t\t%s'%zone, file=ab)
print("\\t\t\t\t")
for q, v in d_{i}[t].items():
print("èèèèè", q, "(", v, '),', end = " ", file=ab)
print("|", file=ab)''')
ab.close()
The issue is that I have some difficulties to call the variable that I created, I want to dynamically call the variable with this name: d_abonnement[1], d_abonnement[2], d_abonnement[3], and want to do it for all tables so also d_constab_assoc_equipement_infos_etat[0], d_constab_assoc_equipement_infos_etat[1], constab_assoc_equipement_infos_etat[2], etc, d_devis[0], d_devis[1], d_devis[2]
But with this part of my code for q, v in d_{i}[t].items(): I have some issues, because it returns me on my file txt all the table, for each table, the good columns, but for all tables I have the values of the first table. In other words I cannot have some other tables names d_constab_assoc_equipement_infos_etat[0], d_devis[0], etc, but only one table name, the first one that turns with the code, and it fullfils the data for all the others.

How to check for identical strings in nested dictionaries

Let me explain, I'm working in a bank and I'm trying to make a short python script that calculates the percentage of different shareholders.
In my example EnterpriseA is owned by different Shareholders directly and indirectly I stored it as it follows :
EnterpriseA = {'Shareholder0': {'Shareholder1': 25, 'Shareholder2': 31, 'Shareholder3': 17, 'Shareholder4': 27},
'Shareholder3': {'Shareholder1': 34, 'Shareholder4': 66}}
I want to calculate how much each shareholders have of EntrepriseA, but I can't figure how to check if a shareholder appears multiple times in all my dictionaries.
What I'm thinking is checking if Shareholder1 appears multiple times if so calculate how many percentage he owns of EnterpriseA like this :
percentage = EnterpriseA['Shareholder0']['Shareholder1'] + (EnterpriseA['Shareholder0']['Shareholder3']*EnterperiseA['Shareholder3']['Shareholder1']/100)
I've made a quick drawing for better understanding
If the maximum depth is only ever singly nested then you can just write a little helper function.
Edit:
From what you've explained, 'Shareholder0' is basically a list of direct enterprise shares.
I've modified the helper function and included a constant reflecting that.
ENTERPRISE_SHARES = 'Shareholder0'
EnterpriseA = {
'Shareholder0': {
'Shareholder1': 25,
'Shareholder2': 31,
'Shareholder3': 17,
'Shareholder4': 27
},
'Shareholder3': {
'Shareholder1': 34,
'Shareholder4': 66
}
}
def calc_percent(enterprise, name):
parent_percents = enterprise[ENTERPRISE_SHARES]
total_percent = parent_percents.get(name, 0)
for shareholder, shares in enterprise.items():
if shareholder != ENTERPRISE_SHARES and shareholder != name:
total_percent += parent_percents[shareholder] / 100 * shares.get(name, 0)
return total_percent
print(calc_percent(EnterpriseA, 'Shareholder1'))
print(calc_percent(EnterpriseA, 'Shareholder2'))
print(calc_percent(EnterpriseA, 'Shareholder4'))

Get the group ID sorted from "/etc/group"

I'd like to manipulate the "/etc/group"
In [39]: fp = open("/etc/group")
In [40]: content = [c.replace("\n", "") for c in fp.readlines()]
In [42]: content
Out[42]:
['root:x:0:',
'bin:x:1:',
'daemon:x:2:',
'sys:x:3:',
'adm:x:4:',
'tty:x:5:',
'disk:x:6:',
'lp:x:7:',
'mem:x:8:',
'kmem:x:9:',
'wheel:x:10:',
'cdrom:x:11:',
'mail:x:12:postfix',
'man:x:15:',
'dialout:x:18:',....]
The result is sorted by alphabet rather than the group ID
In [44]: sorted(content, key=lambda c:int(re.search(r"\d+",c).group()))
Out[44]:
['root:x:0:',
'bin:x:1:',
'daemon:x:2:',
'sys:x:3:',
'adm:x:4:',
'tty:x:5:',
'disk:x:6:',
'lp:x:7:',
'mem:x:8:',
'kmem:x:9:',
'wheel:x:10:',
'cdrom:x:11:',
'mail:x:12:postfix',
'man:x:15:',
'dialout:x:18:',
I get it done with re.search and lambda in a unwired way,
Could it be solved in an elegant style?
Sort by the third colon-defined field:
sorted(content, key=lambda x: int(x.split(':')[2]))

Assigning values to imported variables from excel

I need to import an excel document into mathematica which has 2000 compounds in it, with each compound have 6 numerical constants assigned to it. The end goal is to type a compound name into mathematica and have the 6 numerical constants be outputted. So far my code is:
t = Import["Titles.txt.", {"Text", "Lines"}] (imports compound names)
n = Import["NA.txt.", "List"] (imports the 6 values for each compound)
n[[2]] (outputs the second compounds 6 values)
Instead of n[[#]] i would like to know how to type in a compound from the imported compound names and have the 6 values be outputted .
I'm not sure if I understand your question - you have two text files, rather than an Excel file, for example, and it's not clear what the data looks like. But there are probably plenty of ways to do this. Here's a suggestion (it might not be the best way):
Let's assume that you've got all your data into a table (a list of lists):
pt = {
{"Hydrogen", "H", 1, 1.0079, -259, -253, 0.09, 0.14, 1776, 1, 13.5984},
{"Helium", "He", 2, 4.0026, -272, -269, 0, 0, 1895, 18, 24.5874},
{"Lithium" , "Li", 3, 6.941, 180, 1347, 0.53, 0, 1817, 1, 5.3917}
}
To find the information associated with a particular string:
Cases[pt, {"Helium", rest__} -> rest]
{"He", 2, 4.0026, -272, -269, 0, 0, 1895, 18, 24.5874}
where the pattern rest__ holds everything that was found after "Helium".
To look for the second item:
Cases[pt, {_, "Li", rest__} -> rest]
{2, 4.0026, -272, -269, 0, 0, 1895, 18, 24.5874}
If you add more information to the patterns, you have more flexibility in how you choose elements from the table:
Cases[pt, {name_, symbol_, aNumber_, aWeight_, mp_, bp_, density_,
crust_, discovered_, rest__}
/; discovered > 1850 -> {name, symbol, discovered}]
{{"Helium", "He", 1895}}
For something interactive, you could knock up a Manipulate:
elements = pt[[All, 1]];
headings = {"symbol", "aNumber", "aWeight", "mp", "bp", "density", "crust", "discovered", "group", "ion"};
Manipulate[
Column[{
elements[[x]],
TableForm[{
headings, Cases[pt, {elements[[x]], rest__} -> rest]}]}],
{x, 1, Length[elements], 1}]

How to select based on a partial string match in Mathematica

Say I have a matrix that looks something like this:
{{foobar, 77},{faabar, 81},{foobur, 22},{faabaa, 8},
{faabian, 88},{foobar, 27}, {fiijii, 52}}
and a list like this:
{foo, faa}
Now I would like to add up the numbers for each line in the matrix based on the partial match of the strings in the list so that I end up with this:
{{foo, 126},{faa, 177}}
I assume I need to map a Select command, but I am not quite sure how to do that and match only the partial string. Can anybody help me? Now my real matrix is around 1.5 million lines so something that isn't too slow would be of added value.
Here is a starting point:
data={{"foobar",77},{"faabar",81},{"foobur",22},{"faabaa",8},{"faabian",88},{"foobar",27},{"fiijii",52}};
{str,vals}=Transpose[data];
vals=Developer`ToPackedArray[vals];
findValPos[str_List,strPat_String]:=
Flatten[Developer`ToPackedArray[
Position[StringPosition[str,strPat],Except[{}],{1},Heads->False]]]
Total[vals[[findValPos[str,"faa"]]]]
Here is yet another approach. It is reasonably fast, and also concise.
data =
{{"foobar", 77},
{"faabar", 81},
{"foobur", 22},
{"faabaa", 8},
{"faabian", 88},
{"foobar", 27},
{"fiijii", 52}};
match = {"foo", "faa"};
f = {#2, Tr # Pick[#[[All, 2]], StringMatchQ[#[[All, 1]], #2 <> "*"]]} &;
f[data, #]& /# match
{{"foo", 126}, {"faa", 177}}
You can use ruebenko's pre-processing for greater speed.
This is about twice as fast as his method on my system:
{str, vals} = Transpose[data];
vals = Developer`ToPackedArray[vals];
f2 = {#, Tr # Pick[vals, StringMatchQ[str, "*" <> # <> "*"]]} &;
f2 /# match
Notice that in this version I test substrings that are not at the beginning, to match ruebenko's output. If you want to only match at the beginning of strings, which is what I assumed in the first function, it will be faster still.
make data
mat = {{"foobar", 77},
{"faabar", 81},
{"foobur", 22},
{"faabaa", 8},
{"faabian", 88},
{"foobar", 27},
{"fiijii", 52}};
lst = {"foo", "faa"};
now select
r1 = Select[mat, StringMatchQ[lst[[1]], StringTake[#[[1]], 3]] &];
r2 = Select[mat, StringMatchQ[lst[[2]], StringTake[#[[1]], 3]] &];
{{lst[[1]], Total#r1[[All, 2]]}, {lst[[2]], Total#r2[[All, 2]]}}
gives
{{"foo", 126}, {"faa", 177}}
I'll try to make it more functional/general if I can...
edit(1)
This below makes it more general. (using same data as above):
foo[mat_, lst_] := Select[mat, StringMatchQ[lst, StringTake[#[[1]], 3]] &]
r = Map[foo[mat, #] &, lst];
MapThread[ {#1, Total[#2[[All, 2]]]} &, {lst, r}]
gives
{{"foo", 126}, {"faa", 177}}
So now same code above will work if lst was changed to 3 items instead of 2:
lst = {"foo", "faa", "fii"};
How about:
list = {{"foobar", 77}, {"faabar", 81}, {"foobur", 22}, {"faabaa",
8}, {"faabian", 88}, {"foobar", 27}, {"fiijii", 52}};
t = StringTake[#[[1]], 3] &;
{t[#[[1]]], Total[#[[All, 2]]]} & /# SplitBy[SortBy[list, t], t]
{{"faa", 177}, {"fii", 52}, {"foo", 126}}
I am sure I have read a post, maybe here, in which someone described a function that effectively combined sorting and splitting but I cannot remember it. Maybe someone else can add a comment if they know of it.
Edit
ok must be bedtime -- how could I forget Gatherby
{t[#[[1]]], Total[#[[All, 2]]]} & /# GatherBy[list, t]
{{"foo", 126}, {"faa", 177}, {"fii", 52}}
Note that for a dummy list of 1.4 million pairs this took a couple of seconds so not exactly a super fast method.

Resources