Brightway ExcelImporter fails for new biosphere exchanges - brightway

I would like to import a formatted Excel file into brightway2 that contains custom biosphere exchanges.
For example, let's say I create the following biosphere activity:
import brightway2 as bw
ef = bw.Database("biosphere3").new_activity(code="foo")
ef['name'] = "bar"
ef['unit'] = "baz"
ef['categories'] = ('undefined',)
ef['type'] = 'new type'
ef.save()
Then, I have an Excel file where with a biosphere exchange that specify the name ('foo'), the database ('biosphere3') a type ('biosphere'), categories ('undefined') and a unit ('baz').
If I try importing the Excel file, my biosphere exchange remains unlinked:
imp = ExcelImporter(my_file)
imp.apply_strategies()
imp.match_database(fields=('name', 'unit', 'location'))
imp.statistics()
Gives: Type biosphere: 1 unique unlinked exchanges
However, if I do this:
import functools
from bw2io.strategies.generic import link_iterable_by_fields
imp.apply_strategy(functools.partial(
link_iterable_by_fields,
other=(obj for obj in Database("biosphere3")),
kind="biosphere",
fields=["name","categories","unit"]
))
Then all is good.
Why will the standard strategies not work?
I thought it may have something to do with a difference between the imposed code foo and the code generated by set_code_by_activity_hash, but even when I have a code column (which should prevent a new code from being associated with the exchange in the Excel), the standard strategies don't get me 100% there.
Is there something wrong with the way I'm creating the biosphere activity or the way I'm defining the Excel file fields?

Related

target_transform in torchvision.datasets.ImageFolder seems not to work

I am using PuyTorch 1.13 with Python 3.10.
I have a problem where I import pictures from a folder structure using
data = ImageFolder(root='./faces/', loader=img_loader, transform=transform,
is_valid_file=is_valid_file)
In this command labels are assigned automatically according to which subdirectory belongs an image.
I wanted to assign different labels and use target_transform for this purpose (e.g. I wanted to use a word from the file name to assign an appropriate label).
I have used
def target_transform(id):
print(2)
return id * 2
data = ImageFolder(root='./faces/', loader=img_loader, transform=transform, target_transform=target_transform, is_valid_file=is_valid_file)
Next,
data = ImageFolder(root='./faces/', loader=img_loader, transform=transform, target_transform=lambda id:2*id, is_valid_file=is_valid_file)
or
data = ImageFolder(root='./faces/', loader=img_loader, transform=transform, target_transform=
torchvision.transforms.Lambda(lambda id:2*id), is_valid_file=is_valid_file)
But none of these affect the labels. In addition, in the first example I included the print statemet to see whether the function is called but it is not. I have serached the use of this funciton but the exmaples I have found do not work and the documentation is scarce in this respect. Any idea what is wrogn with the code?

Stuck using pandas to build RPG item generator

I am trying to build a simple random item generator for a game I am working on.
So far I am stuck trying to figure out how to store and access all of the data. I went with pandas using .csv files to store the data sets.
I want to add weighted probabilities to what items are generated so I tried to read the csv files and compile each list into a new set.
I got the program to pick a random set but got stuck when trying to pull a random row from that set.
I am getting an error when I use .sample() to pull the item row which makes me think I don't understand how pandas works. I think I need to be creating new lists so I can later index and access the various statistics of the items once one is selected.
Once I pull the item I was intending on adding effects that would change the damage and armor and such displayed. So I was thinking of having the new item be its own list then use damage = item[2] + 3 or whatever I need
error is: AttributeError: 'list' object has no attribute 'sample'
Can anyone help with this problem? Maybe there is a better way to set up the data?
here is my code so far:
import pandas as pd
import random
df = [pd.read_csv('weapons.csv'), pd.read_csv('armor.csv'), pd.read_csv('aether_infused.csv')]
def get_item():
item_class = [random.choices(df, weights=(45,40,15), k=1)] #this part seemed to work. When I printed item_class it printed one of the entire lists at the correct odds
item = item_class.sample()
print (item) #to see if the program is working
get_item()
I think you are getting slightly confused with lists vs list elements. This should work. I stubbed your dfs with simple ones
import pandas as pd
import random
# Actual data. Comment it out if you do not have the csv files
df = [pd.read_csv('weapons.csv'), pd.read_csv('armor.csv'), pd.read_csv('aether_infused.csv')]
# My stubs -- uncomment and use this instead of the line above if you want to run this specific example
# df = [pd.DataFrame({'weapons' : ['w1','w2']}), pd.DataFrame({'armor' : ['a1','a2', 'a3']}), pd.DataFrame({'aether' : ['e1','e2', 'e3', 'e4']})]
def get_item():
# I removed [] from the line below -- choices() already returns a list of length 1
item_class = random.choices(df, weights=(45,40,15), k=1)
# I added [0] to choose the first element of item_class which is a list of length 1 from the line above
item = item_class[0].sample()
print (item) #to see if the program is working
get_item()
prints random rows from random dataframes that I setup such as
weapons
1 w2

Python. creating Pie chart using existing Object?

I'm working on a dataset called 'Crime Against Women in India.
I got the dataset from the website and tidy up the data using Excel.
For data Manipulation and Visualization i'm using python (3.0) on Jupyter Workbook (5.0.0 Version). Here's the the code I worked so far.
# importing Libraries
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
# Reading CSV File and naming the object called crime
crime=pd.read_csv("C:\\Users\\aneeq\\Documents\\python assignment\\crime.csv",index_col = None, skipinitialspace = True)
print(crime)
Now I can see my data. Now I want to do is to find out what type of crime has the most value against Woman in India in 2013. That's simple and I did that using the following code
Type = crime.loc[(crime.AreaName.isin(['All-India'])) & (crime.Year.isin([2013])) , ['Year', 'AreaName', 'Rape', 'Kidnapping', 'DowryDeaths', 'Assault', 'Insult', 'Cruelty']]
print(Type)
Results shows like this.
Year AreaName Rape Kidnapping DowryDeaths Assault Insult Cruelty
2013 All-India 33707 51881 8083 70739 12589 118866
Now , the next part is where I'm struggling with it at the moment. I want to make a piechart for the type of crimes that has the most values. You can see Cruelty('Cruelty by Husband or his relatives') has the most crime values than others.
I want to display 'Rape', 'Kidnapping', 'DowryDeaths', 'Assault', 'Insult' and 'Cruelty' on the Piechart (using matplotlib). Not 'Years' and 'AreaNames'.
This is my code so far
exp_val = Type.Rape, Type.Kidnapping, Type.DowryDeaths, Type.Assault, Type.Insult, Type.Cruelty
plt.pie(exp_val)
Not sure if my code is right. But anyways I got an error saying `'KeyError: 0'.
Can anyone help me with this and what is the right code for displaying Pie chart using existing object.

Accessing HDF5 file structure while omitting certain groups and datasets

I would like to access an HDF5 file structure with h5py, where the groups and data sets are stored as following :
/Group 1/Sub Group 1/*/Data set 1/
where the asterisk signifies a sub-sub group which has a unique address. However, its address is irrelevant, since I am simply interested in the data sets it contains. How can I access any random sub-sub group without having to specify its unique address?
Here is a script for a specific case:
import h5py as h5
deleteme = h5.File("deleteme.hdf5", "w")
nobody_in_particular = deleteme.create_group("/grp_1/subgr_1/nobody_in_particular/")
dt = h5.special_dtype(vlen=str)
dataset_1 = nobody_in_particular.create_dataset("dataset_1",(1,),dtype=dt)
dataset_1.attrs[str(1)] = "Some useful data 1"
dataset_1.attrs[str(2)] = "Some useful data 2"
deleteme.close()
# access data from nobody_in_particular subgroup and do something
deleteme = h5.File("deleteme.hdf5", "r")
deleteme["/grp_1/subgr_1/nobody_in_particular/dataset_1"]
This gives output:
<HDF5 dataset "dataset_1": shape (1,), type "|O">
Now I wish accomplish the same result, however without knowing who (or which group) in particular. Any random subgroup in place of nobody_in_particular will do for me. How can I access this random subgroup?
In other words:
deleteme["/grp_1/subgr_1/<any random sub-group>/dataset_1"]
Assuming you only want to read and not create groups/datasets, then using visit (http://docs.h5py.org/en/latest/high/group.html#Group.visit) with a suitable function will allow you to select the desired groups/datasets.

Saving multiple activities to database in a loop in Brightway

I have a loop that generates data and writes it to a database:
myDatabase = Database('myDatabase')
for i in range(10):
#some code here that generates dictionaries that can be saved as activities
myDatabase.write({('myDatabase', 'valid code'): activityDict})
Single activities thus created can be saved to the database. However, when creating more than one, the length of the database is always 1 and only the last activity makes its way to the database.
Because I have lots of very big datasets, it is not convenient to store all of them in a single dictionary and write to the database all at once.
Is there a way to incrementally add activities to an existing database?
Normal activity writing
Database.write() will replace the entire database. The best approach is to create the database in python, and then write the entire thing:
data = {}
for i in range(10):
# some code here that generates data
data['foo'] = 'bar'
Database('myDatabase').write(data)
Dynamically generating datasets
However, if you are dynamically creating aggregated datasets from an existing database, you can create the individual datasets in a custom generator. This generator will need to support the following:
__iter__: Returns the database keys. Used to check that each dataset belongs to the database being written. Therefor we only need to return the first element.
__len__: Number of datasets to write.
keys: Used to add keys to mapping.
values: Used to add activity locations to geomapping. As the locations will be the same in our source database and aggregated system database, we can just give the original datasets here.
items: The new keys and datasets.
Here is the code:
class IterativeSystemGenerator(object):
def __init__(self, from_db_name, to_db_name):
self.source = Database(from_db_name)
self.new_name = to_db_name
self.lca = LCA({self.source.random(): 1})
self.lca.lci(factorize=True)
def __len__(self):
return len(self.source)
def __iter__(self):
yield ((self.new_name,))
def get_exchanges(self):
vector = self.lca.inventory.sum(axis=1)
assert vector.shape == (len(self.lca.biosphere_dict), 1)
return [{
'input': flow,
'amount': float(vector[index]),
'type': 'biosphere',
} for flow, index in self.lca.biosphere_dict.items()
if abs(float(vector[index])) > 1e-17]
def keys(self):
for act in self.source:
yield (self.new_name, act['code'])
def values(self):
for act in self.source:
yield act
def items(self):
for act in self.source:
self.lca.redo_lci({act: 1})
obj = copy.deepcopy(act._data)
obj['database'] = self.new_name
obj['exchanges'] = self.get_exchanges()
yield ((self.new_name, obj['code']), obj)
And usage:
new_name = "ecoinvent 3.2 cutoff aggregated"
new_data = IterativeSystemGenerator("ecoinvent 3.2 cutoff", new_name)
Database(new_name).write(new_data)
Limitations of this approach
If you are writing so many datasets or exchanges within datasets that you are running into memory problems, then you are also probably using the wrong tool. The current system of database tables and matrix builders uses sparse matrices. In this case, dense matrices would make much more sense. For example, the IO table backend skips the database entirely, and just writes processed arrays. It will take a long time to load and create the biosphere matrix if it has 13.000 * 1.500 = 20.000.000 entries. In this specific case, my first instinct is to try one of the following:
Don't write the biosphere flows into the database, but save them separately per aggregated process, and then add them after the inventory calculation.
Create a separate database for each aggregated system process.

Resources