Error when validating impact assessment method data and issues with regional characterization factors - brightway

This question is somewhat related to this one.
I am attempting to validate the data for a new impact assessment method, prior to writing the method. The method data contains characterization factors for both global and regional interventions. I created a small toy example here.
I am trying to validate the data as follows:
my_method = Method(('my method', 'a method', 'oh what a method'))
method_data = [
(('biosphere', 'global intervention'),1, u'GLO'),
(('biosphere', 'regional intervention'),1, u'REG')
]
my_method.validate(method_data)
The following error occurs:
MultipleInvalid: expected a list # data[0]
No errors occur when attempting the write the method without validation. The error can be avoided by storing data in lists rather than tuples.
Is this a bug in the package or am I doing something wrong?
Furthermore, I am testing specifying regional identifiers for each characterization factors (as shown in the data above). This does not seem required, but when specifying an identifier other than u'GLO' the impacts are not accounted for in subsequent lca calculations. I test this in my example notebook.
Should one avoid specifying regional identifiers for characterization factors?

Validating your new method
What happens is that you need to "organize" your CFs as a list of lists, instead of a list of tuples:
my_method = Method(('my method', 'a method', 'oh what a method'))
method_data = [
[('biosphere', 'global intervention'),1, u'GLO'],
[('biosphere', 'regional intervention'),1, u'REG']
]
my_method.validate(method_data)
Validating an existing method
Suppose you want to copy an existing method, and update some CFs (or add a location or uncertainty data). You would be tempted to use the method data from the load() function of the Method class, but this data is not in "valid" format.
method_data = Method(('CML 2001 (obsolete)',
'acidification potential', 'generic')).load()
...
# modify the `method_data`
my_new_method = Method(('my method', 'a method', 'oh what a method'))
my_new_method.validate(method_data)
that would yield the following error:
MultipleInvalid Traceback (most recent call last)
<ipython-input-27-2fa012f6d12b> in <module>
2 my_method = Method(('my method', 'a method', 'oh what a method'))
3 my_method.validate([list(item) for item in method_data])
----> 4 my_method.validate(method_data)
/opt/conda/lib/python3.9/site-packages/bw2data/data_store.py in validate(self, data)
277 def validate(self, data):
278 """Validate data. Must be called manually."""
--> 279 self.validator(data)
280 return True
/opt/conda/lib/python3.9/site-packages/voluptuous/schema_builder.py in __call__(self, data)
270 """Validate data against this schema."""
271 try:
--> 272 return self._compiled([], data)
273 except er.MultipleInvalid:
274 raise
/opt/conda/lib/python3.9/site-packages/voluptuous/schema_builder.py in validate_sequence(path, data)
644 errors.append(invalid)
645 if errors:
--> 646 raise er.MultipleInvalid(errors)
647
648 if _isnamedtuple(data):
MultipleInvalid: expected a list # data[0]
You must transform it into a list of lists first:
my_method.validate([list(item) for item in method_data])

Related

Accessing Annotation of an Entity of ontology using owlready

I want to access the annotation properties of an entity from an ontology using owlready library. In the screenshot below I want to access the definition of the selected Entity i.e. impulse control disorder.
from owlready2 import *
onto = get_ontology("./HumanDO.owl").load()
for subclass_of_mental_health in list(onto.search(label ="disease of mental health")[0].subclasses()):
print(subclass_of_mental_health.id, subclass_of_mental_health.label) # This gives outputs (see below)
print(subclass_of_mental_health.definition) # This results in error
Above is the code to access the impulse control disorder entity. I was able to access id by simply using dot notation (<entity>.id) but when i try to <entity.definition> getting
['DOID:10937'] ['impulse control disorder']
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_14348\3127299132.py in <module>
1 for subclass_of_mental_health in list(onto.search(label ="disease of mental health")[0].subclasses()):
2 print(subclass_of_mental_health.id, subclass_of_mental_health.label)
----> 3 print(subclass_of_mental_health.definition)
E:\Python\PythonInstall\envs\new\lib\site-packages\owlready2\entity.py in __getattr__(Class, attr)
592 else:
593 Prop = Class.namespace.world._props.get(attr)
--> 594 if not Prop: raise AttributeError("'%s' property is not defined." % attr)
595 if issubclass_python(Prop, AnnotationProperty):
596 attr = "__%s" % attr # Do NOT cache as such in __dict__, to avoid inheriting annotations
AttributeError: 'definition' property is not defined.
The ontology look like this and you can download the actual file from here Github link
My final goal is to read some annotations and store them but stuck at just reading them. To make it clear, I am not able to access any property other than id and label

Callback in JAX fori_loop

Is it possible to have callbacks inside a function passed to JAX fori_loop?
In my case, the callback will save to disk some of the intermediate results produced in the function.
I tried something like this:
def callback(values):
# do something
def diffusion_loop(i, args):
# do something
callback(results)
return results
final_result, _ = jax.lax.fori_loop(0, num_steps, diffusion_loop, (arg1, arg2))
But then if I use final_result or whatever was saved from the callback I get an error like this
UnfilteredStackTrace: jax._src.errors.UnexpectedTracerError: Encountered an unexpected tracer. A function transformed by JAX had a side effect, allowing for a reference to an intermediate value with type float32[1,4,64,64] wrapped in a DynamicJaxprTracer to escape the scope of the transformation.
JAX transformations require that functions explicitly return their outputs, and disallow saving intermediate values to global state.
The function being traced when the value leaked was scanned_fun at /usr/local/lib/python3.8/dist-packages/jax/_src/lax/control_flow/loops.py:1606 traced for scan.
------------------------------
The leaked intermediate value was created on line /usr/local/lib/python3.8/dist-packages/diffusers/schedulers/scheduling_pndm_flax.py:508 (_get_prev_sample).
------------------------------
When the value was created, the final 5 stack frames (most recent last) excluding JAX-internal frames were:
------------------------------
<timed exec>:81 (<module>)
<timed exec>:67 (diffusion_loop)
/usr/local/lib/python3.8/dist-packages/diffusers/schedulers/scheduling_pndm_flax.py:264 (step)
/usr/local/lib/python3.8/dist-packages/diffusers/schedulers/scheduling_pndm_flax.py:472 (step_plms)
/usr/local/lib/python3.8/dist-packages/diffusers/schedulers/scheduling_pndm_flax.py:508 (_get_prev_sample)
------------------------------
To catch the leak earlier, try setting the environment variable JAX_CHECK_TRACER_LEAKS or using the `jax.checking_leaks` context manager.
See https://jax.readthedocs.io/en/latest/errors.html#jax.errors.UnexpectedTracerError
It sounds like you want to do a callback to the host that is impure (i.e. it has a side-effect of saving values to disk) and does not return any values to the runtime. For that, one option is jax.experimental.host_callback.id_tap, discussed in the docs here.
For example:
import jax
from jax.experimental import host_callback as hcb
def callback(value, transforms):
# do something
print(f"callback: {value}")
def diffusion_loop(i, args):
hcb.id_tap(callback, i)
return args
args = (1, 2)
result, _ = jax.lax.fori_loop(0, 5, diffusion_loop, args)
callback: 0
callback: 1
callback: 2
callback: 3
callback: 4

Multiprocessing.Pool: can not iterate over IMapIterator object in AWS Batch because of PicklingError

I need to request huge bulk of data from an API endpoint and I want to use multiprocessing (vs multithreading, company framework limitations)
I have a multiprocessing.Pool with predefined concurrency CONCURRENCY in a class called Batcher. The class looks like this:
class Batcher:
def __init__(self, concurrency: int = 8):
self.concurrency = concurrency
def _interprete_response_to_succ_or_err(self, resp: requests.Response) -> str:
if isinstance(resp, str):
if "Error:" in resp:
return "dlq"
else:
return "err"
if isinstance(resp, requests.Response):
if resp.status_code == 200:
return "succ"
else:
return "err"
def _fetch_dat_data(self, id: str) -> requests.Response:
try:
resp = requests.get(API_ENDPOINT)
return resp
except Exception as e:
return f"ID {id} -> Error: {str(e)}"
def _dispatch_batch(self, batch: list) -> dict:
pool = MPool(self.concurrency)
results = pool.imap(self._fetch_dat_data, batch)
pool.close()
pool.join()
return results
def _run_batch(self, id):
return self._dispatch_batch(id)
def start(self, id_list: list):
""" In real class, this function will create smaller
batches from bigger chunks of data """
results = self._run_batch(id_list)
print(
[
res.text
for res in results
if self._interprete_response_to_succ_or_err(res) == "succ"
]
)
this class is called in file like this
if __name__ == "__main__":
"""
the source of ids is a csv file with single column in s3 that contains list
of columns with single id per line
"""
id_list = boto3_get_object_body(my_file_name).decode().split("\n") # custom function, works
batcher = Batcher()
batcher.start(id_list)
This script is a part of AWS Batch Job that is triggered via CLI. the same function runs perfectly on my local machine with same environment as in AWS Batch. It throws
_pickle.PicklingError: Can't pickle <class 'boto3.resources.factory.s3.ServiceResource'>: attribute lookup s3.ServiceResource on boto3.resources.factory failed
in the line where I try to iterate over IMapIterator object results that is generated by pool.imap()
Relevant Traceback:
for res in results
File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 870, in next
raise value
File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 537, in _handle_tasks
put(task)
File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 211, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/local/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'boto3.resources.factory.s3.ServiceResource'>: attribute lookup s3.ServiceResource on boto3.resources.factory failed
I am wondering if I am missing something blatantly obvious or this issue is related to EC2 Instance spun on by batch job at this point and appreciate any kind of lead to root cause analysis.
This error happens because multiprocessing could not import the relevant datatype for duplicating data or calling the target function in the new process it started. This usually happens when the object necessary for the target function to run is created someplace the child process do not know about (for example, a class created inside the if __name__ ==... block in main module), or if the object's __qualname__ property has been fiddled with (you might see this using something similar to functools.wraps or monkey-patching in general)
Therefore, to actually "fix" this, you need to dig in your code and see if the above is true. A good place to start is with the class that is raising the issue (in this case it's boto3.resources.factory.s3.ServiceResource), can you import this in the main module before the if __name__... block runs?
However, most of the times, you can get away with by simply reducing the data required to start the target function (less data = less chances for faults occuring). In this case, the target function you are calling in the pool is an instance method. To start this function in a new process, multiprocessing would need to pickle all the instance attributes, which might have their own instance attributes, and so on. Not only does this add overhead, it could also be possible that the problem lies in a particular instance attribute. Therefore, just as a good practice, if your target function can run independently but is currently an instance method, change it a to staticmethod instead.
In this case, this would mean changing _fetch_dat_data to a staticmethod, and submitting it to the pool using type(self)._fetch_dat_data instead.

Best way to extend HTTPStatus with custom value

I am extending HTTPStatus with a custom value:
from http import HTTPStatus
HTTPStatus.MY_CUSTOM_SERVICE_TIMEOUT = 573
I am wondering why I do not see that value when inspecting HTTPStatus:
>>> dir(HTTPStatus)
['ACCEPTED', 'ALREADY_REPORTED', 'BAD_GATEWAY', 'BAD_REQUEST', 'CONFLICT', 'CONTINUE', 'CREATED', 'EXPECTATION_FAILED', 'FAILED_DEPENDENCY', 'FORBIDDEN', 'FOUND', 'GATEWAY_TIMEOUT', 'GONE', 'HTTP_VERSION_NOT_SUPPORTED', 'IM_USED', 'INSUFFICIENT_STORAGE', 'INTERNAL_SERVER_ERROR', 'LENGTH_REQUIRED', 'LOCKED', 'LOOP_DETECTED', 'METHOD_NOT_ALLOWED', 'MOVED_PERMANENTLY', 'MULTIPLE_CHOICES', 'MULTI_STATUS', 'NETWORK_AUTHENTICATION_REQUIRED', 'NON_AUTHORITATIVE_INFORMATION', 'NOT_ACCEPTABLE', 'NOT_EXTENDED', 'NOT_FOUND', 'NOT_IMPLEMENTED', 'NOT_MODIFIED', 'NO_CONTENT', 'OK', 'PARTIAL_CONTENT', 'PAYMENT_REQUIRED', 'PERMANENT_REDIRECT', 'PRECONDITION_FAILED', 'PRECONDITION_REQUIRED', 'PROCESSING', 'PROXY_AUTHENTICATION_REQUIRED', 'REQUESTED_RANGE_NOT_SATISFIABLE', 'REQUEST_ENTITY_TOO_LARGE', 'REQUEST_HEADER_FIELDS_TOO_LARGE', 'REQUEST_TIMEOUT', 'REQUEST_URI_TOO_LONG', 'RESET_CONTENT', 'SEE_OTHER', 'SERVICE_UNAVAILABLE', 'SWITCHING_PROTOCOLS', 'TEMPORARY_REDIRECT', 'TOO_MANY_REQUESTS', 'UNAUTHORIZED', 'UNPROCESSABLE_ENTITY', 'UNSUPPORTED_MEDIA_TYPE', 'UPGRADE_REQUIRED', 'USE_PROXY', 'VARIANT_ALSO_NEGOTIATES', '__class__', '__doc__', '__members__', '__module__']
The value itself is available:
>>> HTTPStatus.MY_CUSTOM_SERVICE_TIMEOUT
573
Is there something strange going on? Should I approach this differently?
That's because http.HTTPStatus is an Enum and Python doesn't really, trully have Enum as a generic type (which is why you can do what you're doing - languages that actually recognize Enum as something special wouldn't let you mess with it like this in general). Of course, Python does its best to make Enums behave as they should (immutable, iterable, mappable...).
There is actually a collections.OrderedDict underneath (Enum._member_map_) that gets created when you create a new Enum type - it reads in the members, aliases the duplicates and adds an additional value -> enum member map as Enum._value2member_map_ (all of that is done by the enum.EnumMeta metaclass). When you dir() an enum - you get that map (or more precisely, the enum names list available in the Enum._member_names_ list) and any changes you may have applied at runtime doesn't count (otherwise it wouldn't be immutable). In other words, when you do HTTPStatus.MY_CUSTOM_SERVICE_TIMEOUT = 573 you're not extending the Enum, you're just adding a dynamic property to the Enum object in question.
You should extend your Enum types the regular, OOP way if you want to add custom members... except Python won't let you do this either. So if you really insist doing it run-time you can, kind of, hack the internal structure to make Python believe your enum value was there all along:
# HERE BE DRAGONS!
# DO NOT do this unless you absolutely have to.
from http import HTTPStatus
def add_http_status(name, value, phrase, description=''):
# call our new member factory, it's essentially the `HTTPStatus.__new__` method
new_status = HTTPStatus.__new_member__(HTTPStatus, value, phrase, description)
new_status._name_ = name # store the enum's member internal name
new_status.__objclass__ = HTTPStatus.__class__ # store the enum's member parent class
setattr(HTTPStatus, name, new_status) # add it to the global HTTPStatus namespace
HTTPStatus._member_map_[name] = new_status # add it to the name=>member map
HTTPStatus._member_names_.append(name) # append the names so it appears in __members__
HTTPStatus._value2member_map_[value] = new_status # add it to the value=>member map
And now you can 'really' extend the HTTPStatus at runtime:
try:
print(HTTPStatus(573))
except ValueError as e:
print(e)
print("MY_CUSTOM_SERVICE_TIMEOUT" in dir(HTTPStatus))
add_http_status("MY_CUSTOM_SERVICE_TIMEOUT", 573, "Custom service timeout")
print("MY_CUSTOM_SERVICE_TIMEOUT" in dir(HTTPStatus))
print(HTTPStatus(573))
print(HTTPStatus.MY_CUSTOM_SERVICE_TIMEOUT.value)
print(HTTPStatus(573).phrase)
Which will give you:
573 is not a valid HTTPStatus
False
True
HTTPStatus.MY_CUSTOM_SERVICE_TIMEOUT
573
Custom service timeout
Keep in mind that this code doesn't handle aliasing, de-duplication and other nice things that you should absolutely be doing if you want to arbitrary extend an Enum so don't use duplicate or invalid values or you'll break it (in a sense it won't work as expected afterwards). Check the additional steps taken in enum.EnumMeta.__new__() to ensure its validity.
Use the extend_enum function from the aenum library1:
import aenum
import http
aenum.extend_enum(http.HTTPStatus, 'CustomTimeout', 537, 'more helpful phrase here')
Which results in:
>>> list(http.HTTPStatus)
[<HTTPStatus.CONTINUE: 100>,
...,
<HTTPStatus.CustomTimeout: 537>]
1 Disclosure: I am the author of the Python stdlib Enum, the enum34 backport, and the Advanced Enumeration (aenum) library.

Wrapping all possible method calls of a class in a try/except block

I'm trying to wrap all methods of an existing Class (not of my creation) into a try/except suite. It could be any Class, but I'll use the pandas.DataFrame class here as a practical example.
So if the invoked method succeeds, we simply move on. But if it should generate an exception, it is appended to a list for later inspection/discovery (although the below example just issues a print statement for simplicity).
(Note that the kinds of data-related exceptions that can occur when a method on the instance is invoked, isn't yet known; and that's the reason for this exercise: discovery).
This post was quite helpful (particularly #martineau Python-3 answer), but I'm having trouble adapting it. Below, I expected the second call to the (wrapped) info() method to emit print output but, sadly, it doesn't.
#!/usr/bin/env python3
import functools, types, pandas
def method_wrapper(method):
#functools.wraps(method)
def wrapper(*args, **kwargs): #Note: args[0] points to 'self'.
try:
print('Calling: {}.{}()... '.format(args[0].__class__.__name__,
method.__name__))
return method(*args, **kwargs)
except Exception:
print('Exception: %r' % sys.exc_info()) # Something trivial.
#<Actual code would append that exception info to a list>.
return wrapper
class MetaClass(type):
def __new__(mcs, class_name, base_classes, classDict):
newClassDict = {}
for attributeName, attribute in classDict.items():
if type(attribute) == types.FunctionType: # Replace it with a
attribute = method_wrapper(attribute) # decorated version.
newClassDict[attributeName] = attribute
return type.__new__(mcs, class_name, base_classes, newClassDict)
class WrappedDataFrame2(MetaClass('WrappedDataFrame',
(pandas.DataFrame, object,), {}),
metaclass=type):
pass
print('Unwrapped pandas.DataFrame().info():')
pandas.DataFrame().info()
print('\n\nWrapped pandas.DataFrame().info():')
WrappedDataFrame2().info()
print()
This outputs:
Unwrapped pandas.DataFrame().info():
<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Empty DataFrame
Wrapped pandas.DataFrame().info(): <-- Missing print statement after this line.
<class '__main__.WrappedDataFrame2'>
Index: 0 entries
Empty WrappedDataFrame2
In summary,...
>>> unwrapped_object.someMethod(...)
# Should be mirrored by ...
>>> wrapping_object.someMethod(...)
# Including signature, docstring, etc. (i.e. all attributes); except that it
# executes inside a try/except suite (so I can catch exceptions generically).
long time no see. ;-) In fact it's been such a long time you may no longer care, but in case you (or others) do...
Here's something I think will do what you want. I've never answered your question before now because I don't have pandas installed on my system. However, today I decided to see if there was a workaround for not having it and created a trivial dummy module to mock it (only as far as I needed). Here's the only thing in it:
mockpandas.py:
""" Fake pandas module. """
class DataFrame:
def info(self):
print('pandas.DataFrame.info() called')
raise RuntimeError('Exception raised')
Below is code that seems to do what you need by implementing #Blckknght's suggestion of iterating through the MRO—but ignores the limitations noted in his answer that could arise from doing it that way). It ain't pretty, but as I said, it seems to work with at least the mocked pandas library I created.
import functools
import mockpandas as pandas # mock the library
import sys
import traceback
import types
def method_wrapper(method):
#functools.wraps(method)
def wrapper(*args, **kwargs): # Note: args[0] points to 'self'.
try:
print('Calling: {}.{}()... '.format(args[0].__class__.__name__,
method.__name__))
return method(*args, **kwargs)
except Exception:
print('An exception occurred in the wrapped method {}.{}()'.format(
args[0].__class__.__name__, method.__name__))
traceback.print_exc(file=sys.stdout)
# (Actual code would append that exception info to a list)
return wrapper
class MetaClass(type):
def __new__(meta, class_name, base_classes, classDict):
""" See if any of the base classes were created by with_metaclass() function. """
marker = None
for base in base_classes:
if hasattr(base, '_marker'):
marker = getattr(base, '_marker') # remember class name of temp base class
break # quit looking
if class_name == marker: # temporary base class being created by with_metaclass()?
return type.__new__(meta, class_name, base_classes, classDict)
# Temporarily create an unmodified version of class so it's MRO can be used below.
TempClass = type.__new__(meta, 'TempClass', base_classes, classDict)
newClassDict = {}
for cls in TempClass.mro():
for attributeName, attribute in cls.__dict__.items():
if isinstance(attribute, types.FunctionType):
# Convert it to a decorated version.
attribute = method_wrapper(attribute)
newClassDict[attributeName] = attribute
return type.__new__(meta, class_name, base_classes, newClassDict)
def with_metaclass(meta, classname, bases):
""" Create a class with the supplied bases and metaclass, that has been tagged with a
special '_marker' attribute.
"""
return type.__new__(meta, classname, bases, {'_marker': classname})
class WrappedDataFrame2(
with_metaclass(MetaClass, 'WrappedDataFrame', (pandas.DataFrame, object))):
pass
print('Unwrapped pandas.DataFrame().info():')
try:
pandas.DataFrame().info()
except RuntimeError:
print(' RuntimeError exception was raised as expected')
print('\n\nWrapped pandas.DataFrame().info():')
WrappedDataFrame2().info()
Output:
Unwrapped pandas.DataFrame().info():
pandas.DataFrame.info() called
RuntimeError exception was raised as expected
Wrapped pandas.DataFrame().info():
Calling: WrappedDataFrame2.info()...
pandas.DataFrame.info() called
An exception occurred in the wrapped method WrappedDataFrame2.info()
Traceback (most recent call last):
File "test.py", line 16, in wrapper
return method(*args, **kwargs)
File "mockpandas.py", line 9, in info
raise RuntimeError('Exception raised')
RuntimeError: Exception raised
As the above illustrates, the method_wrapper() decoratored version is being used by methods of the wrapped class.
Your metaclass only applies your decorator to the methods defined in classes that are instances of it. It doesn't decorate inherited methods, since they're not in the classDict.
I'm not sure there's a good way to make it work. You could try iterating through the MRO and wrapping all the inherited methods as well as your own, but I suspect you'd get into trouble if there were multiple levels of inheritance after you start using MetaClass (as each level will decorate the already decorated methods of the previous class).

Resources