pyhf test-statistics with toys

pyhf test-statistics with toys - statistics

Dear pyhf creators and contributors, first of all let me thank you for taking the time to undertake this project, it has already been very useful to me in checking simple analyses and conclusions.
From the github page, I read that "Non-asymptotic calculators" are in the To-do list. The first part of my question is what is the current status for pyhf-integrated computations using toys?
The second part, is what is the most straight-forward way for someone to use the existing pyhf schema to produce computations with toys, in a pyhf model that has been provided with any nuisance parameters? Are the pyhf.infer.hypotest test-statistics, shown in the "Empirical Test Statistics" hepdata_like example, capable of handling all pyhf schema modifiers and full-fledged models?

Thanks for your question. All APIs discussed in this answer correspond to the v0.6.0 API.
The first part of my question is what is the current status for pyhf-integrated computations using toys?
As of release v0.6.0 support for pseudoexperiments (toys) has been added to pyhf. You asked your question back in December 2020 and pyhf v0.6.0 was releases in February 2021, so at the time of asking toys were only available in dev releases.
what is the most straight-forward way for someone to use the existing pyhf schema to produce computations with toys
In your situation the easiest way to use toys is through the pyhf.infer.hypotest API using the calctype='toybased' kwarg (which is passed through to pyhf.infer.utils.create_calculator). Both the high level interface through the hypotest API and the lower level interface through the Calculator API are explored in a bit more detail in the Using Calculators "learn" example in the docs.
Are the pyhf.infer.hypotest test-statistics, shown in the "Empirical Test Statistics" hepdata_like example, capable of handling all pyhf schema modifiers and full-fledged models?
Yes. The test statistics available are passed to hypotest through the test_stat kwarg (shown here for the AsymptoticCalculator, but they are the same as for the ToyCalculator).
Note that now the default documentation website will now be the one hosted and versioned with releases on ReadTheDocs: https://pyhf.readthedocs.io/. This should help avoid confusion in the future about what is in the current dev release and what is in the stable public release.

Related

Jhipster - How not to override manual edits

I know it is the most basic question. In the video tutorial you can see a change in one Java file http://www.jhipster.tech/video-tutorial/
This change was "a user can only see his projects, except if he is the admin". This is only a minor change and works with the schema.
Now I want to extend the schema and I perform
a) only compatible changes (adding new tables and relation)
b) also some incompatible changes (e.g. fixing a typo in a table name)
My question: How does jHipsetr support such a evolution in the model with compatible and incompatible changes?
Regarding a) it should be possible to perform a kind of "merge" as you know the current model + changes and future model. Can this kind of evolution even be automated?
Regarding b) some things (like propagating table name changes) might even be possible to be automated
I am asking because I do not know how evolution in a model driven engineering approach is supported by jhipster.
Thank you for your answer,
Florian

If you want to keep your manual changes, the jhipster upgrade command uses git to merge them with code evolutions in generator. Otherwise some coding conventions help a lot. You can see a presentation (in French) from Altissia about their coding conventions, slides with code examples could be read by non French speaking people.

impact world and recipe 2016 in Brightway

Has somebody tried to implement the impact assessment methods impact world+ ref or Recipe 2016 ref in Brightway?
The characterization factors of impact world + are available for download (beta version). The spreadsheet facilitates an implementation in Simapro but I guess that may cause some trouble if the biosphere flows are defined differently in Brightway2 and Simapro (is this the case?). I have not been able to find the characterisation factors for Recipe 2016.

I don't think anyone has done this yet. I have started with the regionalized characterization factors, but these are not complete. ReCiPe factors are here.
SimaPro makes changes to biosphere flow names and other metadata, and additionally includes a number of biosphere flows that ecoinvent doesn't, so there is a compatibility problem with the default Brightway flows.
Matching is possible - not even all that hard - but enough of a pain that it would be great if someone would do this. Ecoinvent knows it is something that people are asking for, but my understanding is that it is not currently a priority.

DDD / Aggregate Root / Versioning

How do we usually deal with versioning of an aggregate root?
I was thinking along this line (I'm in a survey-design domain).
One way to have versioning is to have an explicit method to create a new version, based on the existing one. For example, Study (an aggregate root).
So initially we have an aggregate root, whose root-entity is Study with (business) key "ABC", version "1".
By invoking the method "newVersion()" on the Study, a copy of that Study and all the other entities that belong to the same aggregate root will be created.
So basically, versioning is done through creation a separate instance (of aggregate root). The ID is composite (business key + version).
How do we know if it's a branch? or is it just one version up? (1.1? or 2). I guess, this simple rule would work: if there's no further version associated, then it's "one version up" (2); if there's already another version, than it's a branch (1.1).
Another concern: noise.
But that means, we cannot work on / modify existing version. We'd have to create a newVersion everytime we want to make modifications to our object. Everytime??? Hmmm.... Doesn't sound right.
Or... we can make rule like this, based on a flag (active / not-active, or published / un-published). If the flag is "not-active", we can modify the AR directly, without creating a new version. If the flag is active we have to either: (a) set it to "not-active" first, and modify.... or (b) create a newVersion and work on the version (initially set to "not-active").
Any thoughts / experience you want to share on this matter?

I think you will find things a bit confusing in researching this question, because there are two very different concepts at play:
Versioning as a concurrency control mechanism to support optimistic concurrency
Versioning as an explicit domain concept
Versioning to support Optimistic Concurrency
Optimistic concurrency is when two simultaneous transactions are allowed to start, but if they both try and modify the same data item, only the first one is permitted to proceed. See Concurrency Control for an overview of different locking strategies.
In summary, you leave versioning up to the persistence technology, because the purpose of the version is to detect simultaneous writes to the persistence layer.
When using this pattern, it's common to not even keep copies of old versions, however it's certainly possible to do so as an audit trail/change log.
Versioning as an explicit domain concept
Based on your question, and the need to support potential branching strategies, it sounds like versioning is an explicit domain concept in your domain - i.e. the concept of a "Version" is something that your domain experts talk about, and working with versions is an important part of the ubiquitous language.
However, you raise a few different concepts which indicate that the domain needs further exploration:
Version branching
User-defined version naming/tagging (but still connected to a 'chain' of versions)
Explicit version changes (user requested) vs implicit version changes (automatic on every change)
If I understand your intent correctly, with explicit versioning, the current 'active'/'live'/'tip' version is mutable and can be modified without tracking the change, until the user 'commits' it - it becomes immutable, and a new 'live' version that is mutable is created.
Some other concepts that may come up if you explore this version:
Branch merging (once you have split two branches, what happens if you want to bring them back together?)
Rolling back - if you have an old version, do you support 'undoing' one or more changes?
Given the above, you may also find some insights from the way that version control systems work both centralised (e.g. subversion) and distributed (e.g. git and mercurial), as they present an active working model of version tracking with a mixture of mutable and immutable elements.
The open questions here suggest to me that you need to explore this in more detail with your domain experts. With DDD sometimes it's easy to get lost in what you can do, but I strongly encourage you to try and understand what you need to do.
How do your users/domain experts think about the world? What kind of operations do they want to be able to do? What is the purpose of these operations towards their initial goal? Your aim is to distill the answers to these questions into a model that effectively encapsulates the processes they work with.
Edit to Consider Modelling
Based on your comment - my first response would be to challenge the interpretation of the word 'version' when thinking about the modified questionnaire. In fact, I'd be tempted to challenge the modelling of the template/survey relationship. Consider a possible set of entities:
Template
Defines the set of questions in the questionnaire
Supports operations:
StartSurvey
Various operations to modify the questions and options in the template etc.
Survey
Rather than referencing a 'live' template, the survey would own it's own questionnaire
When you call Template.StartSurvey it returns a Survey that is prefilled with the list of questions from the template
A survey also supports modifying the questions - but this doesn't change the template it was created from
Unlike a template, a survey also maintains a list of recorded answers, and offers operations to set the answers
It probably also includes a lifecycle state wherein in some states answering questions is permitted, but once 'submitted' you can't modify the answers (just guessing on this one).
In this world, the survey is 'stamped out' from the template, but then lives an independent life. You can modify the questionnaire in the survey all you like, and it won't effect the template.
The trade-off here is that if you do modify the template, none of the surveys that have already been created from it would get updated - but it sounds like that might be safer for you anyway?
You could also support operations to convert a survey back into a template so that if you like the look of a modified survey, you could 'templatize' it so it could be used for future surveys.

NLP to find relationship between entities

My current understanding is that it's possible to extract entities from a text document using toolkits such as OpenNLP, Stanford NLP.
However, is there a way to find relationships between these entities?
For example consider the following text :
"As some of you may know, I spent last week at CERN, the European high-energy physics laboratory where the famous Higgs boson was discovered last July. Every time I go to CERN I feel a deep sense of reverence. Apart from quick visits over the years, I was there for three months in the late 1990s as a visiting scientist, doing work on early Universe physics, trying to figure out how to connect the Universe we see today with what may have happened in its infancy."
Entities: I (author), CERN, Higgs boson
Relationships :
- I "visited" CERN
- CERN "discovered" Higgs boson
Thanks.

Yes absolutely. This is called Relation Extraction. Stanford has developed several useful tools for working on this problem.
Here is there website: http://deepdive.stanford.edu/relation_extraction
Here is the github repository: https://github.com/philipperemy/Stanford-OpenIE-Python
In general here is how the process works.
results = entract_entity_relations("Barack Obama was born in Hawaii.")
print(results)
# [['Barack Obama','was born in', 'Hawaii']]
Of some importance is that only triples are extracted of the form (subject,predicate,object).

You can extract verbs with their dependants using Stanford Parser, for example. E.g., you might get "dependency chains" like
"I :: spent :: at :: CERN".
It is a much tougher task to recognise that "I spent at CERN" and "I visited CERN" and "CERN hosted my visit" (etc) denote the same kind of event. Going into how this can be done is beyond the scope of an SO question, but you can read up literature of paraphrases recognition (here is one overview paper). There is also a related question on SO.
Once you can cluster similar chains, you'd need to find a way to label them. You could simply choose the verb of the most common chain in a cluster.
If, however, you have a pre-defined set of relation types you want to extract and lots of texts manually annotated for these relations, then the approach could be very different, e.g., using machine learning to learn how to recognize a relation type based on annotated data.

Don't know if you're still interested but CoreNLP added a new annotator called OpenIE (Open Information Extraction), which should accomplish what you're looking for. Check it out: OpenIE

Similar to the Stanford parser, you can also use the Google Language API, where you send a string and get a dependency tree response.
You can test this API first to see if it works well with your corpus: https://cloud.google.com/natural-language/
The outcome here is a subject predicate object (SPO) triplet, where your predicate describes the relationship. You'll need to traverse the dependency graph and write a script to parse out the triplet.

There are many ways to do relation extraction. As colleagues mentioned that you have to know about NER and coreference resolution. Different techniques require different approaches. Nowadays, Distant Supervision is most common, and for detecting the relation between entities, they used FREEBASE.

Building a code asset library [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have been thinking about setting up some sort of library for all our internally developed software at my organisation. I would like collect any ideas the good SO folk may have on this topic.
I figure, what is the point in instilling into developers the benefits of writing reusable code, if on the next project the first thing developers do is file -> new due to a lack of knowledge of what code is already out there to be reused.
As an added benefit, I think that just by having a library like this would encourage developers to think more in terms of reusability when writing code
I would like to keep this library as simple as possible, perhaps my only two requirements being:
Search facility
Usable for many types of components: assemblies, web services, etc
I see the basic information required on each asset/component to be:
Name & version
Description / purpose
Dependencies
Would you record any more information?
What would be the best platform for this i.e., wiki, forum, etc?
What would make a software library like this successful vs unsuccessful?
All ideas are greatly appreciated.
Thanks
Edit:
Found these similar questions after posting:
How do you ensure code is reused correctly?
How do you foster the use of shared components in your organization?

Sounds like there is no central repository of code available at your organization. Depending on what you do this could be because of compatmentalization of the knowledge due to security restrictions, the fact that external vendor code is included in some/all of the solutions, or your company has not yet seen the benefits of getting people to reuse, refactor, and evangelize the benefits of such a repository.
The common attributes of solutions I have seen work at mutiple corporations are a multi pronged approach.
Buy in at some level from the management. Usually it's a CTO/CIO that the idea resonates with and they claim it's a good thing and don't give any money to fund it but they won't sand in your way if they are aware that someone is going to champion the idea before they start soliciting code and consolidating it somewhere.
Some list of projects and the collateral available in english. Seen this on wikis, on sharepoint lists, in text files within a source repository. All of them share the common attribute of some sort of front end search server that allows full text over the description of a solution.
Some common share or repository for the binaries and / or code. Oftentimes a large org has different authentication/authorization methods for many different environments and it might not be practical (or possible logistically) to share a single soure repository - don't get hung up on that aspect - just try to get it to the point that there is a well known share/directory/repository that works for your org.
Always make sure there is someone listed as a contact - no one ever takes code and runs it in production without at lest talking to the previous owner of it - and if you don't have a person they can start asking questions of right away then they might just go ahead and hit file->new.
Unsuccessful attributes I've seen?
N submissions per engineer per time period = lots of crap starts making it's way in
No method of rating / feedback. If there is no means to favorite/rate/give some indicator that allows the cream to rise to the top you don't go back to search it often because you weren't able to benefit from everyone else's slogging through the code that wasn't really very good.
Lack of feedback/email link that contacts the author with questions directly into their email.
lack of ability to categorize organically. Every time there is some super rigid hierarchy or category list that was predetermined everything ends up in "other". If you use tags or similar you can avoid it.
Requirement of some design document to accompany it that is of a rigid format the code isn't accepted - no one can ever agree on the "centralized" format of a design doc and no one ever submits when this is required.
Just my thinking.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string