Johansen test for cointegration in R - var

I am building a VAR model. As far as I understood by reading different articles in Internet, I need first to check stationarity. By using the adf.test I found out that the data that I have is not stationary. So the next step would be to check a cointegration relationship. I did it using Johansen-Procedure Unit Root / Cointegration Test.
jo_eigen <- ca.jo(Canada, type="trace",ecdet="trend",spec="transitory")
The first hypothesis,r=0, tests for the presence of cointegration shows that there is cointegration. Would it be correct to use VAR model for forecast in this case?
Thank you!
BR
Anna

Since there is statistical evidence that there is a cointegration relationship you should use a VEC model in order to take advantage of the relation between the time series.

Related

NLG | T5 Paraphrase results are not creative enough

I've tried to use pre-trained model to create paraphrases of sentences,
Even after trying to correlate top_p and top_k the result paraphrases were almost the same as the original sentence.
I would like to get results that look completely different (eg. third person talking about this topic or to point an event that illustrates this topic.)
Below is an example of three sentences that express the same idea in different words:
Looking within is the best way to start solving the challenges in our lives.
People who are looking for solutions to their problems in the world around them will always continue to look for solutions.
The first step in healing our pain lies in our ability to "look in the mirror".
Do you think it possible is to achieve those results with more effort on fine tuning or correlation?
models I've tried:
"ramsrigouthamg/t5_paraphraser": https://huggingface.co/ramsrigouthamg/t5_paraphraser
"Vamsi/T5_Paraphrase_Paws": https://huggingface.co/Vamsi/T5_Paraphrase_Paws

Using Learning To Rank on textual documents?

i need some help in implementing Learning To Rank (LTR). It is related to my semester project and I'm totally new to this. The details are as follows:
I gathered around 90 documents and populated 10 user queries. Now i have to rank these documents based on each query using three algorithms specifically LambdaMart, AdaRank, and Coordinate Ascent. Previously i applied clustering techniques on Vector Space Model but that was easy. However in this case, I don't know how to change the data according to these algorithms. As i have this textual data( document and queries) in txt format in separate files. I have searched for solutions online and I'm unable to find a proper solution so can anyone here please guide me in the right direction i.e. Steps. I would really appreciate.
As you said you have applied the clustering in vector space model. the input of these algorithms are also vectors.
Why don't you have a look at the standard data set introduced for learning to rank issue (Letor benchmark) in which documents are shown in vectors of features?
There is also implementation of these algorithm provided in java (RankLib), which may give you the idea to solve the problem. I hope, this help you!

Monte Carlo LCA on activities that have parameters with uncertainty from a SimaPro project returns constant value (no uncertainty)

I've imported a project from SimaPro in which almost every activity uses parameters that have uncertainty. When I run a Monte Carlo LCA on any of these in Brightway, the results are constant, as though the quantities have no uncertainty (the snippet shows 10 steps, but it's the same for 2000 steps).
sp = bw.SimaProCSVImporter(fp, name="All Param")
sp.apply_strategies()
sp.statistics() # returns 0 unlinked
sp.write_database(activate_parameters=True)
spdb = bw.Database("All Param")
imported_material = [act for act in spdb if 'Imported' in act['name']][0]
mciter=10
mc_imported = bw.MonteCarloLCA({imported_material:1},('IPCC 2013', 'climate change', 'GWP 100a'))
scores = [next(mc_imported) for _ in range(mciter)]
scores
[0.015027544172490276,
0.015027544172490276,
...
0.015027544172490276,
0.015027544172490276]
I'm at a loss, as everything loads without errors, and looking at the activities and exchanges show the expected formulas, parameters and uncertainties on parameters.
I suspect that the issue might be related to the distinction between active and passive parameters described in the documentation, but do not see how to make the designation that these parameters are (all) "active" beyond xxx.write_database(activate_parameters=True) as in the parameterized dataset example notebook. I also don't see how to list which parameters are active or passive, so the issue might be something else completely.
What do I need to do to get my parameterized activities to incorporate the uncertainty from the parameters in the MC LCA? Any help would be most appreciated!
For what it's worth, they do work in the SimaPro project whence they come - the uncertainty analysis uses the uncertainty on the parameters - so I don't think the issue is in the originating project.
Thank you for any guidance you can provide!
Parameterized inventories generally don't work in Monte Carlo, as the Monte Carlo class is focused on data point uncertainties described by PDFs. There is a separate project called presamples which allows for the use of parameterized inventories in Monte Carlo through some pre-calculations - however, it doesn't have great documentation yet. Look in the docs, and ParameterizedBrightwayModel.
Note: Check your parameter names and formulas from SimaPro, Brightway is stricter in what it allows (e.g. python is case sensitive and has more reserved words).

Can we test or evaluate entity extraction in Rasa NLU?

Is it possible to evaluate how well my model extracts entities (and maps synonym values) in Rasa NLU?
I have tried the rasa_nlu -evaluate mode however, it seems to only work for intent classification, although my JSON data file contains entities information and I'd really like to know if my entity extraction is up to the mark given various scenarios. I've used Tracy to generate test dataset.
Actually yes - you should get the score to you entities as well.
Are you sure you added some to your training data?
do you have it NER algo that fetches them? something like this?
pipeline:
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
batch_size: 64
epochs: 1500
- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "ner_crf"
ner_crf is conditional random field for extracting the "name entity recognition"
To make sure you follow the model building correctly have a look at this tutorial:
https://hackernoon.com/build-simple-chatbot-with-rasa-part-1-f4c6d5bb1aea
As the documentation says https://rasa.com/docs/nlu/0.12.0/evaluation/, if your are using either ner_crf or ner_duckling, the evaluation method automatically takes entity extraction performance unto account. If you only use ner_synonyms the evaluate method won't compute an output table.
Other possible pitfalls could be:
If you parse a single sentence including a desired entity, does your trained model extract an entity? This could be a clue to the situation that your model was not able to evolve a pattern recognizing entities.
Also a problem could be that by randomly splitting the data into train and test set, there's no entity in your test set to extract. Your algorithm could have learned the pattern but is not forced to apply this pattern. Did you check wether your test set contains entities?
If I understand right, perhaps you are interested in something like https://github.com/RasaHQ/rasa_nlu/issues/1472? So, this issue was written because for intents you could get overall score and you could see how each intent was classified, but you could only get the overall score for entities and not how each entity was classified.
So in short, this is still an open issue and not possible in Rasa. However, it was an issue I was asked to look at just yesterday, so I will let you know if I make any progress on it.

Close-Enough TSP implementation

I'm looking for a solution to a Close-Enough Traveling Salesman Problem (CETSP) where I have a set of nodes that I need to visit all within a certain distance of optimally. I've found a couple of sources for some approaches towards this TSP variant but was unable to find a solver or a algorithm that I could easily use.
Do you have any suggestions for how I can go about getting a solution to my CETSP problem, whether it be running an implementation of it myself or using an existing solver.
You can try using UFFLP. They have an example where you can find the correct coordinates the salesman is supposed to pass given a predetermined sequence. So you can generate thousands of sequences and choose the best one (just a simple heuristic).
Have a look at http://www.gapso.com.br/en/ufflp-en/
You will find useful information.

Resources