Migration from GaussianProcessor to GaussianProcessRegressor

Migration from GaussianProcessor to GaussianProcessRegressor - python-3.x

I am attempting to migrate some old python code using the scikit-Learn library.
When doing so I encountered the GaussianProcess class which is now fully reimplemented as GaussianProcessRegressor.
I was able to get a running script by replacing
self.f = GaussianProcess(corr='linear',theta0=1e-2,thetaL=1e-4,thetaU=1e-1)
with
self.f = GaussianProcessRegressor()
except now I have completely different results when calling predict()...
Any idea how to translate the autocorrelation method (corr) and different theta values with the new API?
I found this topic talking about pretty much the same problem, but aparently the author was fine about not having the old parameters taken into account, and this topic which states the problem precisely as well but does not provide a clear answer.

Related

Python sentiment / text analysis advice

I don't know if this is the right place to ask this but, i am trying to build a bot in Python that will read incoming messages on a Slack channel where customer post their issues such as 'unable to connect to VPN', 'can someone reply to my ticket' etc…
The bot will analyze the message, determine if the customer is angry or not, and then propose a solution until an agent is free to actually check the issue.
Now, I was experimenting with TextBlob for the sentiment analysis part, but I don't know which technologies to actually use to determine the issue based on specific keywords and provide a solution to the user. Can someone propose me some python libraries/technologies that I could use to achieve this ?

To be honest your question is to generic to answer in one go.
Nontheless, you first have to clearly define the scope of your project. In doing so, you might want to first do a quick literaty review (Google Scholar) to familiarize with the state of the art technologies and methods.
From my little experience, a common (maybe simple) technique (lexicon-based approach) used to determine the sentiment of a word, is to use a pre-compiled dictionary (you can create your own though) that contains words - sentiment mappings. For example:
word:tired, sentiment:negative, score:5
So each time the bot finds the keyword "tired" in a sentence it will assign its corresponding negative value (polarity) to the sentence.
You might want to consider applying POS tags in the input text, as sometimes nouns or ``verbs carry significant meaning, compared to adjectives for example.
Keep in mind though, that negative comments can be written in the form of sarcasm. Sarcasm detectioin is a more difficult task though.
Alternatively, you could try using a pre-trained model such as bert-base-multilingual-uncased-sentiment that can be found here in Hugging Face.
For more information on the matter you have a look at this post.
Again as I mentioned, you have to clearly define your goals. This will enable you to specify the libraries or methodology available to solve your problem. Hope my answer helps.

Alloy API: Decompile into .als

BLUF: Can I export a .als file corresponding to a model I have created with the Alloy API?
Example: I have a module that I read in using edu.mit.csail.sdg.alloy4compiler.parser.CompUtil. I then add signatures and facts to create a modified model in memory. Can I "de-parse" that and basically invert the lexer (edu.mit.csail.sdg.alloy4compiler.parser.CompLexer) to get a .als file somehow?
It seems like there ought to be a way to decompile the model in memory and save that as code to be later altered, but I'm having trouble identifying a path to that in the Alloy API Javadocs. I'm building a translator from select behavioral aspects of UML/SysML as part of some research, so I'm trying to figure out if there is something extant I can take advantage of or if I need to create it.

It seems a similar question has been asked before: Generating .als files corresponding to model instances with Alloy API
From the attached post https://stackoverflow.com/users/2270610/lo%c3%afc-gammaitoni stated he has written a solution for this in his Lightning application. He said that he may include the source code for completing this task. I'm unsure if he has uploaded the solution yet.

Reading a grib2 message into an Iris cube

I am currently exploring the notion of using iris in a project to read forecast grib2 files using python.
My aim is to load/convert a grib message into an iris cube based on a grib message key having a specific value.
I have experimented with iris-grib, which uses gribapi. Using iris-grib I have not been to find the key in the grib2 file, althrough the key is visible with 'grib_ls -w...' via the cli.
gribapi does the job, but I am not sure how to interface it with iris (which is what, I assume, iris-grib is for).
I was wondering if anyone knew of a way to get a message into an iris cube based on a grib message key having a specific value. Thank you

You can get at anything that the gribapi understands through the low-level grib interface in iris-grib, which is the iris_grib.GribMessage class.
Typically you would use for msg in GribMessage.messages_from_filename(xxx): and then access it like e.g. msg.sections[4]['productDefinitionTemplateNumber']; msg.sections[4]['parameterNumber'] and so on.
You can use this to identify required messages, and then convert to cubes with iris_grib.load_pairs_from_fields().
However, Iris-grib only knows how to translate specific encodings into cubes : it is quite strict about exactly what it recognises, and will fail on anything else. So if your data uses any unrecognised templates or data encodings it will definitely fail to load.
I'm just anticipating that you may have something unusual here, so that might be an issue?
You can possibly check your expected message contents against the translation code at iris_grib:_load_convert.py, starting at the convert() routine.
To get an Iris cube out of something not yet supported, you would either :
(a) extend the translation rules (i.e. a Github PR), or
(b) sometimes you can modify the message so that it looks like something
that can be recognised.
Failing that, you can
(c) simply build an Iris cube yourself from the data found in your GribMessage : That can be a little simpler than using 'gribapi' directly (possibly not, depending on detail).
If you have a problem like that, you should definitely raise it as an issue on the github project (iris-grib issues) + we will try to help.
P.S. as you have registered a Python3 interest, you may want to be aware that the newer "ecCodes" replacement for gribapi should shortly be available, making Python3 support for grib data possible at last.
However, the Python3 version is still in beta and we are presently experiencing some problems with it, now raised with ECMWF, so it is still almost-but-not-quite achievable.

Using keras model in pyspark lambda map function

I want to use the model to predict scores in map lambda function in PySpark.
def inference(user_embed, item_embed):
feats = user_embed + item_embed
dnn_model = load_model("best_model.h5")
infer = dnn_model.predict(np.array([feats]), verbose=0, steps=1)
return infer
iu_score = iu.map(lambda x: Row(userid=x.userid, entryid=x.entryid, score = inference(x.user_embed, x.item_embed)))
The running is extremely slow and it stuck at the final stage quickly after code start running.
[Stage 119:==================================================>(4048 + 2) / 4050]
In HTOP monitor, only 2 of 80 cores are in full work load, others core seems not working.
So what should I do to making the model predicting in parallel ? The iu is 300 million so the efficiency if important for me.
Thanks.
I have turn verbose=1, and the predict log appears, but it seems that the prediction is just one by one , instead of predict in parallel.

During the response I researched a little bit and found this question interesting.
First, if efficiency is really important, invest a little time on recoding the whole thing without Keres. You still can use the high-level API for tensorflow (Models) and with a little effort to extract the parameters and assign them to the new model. Regardless it is unclear from all the massive implementations in the framework of wrappers (is TensorFlow not a rich enough framework?), you will most likely meet problems with backward compatibility when upgrading. Really not recommended for production.
Having said that, can you inspect what is the problem exactly, for instance - are you using GPUs? maybe they are overloaded? Can you wrap the whole thing to not exceed some capacity and use a prioritizing system? You can use a simple queue if not there are no priorities. You can also check if you really terminate tensorflow's sessions or the same machine runs many models that interfere with the others. There are many issues that can be the reason for this phenomena, it will be great to have more details.
Regarding the parallel computation - you didn't implement anything that really opens a thread or a process for this models, so I suspect that pyspark just can't handle the whole thing by its own. Maybe the implementation (honestly I didn't read the whole pyspark documentation) is assuming that the dispatched functions runs fast enough and doesn't distributed as it should. PySpark is simply a sophisticated implementation of map-reduce principles. The dispatched functions plays the role of a mapping function in a single step, which can be problematic for your case. Although it is passed as a lambda expression, you should inspect more carefully which are the instances that are slow, and on which machines they are running.
I strongly recommend you do as follows:
Go to Tensorflow deplot official docs and read how to really deploy a model. There is a protocol for communicating with the deployed models called RPC and also a restful API. Then, using your pyspark you can wrap the calls and connect with the served model. You can create a pool of how many models you want, manage it in pyspark, distribute the computations over a network, and from here the sky and the cpus/gpus/tpus are the limits (I'm still skeptical about the sky).
It will be great to get an update from you about the results :) You made me curious.
I hope you the best with this issue, great question.

How to detect near duplicate rows in Azure Machine Learning?

I am new to azure machine learning. We are trying to implement questions similarity algorithm using azure machine learning. We have large set of questions and answers. Our objective is to identify whether newly added questions are duplicates or not? Just like Stackoverflow suggests existing questions when we ask new questions?Can we use azure machine learning services to solve this? Can someone guide us in the right direction?

Yes you can use Azure Machine Learning studio and could use the method Jennifer proposed.
However, I would assume it is much better to run a R script against a database containing all current questions in your experiment and return a similarity metric for each comparison.
Have a look at the following paper for some examples (from simple/basic to more advanced) how you could do this:
https://www.researchgate.net/publication/4314910_Question_Similarity_Calculation_for_FAQ_Answering
A simple way to start would just be to implement a simple "bags of words" comparison. This will yield a distance matrix that you could use for clustering or use to give back similar questions. The following R code would so such a thing, in essence you build a large string with as first sentence the new question and then follow it with all known questions. This method will, obviously, not really take into consideration the meaning of the questions and would just trigger on equal word usage.
library(tm)
library(Matrix)
x <- TermDocumentMatrix( Corpus( VectorSource( strings.with.all.questions ) ) )
y <- sparseMatrix( i=x$i, j=x$j, x=x$v, dimnames = dimnames(x) )
plot( hclust(dist(t(y))) )

Yes, you can definitely do this with Azure Machine Learning. It sounds like you have a clustering problem (you are trying to group together similar questions).
There is a "Clustering: Find similar companies" sample that does a similar thing at https://gallery.cortanaanalytics.com/Experiment/60cf8e46935c4fafbf86f669121a24f0. You can read the description on that page and click the "Open in Studio" button in the right-hand sidebar to actually open the workspace in Azure Machine Learning Studio. In that sample, they are finding similar companies based on the text from the company's Wikipedia article (for example: Microsoft and Apple are similar companies because the word "computer" appears a lot in both articles). Your problem is very similar except you would use the text in your questions to find similar questions and cluster them into groups accordingly.
In k-means clustering, "k" is the number of clusters that you want to form, so this number will probably be pretty big for your specific problem. If you have 500 questions, perhaps start with 250 centroids? But mess around with this number and see what works. For performance reasons, you might want to start with a small dataset for testing and then run all of your data through the model after it seems to be grouping well.
Also, the documentation for K-means clustering is here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string