Phalcon paginate return model - pagination

I have a relational table.
I want to paginate one table and return as Model.
Because I need to model that belongs other model.
But $paginator->paginate()->getItems() always return Phalcon\Paginator\Adapter\NativeArray type. I need to return Model

assuming you are talking about phalcon 4
it will always return an array of arrays there is no way around that.
you can cache your table results and then loop through them using an identifier from the array.

Related

What is BruteForce layer in TensorFlow and why is it called BruteForce

I understand the task of retrieval - I have gone through the code; also looked into alternative approaches like SCNN which is an ultra-fast nearest neighbor.
However, I still have hard time understanding the mechanism of the following code
# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
# recommends movies out of the entire movies dataset.
index.index_from_dataset(
tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))
)
# Get recommendations.
_, titles = index(tf.constant(["42"]))
print(f"Recommendations for user 42: {titles[0, :3]}")
model.user_model is trained and by now should return embeddings of user_id. The input for BruteForce layer is model.user_model; and then it should be indexed ?
I guess the output is given user_id 42, return 3 titles, out of that movies.batch(100). but I can't understand the function of BruteForce and indexing !
The BruteForce layer tests all the combinations between embeddings that are extracted from the last layer of the model.
According to the tensorflow documentation for the layer the layer retuns the index of the top k results (by default 10) indexes that are the closest to each index.

SBERT's Interpretability: can we better understand the final results?

I'm familiar with SBERT and its pre-trained models and they are amazing! But at the same time, I want to understand how the results are calculated, and I can't find anything more specific in their website.
For example, I have a document and I want to find other documents that are similar to it. I used 2 documents containing 200-250 words each (I changed the model.max_seq_length to 350 so the model can handle bigger texts), and in the end we can see that the cosine-similarity is 0.79. Is that all we can see? Is there a way to extract the main phrases/keywords that made the model return this high value of similarity?
Thanks in advance!
Have you tried to make either a simple word-count-comparison between the two documents and other random documents? Or a tf-idf, if the two documents are part of a bigger corpus?
Another thing you can do, is to look inside the "stored_embeddings" matrix (see code below and here) in which SBERT encodes your sentences (e.g. for 20.000 documents you'll get a 20.000*384 matrix), after having saved it into a pickle file like:
from sentence_transformers import SentenceTransformer
import pickle
embeddings = model.encode(sentences)
with open('embeddings.pkl', "wb") as fOut:
pickle.dump({'sentences': sentences, 'embeddings': embeddings}, fOut, protocol=pickle.HIGHEST_PROTOCOL)
with open('embeddings.pkl', "rb") as fIn:
stored_data = pickle.load(fIn)
stored_embeddings = stored_data['embeddings']
The stored embeddings variable can be handled as a numpy matrix and can therefore be (for example) indexed to access single elements. By looking at the values of the single 384 dimensions (to do this, you can go column by column but in case of a big matrix I suggest you not to .enumerate(), it'll take forever) and compare the values that the two documents take in one precise dimension. You can see which dimension has the highest values or variance, for example.
I'm not saying it'll be interpretable what you'll find, but at least you can try and see what you find.

How can I use Hyperopt with MLFlow within a pandas_udf?

I'm building multiple Prophet models where each model is passed to a pandas_udf function which trains the model and stores the results with MLflow.
#pandas_udf(result_schema, PandasUDFType.GROUPED_MAP)
def forecast(data):
......
with mlflow.start_run() as run:
......
Then I call this UDF which trains a model for each KPI.
df.groupBy('KPI').apply(forecast)
The idea is that, for each KPI a model will be trained with multiple hyperparameters and store the best params for each model in MLflow. I would like to use Hyperopt to make the search more efficient.
In this case, where should I place the objective function? Since the data is passed to the UDF for each model I thought of creating an inner function within the UDF that uses the data for each run. Does this make sense?
if I remember correctly, you couldn't do it because it would be something like nested Spark execution, and it won't work with Spark. You'll need to have to change approach to something like:
for kpi in list_of_kpis:
run_hyperopt_tuning
if you need to tune parameters for every KPI model separately - because it will optimize parameters separately.
If KPI is like a hyperparameter of the model, then you can just include list of KPIs into search space, and load necessary data inside the function that doing the training & evaluation.

fastai tabular model trained but can not find categorical mapping

After training my dataset which has a number of categorical data using fastai's tabular model, I wish to read out the entity embedding and use it to map to my original data values.
I can see the embedding weights. The number of input don't seem to match anything, but maybe it is based on the unique categorical values in the train_ds.
To get that map, I would like to get the self.categories dictionary from the Categorify transform class. Is there anyway to get that from the data variable obtained by calling TabularList.from_df?
Or maybe someone can tell me a better way to get this map. I know the input df into the TabularList.from_df() is not it, because the number of rows are wrong. Most likely because df is splitted into train and valid subsets. But there is no easy way to obtain the train part of the TabularList to check just the train part.
It's strange I can't find any code example that shows this. Doesn't anyone else care to map the entity embedding value back to its original categorical value?
I found it.
It is in data.train_ds.inner_df.

Natural Language Processing in Python

How to find similar kind of issues for a new unseen issue based on past trained issues(includes summary and description of issue) using natural language processing in python
If I understand you correctly you have a new issue (query) and you want to look up other similar issues (documents) in your database. If so, then what you need is a way to find the similarity between your query and existing documents. And once you have them, you can rank them and select the most relevant ones. One such method that allows you to do this is Latent Semantic Indexing (LSI).
To do this you'll have to construct a document-term matrix. You'll use your existing document and create a term occurrence matrix across documents. What this means is that you basically record how many times a word appears in a document (or some other complex measure, example- tfidf). This can be done either through a bag of words representation or a TFIDF representation.
Once you have that, you'll have to process your query so that it is in the same form as your documents. Now that you have your query in usable form, you can calculate the cosine similarity between documents and your query. The one with the highest cosine similarity is the closest match.
Note: The topic that you may want to read about is Information Retrieval and LSI is just one such method. You should look into other methods as well.

Resources