how to extract trillsson output with frame data - tensorflow-hub

I want to extract trillsson output which is published by tensorflow(https://tfhub.dev/google/nonsemantic-speech-benchmark/trillsson5/1).
I want to extract time-series embeddings, but according to this link I only get aggregated embeddings.
Please tell me the way to extract time-series embeddings on trillsson.
I saw this repo(https://github.com/google-research/google-research/tree/master/non_semantic_speech_benchmark/trillsson) to find solution, but it was too complex to do for me.

Related

How to build a CNN from nifti files?

I am using COBRE brain MRI dataset containing Nifti files. I can visualize them but could not understand how to use them in deep learning with the correct format. I read Nilearn documentation but they have used only one example of .nii file for 1 subject. The question is how to give 100 .nii files to a CNN?
The second thing is how to determine which slice of the file should be used? Should it be the middle of them? Nifti file consists of 150 slices for each subject's brain.
The third thing is how to provide the model with labels? The dataset doesn't contain any mask. How to give the model specific label for a specific file? Should I create a csv file with path of .nii files and their associated label?
Please explain me or suggest me some resources for the same.
hi i recently got into processing of nii files for one of my projects. i could get a break though to some level (preprocessing) not yet to model level.
for your second question, usually an expert visualise the niis and provide the location(s) of the roi(region of interest)
I am currently in the process of parsing the nii into csv format with labels. so the answer to your third question is , we lable the coordinates (x,y,z,c,t) as per the roi locations . (i may need to correct this understanding as i advance on need basis but for now this is the approach to feed the dataset to model i am goin to follow)

fastai tabular model trained but can not find categorical mapping

After training my dataset which has a number of categorical data using fastai's tabular model, I wish to read out the entity embedding and use it to map to my original data values.
I can see the embedding weights. The number of input don't seem to match anything, but maybe it is based on the unique categorical values in the train_ds.
To get that map, I would like to get the self.categories dictionary from the Categorify transform class. Is there anyway to get that from the data variable obtained by calling TabularList.from_df?
Or maybe someone can tell me a better way to get this map. I know the input df into the TabularList.from_df() is not it, because the number of rows are wrong. Most likely because df is splitted into train and valid subsets. But there is no easy way to obtain the train part of the TabularList to check just the train part.
It's strange I can't find any code example that shows this. Doesn't anyone else care to map the entity embedding value back to its original categorical value?
I found it.
It is in data.train_ds.inner_df.

Efficient way to get best matching pairs given a similarity-outputting neural network?

I am trying to come up with a neural network that ranks two short pairs of text (for example, stackexchange title and body). Following the deep learning cookbook's example, the network would look basically like this:
So we have our two inputs (title and body), embed them, then calculate the cosine similarity between embeddings. The inputs of the model would be [title,body], the output is [sim].
Now I'd like the closest matching body for a given title. I am wondering if there's a more efficient way of doing this that doesn't involve iterating over every possible pair of (title,body) and calculating the corresponding similarity? Because for very large datasets this is just not feasible.
Any help is much appreciated!
It is indeed not very efficient to iterate over every possible data pair. Instead you could use your model to extract all the embeddings of your titles and text bodies and save them in a database (or simply a .npy file). So, you don't use your model to output a similarity score but instead use your model to output an embedding (from your embedding layer).
At inference time you can then use a library for efficient similarity search such as faiss. Given a title you would simply look up its embedding and search in the whole embedding space of all body embeddings to see which ones get the highest score. I have used this approach myself and been able to search 1M vectors in just 100 ms.

Convert Text Data into SVMFile format for Spam Classification?

How can I convert Text Data into LibSVM file format for training the model for spam classification.
Are SVMFiles already Labeled ?
SVM format is neither required or that useful. It is used in Apache Spark ML example, only because it can be map directly to the required format.
Are SVMFiles already Labeled ?
Not necessarily, but Spark can read only labeled variant.
In practice you should use org.apache.spark.ml.feature tools to extract relevant features from your data.
You can follow the documentation as well as a number of questions on SO.,

Timeline Detection

I am trying to do a timeline detection problem using text classification. As a newbie I am confused as to how I can go about with this. Is this a classification problem? i.e, Can I use the years(timelines) as outcomes and solve this as a classification problem?
You should be able to solve this as a classification problem as you suggest. An option could be to find or build a corpus consisting of texts tagged with the period in which they're set, and train a classification algorithm on this data set.
Another option could be to train a word space model on such a data set, and generate vectors for different periods of time (e.g. the 50s, 60s etc.). You could then create a document vector for the text you wish to classify, and find which of these time vectors yields the best match.
Might not work, but it could be interesting to see what results you get.
Hope this helps!

Resources