How many ways to normalize data? [closed] - data-processing

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am curious about how many ways can we normalize data in data processing step before we use it to train machine learning model, deep learning model and so on.
All I know is
Z-score normalization = (data - mean)/variance.
Min-Max normalization = (data - min)/(max - min)
Do we have other ways except these two that I know?

There are many ways to normalize the data prior to training a model, some depends on the task, data type (tabular, image, signals) and data distribution. You can find the most important ones in scikit-learn preprocessing subpackage:
To highlight few that I have been using consistently, Box-Cox or Yeo-Johnson transformation, where it is used when your feature's distribution is skewed. This will minimize the skewness through maximum likelihood.
Another normalization technique is called Robust Scaler that is can perform better than the Z-score normalization if your dataset contains many outliers as they can falsely influence the sample mean and variance.

Related

Hello World aka. MNIST with feed forward gets less accuracy in comparison of plain with DistributedDataParallel (DDP) model with only one node [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
This is a cross-post to my question in the Pytorch forum.
When using DistributedDataParallel (DDP) from PyTorch on only one node I expect it to be the same as a script without DistributedDataParallel.
I created a simple MNIST training setup with a three-layer feed forward neural network. It gives significantly lower accuracy (around 10%) if trained with the same hyperparameters, same epochs, and generally same code but the usage of the DDP library.
I created a GitHub repository demonstrating my problem.
I hope it is a usage error of the library, but I do not see how there is a problem, also colleges of mine did already audit the code. Also, I tried it on macOS with a CPU and on three different GPU/ubuntu combinations (one with a 1080-Ti, one with a 2080-Ti and a cluster with P100s inside) all giving the same results. Seeds are fixed for reproducibility.
You are using different batch sizes in your two experiments: batch_size=128, and batch_size=32 for mnist-distributed.py and mnist-plain.py respectively. This would indicate that you won't have the same performance result with those two trainings.

Feeding an image to stacked resnet blocks to create an embedding [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Do you have any code example or paper that refers to something like the following diagram?
I want to know why we want to stack multiple resnet blocks as opposed to multiple convolutional block as in more traditional architectures? Any code sample or referring to one will be really helpful.
Also, how can I transfer that to something like the following that can contain self-attention module for each resnet block?
Applying self-attention to the outputs of Resnet blocks at the very high resolution of the input image may lead to memory issues: The memory requirements of self-attention blocks grow quadratically with the input size (=resolution). This is why in, e.g., Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He Non-Local Neural Networks (CVPR 2018) they introduced self-attention only at a very deep layer of the architecture, once the feature map was substantially sub-sampled.

Difference between Text Embedding and Word Embedding [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am working on a dataset of amazon alexa reviews and wish to cluster them in positive and negative clusters. I am using Word2Vec for vectorization so wanted to know the difference between Text Embedding and Word Embedding. Also, which one of them will be useful for my clustering of reviews (Please consider that I want to predict the cluster of any reviews that I enter.)
Thanks in advance!
Text Embeddings are typically a way to aggregate a number of Word Embeddings for a sentence or a paragraph of text. There are various ways this can be done. The easiest way is to average word embeddings but not necessarily yielding best results.
Application-wise:
Doc2vec from gensim
par2vec vs. doc2vec

conv2d is more accurate or conv1d in image classification? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have executed a program with image classification and it was running good .I was running the code with conv1D and conv2D . I am getting accuracy of 0.854 for both conv1D and conv2D.
Can i know the exact differences between these two things in detail?
Conv1d is a convolution filter of 1 dimension (imagine it like a one dimension array). Conv2d is a filter with 2 dimensions (like a 2d array) and it is more suitable for data like images where it can retain more spatial information in a data point because it is applied to all the neighbors. You can see what is a kernel to understand why this is better for data like images. For non image data I guess it will not have significant impact whether you use 1d or 2d convolution arrays.
Note: Also this site is for programming problems, maybe you should ask your question in Data Science

Model weights means in Machine Learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm currently learning machine learning.i get confused what is Model weights term. please explain to me what is model weight really means
Weights are the numbers you use to turn your samples into a prediction. In many (most?) cases this is what you are learning with your system. For example, suppose you want to predict house price using only the house size (x). You might use a simple linear regression model that tries to fit a line to the data. The formula you will use is the formula for a line:
y = w * x + b
Here x is given (the house size) and you use w and b to predict y the price. In this case w and b are your weights. The goal is to determine which w and b give the best fit to the data.
In more complex models like neural networks (or even more complicated linear regression) you may have dramatically more weights in you model, but the basic idea of finding those weights that best fit the data is the same.

Resources