How to calculate impact of quantitative x variables on a y variable - excel

I have been trying to share some contribution analysis with management using quantitative variable. However I am struggling to reconcile my y% increase to my x's. I have tried linear regression but dont think that will help immensely here. Please help...
Here is the data and below that is the template I need to submit

Related

Normality Assumption - how to check you have not violated it?

I am rleatively new to statistics and am stuggling with the normality assumption.
I understand that parametric tests are underpinned by the assumption that the data is normally distributed, but there seems to be lots of papers and articles providing conflicting information.
Some articles say that independant variables need to be normally disrbiuted and this may require a transformation (log, SQRT etc.). Others says that in linear modelling there are no assumptions about any linear the distribution of the independent variables.
I am trying to create a multiple regression model to predict highest pain scores on hospital admissions:
DV: numeric pain scores (0-no pain -> 5 intense pain)(discrete- dependant variable).
IVs: age (continuous), weight (continuous), sex (nominal), depreviation status (ordinal), race (nominal).
Can someone help clear up the following for me?
Before fitting a model, do I need to check the whether my independant variables are normally distributed? If so, why? Does this only apply to continuous variables (e.g. age and weight in my model)?
If age is positively skewed, would a transformation (e.g. log, SQRT) be appropriate and why? Is it best to do this before or after fitting a model? I assume I am trying to get close to a linear relationship between my DV and IV.
As part of the SPSS outputs it provides plots of the standardised residuals against predicted values and also normal P-P plots of standardised residuals. Are these tests all that is needed to check the normality assumption after fitting a model?
Many Thanks in advance!

Can i predict data price based on a survey on azure machine learning?

I want to predict my input price based on a list of questions/answers using azure machine learning.
I built one using the "bayesian linear regression" but it seems that it is predicting the price based on the prices i have in my dataset and not based on the Q/A.
Am i in the wrong path or am i missing something?
Any suggestion would be helpful.
Check the Q/A s that you using is not having missing values. If there's any missing values follow data preprocessing techniques to fill those.
What kind of answers do you have as inputs? (yes/no, numeric values, different textual answers, etc...) In my opinion numerical values and yes/no inputs makes your model more accurate.
Try different regression algorithms (https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-cheat-sheet/) and check their accuracy.
you need to set features and label properly. if you publish your experiment in Gallery using unlisted mode and paste the link here, we can take a look.

Azure Machine Learning Experiment Creation

I am new to create Experiments in Azure ML. I want to done a sample and small POC on Azure ML.
I have a data for the students consisting of StudentID, Student Name and Marks for Monthly Tests 1,2 and 3. I just to want to Predict data for the Final Monthly Test (i.e., Monthly Test 4).
I don't know how to create and what kind of Transformations to be used in Predicting the Data.
Anyone Please...
Thanks in Advance
Pradeep
You can simply start with basic tutorials.
https://azure.microsoft.com/en-in/documentation/articles/machine-learning-create-experiment/
It is real helpful. I also referred this.
You can draw simple flow chart for your experiment and simply apply when you need to drag the dataset.
HTH
This is a unsupervised machine learning problem. Do refer the algorithms that you can use for solving the problem (Most probably linear regression will suit for this case) Do the data pre-processing first. Then follow the steps in the above link mentioned by #kunal to build up the model.

Suitable data mining technique for this dataset

I'm working on a data mining project and would like to mine this dataset Higher Education Enrolments for interesting patterns or knowledge. My problem is figuring out which technique would work best for the dataset.
I'm currently working on the dataset using RapidMiner 5.0 and I removed two columns (E550 - Reference year, E931 - Total Student EFTSL) from the data as they would not be relevant to the analysis. The rest of the attributes are nominal except StudentID (integer) which I have used as my id. I'm currently using classification on it (Naive Bayes) but would like to get the opinion of others, hopefully those who have had more experience in this area. Thanks.
The best technique depends on many factors: type/distribution of training and target attribute, domain, value range of attributes, etc. The best technique to use is the result of data analysis and understanding.
In this particular case, you should clarify which is the attribute to predict.
Unless you already know what you are looking for, and know about the quality of the data source, you should always start by trying various exploratory analysis:
look at some of the first and second order statistics of all the
variables
generate histograms of each variable, to get an idea of the empirical
distribution of each
take a look at pairwise scatter plots of variables that might have
dependency
try other visualization that you might think of
These would give you a rough idea about what kind of pattern might be present and might be discoverable given the noise level. Then depending on what kind of pattern you are interested in, you could start trying various unsupervised pattern learning methods such as, PCA/ICA/factor analysis, clustering, or supervised methods, such as regression, classification.

SPSS logistic regression

I'm wondering if there is a way to get many single covariate logistic regression. I want to do it for all my variables because of the missing values. I wanted to have a multiple logistic regression but I have too many missing values. I don't want to compute a logistic regression for each variable in my DB, is there any automatic way?
Thank you very much!
You can code it using SPSS syntax.
For example:
LOGISTIC REGRESSION VARIABLES F2B16C -- Dependent variable
/METHOD=BSTEP -- Backwards step - all variables in then see what could be backed out
XRACE BYSES2 BYTXMSTD F1RGPP2 F1STEXP XHiMath -- Independent variables
/contrast (xrace)=indicator(6) -- creates the dummy variables with #6 as the base case
/contrast (F1Rgpp2)=indicator(6)
/contrast (f1stexp)=indicator(6)
/contrast (XHiMath)=indicator(5)
/PRINT=GOODFIT CORR ITER(1)
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).`
If you do that you can also tell it to keep records with missing values where appropriate.
add /MISSING=INCLUDE
If anyone knows of a good explanation of the implications of including the missing values, I'd love to hear it.

Resources