Regression analysis with All text data - python-3.x

I want to know what is the best approach to handle a regression analysis on all text data type. I have the following data set.
my feature columns are: Strength, area of development, leadership, satisfactory
values of these columns are predefined set of texts eg. "Continuous Improvement,Self-Development,Coaching and Mentoring,Creativity,Adaptability"
based on the value in these columns I want to predict the label (overall Performance) - Outstanding or Exceeding Expectation or Meeting Expectation.
what should be the best approach to deal with this dataset ?

Related

Is there any difference between two terminologies viz. descriptive statistics and descriptive analytics

I am trying to understand if there is any difference between two terminologies viz. descriptive statistics and descriptive analytics. Googling didn't give clear picture on what is common and what is different between these two terminologies.
It appears that both terminologies summarizes and analyses the data with the help of statistics.
So does it means that they are just same? Statistician may like to mention descriptive statistics while data scientist may call it descriptive analytics.
Both are the same.
Descriptive statistics summarizes or describes characteristics of a data set.
Descriptive statistics consists of two basic categories of measures:
measures of central tendency
measures of variability or spread.
Measures of central tendency describe the center of a data set.Like mean, median, or mode, which measures the most common patterns of the analyzed data set.
Measures of variability or spread describe the dispersion of data within the data set.describing the shape and spread of the data set. Range, quartiles, absolute deviation, and variance

I have a .CSV file that contains dates and gms value for those dates. Is it possible to apply Linear Regression to this?

Consider i have a .csv file that has two attributes that are dates and gms\revenue value for that date. Is it possible to apply Linear regression to predict the gms value for a particular date or does this come under time series regression analysis?I'm new to machine learning so any help would be appreciated.Thank you. this is the csv file and it has around 1800 records. Dates are continuous.
Time series models will be able to find patterns in the revenue over time.
Regression models will predict the revenue given an input set of variables.
Since you have time as the only predictor variable, it would come under time-series analysis. You can still try to solve it using regression, but your prediction will just be a line (or any other polynomial curve) incresing or decreasing with time. It will not be able to capture the seasonality and lag-dependent trends which are common in time-series data.

Remove outliers in multiple columns from a spark dataframe

I have a dataset of around 10 integer features and I wish to remove outliers from my dataset, from each feature.
What I have done in the past, is compute average and standard deviation for each feature and do a pass on the dataset, with discarding rows that qualify as outliers. Doing it on each column/ feature, helps me get rid of rows having at least one outlier feature.
Since parsing the dataset multiple times is not the optimal way, I was looking for ways to do this in a computation efficient manner. Can someone propose a better way so that the dataset can be parsed once and one can get rid of all outlier rows?

Azure Machine Learning - Empty score results

I've trained a model, the test results on test-set are okay.
Now I have saved the model as 'Trained model' and made a new experiment into a new dataset, for making predictions where I don't have the actual value's.
Normally, the trained model gives me a scored label result per instance.
But now, the scored label results are empty. Also when I convert the score results to CSV the scored labels column is empty.
Even stranger, when I take a look at the Statistics of the score Visualize tab, I DO see the statistics of the scored values. But no actual scored values...
Is this a bug? Or am I forgetting something important? Whats going on ;) ?
If your test dataset is missing the dependent values, your predictive experiment may fail for some models. The solution is to pad your csv file with zero values instead of blank values.
I had this same issue and it was frustrating but I think I finally understand why this is happening.
When I was training my experiment, part of the cleaning process was populating missing values or trimming existing data with R.
The problem is if one of those features is optional. For instance if you have a column that is not filled in, the scoring model will fail in the web service.
To see if this problem affects you, go to your Predictive Experiment and visualize the Score Model results. If you see empty Predicted Label & Predicted Score values, you can easily see which data points are missing.

Excel - Iteration based on changing cell value, pasting result

So I have set up a linear model in excel of passenger numbers and population. There are two decay parameters which change the forecasts of passenger numbers - for the different types of transport.
Manually I can change the decay factors 0.1-1.0 for every combination, to see how the fit of the model changes. I would like to find the combination of parameters that creates the best model fit at a 0.01 accuracy. Any ideas how?
Essentially the passenger forecasts change when setting parameters which in turn changes the model fit. I need an easy way to see how model fit changes with changing the parameters! Thanks.

Resources