Rapidminer / Excel Missing Value Replacement - excel

Im learning how to use Rapidminer for a project. Im stuck at some point. I have a dataset as follows: There are countries. For each country Im keeping track of some values (medals lets say) for years 1990-2012. As an example:
Country Year Gold Silver Bronze
USA 1990 10 5 7
.....
USA 2012 12 3 8
Spain 1990 8 12 9
...
Spain 1992 7 ? 8
....
Spain 2012 4 11 12
...GOES ON...
What I want to do is to replace the missing values. For example Spain has a missing value in 1992 for Silver Medals. I want to find the average for Silver data available for Spain and replace the missing value with that. How can I do this? If the present modules in Rapidminer not able to do this, is there some kind of macro etc? I can also use Excel to preprocess the data (but how)???.

Use the Replace Missing Values operator. Its default settings fill any missing data with the average of that column - exactly what you want.

Related

EXCEL SumIf with multiple conditions from another cell/array

A
B
C
1
User
Task
Hours
2
Jim
AA-1
10
3
Mike
AA-2
12
4
Jim
AA-3
13
5
Steve
CC-5
14
6
Jim
BB-1
15
7
Mike
BB-3
5
8
Steve
BB-4
10
9
Mike
CC-5
8
The table is way bigger and there are more than just AA, BB and CC type of tasks.
I want to be able to get how many hours Jim spent on tasks that start by AA* or BB*
This is simple with a sumifs but the problem is when I have 20 different type of tasks and I Want to get a lot of people results.
So I want to get in a row how many hours Jim spent on AA, BB and CC tasks and in the next row how many he spent on DD, EE, FF.
Basically I would like a sumif like (just look at the last part):
('SHEET1'!C:C,'SHEET1'!E:E,$B$3,'SHEET1'!G:G,"AA*,BB*,CC*")
Or even better if the AA*,BB*,CC* part were in another cell to easily change it.
Try the following formula-
=SUMIFS($C$2:$C$9,$A$2:$A$9,$F3,$B$2:$B$9,G$2)
You may also use following formulas.
F3==UNIQUE(A2:A9)
G2==TRANSPOSE(SORT(UNIQUE(TEXTSPLIT(B2:B9,"-"))))&"*"
So thanks to Harun24hr answer I started to think in doing it in different steps.
As I stated above I have the following table:
A
B
C
1
User
Task
Hours
2
Jim
AA-1
10
3
Mike
AA-2
12
4
Jim
AA-3
13
5
Steve
CC-5
14
6
Jim
BB-1
15
7
Mike
BB-3
5
8
Steve
BB-4
10
9
Mike
CC-5
8
The issue is that I might have a lot of different type of tasks and I need to group some results, so I created another table to stablish the groups:
A
B
1
Group
Task
2
a
AA*
3
a
BB*
4
b
CC*
5
a
DD*
This new table is easier to maintain and I can then add a new column on the main table that represents the group based on this grouping table and then do the calculation with a simple sumif.
Thanks a lot for all the help.

Excel: Function to replace variable name

Before I begin this question, I would like to state that I have read a few articles, I have tried a couple of functions (=REPLACE, =SUBSTITUTE) but I'm not able to get the results required, I'm new to Excel.
The following is a homework question.
Question: Use appropriate text functions to shorten the variable names to something like Arizona Females Young, Arizona Females Old, and Arizona Females All, also is there a way to do it automatically for all variables in 1 function. The screenshot is attached.
If you need change headers:
Licensed Drivers Alabama Female 19 and under # -> Female 19 and under #
Licensed Drivers Alabama Female 19 and under # -> Female 85 and over #
function "Find and Replace" is the best solution.
You also can extract value to other place by formulas:
=SUBSTITUTE(A1,"Licensed Drivers Alabama ","")
=SUBSTITUTE(B1,"Licensed Drivers Alabama ","")

Equal distribution formula based on preset values. Excel

e.g. There are 300 apples. There are 100 people. Each person has a preset value (represented as a number 1 to 5). 1=1 apple, if their value is 5 they get 5 apples etc. But there are 300 apples available so each person is going to get more then the value says they "deserve". Or one day there are only 200 apples and every one gets less then what the value states says they "deserve". Is this possible in excel?
NAME VALUE
john 5
james 5
sam 4
matt 5
mike 3
steve 2
etc...
This absolutely sounds like a perfect problem for Solver to handle. As you know, this in included within Excel's addins. It can deal with all the variables you mentioned.

Lookup with multiple criteria, one a MAX value

I am trying to lookup the LOCATION of an employee (NAME) and their MANAGER from the most recent month (largest month number) in a particular QUARTER in data like this:
NAME LOCATION MANAGER QUARTER MONTH
Ryan Smith Sioux Falls Rick James 3 7
Jane Doe Tampa Bobby Brown 3 7
John Rogers Tampa Tracy Lane 3 7
Ryan Smith Sioux Falls Rick James 3 8
Jane Doe Denver Thomas Craig 3 8
John Rogers Tampa Cody Davis 3 8
So if I know the name of the employee and the quarter I'm looking up, the results should display who their last manager was and the location they were in, as these may change month to month.
I have used an INDEX and MATCH array formula:
{=INDEX($B$2:$B$7,MATCH(A12,IF($D$2:$D$7=D12,$A$2:$A$7),0))}
but this just provides the first match and not necessarily the most recent month in that quarter. I attempted to include a MAX function which looked something like this:
{=INDEX($B$2:$B$7,MAX($E2:$E7,MATCH(A12,IF($D$2:$D$7=D12,$A$2:$A$7),0)))}
but that didn't quite get me there either.
What formula do I need to get this to work?
I think I'd choose a PivotTable for its versatility and speed:
I think a pivot table is probably the best option and can easily be modified with the filters when new entries are added to the underlying data. I was working on a solution with a formula, but it requires you to add a lookup column.
The formula for the lookup column is: =E6&" "&H6&" "&I6
I wasn't clear on how the OP was going to be "entering" his employee name and quarter, so I had to make an assumption that it would be in a separate column:
And the formula in column B (which is cumbersome) is:
=VLOOKUP(A6&" "&MAX(IF(H1:H100=NUMBERVALUE(RIGHT(A6,1)),I1:I100)),$D$6:$G$11,3,FALSE)&", managed by "&VLOOKUP(A6&" "&MAX(IF(H1:H100=NUMBERVALUE(RIGHT(A6,1)),I1:I100)),$D$6:$G$11,4,FALSE)
But it works, and as long as the lookup range is adjusted, is scaleable.

Pulling Data in to Excel from SQL Server 2012 and formatting it

I will try and explain as best I can what i am trying to do but I am no Excel genius :-)
I have 1 Excel Sheet which we will call Template this is going to be pre-populated with data and then used offline, I am going to import a Data Set in to a tab called RawData from SQL Server, data set will be something similar to this
Description Rate Hours SellValue Item
APU 2.50 3 7.50 1
APU 2.50 4 10.00 2
APU 2.50 5 12.50 3
INS 2.50 3 7.50 1
INS 2.50 4 10.00 2
INS 2.50 5 12.50 3
There could be more or less records but no more than 7 distinct descriptions.
There is now another Tab called Report
At Position A1 will be the Title APU then underneath that I want the records with the Description APU to appear and this block will shrink and expand depending on the number of records. Then wherever the last record appears will be the next heading INS and the records associated with that description. There is one final tab called Rate and at position A1 will be a rate value, when this is changed, it will take the new rate value and re-calculate the SaleValue in RawData thus amending the figures in the Report. I am sure there are better ways to explain this but I hope someone has got the gist before I loose what hair I have left.
Thanks in advance
The easiest and powerful approach to such problems would be to use Pivot-Tables in excel. You can customize your data view better.

Resources