I have an excel sheet with over 1000 columns and 11000 rows - all with numeric data. Within the data, there are missing values represented with '*'.
I would like to replace all of the '*' values with the average of the column that it is in.
Doing this manually would take a long time, so is there a formula that would achieve this?
Thanks so much in advanced for any help.
I can give you a three sheet solution Sam?:
Sheet 2:
Cell A1=
=AVERAGE(Sheet1!A:A)
Paste that along the top row for each of 1000 columns in sheet 2.
Sheet 3:
Cell A1=
=IF(Sheet1!A1="*",Sheet2!A$1,Sheet1!A1)
Copy that and then paste it into the entire worksheet 3 (i.e., that top left corner symbol that allows you to do that). It's gonna take a while to update but will deliver what you want!
As you have mentioned machine learning I thought I would introduce you to how you could do this with Azure Machine Learning Studio (AML) using a free account.
By using AML you gain access to a number of methods for replacing missing values which are extremely quick. AML has a Clean Missing Data module which exposes methods of replacement such as Multivariate Imputation using Chained Equation, Mean, Median and several others. The great thing here is you can visualize the dataset columns by right clicking on the dataset and see which columns have skew. You can then select on a column by column basis which replacement method to use. If you have heavily skewed columns you might use median instead for instance. This also offers great opportunities for data normalization (scale and reduce). You also gain access to using Python and R with your dataset.
I don't know if there is a method for directly treating "*" as missing values, I am trying to find that out, but if you do a little processing in advance of load then all is fine. The step before loading requires:
Export the sheet as a CSV and save it.
Use Ctrl+ F to bring up the find and replace dialog and enter "~*" for Find and leave Replace blank
Then login into AML and click the + New at the bottom of the screen
Select New > DATASET > FROM LOCAL FILE and select your file
When selecting type ensure to select CSV with no header if you data has no header row or with header if it does:
Your dataset will start uploading as shown by progress bar at bottom of screen and then appear in the SAVED DATASETS collection.
Click the + New button again and select EXPERIMENT > BLANK EXPERIMENT
Drag and drop your saved dataset onto the canvas on the right:
In the Search experiment items box on the right, type: Clean Missing Data
then drag the module that appears onto the canvas
Join the 2 boxes by clicking the dot at the bottom of the top box and dragging to the other box
Select the bottom box and then input the following parameters on the right (here is where you can choose which method to apply for missing values e.g. replace missing with mean, or perhaps median if your column data is skewed.
Right click the bottom module and select Run selected
Right click again and select Cleaned dataset > Save as Dataset
The progress bar at the bottom will inform you when complete
Type in the Search experiment items box again: convert to csv and drag that onto the canvas and connect the left hand side bottom of the second module to the top of the newly added third:
Select the bottom module and right click > Run selected
Wait for the progress bar to complete.
Right-click the bottom module and hit Download. Done.
Related
I need to create an Excel template with f.e following columns and data:
Example of an Excel Table
and so on. So my goal is to set up the filter in such a way, that when I choose A1 from A column, I can get all the related B,C and D values, like B1,C1,D1; B1.1,C1.1,D1.1;B1.2,C1.2,D1.2
With the normal filter I can only see B1,C1,D1 when I choose A1.
The only solution I came up with, is writing A1 in every row that has relevant B 1.x ,C 1.x and D 1,x. Then I can see all relevant stuff, but this solution is not the most effective one(especially when values inserted will be a small text, writing same sentences in each row makes it look a bit messy)
The Pivot Table also does not recognize B1.1 C1.1,D1.1; B1.2 C1.2,D1.2 as related to A1. Even when I copied A1 in every row, it still couldn't sort it correctly.
Can you please help me with these quesitons? Many thanks in advance!
Kamola.
Update: I created a following example for clarification, hope it will help a bit! Unfortunately I cannot share the Excel Sheet per Stack Overflow, so here is a screenshot of it:Example of the content
MANUAL APPROACH
In your example which is a Table but not a pivot table,
Highlight Column Problems in your table;
Press Ctrl+G on your keyboard to bring out the Go To window;
Click Special... button at the left corner of the window;
Select Blanks then click OK;
Do not alter anything, go to the formula bar and enter =A2, then hold Ctrl key and press Enter.
If you have done the above steps correctly, you should have column A filled with Problem IDs.
POWER QUERY APPROACH
FYI, if you are using Excel 2010 Professional Plus or later versions of Excel, you can add your data table to Power Query Editor, right click the column header of the first column and select Fill -> Down to quickly fill the column with all Problem ID.
PIVOT TABLE APPROACH
If you want to show row labels in each line in a pivot table (as mentioned in your post), click somewhere within the pivot table, go to Design tab in the Excel ribbon, click Report Layout and select Repeat All Item Labels.
Let me know if you have any questions. Cheers :)
I am experiencing the following issue in Microsoft Excel for Mac 2016. When using the From Text data import function and clicking through the first two steps (selecting "Delimited" and defining delimiter), Step 3 allows for changing the column data format. By default, column 1 is selected and I can click to select any other column that fits into the window, as shown in the screenshot below. However, it seems that it is not possible to select columns that are to the right of those visible in the window. In my example below, there are about four more columns to the right. Scrolling and arrow-keys seem to not work in the Mac version.
Is there a keyboard shortcut, or some other workaround, that I am missing?
Perform a two-finger touch (not a complete tap), more like a contact. A slider bar will appear at the bottom of the columns. Now is your chance to grab it with one-finger click. If you begin with the pointer in the bottom part of the columns, it is much easier to grab the slider as it should be under the pointer. This probably works on other such windows.
I have a SSRS report that when I export to Excel, creates unwanted columns when viewed in Excel. What would the best way to go about ensuring no additional columns are created. I have tried setting the location of the table rows to 0in, 0in but that did not resolve the problem. The attached screen shot is what the report looks like in both Visual Studio and Excel.
There are two ways to approach this:
Align everything:
You need to align your textboxes with the main tablix to remove the unwanted columns.
So the first expression after the main tablix start, Align left with the Patient Name and right with the right of state text box.
Second Expression align the left with the left of Phone text box and align the right of expr with the right of state.
Same thing you need to do with all the text boxes. If they don't align you will get the extra columns.
Align Left by moving the left column of textbox to match with the table. You will see blue line which indicates if the report items are aligned.
Aligning right using mouse
Also if you select multiplie object you align them using Format >> Align Menu option.
Create Tables to handle the alignment
Create tables without any groupings or detail. Delete the groupings as shown below.
Then add your report items in that table. One table before the main tablix and one after it. Make sure it doesn't give you any data otherwise you might get duplicate info.
It is lot easier to align table then to align 20 text boxes.
I have used both methods. If there are few items I will use 1. If there are lot more then I use approach 2.
Here I have a chart
I did a right-click -> "Add labels" , and it read them from my a(H/C) row. Basically, I want it to read label values from the CO2/CH4 row instead, so they would be 0,0.5,1,2,5,10 instead. Of course, I want the chart itself to remain the same, so, the x values of dots are in row "b(O/C)", their y values are in "a(H/C)" row, and their respective labels are read from "CO2/CH4". Can it be done automatically and how (preferrably, without scripting magic)? Rewriting them manually is a pain, really.
You will get the desired results by following the steps below:
Step 1: Click on the Chart
Step 2: Select the Design Tab in Ribbon Bar (Note: “Design Tab” appears only when the Chart is selected)
Step 3: Click on “Select Data” feature in the Design Tab as shown in Screen Shot 1
Step 4: Click on Edit Button as shown in Screen Shot 2
Step 5: Change the Series Name Rage and the data range in “Series Y Values:” as highlighted in Screen shot 3
What about adding the different points as different series and using the series names as labels (instead of the y-values) ?
If you need the "line" between the points (or if you need to add a trendline...), keep the serie you already have (with every point) without labels
Excel 2013 added the capability to use text from worksheet cells as data point labels. If you don't have 2013 (your screen shot looks like 2010), or even if you do, you can use Rob Bovey's free Chart Labeler add-in
I have been using MSWord 2010 to compose list of questions. These questions are organized in single MSWord document, using numbering - 1. first question, etc...
I was wondering could contents of each bullet be transffered to MSExcel cell? So if i have 20 questions, i would have cell with 20 rows, each containing one question.
I am asking this because i have 300 questions that i want to import to excel.
It's possible to copy your numbered bullets from Excel to Word and then break them up using Excel worksheet functions. However, it's real easy to just do it with the built-in Excel commands.
In Word:
Increase the width on the hanging indent on your numbered list. It will make the conversion in Excel easier to deal with.
Select your bullets and copy them.
In Excel:
"Paste Special" the copied text into Excel using the Match Destination Formatting option.
Select the cells you pasted the bullets by the number of digits in the bullets (i.e., first do 1-9, then do 10-99, etc.)
With the cells selected, choose the Text to Columns command from the Data tab on the ribbon.
Make sure that the 'Fixed Width" radio box is selected on the dialogue box that comes up, then move to the next step.
Adjust the break lines so that there are three fields: one with the number + period, another the spaces between the numbers and text, the third the text.
Moving to the next step - select the second field (the spaces) and click the "Do not import column (skip) radio button.
Click finish and the bullets are imported.
The above answer is best if you have an already established list. The best workflow I've found for this is to create a table to work in, in word. That table then copies perfectly into cells in excel, allowing you to create a structure that will pass between the tow docs seamlessly.