I am trying to create a data analysis spreadsheet for some test results in a school.
The picture above is some dummy data I am using. I need to highlight using a colour scale conditional formatting for each question, however I want the spreadsheet to know the maximum value and minimum value from the Marks row. The minimum mark for all questions is obviously 1, however the maximum for question 1 would be 3, for question 2 it would be 2 etc.
The problem is when using the stock colour scale, say if no one achieves full marks in a question, the highest mark even though not actual maximum would be highlighted green.
Any help would be greatly appreciated.
Related
I am working on a little data mining project (I am still a Data Science student, not a professional). Maybe you can help me to choose a proper model for my task.
So, let's say we have a table with three columns and around 4000 rows:
YEAR
COLOR
NAME
1900
Green
David
1901
Yellow
Sarah
1902
Green
???
1902
Red
Sarah
…
…
…
2020
Purple
John
Any value for any field can be repeated in the dataset (also Year values).
In the first two columns we don't have missing values, but we only have around 20% of Name values in the third column. Name value deppends somewhat on the first two columns (not a causal relation).
My goal is to extrapolate the available Name values to the whole table and get a range of occurrences for each name value (for example in a boxplot)
I have imagined a process like that, although I am not very sure if statitically it makes sense (any objections and suggestions are appreciated):
For every unknown NAME value, the algorythm choose randomly one of the already known NAME values. The odds of a particular NAME value to be chosen depend on the variables YEAR and COLOR. For instance, if 'David' values tend to be correlated with low Year values AND with 'Green' or 'Purple' values for Color, the algorythm give 'David' a higher probability to be chosen if input values for Year and Color are "1900, Purple".
When the above process ends, the number of occurrences for each name is counted.
The above process is applied 30 times and the results for each name are displayed in a plotbox.
However, I don't know which is the best model to implement an idea similar to this. I have drawn the process in a simple paint drawing:
Possible output for the task
Which do you think it could be a good approach to this task? I appreciate any help.
I think you have the process down, it's converting the data which may be the first hurdle.
I would look at using from sklearn.preprocessing import OrdinalEncoder to encode the data to convert from categorical to numeric.
You could then use a random number generator to produce a number within the range defined by the encoding which would randomly select a name.
Loop through this 30 times with an f loop to achieve the result.
It also looks like you will need to provide the ranking values for year and colour prior to building out your code. From here you would just provide bands, for example, if year > 1985, etc within your for loop to specify the names.
I'm tying to visualize the results of a network transport model. Specifically, I would like to show the cost to deliver to a location from a given plant. Next, for that location, I would like to split the cell into two colors, based on the variable and fixed components of the total shipping cost. I don't think this is possible....but wanted to check if anyone might have done this. Screenshot shown:
I'm hoping to change the size of green cell "Demand" by the amount of the total cost (in this case it would 25), and then split the cell itself into two colors, based on the ratios of the components (so 40% of the area would be variable cost in this case). Has anyone tried this?
I have a pretty dirty workaround for you, but it is doing the trick:
You use conditional formatting and custom number for making it work.
First, in cell D4, type =D5 to get the fixed value. Then, you change the custom number format to “Demand”.
Now, you just need to add conditional formatting with a gradient fill, where the minimum value is 0 and the maximum value is the =D5+D6.
That works, but you must decide if it is practical for you.
So this is the simplified question I broke down from a former question I had here: Excel help on combination of Index - match and sumifs? .
For this one, I have Table1 (the black-gray one) with two or more columns for adjustments for various order numbers. See this image below:
What I want to achieve is to have total adjustments for those order numbers that contain the numbers in Total Adjustment column in the blue table, each of which will depend on the cell beside it.
Example: Order number 17051 has two products: 17051A (Apple) and 17051B (Orange).
Now what I want to achieve in cell C10 is the sum of adjustment for both 17051A and 17051B, which will be: Apple Adjustment (5000) + Orange Adjustment (4500) = 9500.
The formula I used below (and in the image) kept giving me error messages, and this happens even before I add the adjustment for Orange.
=SUMIF(Text(LEFT(Table1[Order Number],5),"00000"),text(B10,"00000"),Table1[Apple Adjustment])
I have spent the whole day looking for a solution for this and didn’t even come close to find any. Any suggestion is appreciated.
Assuming your headers always have the text "adjustment" in them, you could use:
=SUMPRODUCT((LEFT($B$4:$B$7,5)=B10&"")*(RIGHT($C$3:$F$3,10)="adjustment")*$C$4:$F$7)
In C10 you could add two sumproducts. This assumes that products are always 5 numbers long at the start. If not swop the 5 to use the length of the product reference part you are matching on.
=SUMPRODUCT(--(1*LEFT($B$4:$B$7,5)=$B10),$D$4:$D$7)+SUMPRODUCT(--(1*LEFT($B$4:$B$7,5)=$B10),$F$4:$F$7)
Which with table syntax is:
=SUMPRODUCT(--(1*LEFT(Table1[Order Number],5)=$B10),Table1[Apple Adjustment])+SUMPRODUCT(--(1*LEFT(Table1[Order Number],5)=$B10),Table1[Orange Adjustment])
Using LEN
=SUMPRODUCT(--(1*LEFT(Table1[Order Number],LEN($B10))=$B10),Table1[Apple Adjustment])+SUMPRODUCT(--(1*LEFT(Table1[Order Number],LEN($B10))=$B10),Table1[Orange Adjustment])
I am multiplying by 1 to ensure Left, 5 becomes numeric.
I have a table visualization in Spotfire that I want the cell to color red if less than the previous value in the same column. For example:
500
400
800
100
The 400 would be red and the 100 would be red because they are less than the previous.
I searched through questions but didn't find many results (probably due to too many search terms).
Thank you for your help,
Chris Habrock
For this you're going to need two calculated columns because Spotfire, apparently, doesn't allow OVER expressions in color by rules.
The first calculated column is defined as:
RowID() as [rid]
The second one is defined as:
First([num]) OVER (Previous([rid])) as [check]
With this your color rule can be "less than First([check]).
I hope this helps.
I've created a spreadsheet for choosing resistor combinations for an RC Operational Amplifier. I've used a list of available capacitors and resistors for my limiting values to produce values of one of the resistors based on the resistance and capacitance values of the available (standard) components. The values in my tables look like 7.23436793078690. I wish to apply a filter that will find the values closest to a whole number (1592.00188622182000). Then I wish to apply another filter that will compare those values to a list of available resistors and highlight resistors closest to the desired value. Many of the returned values of R2 are negative values so I also wish to filter values of R2<0.
For this spreadsheet I've used the equation R2=(Req)(R1)/(R1-Req), which is an equation to determine Req, for parallel resistors, that is solved for R2. In Column 1, the Rows are populated with values for available (standard) resistors. All other columns are populated with the equation for R2. The value for Req is obtained from another table in the Workbook that uses available (standard) capacitor values. Therefore, Columns B and beyond are labeled R2(C=.47 uF), for example. Essentially, Columns B and beyond reference the available (standard) capacitor values.
I wish to highlight the values I discussed in the first paragraph so I can quickly scan the workbook for the best possible value of R2. Then I can quickly determine the values of R1 and C to complete my task and minimize the tolerance for the given op-amp application.
I have some C++ programming knowledge and I have enough experience with Excel so I should be able to understand where and how to do what I wish to do but I wish to get some advice and direction from a more experienced Excel user.
***UPDATE***
Since my first post, I've done some research. It seems like the easiest approach would be to apply a "closest to" filter. I've attached a screenshot of a small portion of my workbook, which contains the equation for the "closest to" filter, a partial range of available resistor values, and the results for my filter. I have multiple tabs in my workbook.
I lied. I'm unable to post an image until I gain 10 reputation. I have 6 reputation. If you're reading this post and you're able to contribute to my reputation, please contribute.
This is my equation: =INDEX(A3:BZ26,MATCH(MIN(ABS(A3:BZ26-CB3)),ABS(A3:BZ26-CB3),0))
The equation format is: =INDEX(rng,MATCH(MIN(ABS(rng-value)),ABS(rng-value),0))
My formula seems to be correct but it returns "#VALUE!".