Selection based on several inputs without extreme duplication - excel

I have a library of data that i need to pull specific rows from, at the moment i have an ID made up of several dropdown menus =$C$2&$F$2... that i compare to an index made up of a combination of column content: =[#Column1]&[#Column2]... that i then use to pull the right data for that instance with VLOOKUP.
Now however i need a much more varied set with more selections, 5 columns worth. That creates 16 sets for every index on the first column and will generate thousands of lines if i am to create one version of every permutation.
The best scenario would be a way to use a modular form of the selections above, if there is any input on X, Y and Z then it functions like now, but if Y and Z are empty it only pulls X. Easy in theory but i dont know the format it will have to take, and it gets even more complicated if i want X and Z for instance, or Y and Z, but still create a neat list of the selections.
An alternative might be a way to pull tables based on a selection, and make one table for every "part" of my query but i cant find a way to do that either.
What i need is any way to pull and combine several rows from a library (based on dropdown or similar input) and assembled in a neat list that i can print.
First post, and thanks in advance =)

Related

How to split a Pandas dataframe into multiple csvs according to when the value of a column changes

So, I have a dataframe with 3D point cloud data (X,Y,Z,Color):
dataframe sample
Basically, I need to group the data according to the color column (which takes values of 0,0.5 and 1). However, I don't need an overall grouping (this is easy). I need it to create new dataframes every time the value changes. That is, I'd like a new dataframe for every set of rows that are followed by and preceded by 5 zeros (because single zeros are sometimes erroneously present in chunks of data that I'm interested in).
Basically, the zero values (black) are meaningless for me; I'm only interested in the 0.5 (red) and 1 values (green). What I want to accomplish is to segment the original point cloud into smaller clusters that I can then visualize. I hope this is clear. I can't seem to find answers to my question anywhere.
First of all, you should understand the for loop well. Python is a great programming language for using the code of any library inside functions and loops. Let's say you have a dataset and you want to navigate and control column a. First, let's start the loop with the "for i in dataset:" code. When you move to the bottom line, you have now specified the criteria you want with the code if "i[a] > 0.5:" in each for loop. Now if the value is greater than 0.5, you can write the necessary codes to create a new dataset with all the data of the row you are in. In terms of personal training, I did not write ready-made code.

Taking means of irregular amounts data

I'm not able to take the means for a large dataset given that the amount of attributes is irregular.
I have posted a simplified case for the problem. It explains the problem very well.
An idea that I came up with: Make a filter to condition on a single attribute. However, still, I don't see a way to do this in an efficient way (other then doing it all by hand).
see excel file:
All help is much appreciated.
I'm basically looking for a function/method to achieve taking means of all different attributes conditioned on each person for a large dataset without doing it by hand.
You can use AVERAGEIFS() inside an IF:
=IF(OR(A2<>A1,B2<>B1),AVERAGEIFS(C:C,A:A,A2,B:B,B2),"")
the ifrst part of the if tests whether the row starts a new group either by the person or the attribute changing. Then it uses AVERAGEIFS() to return the correct average of that group. otherwise it returns a blank
What you want to do can be accomplished very simply with a pivot table.
Simply select one of the cells inside the range of data you want to process(See the video for general use of a pivot table https://www.youtube.com/watch?v=iCiayB6GrpQ )
go the insert tab and insert pivot table.
Once you have it, simply check people, attribute, and values. Then drag people and attribute into rows, drag valut into the values window, select the drop down list and change it from sum of value to average and you should be done. https://i.stack.imgur.com/nYEzw.png

Outputting text values

I am unsure Excel would be able to do this automatically. I hope it can but maybe not.
I am trying to work with another member of staff in a different building. I have created a table trying to identify where the flow of some of the work is coming from. I am looking to try and count the amount of instances of text within a column. The problem is that the text can be pretty dynamic. As an example:
Consultant
a
a
b
a
b
a
b
z
c
c
c
Is there a way I can get excel to count the instances of text within the column, then create a table with the totals of the counts in it with labels.
I looked at pivot tables and that didn't seem to want to play ball.
The simplest way to do this is using COUINTIF
=COUNTIF(A:A,"a")
Which will simply tell you how many times "a" appears in the Column A.
You could easily duplicate this for every letter of the alphabet. Then use a summary table to display the results.

Dynamic Range with Categorical Variables

I'd like to sort a time series of exam performance by one of three categories:
Ideally, a function would sort the scores by "difficulty" while still preserving chronological order. I'd like to do this without filters etc. Something like this is very close, but not quite there. Do I need to use dynamic ranges? Or can I just define data ranges in the table dialog with VLOOKUP or INDEX/MATCH?
I'm thinking a bar graph would be the easiest way to illustrate the data, but I'm open to suggestions. New scores are added every day, with varying difficulties.
Here is the spreadsheet if anyone would like to look it over.
EDIT:
The output visualization could be, for example, a clustered bar graph, but with only one label per category. The idea is that I'd like to preserve chronological order without necessarily having to mark it on the graph.
Would there, for instance, be a quick-and easy and formula-driven way to put these 14 and 17 values for "score" all together under one label? I feel like 17 bar graphs clustered too closely would be hard to read.
I realize this is more of a formatting than a formula issue, but I appreciate input with regards to both.
I would recommend you add a Table over the data in the workbook. One for verbal and one for math. The upside is that it will automatically grow with your data as you add new rows. This is very helpful because charts and other things will automatically refer to the new data. Add one with CTRL+T or Insert->Table on the Ribbon.
Once you have the Table, you can easily do the sorting bit by adding a two column sort onto the Table. This menu is accessible by right clicking in the Table and doing Sort->Custom Sort. Again, the Table is nice here because it will only sort the data within it (not the whole sheet) and will remember your settings. This lets you add new data and simply do Data->Reapply to get it to sort again. Your sort on Difficulty is going to be alphabetic unless you add a number at the front. Here is the sorting step:
With this done, you can create a quick chart based on that data. For the "implicit chronology" you can simply plot score vs. difficulty for all of them since they are sorted.
To get closer to that matrix style display, you can easily create a PivotTable based on this Table and let it do the organizing by date/difficulty. Here is the result of that. I am using Average as the aggregation function since it appears that no dates have more than 1 score. If they did, it would be a better choice than Sum.

Standard Deviation Excel from two different lists

I am currently trying to find a z score for my values. To start off, I am drawing two separate lists from a different sheet to try to find the standard deviation. Currently I have
(G11-(AVERAGE(INDEX('Asia Last Yields'!DU:DU,
MATCH(B3,'Asia Last Yields'!DT:DT)):INDEX('Asia Last Yields'!DU:DU,
MATCH(B1,'Asia Last Yields'!DT:DT)))-AVERAGE(INDEX('Asia Last Yields'!DL:DL,
MATCH(B3,'Asia Last Yields'!DK:DK)):INDEX('Asia Last Yields'!DL:DL,
MATCH(B1,'Asia Last Yields'!DK:DK)))))/(need the standard dev)
This is basically (original value - mean) right now. As you can see it is pretty complicated right now, but basically my problem is since I am drawing from two lists that do not form an array, I cannot simply just use the Standard Deviation function. Is there anyway to combine the two lists without creating a seperate list within the worksheet?
Let me know if I wasn't clear, and thanks for the help!

Resources