Subset data based on multiple columns to remove a specific group of data - subset

I have a data set with the columns Time, fluoresence,compound concentration, and replicate. I plotted the data in ggplot by facet wrapping giving me:
all data
I want to remove the blue line in the center plot where concentration =.316 and replicate =3. I tried:
data %>%
subset(Concentration_uM!=0.316 & replicate!=3)%>%
ggplot()+ ...
which gave me:
subset not as desired.
Note that this removes all of the data where concentration =0.316, then removes all of the data where replicate =3. What I want is to remove only the data where concentration = 0.316 and also replicate =3.
What I want is:
desired subset
I know that I can achieve this by pasting concentration and replicate vectors together into a new vector and subsetting by that (essentially how I made the desired plot), but am wondering if there is a way to avoid creating a new vector.

Related

How to split a Pandas dataframe into multiple csvs according to when the value of a column changes

So, I have a dataframe with 3D point cloud data (X,Y,Z,Color):
dataframe sample
Basically, I need to group the data according to the color column (which takes values of 0,0.5 and 1). However, I don't need an overall grouping (this is easy). I need it to create new dataframes every time the value changes. That is, I'd like a new dataframe for every set of rows that are followed by and preceded by 5 zeros (because single zeros are sometimes erroneously present in chunks of data that I'm interested in).
Basically, the zero values (black) are meaningless for me; I'm only interested in the 0.5 (red) and 1 values (green). What I want to accomplish is to segment the original point cloud into smaller clusters that I can then visualize. I hope this is clear. I can't seem to find answers to my question anywhere.
First of all, you should understand the for loop well. Python is a great programming language for using the code of any library inside functions and loops. Let's say you have a dataset and you want to navigate and control column a. First, let's start the loop with the "for i in dataset:" code. When you move to the bottom line, you have now specified the criteria you want with the code if "i[a] > 0.5:" in each for loop. Now if the value is greater than 0.5, you can write the necessary codes to create a new dataset with all the data of the row you are in. In terms of personal training, I did not write ready-made code.

Paraview : grid interpolation and merging data

I have a paraview multiblock dataset containing blocks holding two different vtk UnstructuredGrids. I want to interpolate data from a grid to another and handle them simultaneously.
Here is what I do :
I use the Extract Block filter twice to separate the data from the two blocks (please note that the data are still of the "multiblock" type (seen in the information tab)).
Using the Resample With Dataset filter, I'm able to interpolate the data held on block 2 (coarse grid) on the grid of block 1 (finer grid).
My issue comes on step 3. :
I'd like to use the Append Attributes filter to handle simultaneously data of block 1 and data interpolated from block 2, but my problem is that this filter is not available.
If the two datasets come from two separate UnstructuredGrids (no multi-block) structures, the Append Attributes is available and I can do what I want.
To circumvent this behavior, I have to apply the Merge Blocks filter after step 1. Note that the output of this last filter is not anymore of "multiblock" type but is now of "UnstructuredGrid" type.
This is too tricky and not intuitive, could someone explain what is the rational behind it?
You do not need Append Attributes to get both data. Just check the "Pass Point Data" and "Pass Cell Data" checkbox in the Ressample With DataSet filter.
As per why Append Attributes filter is not available in your case, there can be different reasons. If you are using ParaView 5.8.0, it can tell you why.
Just hover over the grayed-out filter in Filters -> Alphabetical, the reason will be written in the status bar.

Giving custom variable to `hue` in sns.pairplot (Seaborn)

I have the air quality(link here) dataset that contains missing values. I've imputed them while creating a dummy dataframe[using df.isnull()] to keep track of the missing values.
My goal is to generate a pairplot using seaborn(or otherwise - if any other simpler method exists) that gives a different color for the imputed values.
This is easily possible in matplotlib, where the parameter c of plt.plot can be assigned a list of values and the points are colored(but the problem is I can plot only against two columns and not a pairplot). A possible solution is to iteratively to create subplots against pairs of columns(which can make the code quite complicated!!)
However, in Seaborn (which already has the builtin function for pairplot) you are supposed to provide hue='column-name' which is not possible in this case as the missingness is stored in the dummy dataframe and need to retrieve the corresponding columns for color coding.
Please let me know how I can accomplish this in the simplest manner possible.

Combining dimensions with non overlapping data (Tableau)

My data source is an 'outer join' of data from three distinct excel sheets with non-overlapping data. Each sheet has the same fields for filtering and the same two dimensions for a desired graph, ID and Reason. I want to create a basic bar chart that has the Reasons across all three sheets on a single horizontal axis and a CountD(ID) on the vertical axis.
How can I combine the three separate dimensions into one dimension? Should I use a calculated field?
Let me know if you need further information.
Alright, with a bit more digging I was able to figure this one out (with the help of the second half of this post
I just needed to select all three of the dimensions I wanted to combine, right click, hover over "Transform", and then select "Merge Mismatched Fields".
This worked was the proper solution for me because there were no overlapping data in the three relevant columns. Wherever one had a value, the other two were always null.
Note: I am using Tableau 10

Dynamic Range with Categorical Variables

I'd like to sort a time series of exam performance by one of three categories:
Ideally, a function would sort the scores by "difficulty" while still preserving chronological order. I'd like to do this without filters etc. Something like this is very close, but not quite there. Do I need to use dynamic ranges? Or can I just define data ranges in the table dialog with VLOOKUP or INDEX/MATCH?
I'm thinking a bar graph would be the easiest way to illustrate the data, but I'm open to suggestions. New scores are added every day, with varying difficulties.
Here is the spreadsheet if anyone would like to look it over.
EDIT:
The output visualization could be, for example, a clustered bar graph, but with only one label per category. The idea is that I'd like to preserve chronological order without necessarily having to mark it on the graph.
Would there, for instance, be a quick-and easy and formula-driven way to put these 14 and 17 values for "score" all together under one label? I feel like 17 bar graphs clustered too closely would be hard to read.
I realize this is more of a formatting than a formula issue, but I appreciate input with regards to both.
I would recommend you add a Table over the data in the workbook. One for verbal and one for math. The upside is that it will automatically grow with your data as you add new rows. This is very helpful because charts and other things will automatically refer to the new data. Add one with CTRL+T or Insert->Table on the Ribbon.
Once you have the Table, you can easily do the sorting bit by adding a two column sort onto the Table. This menu is accessible by right clicking in the Table and doing Sort->Custom Sort. Again, the Table is nice here because it will only sort the data within it (not the whole sheet) and will remember your settings. This lets you add new data and simply do Data->Reapply to get it to sort again. Your sort on Difficulty is going to be alphabetic unless you add a number at the front. Here is the sorting step:
With this done, you can create a quick chart based on that data. For the "implicit chronology" you can simply plot score vs. difficulty for all of them since they are sorted.
To get closer to that matrix style display, you can easily create a PivotTable based on this Table and let it do the organizing by date/difficulty. Here is the result of that. I am using Average as the aggregation function since it appears that no dates have more than 1 score. If they did, it would be a better choice than Sum.

Resources