I'm think of like if I have 2 or more columns with missing values and As per my interpretation i think 1st column's NA value could be 0 (by-fillna() for better result and for 2nd column's its may be mean( or median) or May be 3rd column could be any constant value.(by simpleImputer).
What would be best approach solve it.
Thanks..
If you're trying to create something general, you're better off creating a Pipeline or ColumnTransformer from sklearn.
This answer explains it in detail.
Related
I have a table that shows me a chemical concentration value based on temperature, pH and
ammonia. The way the I measure these variables, the ammonia level are always one of these six values (on top of the table), so it works as a categorical variable.
I need a way to interpolate on this table, based on these 3 variables. I tried using a combination of INDEX and MATCH, but I was not able to achieve what I wanted. Then I thought of "dividing" the table in intervals to "reduce" one variable and use an IF function to select which interval to interpolate based on the third variable (I was thinking pH or Ammonia), but I can't figure out a way to change intervals dynamically like this.
Can anyone think of an alternative to accomplish what I'm trying to do? If possible I would like to avoid using VBA, but if there is no other way I have no problem using it.
Thank you for the help!
I'm attaching an example of the table below.
Assuming that PH is in Column A:
=INDEX(A:H;MATCH(6,8;A:A;0)+MATCH(25;B:B;0)-2;MATCH(2;2:2,0))
Where the -2 needs to be changed to the number of rows BEFORE the first 22 in Temp.
This also assumes that the pattern of 22;25;28 in Temp is the same for every pH
So I have a range that is B2:B20. In this range I have strings, which include, 'I strongly agree,' 'I strongly disagree,' etc. I have multiple columns with each representing a question and which answer was given. I want to count the amount of each occurrence in B2:B20, C2:C20, etc and display them by dragging down a column. Photos are provided below since this is difficult to word.
Basically, I want to count how many of each response each question got (columns in Figure 1) in Figure 2. Ie. in Figure 2, Q1-Strongly agree counts the strongly agrees in Figure 1 column 1. Q2-Strongly agree counts the strongly agrees in Figure 1 column 2. Q3-strongly agree counts the strongly agrees in Figure 1 column 3. I hope this isn't too confusing. Thanks for your guys' time.
Use:
=COUNIFS(INDEX($B$2:$T$20,,ROW($ZZ1)),"*"&U$1&"*")
Where $B$2:$T$20 are the data columns and U$1 is the cell with Strongly Agree in the output.
In Use .corr to get the correlation between two columns
in answer with
Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])
and consulting pandas doc on .corr neither parameters nor example indicate you should put column to be correlated with as a parameter to .corr()
How do you know when and if you should can or should put data frame column reference inside a method like here for .corr?
This is a good point... it is quite frustrating that in the pandas.DataFrame.corr documentation, there's no explanation that the input dataframe must be a 2 column dataframe, nor any discussion on what types those columns ought to be, given the correlation coefficients of choice. That's the kind of thing you could add as a contribution to the pandas project, and I think it would be valuable.
On the other hand, the question you're asking does have an answer on the pandas.Series.corr documentation.
I have advanced Excel/Google Sheets skills. I have more of a conceptual question. I am happy with any solution (Excel or for Sheets, no difference for me).
I have a sheet where various coworkers have access and work with. It is used to define which product needs to go through which steps. Then when a part of a job is done, the status of the product is changed depending on criteria.
You can also think of it as projects and the status of a project.
The 3 examples shows how the data is input by the workers. Sometimes, the "No" cells are empty, sometimes they have a "No", sometimes for the same product, one criterion is empty, the other has a "No".
If I do a nested IF formula, I would have to create 32 of them (I believe, since its 5 criteria with each 2 options).
Obviously I can do that. I was wondering anyone has a better solution for me? Something more practical.
Thanks in advance!
Based on the data you've provided, it looks like your statuses are based on the number of Yes's in the input columns. Also you don't have a status shown for zero Yes's so I'll make an additional for that.
Given that assumption you can use a combination of the COUNTIF function (to count the Yes's), and the IFS function (to manage nested Ifs better) to drastically reduce the size of your function.
To make this cleaner I suggest you add a column and hide it containing: =COUNTIF([InputCriteria1to5Range],"Yes")
For the next formula assume the formula above is in B2. In your status column put the following:
=IFS(B2=5, Status1, B2=4, Status2, B2=3, Status3, B2=2, Status4, B2=1, Status5, B2=0, Status6)
Solution: Thanks to all for your help, I ended up firstly, creating ALL scenarios. This was actually the most complex part. See https://www.mrexcel.com/forum/excel-questions/654871-how-generate-all-possible-combinations-two-lists-without-macro.html (Answer from "Tusharm") where I had to repeat this process 5 times to have all possible outcomes. In the end, there were 192 combinations.
Then, I assigned a status for each combination.
Finally, for each product/row, I created another column where I concatenated the different criteria so that it looks exactly like my above combinations. Then finally index match the concatenated criteria to my combinations.
Let me share the problem, where I am trying to decide the winner list comparing multiple parameters:
First of all, I need to compare the fault points. The less you have the better place you get. If the fault points are equal, then I need to compare the time. Comparing the time, the faster you performed the greater place you get (green column represents the right result).
I have used this formula:
=IF(AA16="";"";COUNTIF($Z$16:$Z$24;"<"&Z16)+1+SUMPRODUCT(--($Z$16:$Z$24=Z16);--($AA$16:$AA$24>AA16)))
However, I get a wrong comparison for the time parameter. My guess is that it is either a small issue I am having or the formula itself is completely wrong.
Thanks in advance.
Use this formula instead:
=RANK(Z16,Z$16:Z$24,1)+SUMPRODUCT((Z$16:Z$24=Z16)*(AA$16:AA$24<AA16))
See image for reference:
Looks like this might be helpful. They have an example related to breaking ties that I think will work for your scenario.
Excel Functions: Rank