I'm analyzing gene expression data from an experiment where I have:
(i) two groups of normal and patient samples;
(ii) for different number of genes ranging from 25000 to 130000 that majority of values are zero;
(iii) the values for different cases are in several files and not just one.
I'm trying to find genes with differential expression between two groups. No idea what script to use?
Related
I am using Excel to build a product recommendation tool, which will take user inputs and use simple calculations to recommend the right product. The results will include a.) a product and b.) a numeric spec of that product.
There are two potential equations I can use in the underlying calculations. In testing, they frequently produce the same result. However, there are cases where they differ. I would like to use simulation with random numbers to understand the situations that produce different results.
Here's a simple structure:
User Input I (random numbers)
User Input II (random numbers)
Equation A Results - Product recommendation and numeric spec
Equation B Results - Product recommendation and numeric spec
Simulation Needs
how often are the results different? On what input conditions is this most likely to happen?
When the products are the same, but the numeric spec is different, what are the stats (stdev, average) of the numeric spec?
what equation is optimal to use in the recommender.
Monte Carlo simulation would seem suitable for this exercise. Formulas and macros are documented here and in videos. I'm struggling with the structure/syntax to compare the outputs of two different equations.
This is a table consisting statistical summary of machining parts manufactured and measured
.
The PIDs are different classes of parts, so a PID 123456 can have 100s of parts under it. Each machining part has 4 attributes to it such as pitch diameter, POW diameter, Major Diameter, Minor diameter. Unfortunately, the report is generated such that the data is in rows and not in its adjacent columns, which would would have been easier to visualize later.
How can I parse/group these row values such that I can store the part info in an object so it has the PID and other measurements with the date manufactured, for each part. I want this sorted information to be able to use it in visualizations later. I would like to differentiate between the parts with the time/date they were manufactured at.
For PID 123456, I have 2 parts and each part has 4 properties. So for example, how could I draw a chart for the upper and lower values for the major diameter or minor diameter of different parts (under the same PID)? Thank you.
I have an organism x with n number of individuals tested. One individual is only tested once. The individuals are trained and tested in a choice array with multiple possible choices. The test choices are of two types p and q with 10 each, amounting to a total of 20 possible choices. The individual is allowed to choose as many times as they want till they are done making choices. What do i do with this data set? How do i analyse the preference?
I want to extract numerical entities like temperature and duration mentioned in unstructured formats of texts using neural models like CRF using python. I would like to know how to proceed for numerical extraction as most of the examples available on the internet are for specific words or strings extraction.
Input: 'For 5 minutes there, I felt like baking in an oven at 350 degrees F'
Output: temperature: 350
duration: 5 minutes
So far my research shows that you can treat numbers as words.
This raises an issue : learning 5 will be ok, but 19684 will be to rare to be learned.
One proposal is to convert into words. "nineteen thousands six hundred eighty four" and embedding each word. The inconvenient is that you are now learning a (minimum) 6 dimensional vector (one dimension per word)
Based on your usage, you can also embed 0 to 3000 with distinct ids, and say 3001 to 10000 will map id 3001 in your dictionary, and then add one id in your dictionary for each 10x.
I'd like to use Excel to generate a randomized lab partner list, without using VB (due to security settings on the PCs).
Parameters are as follows:
Number of students: 10-30, one worksheet per total number desired
Number of partners: Three for first two labs, and two for the other four-five.
Number of lab stations: 10
Repeats: Ideally none, but it is permissible for a student to have a repeat partner from one of the first two labs.
Excel version: 2007
To clarify, each student will have two labs where they share a lab station with up to two other students, giving a maximum lab size of 30 students. After that, they will be strictly limited to two students per station, giving a maximum of 20 students. Each student will have four of these limited labs, with there being a total of five such labs presented, to allow for either odd-numbered classes, or a class size between 21-30.
Each student is simply numbered from 1-30, so a cell could, for instance, state "5, 24" as the two students for that lab station.
True RNG is not important, and in fact, only needs to be performed once to make these matrices.
I think this is a bit tricky without using VBA, but here is one approach that is OK for small groups. I have tried it using a group of just nine so that the screen shot should be readable.
The method is basic Fisher-Yates
A Start with a group of students size n represented by a list of numbers 1 to n.
B Generate a random number r in range 1 to n
C Pick the rth element from the list
D Remove the rth element from the list
E Reduce n by 1
F Repeat from B until n=1.
In Excel:-
Fill A2:A10 and D2:L2 with numbers 1-9
Put the following in B2 and pull down:-
=RANDBETWEEN(1,10-A2)
Put this in C2 and pull down:-
=OFFSET(D2,0,B2-1)
Put this in D3 and pull down and across:-
=IF(D2>=$C2,E2,D2)
The ID's will be in column C so the first three would be in group 1, the next three in group 2 etc.
By the way, your question is a special case of generating non-repeating random numbers - see
Generating unique random numbers without VBA
The array formula described here does it in one step - modified slightly for this problem it would look like
=SMALL(IF(COUNTIF(C$1:C1,ROW(INDIRECT("1:9")))=0,ROW(INDIRECT("1:9"))),RANDBETWEEN(1,(9-ROWS(C$2:C2)+1)))