Subset misses values - subset

I'm pretty new to coding, but it seems that my subset is missing values and I'm wondering what i am doing wrong. So, I have a data frame called «df_envel» with 4 colums : Elevation, distance, profil, date. I am trying to subset this dataframe to get only values that equals -0.1 m. I have tried multiple subset methods but all methods misses some -0.1 values and put some NA's instead. Here's the subset code lines I tried which all returns to the same number of values:
Here is my code:
f<- df_envel[which(df_envel$Elevation=='-0.1'),]
f<- df_envel %>% filter(Elevation == '-0.1')
f<- subset(df_envel, Elevation %in% '-0.1')
Does anybody know what I might be doing wrong?

I finally resolved it by changing the data frame into a matrix, change it to numeric, subset and than turn it back into a data frame. I don't really know why it worked, but it did!
df_envel <- as.matrix(df_envel)
df_envel[,c(1,2)] <- as.numeric(df_envel[,c(1,2)])
f <- df_envel[ which(df_envel[,'Elevation']=='0'),]
f <- as.data.frame(f)

Related

Excel Formula LET function based on Criteria

Self-Learning Excel, Im new with the LET function and i am Looking for a fix to the below formula, where B5144 is user input for Date in which the LET function will create a table based on user input in B5144 (Filter records where date values are matching with B5144).
Either using an IF/ Filter function as validation before generating a result. Where only records in the table that match the date value in B5144 will be displayed.
=SORT((LET(x,UNIQUE(D2:D5140),y,SUMIFS(J2:J5140,D2:D5140,x),CHOOSE({1,2},x,y))))
If you are going to use a dynamic array (filter) you are going to have to start with the filter and work your way out:
FILTER(A2:J5140,A2:A5140=B5144)
Once you have this "sub array" it is the only thing you will use, you will make no further references to A:xxx or J:xxx.
LET(results,FILTER(A2:J5140,A2:A5140=B5144),
A, index(results,,1),
x, index(results,,4),
y, index(results,,10) )
And then move on from there.
results is a variable height x 10 column width array.
A, x, and Y are the same variable height x 1 column width array.
I believe your "unique" function is going to create a hot mess unless you are really clear in your logic. Array "x" will often be shorter in length than A and y, and this may or may not be what you want.
You may have to nest Filters: =Filter( Filter( xx by date ) by unique D )
In this way, you always end up with a matched A, x, and y array.
Hope this helps get you in the right direction. Let us know if you need more.
———-
Edit: I only recently came to overflow from stackexchange so I don’t have enough points to make comments yet, so I’ll respond here.
My formula above just does the filter and sets the definitions, are you then entering your desired formula at the end just inside the final parenthesis?

Search data in variable table Excel

In my Excel file, I have data split up over different tables for different values of parameter X.
I have tables for parameter X for values 0.1, 0.5, 1, 5 and 10. Each table has a parameter Y at the far left that I want to able to search for with a few data cells right of it. Like so:
X = 0.1
Y
Data_0
Data_1
Data_2
1
0.071251
0.681281
0.238509
2
0.283393
0.509497
0.397196
3
0.678296
0.789879
0.439004
4
0.788525
0.363215
0.248953
etc.
Now I want to find Data_0, Data_1 and Data_2 for a given X and Y value (in two separate cells).
My thought was naming the tables X0.1 X0.5 etc. and when defining the matrix for the lookup function use some syntax that would change the table it searches in. With three of these functions in adjacent cells, I would obtain the three values desired.
Is that possible, or is there some other method that would give me the result I want?
Thanks in advance
On the question what would be my desired result from this data:
I would like A1 to give the value for the X I'm searching for (so 0.1 in this case)
A2 would be the value of Y (let's pick 3)
then I want C1:E1 to give the values 0.678... 0.789... 0.439...
Now from usmanhaq, I think it should be something like:
=vlookup(A2,concatenate("X",A1),2)
=vlookup(A2,concatenate("X",A1),3)
=vlookup(A2,concatenate("X",A1),4)
for the three cells.
This exact formulation doesn't work and I can't find the formulation that does work.

How to calculate with the Poisson-Distribution in Matlab?

I’ve used Excel in the past but the calculations including the Poisson-Distribution took a while, that’s why I switched to SQL. Soon I’ve recognized that SQL might not be a proper solution to deal with statistical issues. Finally I’ve decided to switch to Matlab but I’m not used to it at all, my problem Is the following:
I’ve imported a .csv-table and have two columns with values, let’s say A and B (110 x 1 double)
These values both are the input values for my Poisson-calculations. Since I wanna calculate for at least the first 20 events, I’ve created a variable z=1:20.
When I now calculated let’s say
New = Poisspdf(z,A),
it says something like non-scalar arguments must match in size.
Z only has 20 records but A and l both have 110 records. So I’ve expanded Z= 1:110 and transposed it:
Znew = Z.
When I now try to execute the actual calculation:
Results = Poisspdf(Znew,A).*Poisspdf(Znew,B)
I always get only a 100x1 Vector but what I want is a matrix that is 20x20 for each record of A and B (based on my actual choice of z=1:20, I only changed to z=1:110 because Matlab told that they need to match in size).
So in this 20x20 Matrix there should always be in each cell the result of a slightly different calculation (Poisspdf(Znew,A).*Poisspdf(Znew,B)).
For example in the first cell (1,1) I want to have the result of
Poisspdf(0,value of A).*Poisspdf(0,value of B),
in cell(1,2): Poisspdf(0,value of A).*Poisspdf(1,value of B),
in cell(2,1): Poisspdf(1,value of A).*Poisspdf(0,value of B),
and so on...assuming that it’s in the Format cell(row, column)
Finally I want to sum up certain parts of each 20x20 matrix and show the result of the summed up parts in new columns.
Is there anybody able to help? Many thanks!
EDIT:
Poisson Matrix in Excel
In Excel there is Poisson-function: POISSON(x, μ, FALSE) = probability density function value f(x) at the value x for the Poisson distribution with mean μ.
In e.g. cell AD313 in the table above there is the following calculation:
=POISSON(0;first value of A;FALSE)*POISSON(0;first value of B;FALSE)
, in cell AD314
=POISSON(1;first value of A;FALSE)*POISSON(0;first value of B;FALSE)
, in cell AE313
=POISSON(0;first value of A;FALSE)*POISSON(1;first value of B;FALSE)
, and so on.
I am not sure if I completely understand your question. I wrote this code that might help you:
clear; clc
% These are the lambdas parameters for the Poisson distribution
lambdaA = 100;
lambdaB = 200;
% Generating Poisson data here
A = poissrnd(lambdaA,110,1);
B = poissrnd(lambdaB,110,1);
% Get the first 20 samples
zA = A(1:20);
zB = B(1:20);
% Perform the calculation
results = repmat(poisspdf(zA,lambdaA),1,20) .* repmat(poisspdf(zB,lambdaB)',20,1);
% Sum
sumFinal = sum(results,2);
Let me know if this is what you were trying to do.

pandas - convert Panel into DataFrame using lookup table for column headings

Is there a neat way to do this, or would I be best off making a look that creates a new dataframe, looking into the Panel when constructing each column?
I have a 3d array of data that I have put into a Panel, and I want to reorganise it based on a 2d lookup table using 2 of the axes so that it will be a DataFrame with labels taken from my lookup table using the nearest value. In a kind of double vlookup type of a way.
The main thing I am trying to achieve is to be able to quickly locate a time series of data based on the label. If there is a better way, please let me know!
my data is in a panel that looks like this, with items axis latitude and minor axis longitude.
data
Out[920]:
<class 'pandas.core.panel.Panel'>
Dimensions: 53 (items) x 29224 (major_axis) x 119 (minor_axis)
Items axis: 42.0 to 68.0
Major_axis axis: 2000-01-01 00:00:00 to 2009-12-31 21:00:00
Minor_axis axis: -28.0 to 31.0
and my lookup table is like this:
label_coords
Out[921]:
lat lon
label
2449 63.250122 -5.250000
2368 62.750122 -5.750000
2369 62.750122 -5.250000
2370 62.750122 -4.750000
I'm kind of at a loss. Quite new to python in general and only really started using pandas yesterday.
Many thanks in advance! Sorry if this is a duplicate, I couldn't find anything that was about the same type of question.
Andy
figured out a loop based solution and thought i may as well post in case someone else has this type of problem
I changed the way my label coordinates dataframe was being read so that the labels were a column, then used the pivot function:
label_coord = label_coord.pivot('lat','lon','label')
this then produces a dataframe where the labels are the values and lat/lon are the index/columns
then used this loop, where data is a panel as in the question:
data_labelled = pd.DataFrame()
for i in label_coord.columns: #longitude
for j in label_coord.index: #latitude
lbl = label_coord[i][j]
shut_nump['%s'%lbl]=data[j][i]

How to change stringified numbers in data frame into pure numeric values in R

I have the following data.frame:
employee <- c('John Doe','Peter Gynn','Jolie Hope')
# Note that the salary below is in stringified format.
# In reality there are more such stringified numerical columns.
salary <- as.character(c(21000, 23400, 26800))
df <- data.frame(employee,salary)
The output is:
> str(df)
'data.frame': 3 obs. of 2 variables:
$ employee: Factor w/ 3 levels "John Doe","Jolie Hope",..: 1 3 2
$ salary : Factor w/ 3 levels "21000","23400",..: 1 2 3
What I want to do is to convert the change the value from string into pure number
straight fro the df variable. At the same time preserve the string name for employee.
I tried this but won't work:
as.numeric(df)
At the end of the day I'd like to perform arithmetic on these numeric
values from df. Such as df2 <- log2(df), etc.
Ok, there's a couple of things going on here:
R has two different datatypes that look like strings: factor and character
You can't modify most R objects in place, you have to change them by assignment
The actual fix for your example is:
df$salary = as.numeric(as.character(df$salary))
If you try to call as.numeric on df$salary without converting it to character first, you'd get a somewhat strange result:
> as.numeric(df$salary)
[1] 1 2 3
When R creates a factor, it turns the unique elements of the vector into levels, and then represents those levels using integers, which is what you see when you try to convert to numeric.

Resources