Why is Matlab is not reading an empty cell while running an if statement? - string

I am using the loop below to isolate instances where data was recorded versus those with no data. The data set is very large (varying from 1000-6000 depending on the column) and of mixed data types, so the only practical solution I can think of is using loops.
I can't get the if or while statement to accurately read a blank space. It runs without any errors if I use a for loop, but it never enters the first half of the if-meaning I end up copying, not separating my data. The varying sizes of data make a for loop undesirable.
while (isempty(andover_all{j,1})==1)
if andover_all{h,33}=='';
current_data{k,4}= formated_date{j};
k=k+1;
else
current_data{i,1}=formated_date{j};
current_data{i,2}=andover_data{33}(j);
i=i+1;
end
h=h+1;
end
Andover_all is an array of strings, current_data and andover_data are cell arrays with mixed data types. I have tried using isempty, [], cellfun(#isempty,andover_data), and a function eq.m that allows me to compare cell elements-none of them work. I also don't want to remove empty cells from the data, just skip over them.
Please let me know if you have any ideas

The empties are indeed something to get used to. It's like working with inf or NaN; what should things like NaN==NaN or 1/0==inf return? There's special rules for these guys. Simple ones, but you have to get to know them. To make all the special rules for these guys less of a burden, more intuitive and more readable, MATLAB has special functions for them: isinf (to detect inf), isnan (to detect NaN) and isfinite (to detect either inf or NaN).
The empties also have special behavior and special rules that require some getting used to. If you think about it, it all makes sense in the end: What should []==[] return? or 1==''?
Empty, of course. And even if []==false is empty, [] is false when evaluated by an if. Easy right? :)
Unfortunately, there is no equivalent of isinf or isnan to detect empties of a specific type (there is no isemptycell or isemptychar etc.) There is an equivalent of isfinite for empties (which is isempty), which catches either '', {}, or [].
But sometimes it is desirable to have checks for specific empties, as in your case. The empties preserve their class. This means, {} is really a cell, and [] really an array of doubles.
Therefore, to detect empty cells:
>> a = {};
>> iscell(a) && isempty(a)
ans =
1
to detect empty strings:
>> a = '';
>> ischar(a) && isempty(a)
ans =
1
and to detect empty arrays:
>> a = [];
>> isnumeric(a) && isempty(a)
ans =
1

Related

Wrangling lists with possible empty values in output

We refactored some code, and when it was written an array was guaranteed to have 4 elements. Now it can have 4 or 0. The output statement from the module was: (stripped down a bit for simplicity)
output "foo" {
value = { "valuea" = thing.subthing[0].id,
"valueb" = thing.subthing[1].id,
"valuec" = thing.subthing[2].id,
"valued" = thing.subthing[3].id
}
Now when thing.subthing is an empty tuple, it understandbly blows chunks. I'm drawing a blank on the most straightforward way to determine that it's empty, and move on.
These are actually a list of subnets ids, which because a flag got added to skip their creation, never got made because their count was set to 0. I've got about 12 more things in that file I'll fix exactly the same way...
Oh, tf 0.12.20something btw.
I kept getting stuck on the usual tricks, like concatting with a an empty array still blew chunks on the missing tuple, until I was reminded that the splat notation worked differently, so
`length(thing.subthing[*].id) > 0 ? thing.subthing[0].id : 0
Works fine.
Although now I think of it, that's left over from our 0.11 code, I should probably refactor value into an array convert this into for_each code. In my copious spare time.

How to find all maximum value between every i and i + k(some constant) of a given array?

I need to find all maximum values among elements of every possible set {a[i], a[i+1],... a[i + k]} (where i is index and k is some given constant). For this I am using.
loop(b, 1, k) {
rloopl(i, b, n) {
if(a[i] < a[i-1])
a[i] = a[i-1];
}
}
but its too slow for large array. Is there any other more efficient way to do this?
I'm very sorry to tell you that, with the requirement as-presented, the answer would be: "no." If "the largest value could be anywhere," you have no choice but to "look ... everywhere."
If you are "doing this 'once and only once,'" for any particular data-set, then you're basically just gonna have to take your lumps. You're stuck with "brute-force."
However, if you're doing this more than once, and/or if you have some influence on the process by which the array in question gets loaded, the situation might start looking a little better.
For instance, if another piece of code is adding elements to this array one-at-a-time, it's trivial for that piece of code to notice the max/min value that it encounters. Code that loads a two-dimensional array might gather statistics about each row (column). And, so on. Such strategies, which are "free, at the time," can be used to eliminate (or, severely curtail) the need to do specific brute-force searches later.

Calculate (If A=B, then "" else A) in Excel without evaluating A twice

I'm intending to conduct a formula of the type:
=IF(VOL("Site";"Date")=0;"";VOL("Site";"Date"))
where VOL is a function I'm using through an Add-In. The limitations of this Add-In is, among others, that it is prohibited to call two Add-In function inside a single formula. I.e. the code I've written above is invalid and will result in an error.
Is there a way of achieving the following:
=IF(LHS=RHS;"Value if True";LHS) (2)
where LHS is Left hand side, RHS right hand side and the expression therefore checks if LHS is equal to RHS, and if so prints a corresponding value, else LHS, without having Excel evaluate LHS twice?
I haven't found any solution to this except importing the formula in one cell, and refer to that cell as the value to print if the logical expression in the IF statement is false, but this will become a quite extensive "double work". A solution like (2) would also become more readable, especially when LHS is of the type "'C:\pathtofile[filename]SheetName!'Cell".
Hope anyone has some clever solution to this
Here is one (rather ugly) way, just using formulas:
=IFERROR(1/IFERROR(1/vol("Site";"Date"),0),"")
This makes use of the IFERROR function, which kind of does what you want but only tests for errors. Division by zero results in an error, so the inner IFERROR returns zero if VOL is zero, and 1/VOL otherwise. Now we need to take the reciprocal again to return the original value, so we repeat the trick, this time returning "" if there is an error.
If you want to test for another value (e.g. 3), just use something like:
=IFERROR(3+1/IFERROR(1/(vol("Site";"Date")-3),0),"")
A much neater way would be to create a function in VBA which wraps the VOL function and does what you want:
Public Function MyVol(varSite As Variant, varDate As Variant) As Variant
MyVol = vol(varSite, varDate)
If MyVol = 0 Then MyVol = ""
End Function
Assuming you can call VOL from VBA.

Dropping various string variables in a loop in Stata

I want to a drop a great number of string variables that contain the word "Other" in their observations. As such, I tried the following loop to drop all the variables:
foreach var of varlist v1-v240 {
drop `var' if `var'=="Other"
}
What I get in return is the answer "syntax error". I would like to know not only a way to perform the task of dropping all the variables that contain the word "Other", but also why the code that I've entered returns an error.
The short answer on why your syntax is illegal, which #Dimitriy Masterov doesn't quite spell out, is that drop supports just two syntaxes, which can't be mixed, dropping variables and dropping observations. This is documented: see e.g. http://www.stata.com/help.cgi?drop and the corresponding on-line help and manual entry within Stata.
In addition to other solutions, findname from the Stata Journal would allow this solution:
findname, any(# == "Other")
drop `r(varlist)'
Your interpretation of contain is evidently 'is equal to' judging by your use of == as an operator, echoed above. If contain really means 'includes as substring', then you need a syntax such as
any(strpos(#, "Other"))
or
any(regexm(#, "Other"))
as #Dimitriy also explains.
If they are actual strings, this should work:
sysuse auto, clear
ds, has(type string) // get a list of string variables
// loop over each string variable, count observations that contain Buick anywhere, and drop the variable if N>0
foreach var of varlist `r(varlist)' {
count if regexm(`var',"Buick")
if r(N)>0 {
drop `var'
}
}
If "contains" means only contains, then you need to use "^Buick$" instead or
count if `var'=="Buick"
Beware of leading/trailing spaces.
The if qualifier restricts the scope of a command to those observations for which the value of the expression is true. Your code errors because you are asking Stata to drop a variable (a column) if some observations (rows) satisfy a condition. You could use the if qualifier to drop those observations or you can drop a variable, but not both simultaneously. My code uses the if command (a different beast) to verify the condition, and then drops the variable if that condition is satisfied.
You might be tempted to do something like
if `var'=="Other" {
drop `var'
}
but that will usually not work as expected (it would drop the variable only if the first observation was "Other").

Python failure to find all duplicates

This is related to random sampling. I am using random.sample(number,5) to return a list of random numbers from within a range of numbers contained in numbers. I am using while i < 100 to return one hundred sets of five numbers. To check for duplicates, I am using :
if len(numbers) != len(set(numbers)):
to identify sets with duplicates and following this with random.sample(number,5) to try to do another randomisation to replace the set with duplicates. I seem to get about 8% getting re-randomised ( using a print statement to say which number was duplicated), but about 5% seem to be missed. What am I doing incorrectly? The actual code is as follows:
while i < 100:
set1 = random.sample(numbers1,5)
if len(set1) != len(set(set1))
print('duplicate(s) found, random selection repeated')
set1 = random.sample(numbers1,5)
In another routine I am trying to do the same as above, but searching for duplicates in two sets by adding the same, substituting set2 for set1. This gives the same sorts of failures. The set2 routine is indented and placed immediately below the above routine. While i < 100: is not repeated for set2.
I hope that I have explained my problem clearly!!
There is nothing in your code to stop the second sample from having duplicates. What if you did something like a second while loop?
while i<100:
i+=1
set1 = random.sample(numbers1,5)
while len(set1) != len(set(set1)):
print('duplicate(s) found, random selection repeated')
set1 = random.sample(numbers1,5)
Of course you're still missing the part of the code that does something... beyond the above it's difficult to tell what you might need to change without a full code sample.
EDIT: here is a working version of the code sample from the comments:
def choose_random(list1,n):
import random
i = 0
set_list=[]
major_numbers=range(1,50) + list1
print(major_numbers)
while i <n:
set1 =random.sample(major_numbers,5)
set2 =random.sample(major_numbers,2)
while len(set(set1)) != len(set1):
print("Duplicate found at %i"%i)
print set1
print("Changing to:")
set1 =random.sample(major_numbers,5)
print set1
set_list.append([set1,set2])
i +=1
return set_list
The code you give obviously has some gaps in it and cannot work as it is there, so I cannot pinpoint where exactly your error is, but running set1 = random.sample(numbers1,5) after the end of the while loop (which is infinite if written as in your question) undoes everything you did before, because it overwrites whatever you managed to set set1 to.
Anyway, random.sample should give you a sample without replacement. If you have any repetitions in random.sample(numbers1, 5) that means that you already have repetitions in numbers1. If that is not supposed to be the case, you should check the content of numbers1 and maybe force it to contain everything uniquely, for example by using set(numbers1) instead.
If the reason is that you want some elements from numbers1 with higher probability, you might want to put this as
set1 = random.sample(numbers1, 5)
while len(set1) != len(set(set1)):
set1 = random.sample(numbers1, 5)
This is a possibly infinite loop, but if numbers1 contains at least 5 different elements, it will exit the loop at some point. If you don't like the theoretical possibility of this loop never exiting, you should probably use a weighted sample instead of random.sample, (there are a few examples of how to do that here on stackoverflow) and remove the numbers you have already chosen from the weights table.

Resources