Say I have a variable var=1 and a string str='var'.
How can I obtain the value of var from str?. I tried using str2num(str), but it didn't work.
Also, if I had 2 strings str1='some letters' and str2='str1', can I obtaing the phase 'some letters' from str2?
I want to do this because I have many matrices (quite big) and I want to separate them in some groups, so I thought about making cells with the names of each of the matrices that belong to a group (a matrix can belong to more than one group, so making cells with the matrices is not very good).
You can use eval:
x = eval( str ) ;
But it's not recommended.
Though it can easily be achieved with an eval as #Shai mentioned, you probably don't really want to do this. Using eval hinders your debugging and depending on the name of variables seriously limits the flexibility of your code. If you want to name something, you may be better off using a struct with a data field and a name field instead.
Judging from your description, I wonder about the following:
1. Why do you have many matrices?
For each variable that you have, you depend on a name. Depending on a lot of names is typically undesirable. Hence my suggestion:
Use a (cell) array containing these matrices
2. What way do you exactly want them to be in a group
It is not clear to me how you want the grouping to work, but think of this:
If you want to use names, create a struct or array of structs with a nameField, but
otherwise just use a cell array and have each matrix get a number.
You can now handle the matrices more easily and things like 'selecting 10 random matrices' or 'selecting all matrices whose nameField contains 'abc'' can be done easily and efficiently.
You can now also have a field with your data specifying in which groups it is, or you can define groups as simple lists of numbers.
Related
I have a table of data with many data repeating.
I have to sort the rows by random, however, without having identical names next to each other, like shown here:
How can I do that in Excel?
Perfect case for a recursive LAMBDA.
In Name Manager, define RandomSort as
=LAMBDA(ζ,
LET(
ξ, SORTBY(ζ, RANDARRAY(ROWS(ζ))),
λ, TAKE(ξ, , 1),
κ, SUMPRODUCT(N(DROP(λ, -1) = DROP(λ, 1))),
IF(κ = 0, ξ, RandomSort(ζ))
)
)
then enter
=RandomSort(A2:B8)
within the worksheet somewhere. Replace A2:B8 - which should be your data excluding the headers - as required.
If no solution is possible then you will receive a #NUM! error. I didn't get round to adding a clause to determine whether a certain combination of names has a solution or not.
This is just an attempt because the question might need clarification or more sample data to understand the actual scenario. The main idea is to generate a random list from the input, then distribute it evenly by names. This ensures no repetition of consecutive names, but this is not the only possible way of sorting (this problem may have multiple valid combinations), but this is a valid one. The solution is volatile (every time Excel recalculates, a new output is generated) because RANDARRAY is volatile function.
In cell D2, you can use the following formula:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m),
idx, SORTBY(seq, RANDARRAY(m,,1,m, TRUE)), rRng, INDEX(rng, idx,{1,2}),
names, INDEX(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Here is the output:
Update
Looking at #JosWoolley approach. The generation of the random sorting can be simplified so that the resulting formula could be:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m), rRng,SORTBY(rng, RANDARRAY(m)),
names, TAKE(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Explanation
LET function is used for easy reading and composition. The name idx represents a random sequence of the input index positions. The name rRng, represents the input rng, but sorted by random. This sorting doesn't ensure consecutive names are distinct.
In order to ensure consecutive names are not repeated, we enumerate (nCnts) repeated names. We use a MAP for that. This is a similar idea provided by #cybernetic.nomad in the comment section, but adapted for an array version (we cannot use COUNTIF because it requires a range). Finally, we use SORTBY with input argument by_array, the map result (nCnts), to ensure names are evenly distributed so no consecutive names will be the same. Every time Excel recalculate you will get an output with the names distributed evenly in a different way.
Not sure if it's worth posting this, but I might as well share the results of my research such as it is. The problem is similar to that of re-arranging the characters in a string so that no same characters are adjacent The method is just to insert whichever one of the remaining characters (names) has the highest frequency at this point and is not the same as the previous character, then reduce its frequency once it has been used. It's fairly easy to implement this in Excel, even in Excel 2019. So if the initial frequencies are in D2:D8 for convenience using Countif:
=COUNTIF(A$2:A$8,A2)
You can use this formula in (say) F2 and pull it down:
=INDEX(A$2:A$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
and similarly in G2 to get the ages:
=INDEX(B$2:B$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
I'm fairly sure this will always produce a correct result if one is possible.
HOWEVER there is no randomness built in to this method. You can see if I extend it to more data that in the first several rows the most common name simply alternates with the other two names:
Having said that, this is a bit of a worst case scenario (a lot of duplication) and it may not look too bad with real data, so it may be worth considering this approach along with the other two methods.
Ever since I learnt that Excel is now Turing-complete, I understood that I can now "program" Excel using exclusively formulas, therefore excluding any use of VBA whatsoever.
I do not know if my conclusion is right or wrong. In reality, I do not mind.
However, to my satisfaction, I have been able to "program" the two most basic structures of program flow inside formulas: 1- branching the control flow (using an IF function has no secrets in excel) and 2- loops (FOR, WHILE, UNTIL loops).
Let me explain a little more in detail my findings. (Remark: because I am using a Spanish version of Excel 365, the field separator in formulas is the semicolon (";") instead of the comma (",").
A- Acumulator in a FOR loop
B- Factorial (using product)
C- WHILE loop
D-UNTIL loop
E- The notion of INTERNAL/EXTERNAL SCOPE
And now, the time of my question has arrived:
I want to use a formula that is really an array of formulas
I want to use an accumulator for the first number in the "tuple" whereas I want a factorial for the second number in the tuple. And all this using a single excel formula. I think I am not very far away from succeeding.
The REDUCE function accepts a LET function that contains 2 LAMBDAS instead of a single LAMBDA function. Until here, everything is perfect. However, the LET function seems to return only a "single" function instead of a tuple of functions
I can return (in the picture) function "x" or function "y" but not the tuple (x,y).
I have tried to use HSTACK(x,y), but it does not seem to work.
I am aware that this is a complex question, but I've done my best to make myself understood.
Can anybody give me any clues as to how I could solve my problem?
Very nice question.
I noticed that in your attempts you have given REDUCE() a single constant value in the 1st parameter. Funny enough, the documentation nowhere states you can't give values in array-format. Hence you could use the 1st parameter to give all the constants in (your case; horizontal) array-format, and while you loop through the array of the 2nd parameter you can apply the different types of logic using CHOOSE():
=REDUCE({0,1},SEQUENCE(5),LAMBDA(a,b,CHOOSE({1,2},a+b,a*b)))
This way you have a single REDUCE() function which internal processes will update the given constants from the 1st parameter in array-form. You can now start stacking multiple functions horizontally and input an array of constants, for example:
=REDUCE({0,1,100},SEQUENCE(5),LAMBDA(a,b,CHOOSE({1,2,3},a+b,a*b,a/b)))
I suppose you'd have to use {0\1} and {1\2} like I'd have to in my Dutch version of Excel.
Given your accumulator:
Formula in A1:
=REDUCE(F1:G1,SEQUENCE(F3),LAMBDA(a,b,CHOOSE({1,2},a+b,a*b)))
I have a string variable with lots of parentheses and other punctuation e.g. _LSC Debt licensed work. How can I easily convert it to a numeric variable when I already have a specified code list for it? i.e. I don't want it to automatically recode everything because it uses the wrong values against the labels.
Create a dataset with two variables: a string holding the current messy name and a numeric variable holding the new code. Then, with both the original dataset and the lookup one sorted by the string, do MATCH FILES specifying a table match (or use Data > Merge Files > Add Variables).
You can prepare a separate file which includes two variables:
- one contains each of the possible values in the original string variable to be recoded (make sure the name and width are the same as your original variable)
- the second contains the new values you want to recode to.
when you set this up, match the files like this:
get file="filepath\Your_Value_Table.sav".
sort cases by YourOriginalVarName.
dataset name ValTab.
get file="filepath\Your_Original_File.sav".
sort cases by YourOriginalVarName.
match files /file=* /table=ValTab /by YourOriginalVarName.
exe.
At this point your original file will contain a new variable that has the codes you wanted.
In general I agree with the solution provided by others. However, I would like to suggest an extra step, which could make your look-up file (see the answer of eli-k and JKP) a bit better.
The point is that your string variable with lots of parentheses and other punctuation probably also has different ways to write the same thing.
For example:
_LSC Debt licensed work
LSC Debt licensed work
_LSC Debt Licensed Work
etc.
You could create a lookup-table with three variables: the unique values of the original string variable, a cleaned-up version of that variable, and finally the numeric value you want to attach.
The advantage of the cleaned-up version is that you can identify more easily the same value although it is written differently.
You could clean up using several functions:
string CleanedUpVersion (A40).
compute CleanedUpVersion = REPLACE(RTIM(LTRIM(UPCASE(YourOriginalVarName))),'_','').
execute.
In this basic example we convert to capital letters, delete leading and trailing blanks and remove the underscore by replacing it by nothing.
Overall this could help to avoid giving different numbers to unique values in your original variable that mean the same thing, while you would like them to have the same number.
I want to extract three number from different column of my dataset and set these numbers along with some words to be the name of my variable in workspace, and then assign a matrix to this variable. For instance:
data=dataset{:,:,5};
FID=data(1,14);
VID=data(1,1);
PID=data(1,15)
Here I extracted three number from different column of a matrix in dataset:
FID=4 , VID=8 , PID=12
Now, I want to create a variable in the workspace using these three numbers besides three words with underline between them, such as: A4_B8_C12
and then assign a matrix to this variable:
A4_B8_C12=dataset{:,:,5};
Since, my dataset is a cell array and contains 2169 matrices, I'm writing a code to extract the three numbers from desired matrix and use them along with desired words to create several matrices.
How can I do that?
When you have cell arrays, structs and arrays, this is not a good practice. This is against the philosophy of using arrays. But any way if you want continue this way of programming you can use the following code:
for i=1:5
data=dataset{:,:,i};
FID=data(1,14);
VID=data(1,1);
PID=data(1,15);
eval(sprintf('A%d_B%d_C%d=data;',FID,VID,CID));
end
Using evalf is a kind of programming which can be used for self modifying codes.
In Stata is there any way to tabulate over the entire data set as opposed to just over one variable/column? This would give you the tabulation over all the columns.
Related - is there a way to find particular values in Stata if one does not know which column they occur in? The output would be which column and row they are located in or at least which column.
Stata does not use row and column terminology except with reference to matrices and vectors. It uses the terminology of observations and variables.
You could stack or reshape the entire dataset into one variable if and only if all variables are numeric or all are string. If that assumption is incorrect, then you would need to convert numeric variables to string, at least temporarily, before you could do that. I guess wildly that you are only interested in blocks of variables that are all either numeric or string.
When you say "tabulate" you may mean the tabulate command. That has limits on the number of rows and/or columns it can show that might bite, but with a small amount of work list could be used for a simple table with many more values.
tabm from tab_chi on SSC may be what you seek.
For searching across several variables, you could automate a loop.
I'd say that if this is a felt need, it is quite probable that you have the wrong data structure for at least some of what you want to do and should reshape. But further details might explode that.