Taking advantage of the name of a variable in Stata - string

I´m using Stata and I have a set of variables named cal1, cal2, cal3 and so on until cal21. For every line of my dataset, i could have more or less cal* variables as non-missing (I designed the dataset with a reshape wide). I want to generate a new variable that returns the maximum name of variable cal* available for each line that is non-missing. For example, if line 1 has until cal3 as non- missing , this variable returns cal3; for the line 2 if i have cal1, cal2 and cal6, I want cal6. Is there a way to do this?

This would be much easier to accomplish with data in long format layout, but it is doable with wide data too with a loop:
gen max_cal = "none"
forvalues v=1/21 {
replace max_cal = "cal`v'" if !missing(cal`v')
}
This will update the max_cal variable each time there's a higher one not missing.

Related

How to import variable labels from Excel into Stata

I have a Excel-sheet, which contains variable and variable-labels. I would like to import this file into Stata. How can I do it?
Let's assume that your variable names are on the first row, and labels on the second row.
I would do:
import excel using file.xlsx, firstrow clear
foreach var of varlist _all {
local x = `var'[1]
label var `var' "`x'"
}
drop if [_n]==1
foreach var of varlist _all {
cap destring `var', replace
}
The first bit replaces the label of the variables with the variable label, which should be in the first row of your imported dataset. The second bit drops this row, and the destrings all variables for which this is possible without an error. The reason for this is that all variables will be imported as strings when you have the second row as variable labels.
This is the most typical case I encountered, but of course there may be other scenarios where you have to adopt different approaches.

Is there a pandas function that can create a dataframe of the mean, median, and mode of selected columns?

My attempt:
# Compute the mean, median and variance for the variables sph, acous and dur. Compare their level of variability.
sad_mean = dat_songs[['spch', 'acous', 'dur']].mean()
sad_mode = dat_songs[['spch', 'acous', 'dur']].mode()
sad_median = dat_songs[['spch', 'acous', 'dur']].median()
sad_mmm = pd.DataFrame({'mean':sad_mean, 'median':sad_median, 'mode':sad_mode})
sad_mmm
Which outputs this
First of all, the median column is not right at all and want to know how to fix that too.
Secondly, I feel like I have seen some quicker or shorter way to do this with a simple function with pandas.
My data head for reference
Simply try, dat_songs.describe(). Descriptive Statistics will be present for all the numerical columns.
For selected columns.
dat_songs[['spch', 'acous', 'dur']].describe()

Calling a vector from a string

I am attempting to write an algorithm that selects a specific reference standard (vector) as a function of temperature. The temperature values are stored in a structure ( procspectra(i).temperature ). My reference standards are stored in another structure ( standards.interp.zeroed.ClOxxx ) where xxx are numbers such as 200, 210, 220, etc. I have built the rounding construct and paste it below.
for i = 1:length(procspectra);
if mod(-procspectra(i).temperature,10) > mod(procspectra(i).temperature,10);
%if mod(-) > mod(+) round down, else round up
tempvector(i) = procspectra(i).temperature - mod(procspectra(i).temperature,10);
else
tempvector(i) = procspectra(i).temperature + mod(-procspectra(i).temperature,10);
end
clostd = strcat('standards.interp.zeroed.ClO',num2str(tempvector(i)));
end
This construct works well. Now, I have built a string which is identical to the name of the vector I want to invoke, but I'm uncertain how to actually call the vector given that this is encoded as a string. Ideally I want to do something within the for-loop like:
parameters(i).standards.ClOstandard = clostd
where I actually am assigning that parameter structure to be the same as the vector I have saved in the standards structure I have previously generated (and not just a string)
Could anyone help out?
Don't construct clostd like that (containing the full variable name), make it contain only the last field name instead:
clostd = ['ClO' num2str(tempvector(i))];
parameters(i).standards.ClOstandard = standards.interp.zeroed.(clostd);
This is the syntax of accessing a structure's field dynamically, using a string. So the following three are equivalent:
struc.Cl0123
struc.('Cl0123')
fieldn='Cl0123'; struc.(fieldn)

Lua: Parsing and Manipulating Input with Loops - Looking for Guidance

I am currently attempting to parse data that is sent from an outside source serially. An example is as such:
DATA|0|4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_
This data can come in many different lengths, but the first few pieces are all the same. Each "piece" originally comes in with CRLF after, so I've replaced them with string.gsub(input,"\r\n","|") so that is why my input looks the way it does.
The part I would like to parse is:
4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_
The "4" tells me that there will be four lines total to create this file. I'm using this as a means to set the amount of passes in the loop.
The 7x5 is the font height.
The 1 is the xpos.
The 25 is the ypos.
The variable data (172-24 in this case) is the text at these parameters.
As you can see, it should continue to loop this pattern throughout the input string received. Now the "4" can actually be any variable > 0; with each number equaling a set of four variables to capture.
Here is what I have so far. Please excuse the loop variable, start variable, and print commands. I'm using Linux to run this function to try to troubleshoot.
function loop_input(input)
var = tonumber(string.match(val, "DATA|0|(%d*).*"))
loop = string.match(val, "DATA|0|")
start = string.match(val, loop.."(%d*)|.*")
for obj = 1, var do
for i = 1, 4 do
if i == 1 then
i = "font" -- want the first group to be set to font
elseif i == 2 then
i = "xpos" -- want the second group to be set to xpos
elseif i == 3 then
i = "ypos" -- want the third group to be set to ypos
else
i = "txt" -- want the fourth group to be set to text
end
obj = font..xpos..ypos..txt
--print (i)
end
objects = objects..obj -- concatenate newly created obj variables with each pass
end
end
val = "DATA|0|4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_"
print(loop_input(val))
Ideally, I want to create a loop that, depending on the var variable, will plug in the captured variables between the pipe deliminators and then I can use them freely as I wish. When trying to troubleshoot with parenthesis around my four variables (like I have above), I receive the full list of four variables four times in a row. Now I'm having difficulty actually cycling through the input string and actually grabbing them out as the loop moves down the data string. I was thinking that using the pipes as a means to delineate variables from one another would help. Am I wrong? If it doesn't matter and I can keep the [/r/n]+ instead of each "|" then I am definitely all for that.
I've searched around and found some threads that I thought would help but I'm not sure if tables or splitting the inputs would be advisable. Like these threads:
Setting a variable in a for loop (with temporary variable) Lua
How do I make a dynamic variable name in Lua?
Most efficient way to parse a file in Lua
I'm fairly new to programming and trying to teach myself. So please excuse my beginner thread. I have both the "Lua Reference Manual" and "Programming in Lua" books in paperback which is how I've tried to mock my function(s) off of. But I'm having a problem making the connection.
I thank you all for any input or guidance you can offer!
Cheers.
Try this:
val = "DATA|0|4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_"
val = val .. "|"
data = val:match("DATA|0|%d+|(.*)$")
for fh,xpos,ypos,text in data:gmatch("(.-)|(.-)|(.-)|(.-)|") do
print(fh,xpos,ypos,text)
end

Name variable based on string MATLAB

I have a variable that is created by a loop. The variable is large enough and in a complicated enough form that I want to save the variable each time it comes out of the loop with a different name.
PM25 is my variable. But I want to save it as PM25_year in which the year changes based on `str = fname(13:end)'
PM25 = permute(reshape(E',[c,r/nlay,nlay]),[2,1,3]); % Reshape and permute to achieve the right shape. Each face of the 3D should be one day
str = fname(13:end); % The year
% Third dimension is organized so that the data for each site is on a face
save('PM25_str', 'PM25_Daily_US.mat', '-append')
The str would be a year, like 2008. So the variable saved would be PM25_2008, then PM25_2009, etc. as it is created.
Defining new variables based on data isn't considered best practice, but you can store your data more efficiently using a cell array. You can store even a large, complicated variable like your PM25 variable within a single cell. Here's how you could go about doing it:
Place your PM25 data for each year into the cell array C using your loop:
for i = 1:numberOfYears
C{i} = PM25;
end
Resulting in something like this:
C = { PM25_2005, PM25_2006, PM25_2007 };
Now let's say you want to obtain your variable for the year 2006. This is easy (assuming you aren't skipping years). The first year of your data will correspond to position 1, the second year to position 2, etc. So to find the index of the year you want:
minYear = 2005;
yearDesired = 2006;
index = yearDesired - minYear + 1;
PM25_2006 = C{index};
You can do this using eval, but note that it's often not considered good practice. eval may be a security risk, as it allows user input to be executed as code. A better way to do this may be to use a cell array or an array of objects.
That said, I think this will do what you want:
for year = 2008:2014
eval(sprintf('PM25_%d = permute(reshape(E',[c,r/nlay,nlay]),[2,1,3]);',year));
save('PM25_Daily_US.mat',sprintf('PM25_%d',year),'-append');
end
I do not recommend to set variables like this since there is no way to track these variables and completely prevents all kind of error checking that MATLAB does beforehand. This kind of code is handled completely in runtime.
Anyway in case you have a really good reason for doing this I recommend that you use the function assignin for this.
assignin('caller', ['myvar',num2str(1)], 63);

Resources