Extracting data from filenames in Gnuplot

Extracting data from filenames in Gnuplot - string

Is there a way for Gnuplot to read and recognize structured strings? Specifically, I have a few hundred files, all containing measurement data, with measurement conditions defined in the filename.
My files look something like "100d5mK2d0T.txt", which would mean that this data was acquired at 100.5mK temperature and 2.0T magnetic field.
Any chance I could extract the temperature and field strength data from the name, and use them as labels in the plot?
Thanks in advance.

With gnuplot's internal string processing you could come up with a solution (using substr and strstrt), but thats quite verbose.
Its better to use an external tool for the string processing, like perl:
filename = '100d5mK2d0T.txt'
str = system('echo "'.filename. '" | perl -pe ''s/(\d+)d(\d+)mK(\d+)d(\d+)T.txt/\1.\2 \3.\4/'' ')
temperature = word(str, 1)
magnetic_field = word(str, 2)
set label at graph 0.1,0.9 "Temperature: ".temperature." mK"
set label at graph 0.1,0.8 "Magnetic field: ".magnetic_field." T"

Related

gnuplot: extract x-values from string in data

My data files look like this:
c0e0100.dat 0.234
c0e0200.dat 0.857
...
c0e1200.dat 0.003
I would like to extract the x-values from the 4th to 7th character of the filenames in the first column.
I tried the following user function:
str2num(s)=(s[4:7]+0.0)
and then:
plot 'file' using (str2num($1)):($2)
but this gives:
internal error: substring range operator applied to non-STRING type
I also tried:
plot 'file' using (str2num(stringcolumn($1))):($2)
but got the same result.
Is there a way of doing this in gnuplot without running the data through external tools first?

The expression $1 is a short cut for column(1), so using $1 already gives you the numerical representation of the respective column. To get the string value, use stringcolumn(1) (without $!), or strcol(1):
str2num(s)=(s[4:7]+0.0)
plot 'file ' using (str2num(strcol(1))):2

Lua: Parsing and Manipulating Input with Loops - Looking for Guidance

I am currently attempting to parse data that is sent from an outside source serially. An example is as such:
DATA|0|4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_
This data can come in many different lengths, but the first few pieces are all the same. Each "piece" originally comes in with CRLF after, so I've replaced them with string.gsub(input,"\r\n","|") so that is why my input looks the way it does.
The part I would like to parse is:
4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_
The "4" tells me that there will be four lines total to create this file. I'm using this as a means to set the amount of passes in the loop.
The 7x5 is the font height.
The 1 is the xpos.
The 25 is the ypos.
The variable data (172-24 in this case) is the text at these parameters.
As you can see, it should continue to loop this pattern throughout the input string received. Now the "4" can actually be any variable > 0; with each number equaling a set of four variables to capture.
Here is what I have so far. Please excuse the loop variable, start variable, and print commands. I'm using Linux to run this function to try to troubleshoot.
function loop_input(input)
var = tonumber(string.match(val, "DATA|0|(%d*).*"))
loop = string.match(val, "DATA|0|")
start = string.match(val, loop.."(%d*)|.*")
for obj = 1, var do
for i = 1, 4 do
if i == 1 then
i = "font" -- want the first group to be set to font
elseif i == 2 then
i = "xpos" -- want the second group to be set to xpos
elseif i == 3 then
i = "ypos" -- want the third group to be set to ypos
else
i = "txt" -- want the fourth group to be set to text
end
obj = font..xpos..ypos..txt
--print (i)
end
objects = objects..obj -- concatenate newly created obj variables with each pass
end
end
val = "DATA|0|4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_"
print(loop_input(val))
Ideally, I want to create a loop that, depending on the var variable, will plug in the captured variables between the pipe deliminators and then I can use them freely as I wish. When trying to troubleshoot with parenthesis around my four variables (like I have above), I receive the full list of four variables four times in a row. Now I'm having difficulty actually cycling through the input string and actually grabbing them out as the loop moves down the data string. I was thinking that using the pipes as a means to delineate variables from one another would help. Am I wrong? If it doesn't matter and I can keep the [/r/n]+ instead of each "|" then I am definitely all for that.
I've searched around and found some threads that I thought would help but I'm not sure if tables or splitting the inputs would be advisable. Like these threads:
Setting a variable in a for loop (with temporary variable) Lua
How do I make a dynamic variable name in Lua?
Most efficient way to parse a file in Lua
I'm fairly new to programming and trying to teach myself. So please excuse my beginner thread. I have both the "Lua Reference Manual" and "Programming in Lua" books in paperback which is how I've tried to mock my function(s) off of. But I'm having a problem making the connection.
I thank you all for any input or guidance you can offer!
Cheers.

Try this:
val = "DATA|0|4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_"
val = val .. "|"
data = val:match("DATA|0|%d+|(.*)$")
for fh,xpos,ypos,text in data:gmatch("(.-)|(.-)|(.-)|(.-)|") do
print(fh,xpos,ypos,text)
end

Name variable based on string MATLAB

I have a variable that is created by a loop. The variable is large enough and in a complicated enough form that I want to save the variable each time it comes out of the loop with a different name.
PM25 is my variable. But I want to save it as PM25_year in which the year changes based on `str = fname(13:end)'
PM25 = permute(reshape(E',[c,r/nlay,nlay]),[2,1,3]); % Reshape and permute to achieve the right shape. Each face of the 3D should be one day
str = fname(13:end); % The year
% Third dimension is organized so that the data for each site is on a face
save('PM25_str', 'PM25_Daily_US.mat', '-append')
The str would be a year, like 2008. So the variable saved would be PM25_2008, then PM25_2009, etc. as it is created.

Defining new variables based on data isn't considered best practice, but you can store your data more efficiently using a cell array. You can store even a large, complicated variable like your PM25 variable within a single cell. Here's how you could go about doing it:
Place your PM25 data for each year into the cell array C using your loop:
for i = 1:numberOfYears
C{i} = PM25;
end
Resulting in something like this:
C = { PM25_2005, PM25_2006, PM25_2007 };
Now let's say you want to obtain your variable for the year 2006. This is easy (assuming you aren't skipping years). The first year of your data will correspond to position 1, the second year to position 2, etc. So to find the index of the year you want:
minYear = 2005;
yearDesired = 2006;
index = yearDesired - minYear + 1;
PM25_2006 = C{index};

You can do this using eval, but note that it's often not considered good practice. eval may be a security risk, as it allows user input to be executed as code. A better way to do this may be to use a cell array or an array of objects.
That said, I think this will do what you want:
for year = 2008:2014
eval(sprintf('PM25_%d = permute(reshape(E',[c,r/nlay,nlay]),[2,1,3]);',year));
save('PM25_Daily_US.mat',sprintf('PM25_%d',year),'-append');
end

I do not recommend to set variables like this since there is no way to track these variables and completely prevents all kind of error checking that MATLAB does beforehand. This kind of code is handled completely in runtime.
Anyway in case you have a really good reason for doing this I recommend that you use the function assignin for this.
assignin('caller', ['myvar',num2str(1)], 63);

constructing data-type instances from CSV

I have CSV data (inherited - no choice here) which I need to use to create data type instances in Haskell. parsing CSV is no problem - tutorials and APIs abound.
Here's what 'show' generates for my simplified trimmed-down test-case:
JField {fname = "cardNo", ftype = "str"} (string representation)
I am able to do a read to convert this string into a JField data record. My CSV data is just the values of the fields, so the CSV row corresponding to JField above is:
cardNo, str
and I am reading these in as List of string ["cardNo", "str"]
So - it's easy enough to brute-force the exact format of "string representation" (but writing Java or python-style string-formatting in Haskell isn't my goal here).
I thought of doing something like this (the first List is static, and the second list would be read file CSV) :
let stp1 = zip ["fname = ", "ftype ="] ["cardNo", "str"]
resulting in
[("fname = ","cardNo"),("ftype =","str")]
and then concatenating the tuples - either explicitly with ++ or in some more clever way yet to be determined.
This is my first simple piece of code outside of tutorials, so I'd like to know if this seems a reasonably Haskellian way of doing this, or what clearly better ways there are to build just this piece:
fname = "cardNo", ftype = "str"
Not expecting solutions (this is not homework, it's a learning exercise), but rather critique or guidelines for better ways to do this. Brute-forcing it would be easy but would defeat my objective, which is to learn

I might be way off, but wouldn't a map be better here? I guess I'm assuming that you read the file in with each row as a [String] i.e.
field11, field12
field21, field22
etc.
You could write
map (\[x,y] -> JField {fname = x, ftype = y}) data
where data is your input. I think that would do it.

If you already have the value of the fname field (say, in the variable fn) and the value of the ftype field (in ft), just do JField {fname=fn, ftype=ft}. For non-String fields, just insert a read where appropriate.

Exporting results

I'm sure this is an issue anyone who uses Stata for publications or reports has run into:
How do you conveniently export your output to something that can be parsed by a scripting language or Excel?
There are a few ado files that do this for specific commands. For example:
findit tabout
findit outreg2
But what about exporting the output of the table command? Or the results of an anova?
I would love to hear about how Stata users address this problem for either specific commands or in general.

After experimenting with this for a while, I've found a solution that works for me.
There are a variety of ADOs that handle exporting specific functions. I've made use of outreg2 for regressions and tabout for summary statistics.
For more simple commands, it's easy to write your own programs to save results automatically to plaintext in a standard format. Here are a few I wrote...note that these both display results (to be saved to a log file) and export them into text files – if you wanted to just save to text you could get rid of the di's and qui the sum, tab, etc. commands:
cap program drop sumout
program define sumout
di ""
di ""
di "Summary of `1'"
di ""
sum `1', d
qui matrix X = (r(mean), r(sd), r(p50), r(min), r(max))
qui matrix colnames X = mean sd median min max
qui mat2txt, matrix(X) saving("`2'") replace
end
cap program drop tab2_chi_out
program define tab2_chi_out
di ""
di ""
di "Tabulation of `1' and `2'"
di ""
tab `1' `2', chi2
qui matrix X = (r(p), r(chi2))
qui matrix colnames X = chi2p chi2
qui mat2txt, matrix(X) saving("`3'") replace
end
cap program drop oneway_out
program define oneway_out
di ""
di ""
di "Oneway anova with dv = `1' and iv = `2'"
di ""
oneway `1' `2'
qui matrix X = (r(F), r(df_r), r(df_m), Ftail(r(df_m), r(df_r), r(F)))
qui matrix colnames X = anova_between_groups_F within_groups_df between_groups_df P
qui mat2txt, matrix(X) saving("`3'") replace
end
cap program drop anova_out
program define anova_out
di ""
di ""
di "Anova command: anova `1'"
di ""
anova `1'
qui matrix X = (e(F), e(df_r), e(df_m), Ftail(e(df_m), e(df_r), e(F)), e(r2_a))
qui matrix colnames X = anova_between_groups_F within_groups_df between_groups_df P RsquaredAdj
qui mat2txt, matrix(X) saving("`2'") replace
end
The question is then how to get the output into Excel and format it. I found that the best way to import the text output files from Stata into Excel is to concatenate them into one big text file and then import that single file using the Import Text File... feature in Excel.
I concatenate the files by placing this Ruby code in the output folder and then running int from my Do file with qui shell cd path/to/output/folder/ && ruby table.rb:
output = ""
Dir.new(".").entries.each do |file|
next if file =~/\A\./ || file == "table.rb" || file == "out.txt"
if file =~ /.*xml/
system "rm #{file}"
next
end
contents = File.open(file, "rb").read
output << "\n\n#{file}\n\n" << contents
end
File.open("out.txt", 'w') {|f| f.write(output)}
Once I import out.txt into its own sheet in Excel, I use a bunch of Excel's built-in functions to pull the data together into nice, pretty tables.
I use a combination of vlookup, offset, match, iferror, and hidden columns with cell numbers and filenames to do this. The source .txt file is included in out.txt just above the contents of that file, which lets you look up the contents of the file using these functions and then reference specific cells using vlookup and offset.
This Excel business is actually the most complicated part of this system and there's really no good way to explain it without showing you the file, though hopefully you can get enough of an idea to figure it out for yourself. If not, feel free to contact me through http://maxmasnick.com and I can get you more info.

I have found that the estout package is the most developed and has good documentation.

This is an old question and a lot has happened since it was posted.
Stata now has several built-in commands and functions that allow anyone to
export customized output fairly easily:
putexcel
putexcel with advanced syntax
putdocx
putpdf
There are also equivalent Mata functions / classes, which offer greater flexibility:
_docx*()
Pdf*()
xl()
From my experience, there aren't 100% general solutions. Community-contributed commands such as estout are now mature enough to handle most basic operations. That said, if you have something that deviates even slightly from the template you will have to program this yourself.

Most tutorials throw in several packages where it would indeed nice to have only one exporting everything, which is what Max suggests above with his interesting method.
I personally use tabout for summary statistics and frequencies, estout for regression output, and am trying out mkcorr for correlation matrixes.

It's been a while, but I believe you can issue a log command to capture the output.
log using c:\data\anova_analysis.log, text
[commands]
log close

I use estpost-- part of the estout package-- to tabulate results from non-estimation commands. You can then store them and export easily.
Here's an example:
estpost corr varA varB varC varD, matrix
est store corrs
esttab corrs using corrs.rtf, replace
You can then add options to change formatting, etc.

You can use asdoc that is available on SSC. To download,
ssc install asdoc
asdoc works well with almost all Stata commands. Specifically, it produces publication quality tables for :
summarize command - to report summary statistics
cor or pwcorr command - to report correlations
tabstat - for flexible tables of descriptive statistics
tabulate - for one-way, two-way, three-way tabulations
regress - for detailed, nested, and wide regression tables
table - flexible tables
and many more. You can explore more about asdoc here
https://fintechprofessor.com/2018/01/31/asdoc/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extracting data from filenames in Gnuplot - string

Related

gnuplot: extract x-values from string in data

Lua: Parsing and Manipulating Input with Loops - Looking for Guidance

Name variable based on string MATLAB

constructing data-type instances from CSV

Exporting results

Categories

Resources