Racket: extracting field ids from structures - struct

I want to see if I can map Racket structure fields to columns in a DB.
I've figured out how to extract accessor functions from structures in PLT scheme using the fourth return value of:
However the returned procedure indexes into the struct using an integer. Is there some way that I can find out what the field names were at point of definition? Looking at the documentation it seems like this information is "forgotten" after the structure is defined and exists only via the generated-accessor functions: (<id>-<field-id> s).
So I can think of two possible solutions:
Search the namespace symbols for ones that start with my struct name (yuk);
Define a custom define-struct macro that captures the ordered sequence of field-names inside some hash that is keyed by struct name (eek).

I think something along the lines of 2. is the right approach (define-struct has a LOT of knobs and many don't make sense for this) but instead of making a hash, just make your macro expand into functions that manipulate the database directly. And the syntax/struct library can help you do the parsing of the define-struct form.

The answer depends on what you want to do with this information. The thing is that it's not kept in the runtime -- it's just like bindings in functions which do not exist at runtime. But they do exist at the syntax level (= compile-time). For example, this silly example will show you the value that is kept at the syntax level that contains the structure shape:
> (define-struct foo (x y))
> (define-syntax x (begin (syntax-local-value #'foo) 1))
> (define-syntax x (begin (printf ">>> ~s\n" (syntax-local-value #'foo)) 1))
>>> #<checked-struct-info>
It's not showing much, of course, but this should be a good start (you can look for struct-info in the docs and in the code). But this might not be what you're looking for, since this information exists only at the syntax level. If you want something that is there at runtime, then perhaps you're better off using alists or hash tables?
UPDATE (I've skimmed too quickly over your question before):
To map a struct into a DB table row, you'll need more things defined: at least hold the DB and the fields it stand for, possibly an open DB connection to store values into or read values from. So it looks to me like the best way to do that is via a macro anyway -- this macro would expand to a use of define-struct with everything else that you'd need to keep around.


How do you find out the fields and properties of a struct?

The question
Suppose you have a struct, like this:
(struct soldier (name rank serial-number) #:transparent)
(define s (soldier 'Smith 'private 100134))
How can you find out what fields soldier or s contains? Or what generic interfaces it supports, or what structure type properties it has?
Research efforts so far
(Skip this section if you already know the answer.)
I've been reading through the documentation on structs for the last few days, and I haven't been able to figure out how you're supposed to put the pieces together. I'm probably just missing some elementary tidbit of information that goes without saying to people who know Racket.
The chapter on Reflection and Security has a section "Structure Inspectors", which says:
An inspector provides access to structure fields and structure type information without the normal field accessors and mutators.
but I haven't understood how to get an inspector to provide that.
struct-info and struct-type-info provide some information, but not field names, interfaces, properties, etc.:
> (struct-type-info struct:soldier)
'(0 1 2)
struct->vector and struct->list provide access to an instance's contents and the above data, but that's all:
> (struct->vector s)
'#(struct:soldier Smith private 100134)
If you could show me an example of how to inspect a struct type to see what's in it, that would probably clarify whatever soon-to-be-obvious-in-hindsight thing I'm not seeing here.
The field names are not available at run time. However you can at expansion time use syntax-local-value on the struct name to get some information.
A quick example:
#lang racket
(require (for-syntax racket/struct-info))
(struct foo (a b))
(display (extract-struct-info (syntax-local-value #'foo))))
In this example:
#lang racket
(require (for-syntax racket/struct-info))
(struct foo (a [b #:mutable] c))
(display (extract-struct-info (syntax-local-value #'foo))))
The list of identifiers for mutators is: (#f #<syntax:4:8 set-foo-b!> #f).
That is only the second field is mutable.
The information is available at expansion time, so you can transfer the information to runtime by calling a macro that expands into a definition like (define info '(#f set-foo-b! #f) or similar.

make menhir find all alternatives?

I would like to change the behavior of menhir's output in follwoing way:
I want it to look up all grammatical alternatives if it finds any, and put them in a list and get me back this ambigouus interpretation. It shall not reduce conflicts, just store them.
In the source code of menhir, it seems to me, that I have to look in "Engine.ml". The resultant syntactically determined token comes in a variant type item "Accepted v" as a state of a checkpoint of the grammatical automaton. This content is found by a function "accept env prod" before, that is part of a bundle of recursive functions, that change the states.
Do you have a tip, how I could change these functions to put all the possible results in the list here and proceed as if nothing happened? Or do you think, that this wont work anyway?
What you are looking for is a GLR parser generator (G is for generalized). Menhir is not such tool, and I doubt you could modify it easily to do what you want.
However, there is another tool that does exactly what you want: dypgen.

Separating fields out of a string in Hive

I have the following problem...
I work with Hive and want to add a file with several (different) rows of Strings. Those contain fields with a fixed size, like this:
A20130420bcd 34 fgh
where the fields have the length 1,8,6,4,3.
Separated it would look like this:
Is there any possibility to read the String and sort it into a field besides getting it as a substring for every field like
substring(col_value,1,1) Field1
I would imagine that cutting the already read part of the string would increase the performance, but i could think of any way to do this with the given functions here.
Secondly, as stated before, there are different types of strings, ordered and identified by the first character.right now just check those with the WHERE-Statement, but it's horrible, as it runs through the whole file just to find only the first String. Is there any way to read specific lines by their number? If i know, that the first string will be of a certain kind, can read it directly?
right it looks like this:
insert overwrite table TEST
substring(col_value,1,1) field1,
substring(col_value,10,3) field 5
from temp_data WHERE substring(col_value,1,1) = 'A';
any ideas on this?
I would love to hear some ideas =)
You need to write yours generic-UDF parser that output the struct or map or whatever appropriate. you can refer to UDF that output multi-values.
then you can write
insert overwrite table output
select parsed.first, parsed.second
from (
select parse(taget)
from input
) parsed
where first='X';
About second question,you may need to check "explain" command of hive to see if hive do filter push-down for you.(just see how many map reduce it takes, theoretically it should be one map, depending on 1.hive version,
2.output table format
In general sense, this is why database is popular -- take optimization into consideration for you .

How to return dynamically created vectors to the workspace?

Hello I'm trying to write a function which reads a certain type of spreadsheet and creates vectors dynamically from it's data then returns said vectors to the workspace.
My xlcs is structured by rows, in the first row there is a string which should become the name of the vector and the rest of the rows contain the numbers which make up the vector.
Here is my code:
function [ B ] = read_excel(filename)
%read_excel a function to read time series data from spreadsheet
% I get the contents of the first cell to know what to name the vector
[nr, name]=xlsread(filename, 'sheet1','A2:A2');
% Transform it to a string
name_str = char(name);
% Create a filename from it
% Get the numbers which will make up the vector
% Create the vector with the corect name and data
eval([varname '= A;']);
As far as I can tell the vector is created corectly, but I have no ideea how to return it to the workspace.
Preferably the solution should be able to return a indeterminate nr of vectors as this is just a prototype and I want the function to return a nr of vectors of the user's choice at once.
To be more precise, the vector varname is created I can use it in the script, if I add:
it will plot the vector, but for my purposes I need the vector varname to be returned to the workspace to persist after the script is run.
I think you're looking for evalin:
evalin('base', [varname '= B;']);
(which will not work quite right as-is; but please read on)
However, I strongly advise against using it.
It is often a lot less error-prone, usually considered good practice and in fact very common to have predictable outcomes of functions.
From all sorts of perspectives it is very undesirable to have a function that manipulates data beyond its own scope (i.e., in another workspace than its own), let alone assign unpredictable data to unpredictable variable names. This is unnecessarily hard to debug, maintain, and is not very portible. Also, using this function inside other functions does not what someone who doesn't know your function would think it does.
Why not use smoething like a structure:
function B = read_excel(filename)
B.data = xlsread(filename,'B2:CT2');
B.name = genvarname(name_str);
Then you always have the same name as output (B) which contains the same data (B.data) and whose name you can also use to reference other things dynamically (i.e., A.(B.name)).
Because this is a function, you need to pass the variables you create to an output variable. I suggest you do it through a struct as you don't know how many variables you want to output upfront. So change the eval line to this:
% Create the vector with the correct name and data
eval(['B.' varname '= A;']);
Now you should have a struct called B that persists in the workspace after running the function with field names equal to your dynamically created variable names. Say for example one varname is X, you can now access it in your workspace as B.X.
But you should think very carefully about this code design, dynamically creating variables names is very unlikely to be the best way to go.
An alternative to evalin is the function assignin. It is less powerfull than evalin, but does exacty what you want - assign a variable in a workspace.
assignin('base', 'var', val)

automapper - simplest option to only write to destination property if the source property is different?

NOTE: The scenario is using 2 entity framework models to sync data between 2 databases, but I'd imagine this is applicable to other scenarios. One could try tackling this on the EF side as well (like in this SO question) but I wanted to see if AutoMapper could handle it out-of-the-box
I'm trying to figure out if AutoMapper can (easily :) compare the source and dest values (when using it to sync to an existing object) and do the copy only if the values are different (based on Equals by default, potentially passing in a Func, like if I decided to do String.Equals with StringComparison.OrdinalIgnoreCase for some particular pair of values). At least for my scenario, I'm fine if it's restricted to just the TSource == TDest case (I'll be syncing over int's, string's, etc, so I don't think I'll need any type converters involved)
Looking through the samples and tests, the closest thing seems to be conditional mapping (src\UnitTests\ConditionalMapping.cs), and I would use the Condition overload that takes the Func (since the other overload isn't sufficient, as we need the dest information too). That certainly looks on the surface like it would work fine (I haven't actually used it yet), but I would end up with specifying this for every member (although I'm guessing I could define a small number of actions/methods and at least reuse them instead of having N different lambdas).
Is this the simplest available route (outside of changing AutoMapper) for getting a 'only copy if source and dest values are different' or is there another way I'm not seeing? If it is the simplest route, has this already been done before elsewhere? It certainly feels like I'm likely reinventing a wheel here. :)
Chuck Norris (formerly known as Omu? :) already answered this, but via comments, so just answering and accepting to repeat what he said.
#James Manning you would have to inherit ConventionInjection, override
the Match method and write there return c.SourceProp.Name =
c.TargetProp.Name && c.SourceProp.Value != c.TargetProp.Value and
after use it target.InjectFrom(source);
In my particular case, since I had a couple of other needs for it anyway, I just customized the EF4 code generation to include the check for whether the new value is the same as the current value (for scalars) which takes care of the issue with doing a 'conditional' copy - now I can use Automapper or ValueInject or whatever as-is. :)
For anyone interested in the change, when you get the default *.tt file, the simplest way to make this change (at least that I could tell) was to find the 2 lines like:
if (ef.IsKey(primitiveProperty))
and change both to be something like:
if (ef.IsKey(primitiveProperty) || true) // we always want the setter to include checking for the target value already being set
