How to decide what to name a stream in Rails Event Store - rails-event-store

This is a bit of a newb question but I struggle with how to decide what a stream should be called and I can't seem to find documentation on how to go about it.
For instance, a lot of the examples in the documentation just have "Customer$1", "Customer$2", etc. I'm presuming Customer is the resource here and the integer is the customer ID?
For my use case, I have UnitAllocated and UnitDeallocated events happening for multiple different Operators. Ideally, I want to have a stream of unit allocations and unit deallocations for a given operator.
Should I embed the operator ID in the stream name as an easy way to tap into multiple stream for the same operator? Or is this the wrong way to go about this?

Related

How can I get all the exercises for a topic (e.g., math) and all its subtopics from the khanacademy api?

Khan Academy's API Explorer has an exercises section that mentions filtering by tags, but the url with math tag applied returns nothing.
The generic exercise objects don't contain the topic they're in. My guess is that there's an id to join on somewhere in the topictree/exercises json objects, but I don't know an efficient way to find it.
Here are the raw exercises json and raw topictree json (note, the second one is huge, and contains many topics other than math).
I don't think there is a nice way to return exercises from just a subtree of the topictree (e.g. just math). Tags are a different concept, and there isn't a tag common to everything in math. Probably your best bet is to load the full topictree with just Exercises (and Topics) and work from there:
http://www.khanacademy.org/api/v1/topictree?kind=Exercise
If you need to reference this structure repeatedly, it probably makes sense to download and filter it ahead of time, and maybe re-fetch it from time to time to account for changes to Khan Academy content. But it depends on your exact use case.
Generally, any content item can be referenced by content_id (sometimes just called id) or by slug, but unfortunately, the naming and usage aren't consistent everywhere.
You can use the following to get all the exercises:
http://www.khanacademy.org/api/v1/exercises
http://www.khanacademy.org/api/v1/topictree?kind=Exercise
I'm not sure what's the difference between these two - I don't use them.
I prefer to fetch the data for the individual topic nodes as follows:
http://www.khanacademy.org/api/v1/topic/%s
http://www.khanacademy.org/api/v1/topic/%s/exercises
http://www.khanacademy.org/api/v1/topic/%s/videos
where %s is the "node_slug" property for each topic. The root of the tree is just "root". The first one will give you the topic details and a list of sub-items in the "child_data" array. Use the "id" properties of each sub-topic in this array to look up its details in the "children" array having "internal_id" equal to "id". There you get the "node_slug" to for the next API call for that sub-topic. The "child_data" array has all the sub-items in the order that they appear on the website when you're working with the missions.
I cache these responses so that I don't have to download everything every time.

What area of machine learning should I look into to automatically extract certain info from messages

I have an app that extracts information from incoming messages. The messages all contain the same information, but they have different forms depending on the source that sent them.
Example:
Message from source A :
A: You spent $50.00 at Macy's on 2/20/12
Message from source B :
Purchase, $50.00, Macy's, 2Feb2012, Balance $5000.00
Every message from a single source has the same form though. So at the moment, I'm doing it by writing a set of regular expressions to first identify which message I'm trying to decode (i.e. what source it came from so I know what the form of the message is), and then extracting the necessary information from the message (in the above example, I want to know the transaction amount, the store where the transaction happened, and the date). If I discover a new source for a message, or a source changes the format of their message (doesn't happen very often, but could happen), I need to manually write the regular expressions for that message. I'm sure however that I could automate this using some kind of machine learning technique. I just don't know much about machine learning, and I don't know where to even start looking for a technique that would apply to my problem. I would like someone to just point me in the right direction on where to start reading.
In order to detect and label amounts, dates, person names and similar information you can use a technique called Named Entity Recognition. The Stanford Named Entity Recognizer comes with pretrained, ready to use models.
You also use whatever labeled data you have generated so far to learn a custom model for your application. The standard techniques used for this purpose are Conditional Random Fields or Sequence Perceptron. There are many toolkits implementing these models, including:
Wapiti - A simple and fast discriminative sequence labelling toolkit.
Sequor - sequence labeler based on Collins's (2002) perceptron.

Open source projects for email scrubbing generating structured data from unstructured source?

Don't know where to start on this one so hopefully you guys can clear up my question. I have project where email will be searched for specific words/patterns and stored in a structured manner. Something that is done with Trip it.
The article states that they developed a DataMapper
The DataMapper is responsible for taking inbound email messages
addressed to plans [at] tripit.com and transforming them from the
semi-structured format you see in your mail reader into a highly
structured XML document.
There is a comment that also states
If you're looking to build this yourself, reading a little bit about
Wrappers and Wrapper Induction might be helpful
I Googled and read about wrapper induction but it was just too broad of a definition and didn't help me understand how one would go about solving such problem.
Is there some open source project out there that does similar things?
There are a couple of different ways and things you can do to accomplish this.
The first part, which involves getting access to the email content I'll not answer here. Basically, I'll assume that you have access to the text of emails, and if you don't there are some libraries that allow you to connect java to an email box like camel (http://camel.apache.org/mail.html).
So now you've got the email so then what?
A handy thing that could help is that lingpipe (http://alias-i.com/lingpipe/) has an entity recognizer that you can populate with your own terms. Specifically, look at some of their extraction tutorials and their dictionary extractor (http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html) So inside of the lingpipe dictionary extractor (http://alias-i.com/lingpipe/docs/api/com/aliasi/dict/ExactDictionaryChunker.html) you'd simply import the terms you're interested in and use that to associate labels with an email.
You might also find the following question helpful: Dictionary-Based Named Entity Recognition with zero edit distance: LingPipe, Lucene or what?
Really a very broad question, but I can try to give you some general ideas, which might be enough to get started. Basically, it sounds like you're talking about an elaborate parsing problem - scanning through the text and looking to apply meaning to specific chunks. Depending on what exactly you're looking for, you might get some good mileage out of a few regular expressions to start - things like phone numbers, email addresses, and dates have fairly standard structures that should be matchable. Other data points might benefit from some indicator words - the phrase "departing from" might indicate that what follows is an address. The natural language processing community also has a large tool set available for text processing - check out things like parts of speech taggers and semantic analyzers if they're appropriate to what you're trying to do.
Armed with those techniques, you can follow a basic iterative development process: For each data point in your expected output structure, define some simple rules for how to capture it. Then, run the application over a batch of test data and see which samples didn't capture that datum. Look at the samples and revise your rules to catch those samples. Repeat until the extractor reaches an acceptable level of accuracy.
Depending on the specifics of your problem, there may be machine learning techniques that can automate much of that process for you.

Freebase: Format search result to list all properties of object of unknown type(s)

I'm trying to write a MQL query to format a search result in freebase (the "output" parameter in the search API). I essentially want to find the (simple) values of all the properties of a given search result (without knowing anything about the types of the result a priori). By "simple", I mean only the default properties if the values are complex objects.
E.g., if I search for "Yo La Tengo" and this takes me to the result for "/en/yo_la_tengo", I want to be able to get the group's members (I just need names, not instruments or dates started), albums (again, just names), films contributed to (again, just names), etc.
Is there a simple way to do this with a search output query, given that I know nothing about the types? I imagine there's some sort of reflection magic I can use, and I've tried mucking about with "/type/reflect", but I'm not getting anywhere. I'm brand-new to MQL (though I have extensive SQL experience), so this is a little daunting. Any ideas?
Edit: So to clarify, I think the problem I'm seeing is due to mediator types like "performance" (an actor in a film) or "marriage". E.g., with a query about Yo La Tengo, I can see most (all?) information that I'm interested in, but a similar query about [The Muppet Movie]( freebase.com/api/service/search?limit=1&mql_output=%5B%7B%22%2Ftype%2Freflect%2Fany_reverse%22%3A%5B%7B%7D%5D%2C%22%2Ftype%2Freflect%2Fany_master%22%3A%5B%7B%7D%5D%2C%22%2Ftype%2Freflect%2Fany_value%22%3A%5B%7B%7D%5D%7D%5D&query=The%20Muppet%20Movie -- sorry, SO thinks I'm a spammer so I can't make this a link), I don't see Frank Oz reference at all (probably because his performance is referenced instead). Is there a generic way for me to "follow" mediator types to get all their properties? E.g., is there a single output MQL that would allow me to get the actor in a performance (when linked form a film search result) and give the the spouse in a marriage (when linked from a person)?
Querying not only every property, but then following those properties another ply deep in the graph for all search results is going to be an incredibly expensive operation. What is the use case for this? Do you really have a UI where the user can see and effectively absorb all this information? To answer the question directly though, it's not possible to unpack mediator types automatically using mql_output on the search API.
I'd suggest combining a basic set of information on the search query with a deeper set of information on a topic that the user has expressed interest (e.g. by hovering over). This UI experience would be similar to that of Freebase Suggest.
In the years since the question was originally asked there have been some additional useful things added such as the "notable" pseudo-property which lets you see what the topic is notable for.
Of course everyone also needs to be moving to the new API, so the queries would be:
https://www.googleapis.com/freebase/v1/search?query=%22the%20muppet%20movie%22&limit=1&indent=true
https://www.googleapis.com/freebase/v1/topic/en/the_muppet_movie
AFAIK there is no way to do this in outright MQL, but you can:
Get all the properties of an object or type of object, then
Programmatically construct another MQL query to get those objects you want to know more about.
Look at this example:
[{
"type|=": [
"/film/actor",
"/tv/tv_actor",
"/celebrities/celebrity"
],
"*": [{}]
}]​
It grabs all the properties of all objects that have the type actor, tv_actor, or celebrity. When you run it, you'll see all the possible "follow" points you can explore.
This is not exactly what you want, but it should get you closer.

DDD - Identifying entities, roots and services

I want to apologize for my bad english first. I am trying to implement DDD in my project, but I have some problems I will try to describe. I am developing a chat application. Front-end is quite simple, no rooms, just one window showing last n messages.
Here comes the first problem. I don't see any entity here (ChatRoom would be ok, but I have only one room). Message seems like value object to me. So I don't know how to save state of chat (I think that having repositories for value objects is a bad practice).
This chat should have one specific feature, it should group and store related messages together (create something like ChatSegment). These segments will be divided by blocks of m off-topic messages (off-topic messages mean messages unrelated to messages in current segment).
I cannot imagine the way to make this work, without using stateful service. This behavior does not fit into any Entity (not even into the hypothetical ChatRoom entity). Segmenter entity does not seem right either. How would you solve this problem?
Maybe my thoughts are totally incorrect, but I would need to make them clear.
Thank you,
Brano.
In an IM application I'd (OTTOMH) treat Meassage as an Entity with a unique Id.
Combining DDD with DNC could be very powerful here if you treat your Message type as a «moment» archetype, and perhaps also use «role» archetypes (e.g. Chatter) in order for the users of the system to take on more than one «role» of needed.
Furthermore; the concept of "next/prior moment (i.e. message)" seems to be a perfect match.

Resources