Naming variables is quite important and being a non-native English speaker I wonder what the difference would be for using element, item and entry to name things within data structures or variables/parameters.
Let us start with the plain English meaning of each of these:
Element: a part or aspect of something abstract, especially one that is essential or characteristic.
Thus, they can be thought of logically connected atomic parts of a whole. E.g. Elements(Nodes) of a Tree, Elements of a HTML code(Opening tag, InnerHtml content and Closing tag)
Item: an individual article or unit, especially one that is part of a list, collection, or set.
I prefer this when the thing are logically independent like Items in a Shopping cart, Items in a bag, etc
Entry: an item written or printed in a diary, list, ledger, or reference book.
I usually use this for tables like Hash Table or Accounts(Transaction entry) or Records(recording entries in sales, etc.)
Now we can't refer the items in a bag(considered as an Object in Object oriented paradigm) as entries or elements(probably not elements because the items as not constituents of the bag itself).
However, in some cases like an array we can use the element or item or entry interchangeably too :)
Had to think on this for a few minutes, interesting :)
Note I'm not a native English speaker either so my opinions are just that, opinions.
I use 'element' for things that have some connection with each other, like nodes in a graph or tree. I use 'item' for individual elements in a list (i.e. that don't necessarily have a connection to each other). I don't use 'entry' because I don't like it in this context, but it's just a matter of preference.
Since I'm primarily a C# dev, this is apparent in .Net's naming too: a List<T> has Items, but WPF building blocks in XAML, or XML tags, are Elements (and many more similar examples); that's probably at least part of the reason why I formed this habit.
I don't think there would be anything very wrong with switching things around though; it would certainly be understandable enough from my point of view.
Related
I'm taking an introductory graphics course, and while I intuitively understand that converting a click or touch into object coordinates will make the math much cleaner, reduce the chances for human error, and potentially make debugging easier, none of these are actually a very good explanation, conceptually, of why object coordinate spaces are used in selection tests, as opposed to simply using world coordinates for the test - rather, they're just observations of what tends to happen when object coordinates are used. So I ask: why?
A selection test involves comparing the click coordinates, which you get in window coordinates, against lots and lots of object features, which are represented in object coordinates.
You need to transform them into the same coordinate system in order to do the checks, so you can EITHER transform the one simple click point OR you can transform all the various object features.
Transforming one point or line is just a lot easier that transforming a whole bunch of object features of various types.
There are cases where the location of a specific object or point may not be known within a world coordinate system, but is known relative to some other coordinate system.
To summarize an example from my course text, consider the idea of two different towns, one using a grid system for its layout, and the other using what I can only describe as the New England we-made-cow-trails-into-roads method. A government employee is tasked with creating a layout of the area which includes them, and in doing so has to convert the two coordinate systems into a third, which encompasses the other two.
Sometimes, using a world atlas just isn't practical to get across the street, and so something much more local (and relevant) is used instead, as it provides much more detail over a much smaller area.
The text also explains that it may be more than simply impractical to use a given coordinate system - it may yield results that are improbable or just plain wrong. This is evidenced in the evolution of the geocentric and heliocentric models of the universe - the distance of the stars from us was calculated with very different results using the two models.
Thinking of my own example, the best that comes to mind would be something like your own internal organs - from the outside, you don't know for sure exactly the shape, size, and structure of each of them, but your own body does. In order to be able to access that information, you need to look inside the body (ideally in a way that doesn't kill you). It's not something that is plainly observable from outside.
Khan Academy's API Explorer has an exercises section that mentions filtering by tags, but the url with math tag applied returns nothing.
The generic exercise objects don't contain the topic they're in. My guess is that there's an id to join on somewhere in the topictree/exercises json objects, but I don't know an efficient way to find it.
Here are the raw exercises json and raw topictree json (note, the second one is huge, and contains many topics other than math).
I don't think there is a nice way to return exercises from just a subtree of the topictree (e.g. just math). Tags are a different concept, and there isn't a tag common to everything in math. Probably your best bet is to load the full topictree with just Exercises (and Topics) and work from there:
http://www.khanacademy.org/api/v1/topictree?kind=Exercise
If you need to reference this structure repeatedly, it probably makes sense to download and filter it ahead of time, and maybe re-fetch it from time to time to account for changes to Khan Academy content. But it depends on your exact use case.
Generally, any content item can be referenced by content_id (sometimes just called id) or by slug, but unfortunately, the naming and usage aren't consistent everywhere.
You can use the following to get all the exercises:
http://www.khanacademy.org/api/v1/exercises
http://www.khanacademy.org/api/v1/topictree?kind=Exercise
I'm not sure what's the difference between these two - I don't use them.
I prefer to fetch the data for the individual topic nodes as follows:
http://www.khanacademy.org/api/v1/topic/%s
http://www.khanacademy.org/api/v1/topic/%s/exercises
http://www.khanacademy.org/api/v1/topic/%s/videos
where %s is the "node_slug" property for each topic. The root of the tree is just "root". The first one will give you the topic details and a list of sub-items in the "child_data" array. Use the "id" properties of each sub-topic in this array to look up its details in the "children" array having "internal_id" equal to "id". There you get the "node_slug" to for the next API call for that sub-topic. The "child_data" array has all the sub-items in the order that they appear on the website when you're working with the missions.
I cache these responses so that I don't have to download everything every time.
I'm new to natural language process so I apologize if my question is unclear. I have read a book or two on the subject and done general research of various libraries to figure out how i should be doing this, but I'm not confident yet that know what to do.
I'm playing with an idea for an application and part of it is trying to find product mentions in unstructured text (e.g. tweets, facebook posts, emails, websites, etc.) in real-time. I wont go into what the products are but it can be assumed that they are known (stored in a file or database). Some examples:
"starting tomorrow, we have 5 boxes of #hersheys snickers available for $5 each - limit 1 pp" (snickers is the product from the hershey company [mentioned as "#hersheys"])
"Big news: 12-oz. bottles of Coke and Pepsi on sale starting Fri." (coca-cola is the product [aliased as "coke"] from coca-cola company and Pepsi is the product from the PepsiCo company)
"#OMG, i just bought my dream car. a mustang!!!!" (mustang is the product from Ford)
So basically, given a piece of text, query the text to see if it mentions a product and receive some indication (boolean or confidence number) that it does mention the product.
Some concerns I have are:
Missing products because of misspellings. I thought maybe i could use a string similarity check to catch these.
Product names that are also English words or things would get caught. Like mustang the horse versus mustang the car
Needing to keep a list of alternative names for products (e.g. "coke" for "coco-cola", etc.)
I don't really know where to start with this but any help would be appreciated. I've already looked at NLTK and SciKit and didn't really gleam how to do this from there. If you know of examples or papers that explain this, links would be helpful. I'm not specific to any language at this point. Java preferably but Python and Scala are acceptable.
The answer that you chose is not really answering your question.
The best approach you can take is using Named Entity Recognizer(NER) and POS tagger (grab NNP/NNPS; Proper nouns). The database there might be missing some new brands like Lyft (Uber's rival) but without developing your own prop database, Stanford tagger will solve half of your immediate needs.
If you have time, I would build the dictionary that has every brands name and simply extract it from tweet strings.
http://www.namedevelopment.com/brand-names.html
If you know how to crawl, it's not a hard problem to solve.
It looks like your goal is to classify linguistic forms in a given text as references to semantic entities (which can be referred to by many different linguistic forms). You describe a number of subtasks which should be done in order to get good results, but they nevertheless are still independent tasks.
Misspellings
In order to deal with potential misspellings of words, you need to associate these possible misspellings to their canonical (i.e. correct) form.
Phonetic similarity: Many reasons for "misspellings" is opacity in the relationship between the word's phonetic form (i.e. how it sounds) and its orthographic form (i.e. how it's spelled). Therefore, a good way to address this is to index terms phonetically so that e.g. innovashun is associated with innovation.
Form similarity: Additionally, you could do a string similarity check, but you may introduce a lot of noise into your results which you would have to address because many distinct words are in fact very similar (e.g. chic vs. chick). You could make this a bit smarter by first morphologically analyzing the word and then using a tree kernel instead.
Hand-made mappings: You can also simply make a list of common misspelling → canonical_form mappings. This would work well for "exceptions" not handled by the above methods.
Word-sense disambiguation
Mustang the car and Mustang the horse are the same form but refer to entirely different entities (or rather classes of entities, if you want to be pedantic). In fact, we ourselves as humans can't tell which one is meant unless we also know the word's context. One widely-used way of modelling this context is distributional lexical semantics: Defining a word's semantic similarity to another as the similarity of their lexical contexts, i.e. the words preceding and succeeding them in text.
Linguistic aliases (synonyms)
As stated above, any given semantic entity can be referred to in a number of different ways: bathroom, washroom, restroom, toilet, water closet, WC, loo, little boys'/girls' room, throne room etc. For simple meanings referring to generic entities like this, they can often be considered to be variant spellings in the same way that "common misspellings" are and can be mapped to a "canonical" form with a list. For ambiguous references such as throne room, other metrics (such as lexical-distributional methods) can also be included in order to disambiguate the meaning, so that you don't relate e.g. I'm in the throne room just now! to The throne room of the Buckingham Palace is beautiful.
Conclusion
You have a lot of work to do in order to get where you want to go, but it's all interesting stuff and there are already good libraries available for doing most of these tasks.
I'm trying to write a MQL query to format a search result in freebase (the "output" parameter in the search API). I essentially want to find the (simple) values of all the properties of a given search result (without knowing anything about the types of the result a priori). By "simple", I mean only the default properties if the values are complex objects.
E.g., if I search for "Yo La Tengo" and this takes me to the result for "/en/yo_la_tengo", I want to be able to get the group's members (I just need names, not instruments or dates started), albums (again, just names), films contributed to (again, just names), etc.
Is there a simple way to do this with a search output query, given that I know nothing about the types? I imagine there's some sort of reflection magic I can use, and I've tried mucking about with "/type/reflect", but I'm not getting anywhere. I'm brand-new to MQL (though I have extensive SQL experience), so this is a little daunting. Any ideas?
Edit: So to clarify, I think the problem I'm seeing is due to mediator types like "performance" (an actor in a film) or "marriage". E.g., with a query about Yo La Tengo, I can see most (all?) information that I'm interested in, but a similar query about [The Muppet Movie]( freebase.com/api/service/search?limit=1&mql_output=%5B%7B%22%2Ftype%2Freflect%2Fany_reverse%22%3A%5B%7B%7D%5D%2C%22%2Ftype%2Freflect%2Fany_master%22%3A%5B%7B%7D%5D%2C%22%2Ftype%2Freflect%2Fany_value%22%3A%5B%7B%7D%5D%7D%5D&query=The%20Muppet%20Movie -- sorry, SO thinks I'm a spammer so I can't make this a link), I don't see Frank Oz reference at all (probably because his performance is referenced instead). Is there a generic way for me to "follow" mediator types to get all their properties? E.g., is there a single output MQL that would allow me to get the actor in a performance (when linked form a film search result) and give the the spouse in a marriage (when linked from a person)?
Querying not only every property, but then following those properties another ply deep in the graph for all search results is going to be an incredibly expensive operation. What is the use case for this? Do you really have a UI where the user can see and effectively absorb all this information? To answer the question directly though, it's not possible to unpack mediator types automatically using mql_output on the search API.
I'd suggest combining a basic set of information on the search query with a deeper set of information on a topic that the user has expressed interest (e.g. by hovering over). This UI experience would be similar to that of Freebase Suggest.
In the years since the question was originally asked there have been some additional useful things added such as the "notable" pseudo-property which lets you see what the topic is notable for.
Of course everyone also needs to be moving to the new API, so the queries would be:
https://www.googleapis.com/freebase/v1/search?query=%22the%20muppet%20movie%22&limit=1&indent=true
https://www.googleapis.com/freebase/v1/topic/en/the_muppet_movie
AFAIK there is no way to do this in outright MQL, but you can:
Get all the properties of an object or type of object, then
Programmatically construct another MQL query to get those objects you want to know more about.
Look at this example:
[{
"type|=": [
"/film/actor",
"/tv/tv_actor",
"/celebrities/celebrity"
],
"*": [{}]
}]
It grabs all the properties of all objects that have the type actor, tv_actor, or celebrity. When you run it, you'll see all the possible "follow" points you can explore.
This is not exactly what you want, but it should get you closer.
I'm working on a pretty basic web app (not much more than CRUD stuff). However, the requirements call for a bunch of data to be displayed with each item in the search results - IDs, dates, email addresses, long descriptions... too much to fit neatly into a simple grid, and too dissimilar to make them flow together (like the natural language example from this article.)
Is there a design pattern for attractively displaying many descriptive fields with each search result?
(Please don't tell me to just remove some fields from the results; that's not an option for this project.)
Obviously there are many ways you can handle this, and to a degree it's a factor of your information design abilities and preferences.
Natural Data Groupings
What I would do is try to organize your data into a small number of "buckets." You state that the data are too dissimilar to be arranged into a sentence, but it's likely you can create a few logical groups. Since we can't see all your data, I'll guess that you have information about a person (email, name, ID?), about some sort of event (dates? type?), or maybe about some kind of object related to the person (orders? classes?). Whatever they are, some of the data will be more closely related to each other than others.
Designing in Chunks
Take each loose "bucket" and design a kind of "plate" -- a grouping just for the information in that bucket. The design problem within this constrained chunk is easier to tackle: maybe it's a little table-like layout, maybe it's something non-tabular, like the stackoverflow user "nameplate". Maybe long textual data have their own plates, or maybe they're grouped into a single plate, but with a preview/detail click-for-more arrangement.
Using a Grid
Now that you have a small number of "plates," go back to a grid-like approach for your overall search result row design. Arrange the plates as units within the row, and be sure to keep them aligned. Following an overall grid (HTML table or otherwise) for the plates will avoid an "information soup" problem. You'll have clean columns that scan well, and a readable, natural information hierarchy. The natural language example you cite would indeed be difficult to parse if it were one of many rows displayed in a search results grid.
Consistency
Be sure to use a common "design vocabulary" when you're working on the chunks -- consistent styling of labels, consistent spacing... so when everything's displayed, despite the bulk of information, it all feels like it's part of the same family.
It's an interesting design exercise. Many comps, lots of iteration, and some brainstorming should get you where you need to be.
It probably depends on the content you're displaying. Look at the StackOverflow layout for this question. It has Votes, Title, Description, Tags, Author, etc. The content wouldn't work well in a grid for sure, nor does it flow nicely on it's own.
I think it's time to get creative ;)
No one ever thinks about what this is going to look like on their screen, do they?
One thing you can do is truncate the displayed text, and then display the expanded version in a tooltip on hover, or after the user clicks on it.
For example, display only the two-letter state abbreviation but show the full state name on hover.
Or, to save even more space, only display the state abbreviation, and put the entire address in the tooltip.
For long descriptions, you can display only the first few characters, followed by an ellipsis or the word "More". Then, show the full text either on hover or on click.
One disadvantage of the hover approach is that you can't sort the column on that text. There's nothing for the user to click to request the sort.