Drupal 6 listing taxonomy terms in views - drupal-6

I have a vocabulary call jobs, I have multiple terms 'Doctor', 'Teacher', 'Bus Driver'
Is there no way yo output this list using views? I have tried terms and all terms. I just get a blank output:
All terms:
All terms:
All terms:
All terms:
All terms:
I could write a module to do this but want a quick way to list them. Am I missing something? I am not as experienced with D6.

When you are first creating a View, you get to pick the type of view. Most often you'll pick Node, but in your case you should pick Term. That way you'll be creating a list of Terms.

Related

Is there a way to group items based on a broader category (e.g., skittles and snickers get labeled as "candy")?

I'm wondering if there is a way (specific package, process, etc.) of grouping items based on an overall category? For example, I'm looking at empty search results and want to see what category customers are most interested in.
Let's say I have a list of searched terms: skittles, laundry, snickers and detergent. I would want to group these items based on a broader category (i.e., skittles and snickers are "candy" and laundry and detergent would be "cleaners").
I've done some research on this and have seen similar (but not exact) ways of doing this (e.g., common keyword grouping using NLP) but not sure if something like this exists in the world when there isn't necessarily any commonality. Any help or direction would be greatly appreciated.
Update here: The best way to handle this scenario is to use pretrained word embeddings using something like Google's BERT algorithm as the first pass and then layer on another ML model that is specific to the use case.

Internal Search optimization for relevance

My team is using Solr and I have a question regarding it.
There are some search terms which doesn't gives relevant results or results which should have been displayed. For example:
Searching for Macy's without the apostrophe like "Macys" doesnt give back any result for Macy's.
Searching for JPMorgan vs JP Morgan gives different result
Searching for IBM doesn't show results which contains its full name i.e International business machine.
How can we improve and optimize such cases so that it gets applied to all, even to the one we didn't catch apart from these 3 above?
Any suggestions?
All these issues are related to how you process the incoming text for those fields. You'll have to create a filter chain for the field - and possibly use multiple fields for different use cases and prioritize those using qf - that processes the input values to do what you want.
Your first case can be solved by using a PatternReplaceFilter to remove any apostrophes - depending on your use case and tokenizer you might want to use the CharFilter version, as it processes the text before it's split into multiple tokens.
Your second case is a straight forward synonym filter or a WordDelimiterFilter, where you expand JPMorgan to "JP Morgan", or use the WordDelimiterFilter to expand case changes into separate tokens. That'll also allow you to search for JP and get JPMorgan related entries. These might have different effects on score, use debugQuery=true to see exactly how each term in your query contributes to the score.
The third case is in general the same as the second case. You'll have to create a decent synonym word list for the terms used, and this is usually something you build as you get feedback from your users, from existing dictionaries and from domain knowledge. There's also the option of preprocessing text using NLP, or in this case, something as primitive as indexing the initials of any capitalized words after each other could help.

Core Data - relationships or attributes?

I have a very basic, functioning, checklist application that I'd like to improve.
Essentially, it's just a list of 37,000 (and growing) items.
Right now, I have two entities:
1) Checklist: This includes the following attributes: name, numberOwned, imageName, groupName, etc - 14 in all. All are Strings
2) Keywords: This includes a single attribute: words, with a one-to-many nameKeywords relationship. This stores the normalized name for searching
My question is: Is there any reason to be using multiple entities in this type of situation? Should I remove the Keywords relationship and just add that as an additional attribute? Or should be be going the other route, minimizing the attributes and adding more entities?
I'd like to keep it as simple as possible (I'm not an experienced programmer, and the app isn't a source of revenue - it's available free on the store) - but I would like to make the searches more efficient if possible to make my users happy. Right now when a user searches for an item, it searches the normalized name in the Keyword entity, but it can take a while if they are trying to search through all items.
As usual, I apologize if this question is to vague. I'm happy to provide clarifications and code snippets as needed!
Zack
To increase the speed of search, you can use indexes for attributes, but it'll help if you can show your model of database

Text classification using Java

I need to categorize a text or word to a particular category. For example, the text 'Pink Floyd' should be categorized as 'music' or 'Wikimedia' as 'technology' or 'Einstein' as 'science'.
How can this be done? Is there a way I can use the DBpedia for the same? If not, the database has to be trained from time to time, right?
This is a text classification problem. Manning, Raghavan and Schütze's Information Retrieval book chapter is a nice introduction. I think you do not need DBPedia nor NER for this, just a small labeled training data set with enough labeled examples for all of your classes.
Yes, DBpedia may be a good choice for this kind of problem. You'll have to
squash the DBpedia category structure so you get the right granularity (e.g., Pink Floyd is listed under Capitol Records artists and a host of other categories, but not directly under Music). Maybe pick a few large categories and try to find whether your concepts are listed indirectly in them;
normalize text; Einstein is listed as Albert Einstein, not einstein
deal with ambiguity due to terms describing multiple concepts and concepts belonging to multiple top-level categories.
These problems may be solvable using machine learning, but I only see how it can be done if you extract these terms, along with relevant features, from running text. But in that case, you might just as well classify the entire text into one of the categories you choose in step 1.
This is the well-studied named entity recognition problem. Unless you have a particular need to roll your own technology (hint: it's a hard problem in general), using Gate, or perhaps one of the online services that builds on it (e.g. TSO's Data Enrichment Service), would be a good option. An alternative online service is OpenCalais.
Mapping your categries to DBPedia.
Index with lucene selected DBPedia categories and label data with your category names.
Do search for your data - tokenization, normalization will be done by Lucene.
This approach is somehow related to KNN classification.
Yes DBpedia is a good choice for text classification, as you can use its predicates/ relations to query and to extract the meaningful information for the particular category.
You can look into the endpoint for querying Dbpedia:
http://dbpedia.org/sparql
Further, learn the basic syntax of SPARQL to query on the endpoint from the following link:
http://www.w3.org/TR/rdf-sparql-query/

Synonym style text lookup and parsing

We have a client who is looking for a means to import and categorize a large amount of textual data. This data has to be categorized and it's been suggested that the easiest way to to do this would be to look at the description field and try to match the words held there to see if a category can be derived for that particular record.
It was thought the best way to do this would be matching the words to key words held against each category and if that was unsuccessful then to use some kind of synonym look up to see if this could be used instead. So for example, if a particular record had the word "automobile" in it then a synonym look up could match that word to the word "car" which would be held against the category "vehicle".
Does anyone know of a web service or other means of looking up a dictionary to find synonyms for a particular word? The project manager has suggested buying a Google Enterprise Search license for this but from what I can make out that doesn't offer what these guys are looking for.
Any suggestions of other getting the client what they are looking for would be gratefully accepted.
Thanks! I'll look into Wordnet.
Do you know of any other types of textual classification software products out there. I see there's some discussion of using Bayasian algorithms for this but I can't see any real world examples of it.
The first thing that comes to mind is Wordnet. Wordnet is a human-generated database of words and related words, including synonyms. The Wikipedia Wordnet entry lists several interfaces to Wordnet. I believe some of them are web services.
You can also roll your own. Manning and Schutze's chapter 5 (free PDF) shows ways to do this.
Having said that, are you solving the right problem? How do you build the category list?
Is it a hierarchy? a tag cloud? See Clay Shirky's Ontology is Overrated for a critique of hierarchical categories. I believe that synonyms are less important if you base your classification on sets of words (Naive Bayes, for example) rather than on single words.
You should look at using WordNet. You can visit their website http://wordnet.princeton.edu/ to get more information, but there are libraries available for integrating against them in lots of languages.
Go to their online tool to see the use of it in action here: http://wordnetweb.princeton.edu/perl/webwn. If you look up a word, then click on "S" next to each definition, you'll get a list of semantically related words to that definition.
I also think you should check out software that will allow you to perform "document clustering." Here is an example: http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview. That should help you bootstrap the category creation process.
I think this will help get you a long way toward what you want!
For text classification you can take a look at Apache Mahout.

Resources