SentenceSplitter in GATE - nlp

I am trying to detect Sentences using GATE and more specifically using either ANNIE SentenceSplitter or RegexSentenceSplitter.
RegexSentenceSplitter seems to be working very well, however the only problem is that a new sentence annotation is being created at the beginning of each new page of the document. (The documents analysed are PDFs).
Is it possible to change this behavior of the RegexSentenceSplitter?

You can probably try to use a conditional corpus pipeline. This method allows you to run PR (here the RegExSentenceSplitter) or not according to the value of a feature on the document.
More details here: https://gate.ac.uk/sale/tao/splitch3.html#x6-480003.8.2

Related

Get number of results from Search Action on Zapier

I build zap using zapier, which performs "Find Person" Action on Pipedrive crm (the question is relevant to search in any integration)
The result returned from the search is a single object. (Contrary to what might be expected - an array of objects)
My goal is to know how many records returned from the search (I mean the list length actually), and use that information to perform various actions later in the Zap.
How can this be done?
how my zap currently looks
Thanks
Zapier searches are designed to resolve a piece of info (like an email) into a single record from an external service (like a "Contact"), so it's working as expected here.
I'm not sure the official integration will return the data you want here, but you can write a custom integration that has an action like "Count People Matching Search", which could return that data for you.
There's info about building a custom integration here: https://platform.zapier.com/

i want to download or at least view all api.ai system entities

i want to download or atleast view all api.ai system entities
the purpose is to understand how they made entity like sys.number and sys.date
the problem i'm facing is that i'm using sys.date entity for my bot whihc works very fine for casual cases like "today" "tomorrow" detected as current date
but fails in special cases like: i want to add one more thing that when user say "aaj", "foran" or "abhi" so it also detect as current date, these are slang words for "today" used in a specific region
All API.AI system entities are listed here: https://api.ai/docs/reference/system-entities The values of those entities (which seems to be what you're asking for) are too large to be published (i.e. all cities).
If you wish to add additional entity values I'd recommend creating additional entities (like today or tomorrow) with the values you believe should be included (like aaj, foran or abhi) and handle them either in your webhook or with custom responses specifically using those entities in your response in API.AI.
If you haven't already you may want to check if API.AI supports the language you're trying to implement. You can check the language of your agent in your API.AI agent's settings (if is not the right language you can select the language you want when creating a new API.AI agent, a list of support languages is here: https://api.ai/docs/reference/language)

Google prediction API - Training data syntax for multi classification

Trying to harness the power of Google Prediction API, to classify my data. Each item in my DB can have multi categories assign to it.
For example: "My Nexus phone is rebooting constantly" could be assigned both #Android and #troubleshooting tags.
I would like to upload my training data to Google, but I'm not sure how to apply both tags to the same content. In the following example I've found the syntax that provide one category for each content like so:
"Android" ,"My Nexus phone is rebooting constantly"
What is the right syntax for multi-classification training data?
Unless I'm misunderstanding something from your question, I think the answer to it is in the docs here.
Namely, the section about text strings explains that when you submit a text string, the system actually cuts it into multiple strings, separating everything using whitespaces as a delimiter. They point out to "Godzilla vs Mothra" to be "Godzilla", "vs", and "Mothra". So in your case, you could just use "Android troubleshooting". The system will separate it in "Android" and "troubleshooting".
From the docs:
Each line can only have one label assigned, but you can apply multiple labels to one example by repeating an example and applying different labels to each one. For example:
"excited", "OMG! Just had a fabulous day!"
"annoying", "OMG! Just had a fabulous day!"
If you send a tweet to this model, you might get a classification something like this: "excited":0.6, "annoying":0.2.

Is there a design pattern for validation?

Is there any adequate design pattern that should be used in order to do number of validations?
For example, let's say that I have an application containing a toolbar with icons, each representing a picture on my file system. I am dragging an icon on a document. Validations during the drag and drop operation could be:
check if the file exists in file system
check if the user has access rights to drag the icon
check that the document is open in order to drop the picture on it
and so on...
I thought of using the Chain of Responsibility or Decorator patterns.
Thanks!
Actually, what you're after, or rather what I'd suggest, is Continuation Passing Style. It's not so much a design pattern as a way of writing code where validation would be defined as a pipeline of methods that an object would go through. This pipeline would use an accumulator to collect all the validation problems encountered by the code.

Can I use parameters with Twist concepts?

The Twist documentation for extracting concepts shows how multiple steps can be grouped into one step that contains those steps. For instance, the following eight fixtures
1. Start at the Maintain product catalog page.
2. The page title should be “Joe’s musical —Maintain Product Catalog.”
3. Click the Add New Instrument button.
4. The page title should be “Joe’s musical—Add New Musical Instrument.”
5. Enter text “Guitar” into the Instrument field.
6. Select “Slide” from the Type selection list.
7. Select “Dobro” from the Brand selection list.
8. Click the Save button.
Can be condensed into one concept:
1. Add a New Musical Instrument “Guitar” of type “Slide” and brand “Dobro”
However, the tutorial doesn't say if it's possible to use this concept with other parameters (perhaps with "Drum" instead of "Guitar"). However, it does clearly say that parameters in the concept name should be surrounded by quotes, but they also should match the parameter name, so it's not clear if it's possible.
So can I use parameters with Twist concepts?
Yes! The documentation is really crummy about making this clear, but it is absolutely possible.
If you extracted a concept in the way that they described in the tutorial you referenced and others, then the fixture Add a New Musical Instrument “Guitar” of type “Slide” and brand “Dobro” actually contains three parameters named Guitar, Slide, and Dobro. What makes this so confusing is in each scenario you can change the value of each parameter to whatever you like (perhaps "Drum", "Snare", "Yamaha"), but under the hood, the variables are still called by their original names (thus Guitar=Drum, etc.) and these original names will appear as default values whenever you add the concept to a scenario.
To eliminate confusion, I recommend changing these default names. In this case, it might be Add a New Musical Instrument “Instrument” of type “Type” and brand “Brand”. Bizarrely, you can't rename the parameters via "Rephrase the Open Concept" because you run into a catch-22 situation. You can't change the name of the concept because it doesn't match the usage in the fixture. And you can't rename the fixtures because the parameters are bound to the concept name. So I recommend just opening it up in the text editor and making the change there.
So bottom line, the examples make it seem like you can't use parameters because the parameters wind up being named after whatever value you inputted. I recommend changing the default parameter names, but you have to do it in the text editor because Twist won't let you.

Resources