Chatbase seems to censor some words - chatbase

I just integrated Chatbase with my Dialogflow chatbot via REST API.
So far everything is working, however I noticed that, especially for "Not handled" inputs, some words looks like censored and are replaced by a <*> sequence.
Is that normal? I can't find any reference in the documentation or any way to change this.
If it helps, my chatbot is in Italian and replaced words are not offensive at all. In addition this only happens with some inputs, not all..
Thanks,
Igor

I do support for Chatbase, thank you for posting this question! We automatically mask any sequence of characters that may be considered SPII (sensitive personally identifiable information). Typically, names and string sequences containing consecutive numbers get replaced with the <*> mask.
You can reveal the full message by using the Transcripts feature available on the Messages Report.
Regards,
Sean

Related

Python sentiment / text analysis advice

I don't know if this is the right place to ask this but, i am trying to build a bot in Python that will read incoming messages on a Slack channel where customer post their issues such as 'unable to connect to VPN', 'can someone reply to my ticket' etc…
The bot will analyze the message, determine if the customer is angry or not, and then propose a solution until an agent is free to actually check the issue.
Now, I was experimenting with TextBlob for the sentiment analysis part, but I don't know which technologies to actually use to determine the issue based on specific keywords and provide a solution to the user. Can someone propose me some python libraries/technologies that I could use to achieve this ?
To be honest your question is to generic to answer in one go.
Nontheless, you first have to clearly define the scope of your project. In doing so, you might want to first do a quick literaty review (Google Scholar) to familiarize with the state of the art technologies and methods.
From my little experience, a common (maybe simple) technique (lexicon-based approach) used to determine the sentiment of a word, is to use a pre-compiled dictionary (you can create your own though) that contains words - sentiment mappings. For example:
word:tired, sentiment:negative, score:5
So each time the bot finds the keyword "tired" in a sentence it will assign its corresponding negative value (polarity) to the sentence.
You might want to consider applying POS tags in the input text, as sometimes nouns or ``verbs carry significant meaning, compared to adjectives for example.
Keep in mind though, that negative comments can be written in the form of sarcasm. Sarcasm detectioin is a more difficult task though.
Alternatively, you could try using a pre-trained model such as bert-base-multilingual-uncased-sentiment that can be found here in Hugging Face.
For more information on the matter you have a look at this post.
Again as I mentioned, you have to clearly define your goals. This will enable you to specify the libraries or methodology available to solve your problem. Hope my answer helps.

Cognitive Service show Fill words and hide personal data

We use the Azure Batch Transcription Service to get the Transcript of an Audio / Speech.
In here we noticed, that sometimes filler words like "uhm", "hm" or something similar are included, but very rarely - also as we used this service for a few months already and we have the feeling as if it "got less" (so less "uhm"s in the transcript)
Q1: Is there a way to get the fill words? We want to recieve them within the transcript.
Also, as we sometimes record conversations it can happen that someone says a name or is talking about other personal information.
Q2: Is there a way to "filter" those personal information / words within the transcript?
Sorry, I don't think there is a way to filter personal data/ word when translate. We only can do profanity Filter for batch transcription.
But I agree this feature will be very helpful. I will forward this feature request to product group to see if we can have this in the feature.
Thing I will suggest is to optimize the transcription as last to filter the sensitive information.
Regards,
Yutong

Any way to get passed the minimum of 20 tokens for text classification - Google NLP API

Is there anyway to get passed the minimum token requirement for google's NLP API text classification method? I'm trying to input a short simple sentence such as "I can't wait for the presidential debates" but this would return an error saying:
Invalid text content: too few tokens (words) to process.
Is there any way to get around this? I've inputting random words until the inputted string got to 20 characters but that messes up the labels and confidence a lot of the time. If there is any way around this such as setting an option or adding something that would be awesome! If there is no workaround, let me know if you know of another pre-trained text classification model that would work for me!
Also, I can't create the categorizes and labels I want. There would just be too many needed for what I'm doing so that's why these predefined categories in nlp api is great. Just need to get rid of that 20 character requirement.
As clarified in the official Content Classification documentation:
Important: You must supply a text block (document) with at least twenty tokens (words) to the classifyText method.
Considering that, checking for possible alternatives, it seems that, unfortunately, there isn't a way to workaround this. Indeed, you will need to supply at least 20 words.
For this reason, searching around, I found this one here and this other - this one in Chinese, but it might help you :) - of pre-trained models for Text Classification that I believe might help you.
Anyway, feel free to raise a Feature Request in Google's Issue Tracker, for them to check about the possibility of removing this limitation.
Let me know if the information helped you!

Extracting relationship from NER parse

I'm working on a problem that at the very least seems to require named entity recognition, but I'm not sure how to go farther than the NER parse. What I'm trying to do is parse information (likely from tweets) regarding scheduling of events. So, for example, I'd like to be able to automatically resolve the yes/no answer to the question of "Are The Beatles playing tomorrow?" from short messages like:
"The Beatles cancelled their show tomorrow" or
"The Beatles' show is still on tomorrow"
I know NER will get me close as it will identify the band of interest and the time (if it's indicated), but there are many ways to express the concepts I'm interested in, for example:
"The Beatles are on for tomorrow" or
"The Beatles won't be playing tomorrow."
How can I go from an NER parsed representation to extracting the information of interest? Any suggestions would be much appreciated.
I guess you should search by event detection (optionally - in Twitter); maybe, also by question answering systems, if your example with yes/no questions wasn't just an illustration: if you know user needs in advance, this information may increase the quality of the system.
For start, there are some papers about event detection in Twitter: here and here.
As a baseline, you can create a list with positive verbs for your domain (to be, to schedule) and negative verbs (to cancel, to delay) - just start from manual list and expand it by synonyms from some dictionary, e.g. WordNet. Also check for negations - again, by presence of pre-specified words ('not' in different forms) in a tweet. Then, if there is a negation, you just invert the meaning.
Since you work with Twitter and most likely there would be just one event mentioned in a tweet, it can work pretty well.

Is OpenNLP able to extract keyword from content?

Is OpenNLP able to extract keyword from content?
If yes, how?
If no, which tool should I use?
I would like to tag content automatically.
For example.
Jessica Chastain has revealed that a meeting has taken place with Marvel over an undisclosed role, although the star has confirmed it is not Captain Marvel.
“We’ve talked about aligning our forces in the future,” Chastain told MTV of her relationship with the studio. “And here’s the thing with me… If you’re going to be in a superhero movie, you only get one chance.”
“You’re that character forever. So why do a superhero movie and play the boring civilian?” A possible reference to Maya Hansen there? Chastain had been attached to the Iron Man 3 character before eventually dropping out on account of scheduling difficulties…
“I don’t want to say too much,” continues the star, “but there was one thing, there was a possibility in the future of the character becoming… And I was like, ‘I understand that, but I want to do it now!’”
Just who that character might be is up for interpretation, although Chastain has moved to quash subsequent rumours that she is in line to play Captain Marvel.
It should be tagged as "superhero", "movie".
Is OpenNLP able to do this?
Thanks.
OpenNLP is able to extract Named entities for you. This means anything that is the name of a person, place, organization etc. would potentially be recognized by the system.
However, what you are looking for is keyword extraction, where you want to identify relevant keywords that explain a document in the general sense. I would recommend checking out Alchemyapi.com
They have models to extract keywords, taxonomy, named entities amongst other things. The only issue is that the free version just gives you 1000 transactions per day (which might be enough for your task)

Resources