Skobbler and OpenSeaMaps - skmaps

I'd like to know if somehow Skobbler might natively support OpenSeaMaps? If not do you have some advice how to display all the data efficient, because the plan B is to get all the data from OpenSeaMaps and display/draw them manually (like parsing JSON data, etc.). Maybe you're planning to add such functionality in the future or it is possible to prepare some customized sea data (map features from openseamaps).
Thanks.
Bart S.

Currently there are no specific developments regarding OpenSeaMap - the current version focuses on land features & navigation (cars, bikes, pedestrians).
This is also true for the map format (the map format used by skobbler/Telenav) - the schema of the map is optimized for land features not water/sea features, so currently no OpenSeaMap features are imported.

Related

Recommended annotation tool to create a Named Entities Recognition data set

I'm new to NLP. I am looking for recommendations for an Annotation tool to create a labeled NER dataset from raw texts.
In details:
I'm trying to create a labeled data set for specific types of Entities in order to develop my own NER project (rule based at first).
I assumed there will be some friendly frameworks that allows create tagging projects, tag text data, create a labeled dataset, and even share projects so several people could work on the same project, but I'm struggling to find one (I admit "friendly" or "intuitive" are subjective, yet this is my experience).
So far I've tried several Frameworks:
I tried LightTag. It makes the tagging itself fast and easy (i.e. marking the words and giving them labels) but the entire process of creating a useful dataset is not as intuitive as I expected (i.e. uploading the text files, split to different tagging objects, save the tags, etc.)
I've installed and tried LabelStudio and found it less mature then LightTag (don't mean to judge here :))
I've also read about spaCy's Prodigy, which offers a paid annotation tool. I would consider purchasing it, but their website only offers a live demo of the the tagging phase and I can't access if their product is superior to the other two products above.
Even in StackOverflow the latest question I found on that matter is over 5 years ago.
Do you have any recommendation for a tool to create a labeled NER dataset from raw text?
⚠️ Disclaimer
I am the author of Acharya. I would limit my answers to the points raised in the question.
Based on your question, Acharya would help you in creating the project and upload your raw text data and annotate them to create a labeled dataset.
It would allow you to mark records individually for train or test in the dataset and would give data-centric reports to identify and fix annotation/labeling errors.
It allows you to add different algorithms (bring your own algorithm) to the project and train the model regularly. Once trained, it can give annotation suggestions from the trained models on untagged data to make the labeling process faster.
If you want to train in a different setup, it allows you to export the labeled dataset in multiple supported formats.
Currently, it does not support sharing of projects.
Acharya community edition is in alpha release.
github page (https://github.com/astutic/Acharya)
website (https://acharya.astutic.com/)
Doccano is another open-source annotation tool that you can check out https://github.com/doccano/doccano
I have used both DOCCANO (https://github.com/doccano/doccano) and BRAT (https://brat.nlplab.org/).
Find the latter very good and it supports more functions. Both are free to use.

tensorflow for classification of strings vs elasticsearch

So, a little bit on my problem.
TL;DR
Can I use machine-learning instead of Elastic Search to find results depending on the user's text input? Is it a good idea?
I am working on a car spare parts project, and we have split the car into 300 parts that we store on the database, with some data for each part (weight, availability, etc).
When the customer inputs the text of his part, we need to be able to classify the part, and map it to one in our database.
The current way it's being done is by people on our team manually mapping the customer text with the parts on our database, we want to automate that process.
We tried using MongoDB text search, but it was often inaccurate since parts have different names in different parts of the country.
So we wanted something that got more accurate results, and improved by the more data we have, we immediately considered TensorFlow, after some research and taking part of Google's Machine Learning Crash Course, I got to that point where it specified:
Models can't learn from string values, so you'll have to perform some feature engineering to convert those values to something numeric
That would be useful in the case we have limited number of features as strings, but we don't know what the user will input as a text.
So, my questions are:
1- Can we use Machine Learning to map text input by the user with some documents on our database?
2- If we can do that, is it a good idea to favor it over other search tools like ElasticSearch?
3- Can ElasticSearch improve its results the more data we have? How?
4- How would you go about this problem?
Note: I'd be doing that in Node.js, and since TensorFlow.js is new, I am inclining to go for other solutions, but if push comes to shove, and the results are much better, I would definitely go there.
TL;DR: Yes and yes.
TS;WM:
This is a perfectly suited problem for machine learning. Especially so, if you have a database of past customer texts that have already been mapped to parts. Ideally, you have hundreds of texts mapped to each part. If that is present, you can design and train a network. And models can learn from string values with some engineering, and it's not that bad.
I'm not sure ElasticSearch would improve much on the network. I don't know much about auto parts trading, but as a wild guess, "the large round thingy that helps change direction" would never be mapped to "steering wheel" by ES but could be learned easily by a network - provided there are at least some examples of people using that text to specify steering wheel.
You can but don't have to necessarily use tensorflow.js for your network. The AI could run on your server as a webservice, and you'd just send over the customer's text to it and it would send back it's recommendations of part SKUs and names.

J2ME implementation of a Trie (Ternary Search Tree )

I am currently working on a predictive text SMS system. I want to implement it using TST data structure and bi-gram (Predicting the next probable word based on current key sequence 12-keypad).
Currently I have a corpus and have used the available applications to come up with a dictionary, bi-gram and frequencies. Currently have the following questions in mind:
Can I find a J2ME TST implementation or a suitable Trie on this case? (More detailed explanation on the available TST trie can be great)
A general guidance on this project approach
NB:I Have looked at similar Trie implementations but still unable to figure out a way forward
You may look at this: http://algs4.cs.princeton.edu/52trie/
Alternatives can be found at Algorithms, 4th Edition in general...

The best IR software for my use?

I want to take what people chat about in a chat room and do the following information retrieval:
Get the keywords
Ignore all noise words, keep verb an nouns mainly
Perform stemming on the keywords so that I don't store the same keyword in many forms
If a synonym keyword is already stored in my storage then the existing synonym should be used instead of the new keyword
Store the processed keyword in a persistant storage with a reference to the chat message it was located in and the user who uttered it
With this prosessed information I want to slowly get an idea of what people are talking about in chatrooms, and then use this to automatically find related chatrooms etc. based on these keywords.
My question to you is a follows: What is the best C/C++ or .NET tools for doing the above?
I partially agree with #larsmans comment. Your question, in practice, may indeed be more complex than the question you posted.
However, simplifying the question/problem, I guess the answer to your question could be one of Lucene's implementation: Lucene (Java), Lucene.Net (C#) or CLucene (C++).
Following the points in your question:
Lucene would take care of point 1 by using String tokenizers (you can customize or use your own).
For point 2 you could use a TokenFilter like StopFilter so Lucene can read a list of stopwords ("the", "a", "an"...) that it should not use.
For point 3 you could use PorterStemFilter.
Point 4 is a little bit trickier, but could be done using a customized TokenFilter.
Point 1 to 4 are perfomed in the Analysis/tokenization phase, which an Analyzer is responsible.
Regarding point 5, in Lucene you can store Documents with fields. A document can have an arbitrary number and mix of fields. So you could create a single Document for each chat room with all its text concatenated, and have another field of the document reference the chatroom it was extracted from. You will end up with a bunch of Lucene documents that you can compare. So you can compare your current chat room with others to see which one is more similar to the one you are on.
If all you want is a set of the best keywords to describe a chatrom your needs are closer to information extraction/automatic summarization/topic spotting task as #larsmans said. But you can still use Lucene for the parsing/tokenization phase.
*I referenced the Java docs, but CLucene and Lucene.Net have very similar APIs so it won't be much trouble to figure out the differences.

Word characteristics tags

I want to do a riddle AI chatbot for my AI class.
So i figgured the input to the chatbot would be :
Something like :
"It is blue, and it is up, but it is not the ceiling"
Translation :
<Object X>
<blue>
<up>
<!ceiling>
</Object X>
(Answer : sky?)
So Input is a set of characteristics (existing \ not existing in the object), output is a matched, most likely object.
The domain will be limited to a number of objects, i could input all attributes myself, but i was thinking :
How could I programatically build a database of characteristics for a word?
Is there such a database available? How could i tag a word, how could i programatically find all it's attributes? I was thinking on crawling wikipedia, or some forum, but i can't see it build any reliable word tag database.
Any ideas on how i could achieve such a thing? Any ideas on some literature on the subject?
Thank you
This sounds like a basic classification problem. You're essentially asking; given N features (color=blue, location=up, etc), which of M classifications is the most likely? There are many algorithms for accomplishing this (Naive Bayes, Maximum Entropy, Support Vector Machine), but you'll have to investigate which is the most accurate and easiest to implement. The biggest challenge is typically acquiring accurate training data, but if you're willing to restrict it to a list of manually entered examples, then that should simplify your implementation.
Your example suggests that whatever algorithm you choose will have to support sparse data. In other words, if you've trained the system on 300 features, it won't require you to enter all 300 features in order to get an answer. It'll also make your training and testing files smaller, because you'll be omit features that are irrelevant for certain objects. e.g.
sky | color:blue,location:up
tree | has_bark:true,has_leaves:true,is_an_organism=true
cat | has_fur:true,eats_mice:true,is_an_animal=true,is_an_organism=true
It might not be terribly helpful, since it's proprietary, but a commercial application that's similar to what you're trying to accomplish is the website 20q.net, albeit the system asks the questions instead of the user. It's interesting in that it's trained "online" based on user input.
Wikipedia certainly has a lot of data, but you'll probably find extracting that data for your program will be very difficult. Cyc's data is more normalized, but its API has a huge learning curve. Another option is the semantic dictionary project Wordnet. It has reasonably intuitive APIs for nearly every programming language, as well as an extensive hypernym/hyponym model for thousands of words (e.g. cat is a type of feline/mammal/animal/organism/thing).
The Cyc project has very similar aims: I believe it contains both inference engines to perform the AI, and databases of facts about commonsense knowledge (like the colour of the sky).

Resources