where can I find SVG or GeoJSON of countries that uses a 2 or 3 digit country code? - svg

I was going to use the Polymaps.org library (combined with Protovis) to create a nice vector based world map. However, there example (http://polymaps.org/ex/world.html) uses a GeoJSON from Thematic Mapping, but the countries are coded by name instead of by their 2 digit country codes.
When I pair up my data, I have problems with things like "Russia" vs "Republic of Russia". Anybody know of a GeoJSON file for countries that uses the ISO 2 or 3 digit codes? It seems crazy to use the names.
Any other SVG type file would be useful too. I could create one, but I feel like it must exist out there and I just don't know how to find it.

Its not exactly what you want since its not in geoJSON format:
http://vis.stanford.edu/protovis/ex/countries.js

May be this is what you want ? World Countries Information and ip geocoding RESTful Web services API
Happy coding:-)

Related

Remove part of a string in each row of a large column of data in KNIME

I am stumbed.
I have a column with some thousand rows of unique adresses regarding universities, pharmacompanies etc. in a KNIME workflow
Example:
55 Shattuck Street Boston Massachusetts 02115 US [NAT: US RES: US] for all designated states
What I need is to clean the data, so each row look like nice and computable like this:
55 Shattuck Street Boston Massachusetts 02115 US.
My problem Is I can't seem to get the system to remove everything after US. Does anyone know a suitable approach in KNIME?
You should be able to use either String Replacer or String Manipulation for this. The first one lets you use either a simple wildcard or a full regular expression pattern while the second one uses a Java-like syntax - the choice comes down to how many different variations on the input data you need to handle and which syntax you prefer.
If you just need to remove any text between square brackets including the space before the open bracket then you can use String Replacer configured like this:
Beside the nodes which were already mentioned by nekomatic and which will work perfectly for the given scenario, there's also a user-friendly regular expression tool in the Palladian nodes extension called Regex Extractor, which allows you to build your regexes with a live preview as you might know from popular online regex testers.
For your scenario, you could e.g. set up a regex like this:
^(?<address>.*)(?:\s\[.*)
In prose, this means: Capture all characters until a space + square opening bracket and output into a column named address.
The Palladian extension is available here as a free plugin for KNIME Desktop and provides a variety of different tools for web, text, and geo data mining and classification.

Named entity recognition - tagging tools

Does someone have a recommendation of tagging tool for NER types in raw text?
The input for the tool should be a library of text files(.txt simple format) , there should be a convenient UI for selecting words and set the tag/annotation fit to selection, the output should be structural representations of the tags(e.gs tart index , last index, tag in a JSON format)
Founderof LightTag here
We provide a super convenient interface to do span annotations such as named entity recognition, classifications and relationships.
You can work as one labeler or bring in a team and LightTag will disribute work between everyone automatically (no more selecting files and remembering what you labeled already) .
You can upload your own suggestions and let labelers use those, or use LightTags built in model.
Of course you can annotate at the character level and highlight subwords or multi word phrases.
You can try https://github.com/lasigeBioTM/MER (bash)
see the demo at http://labs.fc.ul.pt/mer/
Online tools:
I guess Dataturks' POS tool should work fine for your use case, you can just upload your data and specify the labels. The UI seems convenient enough.
Here is the link:
https://dataturks.com
It's an online tool, so you can work with multiple people to get the tagging done.
The exact output format you are looking for is not supported, but the format can easily be converted to what you are looking for, the output is like: word___LABEL word2___LABEL , so a simple 2-line script can convert it to start and end index.
Offline:
Another tool you can check out is prodigy, it's a downloadable software and does similar things. Just that you might be willing to pay for it upfront.
https://prodi.gy

How can you create a search that will search within a KML and display the results on a Google Map v3?

I've created a Google map that loads a KML file as an overlay. It is a map of trailheads for say hiking. What I'm trying to figure out now is how to create a search that will allow visitors to search within the KML's data and show the relevant trailhead/s as results on the Google Map. Is this possible? I have a google search that will let them search for an address, but this does NOT search within the KML file's data for a trailhead.
Ideally the visitor could input an address, say 12345 Main st., Chicago, IL, or something and it would display results that are within a specified vicinity, say ten miles, of that address (ie latitude, longitude).
I'm a little lost as to even where to begin.
thanks for your help!
Davis
I don't know how often your kml file updates, but i recommend storing all the kml data in a database as well to make this easier. Maybe every once in a while re-download the kml file and update the database.
Then its as simple as using the haversine formula and searching the database for nearby trails.
What you're describing sounds like a good job for Fusion Tables. Fusion Tables give you a nice way to store and edit the data (even collaboratively). In addition, there are geospatial columns/data fields you can add (aka, a "Location" column that can be address or lat/long coordinates). Put all the trail heads in your fusion table and you can map them. Let people enter an address or lat/long, and you can query the fusion table to show all trail heads within the user specified distance of that point. See the tutorials to get started.
You can use KML search tool to do this. It supports KML KMZ CSV and GPX. You can find the tool here

How to compare different language String values in JAVA?

In my web application I am using two different Languages namely English and Arabic.
I have a search box in my web application in which if we search by name or part of the name then it will retrieve the values from DB by comparing the "Hometown" of the user
Explanation:
Like if a user belongs to hometown "California" and he searches a name say "Victor" then my query will first see the people who are having the same hometown "California" and in the list of people who have "California" as hometown the "Victor" *name* will be searched and it retrieve the users having "California" as their hometown and "victor" in their name or part of the name.
The problem is if the hometown "California" is saved in English it will compare and retrieve the values. But "California" will be saved as "كاليفورنيا" in Arabic. In this case the hometown comparison fails and it cant retrieve the values.
I wish that my query should find both are same hometown and retrieve the values. Is it possible?
What alternate I should think of for this logic for comparison. I am confused. Any suggestion please?
EDIT:
*I have an Idea such that if the hometown is got then is it possible to use Google translator or transliterator and change the hometown to another language. if it is in english then to arabic or if it is in english then to arabic and give the search results joining both. Any suggestion?*
The problem you encounter is that you want / need information in 2 or more languages and you want the user of your application to be able to use both languages. One possible approach is to keep multiple records per item and including a language code as part of the primary key, for instance if your record is
id hometown name
001 California Victor
you could introduce a language code and store
id lang hometown name
001 en California Victor
001 ar كاليفورنيا Victor
then your search would match either "California" or "كاليفورنيا" giving you the id 001, which you can then use to load all translations of your data (or just the data in the current output language.) This sceme can be used with any number of languages and has the added advantage that you don't need to prefill the table. You can add new translations for records when they become known.
(Caveat: I just repeated your arabic string, I can't read it, also 'ar' most likely isn't the correct language code for aribic but you get the idea.)
Does the Arabic sound like "California"? If so you will need to compare on a "sounds-like"-basis which will most likely result in a phoneme conversion.
Transliterate all names into the same language (e.g. English) for searching, and use Levenstein edit distance to compute the similarity between the phonetic representations of the names. This will be slow if you simply compare your query with every name, but if you pre-index all of the place names in your database into a Burkhard-Keller tree, then they can be efficiently searched by edit distance from the query term.
This technique allows you to sort names by how close they actually match. You're probably more likely to find a match this way than using metaphone or double-metaphone, though this is more difficult to implement.
Your Google suggestion sounds like it might also be a good one, but you should play around with it, and be sure that you're happy with its accuracy. In testing how it worked going between Hebrew and English, I noticed that sometimes Google just leaves English place names in English letters when translating to Hebrew.
How about you use some localization on client side to display values. Or create a wrapper class for hometown that will override equal(Object) in the manner the instance for California will return true for both "California" and "كاليفورنيا" (sorry if I made mistake here, just copy-pasted from above).
This sounds like a classic encoding problem. Whenever you transfer non-ascii character you need to make sure you're encoding it right. For Arabic and English I suspect you can use UTF-8 (but I don't know arabic, so it may be wrong).
In your setup you will probably have the following points:
Browser <-> Servlet container <-> Database
|
System.out
In any of the system interfaces where chars (16-bit) are converted to byte (8-bit) you will need to make sure the encoding is correct.
Browser to Servlet container
When you do GET or POST requests from a web-page, the browser will look at 1) The HTTP headers from the server, especially the Content-Type: text/html; charset=UTF-8, which if present, will override the HTML meta header <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">.
On the servlet container side, the HttpServletRequest.getParameter(), will have an encoding that you most likely need to set in the server settings.
Example tomcat's server.xml
<Connector port="8080" protocol="HTTP/1.1" URIEncoding="UTF-8"
maxThreads="2000"
connectionTimeout="20000"
redirectPort="8443" />
Servlet container to Database
The database needs to have the correct encodings, or sorting etc will not be right.
Example my.cnf for MySQL
[mysqld]
....
init_connect=''SET collation_connection = utf8_general_ci''
init_connect='SET NAMES utf8'
default-character-set=utf8
character-set-server = utf8
collation-server = utf8_general_ci
[mysql]
....
default-character-set=utf8
Then the JDBC-driver needs to be set for UTF-8.
Example JDBC connect string
jdbc:mysql://localhost:3306/rimario?useUnicode=true&characterEncoding=utf-8
System.out
System.out.printnln() can not be relied upon to verify things. First it depends on the java vm default encoding, set using System.property -Dfile.encoding=UTF-8, secondly the terminal in which you do the System.out, will need to be set to and support UTF-8. Don't trust System.out!
Once a String in the VM is a proper character, it will not be affected by encoding. In memory every char in a string is 16-bit, which (almost) covers all the chars that utf-8 can encode. You can write the string to a file and investigate the file to really know if you got correct chars in your VM.

United States State shapes for Office

I want to create visuals along the lines of CNN's "red-state, blue-state" shadings of the states in the U.S. for my project. I'm planning to do something fancier than just shading the state's shape in a color. Are there open source libraries of state shapes/polygons (or - if not open source - others) that I can import into Word, Excel, etc. that I can use to show complicated graphs based on states?
I have Map Point, but haven't been able to figure out how to shade the states in a complex way.
you could try google charts, it looks like http://www.woot.com is doing something similar to what you need
Here is a good example using google maps... I've used code like that before.. perhaps from this exact example.
http://econym.org.uk/gmap/example_states2.htm
EDIT: you might want to consider converting the states.xml into JSON... it'll be smaller (136k of XML right now!) and should load faster in most browsers.
There might be a couple parts to the question you are asking, but to address the first part "Are there open source libraries of state shapes/polygons...", here's a resource to check out:
http://commons.wikimedia.org/wiki/Category:SVG_maps_of_the_United_States
It's a list of various SVG(scalable vector graphics) files which can be imported into a number of applications. Basically a giant xml representation of lines and endpoints. This can be directly converted to XAML, if you're into a more programmatic way of charting(ie, C# w/ Silverlight).
However, to address the second part regarding MS Office, Visio can import SVG files for manipulation as well. I'm unsure what type of graphs you were looking for, but I hope this can assist in some small way on your path to awesomeness ;)

Resources