Is there an easy way to detect user region using IP address? I'd assume that each region will have a specific IPv4 range assigned.
I don't want to rely on any 3rd party service (except initial data import)
Precision does not need to be 100%, but should be reasonably high (at least ~80%)
I only want to guess users region (Europe, Asia, Africa, ...). No need for city/country.
Not sure what language you're using. But you can do it with GeoIP or similar modules. If you dont want any 3rd party libraries at all you can make a function based on IANA numbers https://www.iana.org/number. Especially look under "IPv4 Address Space".
Related
I am writing an application to query the DNS SRV record to find out an internal service for a domain obtained from the email address. Is it correct to do the following.
Lets say the email domain is test.example.com
Query SRV record _service._tcp.test.example.com
No SRV record is returned
Now query SRV record _service._tcp.example.com
A record is returned. Hence use this record to connect
Is the above approach right? Assuming its not, are there any RFCs or standards that prevents an application from doing it?
Is the above approach right?
No, it is not. You should not "climb" to the root.
There is nothing explicitly telling you not to do that in RFCs and you will even find some specifications telling you to climb to the root, see CAA specifications (but they had to be changed over the year because of some unclarity exactly around the part about climbing to the root).
Most of the time, such climbing creates more problems than solution, and it all come from "finding the administrative boundaries" which looks far more simple than what it is really.
If we go back to you example, you say, use _service._tcp.test.example.com and then _service._tcp.example.com and then I suppose you stay there, because you "obviously" know that you shouldn't go to _service._tcp.com as next step, because you "know" that example.com and com are not under the same administrative boundaries, so you shouldn't cross that limit.
Ok, yes, in that specific example (and TLD) things seem simple. But imagine an arbitrary name, let us say www.admin.santé.gouv.fr, how do you know where to stop climbing?
It is a difficult problem in all generality. Attempts were made to solve it (see IETF DBOUND working group) and failed. You have only basically two venues if you need to pursue: either find delegations (zone cuts) by DNS calls (not all delegations are new administrative boundaries, but a change of administration should mean a delegation; and obviously there is not necessarily a delegation at each dot, so you can not find all of this by just looking at the string, you need to do live DNS queries) OR using Mozilla Public Suffix List, which has a lot of drawbacks.
This is all basically a rehash of what you can read in "§4. Zone Boundaries are Invisible to Applications" of RFC5507, quoting the core part here:
The false assumption has lead to an approach called "tree climbing",
where a query that does not receive a positive response (either the
requested RRSet was missing or the name did not exist) is retried by
repeatedly stripping off the leftmost label (climbing towards the
root) until the root domain is reached. Sometimes these proposals
try to avoid the query for the root or the TLD level, but still this
approach has severe drawbacks:
[..]
o For reasons similar to those outlined in RFC 1535 [RFC1535],
querying for information in a domain outside the control of the
intended entity may lead to incorrect results and may also put
security at risk. Finding the exact policy boundary is impossible
without an explicit marker, which does not exist at present. At
best, software can detect zone boundaries (e.g., by looking for
SOA Resource Records), but some TLD registries register names
starting at the second level (e.g., CO.UK), and there are various
other "registry" types at second, third, or other level domains
that cannot be identified as such without policy knowledge
external to the DNS.
Note indeed also the example given for MX because a naive view you apply the same algorithm there, but as the RFC says:
To restate, the zone boundary is purely a boundary that exists in the
DNS for administrative purposes, and applications should be careful
not to draw unwarranted conclusions from zone boundaries. A
different way of stating this is that the DNS does not support
inheritance, e.g., an MX RRSet for a TLD will not be valid for any
subdomain of that particular TLD.
There are various examples of people having tried to climb to the root... and creating a lot of problems:
in the past, Microsoft and wpad.dat: https://news.softpedia.com/news/wpad-protocol-bug-puts-windows-users-at-risk-504443.shtml
more recently, Microsoft again about email autodiscover: https://www.zdnet.com/article/design-flaw-in-microsoft-autodiscover-abused-to-leak-windows-domain-credentials/
So, in short, without a solid understanding of DNS, please do not create anything "climbing" to the root. Do note that RFC2782 about SRV gives "Usage Rules" without a case of climbing to the root.
You are not explaining fully why you are thinking about this. I suggest you have a look at the newest HTTPS/SVCB DNS records (RFCs not published yet, but RR type codepoint assigned by IANA already, and in use by Apple, Cloudflare and Google already), as they may provide similar features set as SRV but may be more relevant for your use case.
I'm using the Google Places API for my project and I found that different places will have different localized names in their address components.
For instance:
some places in Lisbon (Portugal) will have a locality name of Lisbon while others will have Lisboa.
some places in Barcelona will have a administrative_area_level_1 name of Catalonia, while others will have Catalunya.
My questions are:
Is there a way to get consistent results using a same reference language?
Is there a way to help Google fix this inconsistent behavior?
ps: my purpose it to be able to perform text-based search from Google Places API data, and these localization differences are not helping.
This may be working as intended, or an inconsistency in how some places have their address registered on Google Maps.
It is the intended behavior that the local language (Lisbon, Cataluña) is used for street-level addresses, while the user's preferred language is used for the other places (postal codes and political entites). Reverse geocoding while preferring non-local language shows this:
https://maps.googleapis.com/maps/api/geocode/json?&latlng=38.716322,-9.149895&language=en
street_address: R. Cecílio de Sousa 84, 1200-009 Lisboa, Portugal
locality: Lisbon, Portugal
https://maps.googleapis.com/maps/api/geocode/json?&latlng=41.738528,1.851196&language=en
street_address: C-16C, 2, 08243 Manresa, Barcelona (Cataluña), Spain
postal_code: 08243 Manresa, Barcelona (Catalunya), Spain
(Cataluña/Catalunya/Catalonia is not usually in formatted_address)
However, there may be addresses that were registered without correctly linking to the appropriate political entities, e.g. using "Catalonia" as a hard-coded address component instead of as a reference to the administrative_area_level_1 itself. These would appear with inconsistent names, even for street-level addresses. Such should be rare, but please consider filing a bug when found.
Situation:
I have been tasked with geocoding and plotting addresses to a map of a city for a friend of the family.
I have a list of over 400 addresses. Included in those addresses are PO Boxes and addresses with Street Number, Direction, Street Name, Street Suffix (some do not have this), City, and Zip Code.
I tried to geocode all of the addresses with Geopy and Nominatim.
Issue:
I noticed that those addresses without street suffixes and PO Boxes could not be geocoded.
What I have done:
I have read most posts dealing with addresses, read the Geopy notes and google searched until the cows came home.
I ended up stumbling across a geocoding website that PO boxes could not be mapped and that street suffix is required for mapping.
http://www.gis.harvard.edu/services/blog/geocoding-best-practices
Question:
Is there a way to search for the street suffix of each street that is missing a street suffix?
Is there another free service or library that can be utilized other than Nominatim and Geopy that can utilize the information I have and not require me to look up each individual street suffix in google maps?
Please advise!
I found out that using Geopy with Google's API can find the correct addresses that services like Nominatim, OpenCage and OpenMapquest will not fine.
There is one downside, the autocomplete can make it hard to determine if the address is the correct address.
First, speaking to the need to find an address that is missing a street suffix, you need to use address completion from an address validation service. Services that do address validation/verification use postal service data (and other data) and match address search inputs to real addresses. If the search input is not sufficiently specific, address validation services may return a handful of potential matches. Here is an example of a non-specific address (missing the State, zip code, and the street suffix) that returns two real addresses that match the search input. SmartyStreets can normally fill in the missing street suffix.
Second, speaking to the PO Box problem: some address services can give you geocode information, as well as other information that you may believe isn't available. For instance, this search shows the SmartyStreets service matching a PO Box number (that I just made up) to the local post office. The latitude and longitude in the response JSON corresponds to the post office when I search it on Google Maps.
Third, speaking to the problem of having a list of addresses: there are various address services that allow batch processes. For instance, it's a fairly common feature to allow a user to upload a spreadsheet of addresses. Here is the information page for SmartyStreets' tool.
There are multiple address services that can help you do all or some of these things. Depending on the service, they will provide some free functionality or have free tiers if you don't do very many searches. I am not aware of a service that does everything you need for free. You could probably use a few services together, like the Google Maps API to Geopy, etc, but it would take effort to code up a script to put them all together.
Full disclosure: I worked for SmartyStreets.
I am planning to build a service based on subdomain like wordpress or tumblr.
I want to know what is the maximum number of subdomains a domain can have.
Well. The absolute theoretical maximum is 2^504, but that assumes no limitations on the octets making up the names. If you want the names limited to ASCII letters and digits, the answer is 111444219848545291112918149658401217019177846881717006276548100629318214534968256903948922840416256 (that is, 36^63).
In another sense the answer is "Far, far more than you will ever need".
This would be more dependant on the DNS server than on the standard. BIND allows for a maximum of 16'777'216 objects per zone file while Microsoft DNS is reported to be stable up to 20'000 objects per zone. This does not mean however that you will be able to max out the DNS object limits as your average website owner is going to want to have a meaningful object name for their sites sub domain, additionally the maximum character count for fully qualified domain names is 255 characters with no individual segment (between dots) being longer than 63 characters as per the DNS specifications.
Effectively what this means is that while there are restrictions and limitations the practical answer is that you are unlikely to encounter limitations due to DNS specifications in any reasonable timeframe.
I have a column which is made up of addresses as show below.
Address
1 Reid Street, Manchester, M1 2DF
12 Borough Road, London, E12,2FH
15 Jones Street, Newcastle, Tyne & Wear, NE1 3DN
etc .. etc....
I am wanting to split this into different columns to import into my SQL database. I have been trying to use Findstring to seperate by the comma but am having trouble when some addresses have more "sections" than others. ANy ideas whats the best way to go about this?
Many THanks
This is a requirements specification problem, not an implementation problem. The more you can afford to assume about the format of the addresses, the more detailed parsing you will be able to do; the other side of the same coin is that the less you will assume about the structure of the address, the fewer incorrect parses you will be blamed for.
It is crucial to determine whether you will only need to process UK postal emails, or whether worldwide addresses may occur.
Based on your examples, certain parts of the address seem to be always present, but please check this resource to determine whether they are really required in all UK email addresses.
If you find a match between the depth of parsing that you need, and the assumptions that you can safely make, you should be able to keep parsing by comma indexes (FINDSTRING); determine some components starting from the left, and some starting from the right of the string; and keep all that remains as an unparsed body.
It may also well happen that you will find that your current task is a mission impossible, especially in connection with international postal addresses. This is why most websites and other data collectors require the entry of postal address in an already parsed form by the user.
Excellent points raised by Hanika. Some of your parsing will depend on what your target destination looks like. As an ignorant yank, based on Hanika's link, I'd think your output would look something like
Addressee
Organisation
BuildingName
BuildingAddress
Locality
PostTown
Postcode
BasicsMet (boolean indicating whether minimum criteria for a good address has been met.)
In the US, just because an address could not be properly CASSed doesn't mean it couldn't be delivered - cip, my grandparent-in-laws live in enough small town that specifying their name and city is sufficient for delivery as local postal officials know who they are. For bulk mailings though, their address would not qualify for the bulk mailing rate and would default to first class mailing. I assume a similar scenario exists for UK mail
The general idea is for each row flowing through, you'll want to do your best to parse the data out into those buckets. The optimal solution for getting it "right" is to change the data entry method to validate and capture data into those discrete buckets. Since optimal never happens, it becomes your task to sort through the dross to find your gold.
Whilst you can write some fantastic expressions with FINDSTRING, I'd advise against it in this case as maintenance alone will drive you mad. Instead, add a Script Transformation and build your parsing logic in .NET (vb or c#). There will then be a cycle of running data through your transformation and having someone eyeball the results. If you find a new scenario, you go back and adjust your business rules. It's ugly, it's iterative and it's prone to producing results that a human wouldn't have.
Alternatives to rolling your address standardisation logic
buy it. Eventually your business needs outpace your ability to cope with constantly changing business rules. There are plenty of vendors out there but I'm only familiar with US based ones
upgrade to SQL Server 2012 to use DQS (Data Quality Services). You'll probably still need to buy a product to build out your knowledge base but you could offload the business rule making task to a domain expert ("Hey you, you make peanuts an hour. Make sure all the addresses coming out of this look like addresses" was how they covered this in the beginning of one of my jobs).