Code to do a direct DNS lookup - dns

I'm thinking of running an experiment to track DNS values in different ways (like how often they change and whatnot). To do this I will need to be able to make a DNS request directly to a server so that 1) I known what server it came from, 2) I can request responses from several servers and 3) I can avoid the local OS run cache.
Does anyone know of a library (c#, D, C, C++ in that order of preference) that will let me directly query a DNS server? Failing that, does anyone know of a easy to understand description of the DNS protocol that I could implement such a system from?

I have experience only with C, so here is my list:
libresolv is the old, traditional and standard way. It is available on every Unix (type man 3 resolver) and includes routines like
res_query which does more or less what you want. To query a specific name server, you typically update the global variable _res.nsaddr_list (do note that, apparently, it does not work with IPv6).
ldns is the modern and shiny solution. You have good documentation online.
a very common library, but apparently unmaintained, is adns.

For C, I'd go with http://cr.yp.to/djbdns/blurb/library.html (the low-level parts if you need total control, i.e. dns_transmit* and friends) -- for C#, maybe http://www.c-sharpcorner.com/UploadFile/ivxivx/DNSClient12122005234612PM/DNSClient.aspx (can't test that one right now, whence the "maybe"!).

The DNS specification is spread over many RFC (see a nice graph) and I would strongly advise not to implement a stub resolver from scratch. There are many opportunities to get it wrong. The DNS evolved a lot in the last years. If you are brave and crazy, here are the most important RFC:
RFC 1034, concepts
RFC 1035, format
RFC 2181, update to the specification, to fix many errors or ambiguities
RFC 2671, EDNS (mandatory today)
RFC 3597, handling the unknown resource record types
and many others...

libdns (I think it's part of bind). There's a cygwin port which may be useful for windows environments.
http://rpm2html.osmirror.nl/libdns.so.21.html

Related

Machine learning and Security

I would like to ask you if it is possible to secure a server with AI/machine learning based on the following concepts:
1) the server is implemented in a way to recognize a normal behavior(authorized access, modification, ...) .
2) the server must recognize any abnormal behavior and adapt to it if encountered.
3) if an abnormal behavior is caught, it checks in some kind of pre-known threat list what type of threat it is and a possible solution for it ELSE it adapts "by itself" and perform changes based on what the normal behavior must be.
PS: If there already is a system similar to this one please let me know.
Thank you for your help!
Current IDS/IPS systems for applications ("web application firewalls") are in part similar to this (the other part is usually plain pattern matching to find common or known attacks or attack classes). First you switch a WAF to "learning mode", it listens to traffic and stores patterns as normal behavior. Then you switch it to "prevention mode" and it stops any traffic that is out of the ordinary flow.
The key is what aspects of the dataflows they listen to and learn to try and find anomalies. Basically a WAF would look at http queries to pages, learn parameter types and length, maybe clients as well, and in prevention mode it would not allow a type or length mismatch (any request not matching the learned values would be stopped on the WAF).
There are obvious drawbacks to this, the learning phase can never be long enough, learnt rules will either be too generic or too specific, manual setup is tedious for a large application, etc.
Taking it to a more generic level would be very (very) difficult. Maybe with a deep neural network (so popular nowadays) you could better approximate a "real" AI that actually learns good and bad traffic patterns. Two obvious problems are getting patterns to teach it (how will you provide good and bad traffic examples in excessive amounts so that it can actually learn the difference) and operational cost (running such a deep neural network would be very expensive, probably way more than a typical application breach would cost - defenses should be proportionate to the risk).
Having said that, I think it's not impossible, but it will take a few years until we get there.
The general idea is interesting and there is a lot of research on this topic currently: https://github.com/Limmen/awesome-rl-for-cybersecurity
But it's still quite far from being mature enough to use in practical settings.

Securely running user's code

I am looking to create an AI environment where users can submit their own code for the AI and let them compete. The language could be anything, but something easy to learn like JavaScript or Python is preferred.
Basically I see three options with a couple of variants:
Make my own language, e.g. a JavaScript clone with only very basic features like variables, loops, conditionals, arrays, etc. This is a lot of work if I want to properly implement common language features.
1.1 Take an existing language and strip it to its core. Just remove lots of features from, say, Python until there is nothing left but the above (variables, conditionals, etc.). Still a lot of work, especially if I want to keep up to date with upstream (though I just could also just ignore upstream).
Use a language's built-in features to lock it down. I know from PHP that you can disable functions and searching around, similar solutions seem to exist for Python (with lots and lots of caveats). For this I'd need to have a good understanding of all the language's features and not miss anything.
2.1. Make a preprocessor that rejects code with dangerous stuff (preferably whitelist based). Similar to option 1, except that I only have to implement the parser and not implement all features: the preprocessor has to understand the language so that you can have variables named "eval" but not call the function named "eval". Still a lot of work, but more manageable than option 1.
2.2. Run the code in a very locked-down environment. Chroot, no unnecessary permissions... perhaps in a virtual machine or container. Something in that sense. I'd have to research how to achieve this and how to make it give me the results in a secure way, but that seems doable.
Manually read through all code. Doable on a small scale or with moderators, though still tedious and error-prone (I might miss stuff like if (user.id = 0)).
The way I imagine 2.2 to work is like this: run both AIs in a virtual machine (or something) and constrain it to communicate with the host machine only (no other Internet or LAN access). Both AIs run in a separate machine and communicate with each other (well, with the playing field, and thereby they see each other's positions) through an API running on the host.
Option 2.2 seems the most doable, but also relatively hacky... I let someone's code loose in a virtualized or locked down environment, hoping that that'll keep them in while giving them free game to DoS or break out of the environment. Then again, most other options are not much better.
TL;DR: in essence my question is: how do I let people give me 'logic' for an AI (which I think is most easily done using code) and then run that without compromising the functionality of the system? There must be at least 2 AIs working on the same playing field.
This is really just a plugin system, so researching how others implement plugins is a good starting point. In particular, I'd look at web browsers like Chrome and Safari and their plugin systems.
A common theme in modern plugins systems is process isolation. Ideally you should run the plugin in its own process space in a sandbox. In OS X look at XPC, which is designed explicitly for this problem. On Linux (or more portably), I would probably look at NaCl (Native Client). The JVM is also designed to provide sandboxing, and offers a rich selection of languages. (That said, I don't personally consider the JVM a very strong sandbox. It's had a history of security problems.)
In general, my preference on these kinds of projects is a language-agnostic API. I most often use REST APIs (or "REST-like"). This allows the plugin to be highly restricted, while not restricting the language choice. I like simple HTTP for communications whenever possible because it has rich support in numerous languages, so it puts little restriction on the plugin. In fact, given your description, you wouldn't even have to run the plugin on your hardware (and certainly not on the main server). Making the plugins remote clients removes many potential concerns.
But ultimately, I think something like your "2.2" is the right direction.

Using built in functions

I am developing a Windows Form Application in C#.I have heard that one should not use built in methods and functions in code since hackers have deep understanding of such built in methods and know how to fail them Instead one should always use his/her own functions and methods and if not then call built in functions intelligently from those newly made functions.How much is that true?
A supporting example in favour of my argument is that I have seen developer always develope there own made encryption algorithm like AES,DES,RC4 and Hash functions since they believe that built in encryption algorithm have many times backdoor in them.
What?! No, no, no! Whoever told you this is just wrong.
There is a common fallacy that published source code is more vulnerable to "h4ckerz" because it is available for anyone to spot the flaws in. However, I'm glad you mentioned crypto, because this is an area where this line of reasoning really stands out as the fallacy it is.
One of the most popular questions of all time on https://security.stackexchange.com/ is about a developer (in the OP he was given the pseudonym "Dave") who shared this fear of published code. Dave, like the developer you saw, was trying to homebrew his own encryption algorithm. Here's one of the most popular comments in that thread:
Dave has a fundamentally false premise, that the security of an algorithm relies on (even partially) its obscurity - that's not the case. The security of a hashing algorithm relies on the limits of our understanding of mathematics, and, to a lesser extent, the hardware ability to brute-force it. Once Dave accepts this reality (and it really is reality, read the Wikipedia article on hashing), it's a question of who is smarter - Dave by himself, or a large group of specialists devoted to this very particular problem. (emphasis added)
As a matter of fact, as it stands now, the top two memes on Security.SE are "Don't roll your own" and "Don't be a Dave".
While this has all been about crypto, this applies in general to most open-source software. The chance that a backdoor will get found and fixed goes up with each new set of eyes laid on the code. This should be a simple and uncontroversial premise: the more people are looking for something, the higher the chance it will be found. Yes, this applies to malicious users looking for exploits. However, it also applies to power users, white hat hackers, security researchers, cryptographers, professional developers, and others working for "good", which generally (hopefully) outnumber those working for "evil". This also implicitly relies on the false premise that hackers need to see the source code to do bad things. This should be obviously false based on the sheer number of proprietary systems whose source code has never been published (various Microsoft and Adobe programs come to mind) which have been inundated with vulnerabilities for years. Maybe having source code to read makes the hacker's job easier, but maybe not -- is it easier to pore over source code looking for an attack vector or to just use scanning tools and scripts against a compiled binary?
tl;dr Don't be a Dave. Rolling your own means you have to be the best at everything to succeed, instead of taking a sampling of the best the community has to offer.
Heartbleed
In your comment, you rebut:
Then why was the Heartbleed bug in openSSL not found and corrected [earlier]?
Because no one was looking at it. That's the sad truth. Here's the difference -- what happened once someone did find it? Now tens of thousands of security researchers, crypto experts, and others are looking at it. Suppose the same kind of vulnerability existed in one of the proprietary products I mentioned earlier, which it very well could. Once it's caught (if it's caught), ask yourself:
Could the team of programmers at the company responsible benefit from the help of the entire worldwide community of security experts, cryptographers, and other analysts right now?
If a bug this critical were discovered (and that's a big if!) in your software, would you be prepared to deal with the fallout caused by your custom implementation?
Unless you know of specific failure modes or weaknesses of the built-in methods your application would use and know how to minimize or eliminate them, it is probably better to use the methods provided by the language or library designers, which will often be both more efficient and more secure than what an average programmer would come up with on the fly for a particular project.
Your example absolutely does not support your view: developing your own encryption algorithm without some serious background in the domain and review by cryptanalysts, and then employing it in security-critical code, is a recipe for disaster. Even developing your own custom implementation of an industry standard encryption algorithm can present problems, and almost certainly will if you are inexperienced at it.

Where am I? (Geolocation, Emacs, Perl) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
This is one of those "surely there is some generic utility that is
better than the quick and dirty thing I have whipped up" questions.
As in, I know how to do this in several ad-hoc ways, but would love to
use a standard.
BRIEF: I am looking for reasonably standard and ubiquitous tools to
determine my present geographical location. Callable from Linux/UNIX
command line, Perl, Emacs, etc.
DETAIL:
A trivial situation inspires this question (but there are undoubtedly
more important applications): I use emacs org-mode, often to record a
log or diary. I don't actually use the official org-mode diary much -
mainly, I drop timestamps in an ordinary org-mode log, hidden in
metadata that looks like a link.
[[metadata: timestamp="<2014-01-04 15:02:35 EST, Saturday, January 4, WW01>" <location location="??" timestamp="??"/>][03:02 PM]]
As you can see, I long ago added the ability to RECORD my
location. But hitherto I have had to set it manually. I am lazy, and
often neglect to set the location manually. (Minor note: I recorded
the last time I manually set the location, helpful when I move and
neglect to manually change my location.
I would much prefer to have code that automatically infers my
location. Particularly since I have been travelling quite a bit in the
last month, but probably more useful for the half-dozen or so
locations I move between on a daily basis: home, work, oceanside, the
standard restaurants I eat working lunch or breakfast in.
I can figure my location out using any of several tools, such as
Where Am I - See your Current Location on Google Maps - ctrlq.org/maps/where/
http://www.wolframalpha.com/input/?i=Where+am+I%3F
Perl CPAN packages such as IP::Location - to map an IP address to a location
note: doesn't necessarily work for a private IP address, behind NAT
but can combine with traceroute
and heuristics such as looking at WiFi SSIDs, etc.
I have already coded something up.
But... there's more depth to this than I have coded.
None of the
techniques above is perfect - e.g. I may not have net.connectivity,
etc. Some are OS specific.
If there is already some open source facility, I should use that.
Therefore my question: is there any reasonably ubiquitous geo location service?
My wishlist
Works cross OS
Cygwin
Linux
Android? OS-X? (just use OS standard)
e.g. tries to exec a command like Windows netsh, and if that fails...
Command line utility
Perl, etc.
callable in emacs
because that is where I want to use it
but I am sure that I would want to be able to use it in other places.
Can connect to widely available standard geolocation services
e.g. Perl CPAN IP::Location, IP->country/city/...
e.g. Google, etc., infer geographical location from browser
Works even when cannot connect to standard geolocation services, or the Internet
e.g. cache last location
e.g. ability to associate a name with a private network environment
e.g. if in a lab that is isolated from network
or at home, connected to WiFi, but broadband down
e.g. look at wifi SSID
Customizable
can use information that is NOT part of any ubiquitous geolocation database
e.g. I may recognize certain SSIDs as being my home or office.
Learned
Knows (or can learn) that some SSIDs are mobile, not geographically fixed (e.g. the mobile hotspot on my phone)
but some are (mainly) geographically fixed (e.g. WiFi at home connected to cable modem)
Learning
can override incorrect inferences (geo databases sometimes wrong, esp. VPN)
can extend or make precise
I wouldn't mind being able to write rules
but even better if some inference engine maintains the rules itself.
e.g. if I correct the location, make inferences about SSID coordinates used for the faulty inference
Heuristics
Windows 7 "netsh wlan show interfaces"
Windows / Cygwin ipconfig
*IX ifconfig
traceroute / tracert
reverse IP lookup
Caching
to avoid expensive lookups
but cache is NOT global - can be done per app
some apps may want to bypass the cache
others can use old data
GeoClue seems to satisfy at least some of your requirements.
To convert coordinates to human-readable address, one can use OSM Nominatim API.
Why didn't you just consider using GPS ? You could just add coordinates to your metadata and bind them to an address (going from simple numbers to an actual place) upon reading.
In this way almost anything can be tagged with coordinates.
In gnu/linux and other unices, gpds should do.
In windows, I have no idea.
In Android, the scripting layer for android should provide access to the gps device.
I am not sure this meets your requirements, but I'm just proposing.
You could use wget to pull data from one of those sites you mentioned, something like wget http://www.wolframalpha.com/input/?i=Where+am+I%3F and then find the data out of the file you just downloaded
Let me put it this way. You intend to track your location without using a positioning device such as gps. This is done based on your current geo location from your nearest network access point. The network access points are usually geo coded. I assume you are tracking your location in your laptop as it doesn't have a gps.
There must be a few frameworks out there to do this. Since you want it to be cross platform, I think a python based framework is your best option. You can also give google geo location a shot. There are a few api's built into html5 for geo location. I think you can coo-kup your own application and share it on opensource for everyone else to use.
For windows there are many commercial pc trackinga pps. All of them do a fine job at it.

implement imap search on server

I'm currently working on implementing IMAP protocol on our mail server. This is my first time implementing such a big project and I've so far coded a majority of IMAP commands in the RFC, except the Search command.
I've been searching on the internet and studied postfix algorithm for weeks to see how to write the search command correctly.
It seems Postfix would work until I encountered something like OR OR A B C D ==> (OR (OR A B) C) D
Could anyone point me a direction on how to implement the Search command when there are multiple ORs?
Thank you very much for any help you could provide.
This is not going to be an answer you are going to like, but I'll recommend this anyway -- don't do this. IMAP is an extremely complex protocol with a ton of non-obvious corner cases. The baseline version (RFC3501) also leaves many advanced features missing; in order to get reasonable performance, especially with mobile clients, you need to implement quite a few extensions.
If I were you, I would recommend integrating with an existing open-source IMAP server implementation. If you have a fancy storage backend, perhaps you can write a plugin for Dovecot or Cyrus.
If you decide to really reimplement this yourself and this is your first complex project, you will very likely end up with a product which is subtly broken in numerous ways. If your goal is to be able to add a "speaks IMAP" phrase to the sales brochure, well, it will work, but in practice, you will be solving interoperability problems in the next five years at least.

Resources