implement imap search on server - search

I'm currently working on implementing IMAP protocol on our mail server. This is my first time implementing such a big project and I've so far coded a majority of IMAP commands in the RFC, except the Search command.
I've been searching on the internet and studied postfix algorithm for weeks to see how to write the search command correctly.
It seems Postfix would work until I encountered something like OR OR A B C D ==> (OR (OR A B) C) D
Could anyone point me a direction on how to implement the Search command when there are multiple ORs?
Thank you very much for any help you could provide.

This is not going to be an answer you are going to like, but I'll recommend this anyway -- don't do this. IMAP is an extremely complex protocol with a ton of non-obvious corner cases. The baseline version (RFC3501) also leaves many advanced features missing; in order to get reasonable performance, especially with mobile clients, you need to implement quite a few extensions.
If I were you, I would recommend integrating with an existing open-source IMAP server implementation. If you have a fancy storage backend, perhaps you can write a plugin for Dovecot or Cyrus.
If you decide to really reimplement this yourself and this is your first complex project, you will very likely end up with a product which is subtly broken in numerous ways. If your goal is to be able to add a "speaks IMAP" phrase to the sales brochure, well, it will work, but in practice, you will be solving interoperability problems in the next five years at least.

Related

Xcode String Searching Algorithm

How does Xcode's autocompletion algorithm work?
I am always amazed by how fast Xcode picks up the code blocks I want to write. I have looked at some different string matching algorithms but none seems to be working as the one Apple uses in Xcode. I would find it quite interesting what type of algorithm they are using.
Thanks in advance.
The image above shows Xcode "predicting" that UTV should be UITableView
It's not just a simple string searching algorithm. It uses nearest code to the scope, last code you picked with the same shortcut, precompiled codes, codes in frameworks, your own defined codes, and many other stuff sorted by some intelligent definition. Since it's private by apple, we may not know how exactly they achieve this. But this year, they open sourced their LSP repository to bring support for these kind of stuff to other editors, even the VIM in terminal! You can investigate on that if you are interested.
Also there are some projects out there like TabNine witch is the all-language autocompleter. It uses machine learning to provide responsive, reliable, and relevant suggestions trained with over 2million github repositories. You can check that out too if you are interested.
Who knows what exactly programming and tech lead companies are currently using while we are looking for algorithms? Maybe a lot of machine learnings is included and only machines knows the exact algorithms.

Machine learning and Security

I would like to ask you if it is possible to secure a server with AI/machine learning based on the following concepts:
1) the server is implemented in a way to recognize a normal behavior(authorized access, modification, ...) .
2) the server must recognize any abnormal behavior and adapt to it if encountered.
3) if an abnormal behavior is caught, it checks in some kind of pre-known threat list what type of threat it is and a possible solution for it ELSE it adapts "by itself" and perform changes based on what the normal behavior must be.
PS: If there already is a system similar to this one please let me know.
Thank you for your help!
Current IDS/IPS systems for applications ("web application firewalls") are in part similar to this (the other part is usually plain pattern matching to find common or known attacks or attack classes). First you switch a WAF to "learning mode", it listens to traffic and stores patterns as normal behavior. Then you switch it to "prevention mode" and it stops any traffic that is out of the ordinary flow.
The key is what aspects of the dataflows they listen to and learn to try and find anomalies. Basically a WAF would look at http queries to pages, learn parameter types and length, maybe clients as well, and in prevention mode it would not allow a type or length mismatch (any request not matching the learned values would be stopped on the WAF).
There are obvious drawbacks to this, the learning phase can never be long enough, learnt rules will either be too generic or too specific, manual setup is tedious for a large application, etc.
Taking it to a more generic level would be very (very) difficult. Maybe with a deep neural network (so popular nowadays) you could better approximate a "real" AI that actually learns good and bad traffic patterns. Two obvious problems are getting patterns to teach it (how will you provide good and bad traffic examples in excessive amounts so that it can actually learn the difference) and operational cost (running such a deep neural network would be very expensive, probably way more than a typical application breach would cost - defenses should be proportionate to the risk).
Having said that, I think it's not impossible, but it will take a few years until we get there.
The general idea is interesting and there is a lot of research on this topic currently: https://github.com/Limmen/awesome-rl-for-cybersecurity
But it's still quite far from being mature enough to use in practical settings.

Securely running user's code

I am looking to create an AI environment where users can submit their own code for the AI and let them compete. The language could be anything, but something easy to learn like JavaScript or Python is preferred.
Basically I see three options with a couple of variants:
Make my own language, e.g. a JavaScript clone with only very basic features like variables, loops, conditionals, arrays, etc. This is a lot of work if I want to properly implement common language features.
1.1 Take an existing language and strip it to its core. Just remove lots of features from, say, Python until there is nothing left but the above (variables, conditionals, etc.). Still a lot of work, especially if I want to keep up to date with upstream (though I just could also just ignore upstream).
Use a language's built-in features to lock it down. I know from PHP that you can disable functions and searching around, similar solutions seem to exist for Python (with lots and lots of caveats). For this I'd need to have a good understanding of all the language's features and not miss anything.
2.1. Make a preprocessor that rejects code with dangerous stuff (preferably whitelist based). Similar to option 1, except that I only have to implement the parser and not implement all features: the preprocessor has to understand the language so that you can have variables named "eval" but not call the function named "eval". Still a lot of work, but more manageable than option 1.
2.2. Run the code in a very locked-down environment. Chroot, no unnecessary permissions... perhaps in a virtual machine or container. Something in that sense. I'd have to research how to achieve this and how to make it give me the results in a secure way, but that seems doable.
Manually read through all code. Doable on a small scale or with moderators, though still tedious and error-prone (I might miss stuff like if (user.id = 0)).
The way I imagine 2.2 to work is like this: run both AIs in a virtual machine (or something) and constrain it to communicate with the host machine only (no other Internet or LAN access). Both AIs run in a separate machine and communicate with each other (well, with the playing field, and thereby they see each other's positions) through an API running on the host.
Option 2.2 seems the most doable, but also relatively hacky... I let someone's code loose in a virtualized or locked down environment, hoping that that'll keep them in while giving them free game to DoS or break out of the environment. Then again, most other options are not much better.
TL;DR: in essence my question is: how do I let people give me 'logic' for an AI (which I think is most easily done using code) and then run that without compromising the functionality of the system? There must be at least 2 AIs working on the same playing field.
This is really just a plugin system, so researching how others implement plugins is a good starting point. In particular, I'd look at web browsers like Chrome and Safari and their plugin systems.
A common theme in modern plugins systems is process isolation. Ideally you should run the plugin in its own process space in a sandbox. In OS X look at XPC, which is designed explicitly for this problem. On Linux (or more portably), I would probably look at NaCl (Native Client). The JVM is also designed to provide sandboxing, and offers a rich selection of languages. (That said, I don't personally consider the JVM a very strong sandbox. It's had a history of security problems.)
In general, my preference on these kinds of projects is a language-agnostic API. I most often use REST APIs (or "REST-like"). This allows the plugin to be highly restricted, while not restricting the language choice. I like simple HTTP for communications whenever possible because it has rich support in numerous languages, so it puts little restriction on the plugin. In fact, given your description, you wouldn't even have to run the plugin on your hardware (and certainly not on the main server). Making the plugins remote clients removes many potential concerns.
But ultimately, I think something like your "2.2" is the right direction.

language popularity figures (C++, C#, Java, PHP, flash script, etc.)

I need to find figures that show how many programmers world wide, has each of the following languages as their primary programming language.
C
C++
C#
Object-C
Java
JavaScript
VB.NET
VB6 (or older)
VBA
PHP
flash scripts
Ruby
Does anyone know of such comparison figures?
If not. Do you know of a good way to research this?
I could compare the number of tags here at stackoverflow and the number of articles for each language at sites like codeproject. This would give me a good idea.
But if you can suggest other ideas how to find these numbers I will be greatfull.
/Thomas
A very common site that does this is the TIOBE index. It basically searches for programming languages in major search engines and compares the results, and it shows you some history. The only problem is that C/C++/C# are not distinguished very well, therefore C is more dominant than you'd expect (not to mention that search results include many pages where many languages are listed, like programming FAQs). But in general, TIOBE gives a good idea, I think, and it should get better, since at least Google tends to know the difference between zero, two or four pluses.
Have you tried TIOBE index?
In general this is hard to measure because every approach has a lot of drawbacks.
TIOBE and others that are based on search results e.g. do not tell anything of what is actually used but just what is highly ranked by google (You can even see that just Google changing a bit of their results in 2004/2005 completely mixed TIOBE). And moreover they have the problem that lots of search-terms are ambiguous (Like Java which IS also an island, Ruby which also exists as gem, Python which is a snake and others which have alternative meaning). Another problem with search based is that most things put into the web stay up forever which means it is irrelevant if it is CURRENTLY interesting. If a C resource was put up in 2002 it likely still is available today (which hugely overrates leading or older languages.)
Here one is an interesting approach based on the number of book sales. (This at least eliminates the ambigous problem, but comes with others.)
Wikipedia also has a small article about the topic.
Try Google trends (see an example). In addition, check sites like freshmeat.net and note the number of projects in each language. That's only open source projects and many people will use a different language for their hobby projects than at work (i.e. one that sucks less).
Next, look for sites which offer job openings. I don't have a good link handy but this Google query should get your started.
not yet!!!!!!!
That's only open source projects and many people will use a different language for their hobby projects than at work (i.e. one that sucks less).
Next, look for sites which offer job openings. I don't have a good link handy but this Google query should get your started.

Effective strategies for studying frameworks/ libraries partially [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I remember the old effective approach of studying a new framework. It was always the best way to read a good book on the subject, say MFC. When I tried to skip a lot of material to speed up coding it turned out later that it would be quicker to read the whole book first. There was no good ways to study a framework in small parts. Or at least I did not see them then.
The last years a lot of new things happened: improved search results from Google, programming blogs, much more people involved in Internet discussions, a lot of open source frameworks.
Right now when we write software we much often depend on third-party (usually open source) frameworks/ libraries. And a lot of times we need to know only a small amount of their functionality to use them. It's just about finding the simplest way of using a small subset of the library without unnecessary pessimizations.
What do you do to study as less as possible of the framework and still use it effectively?
For example, suppose you need to index a set of documents with Lucene. And you need to highlight search snippets. You don't care about stemmers, storing the index in one file vs. multiple files, fuzzy queries and a lot of other stuff that is going to occupy your brain if you study Lucene in depth.
So what are your strategies, approaches, tricks to save your time?
I will enumerate what I would do, though I feel that my process can be improved.
Search "lucene tutorial", "lucene highlight example" and so on. Try to estimate trust score of unofficial articles ( blog posts ) based on publishing date, the number and the tone of the comments. If there is no a definite answer - collect new search keywords and links on the target.
Search for really quick tutorials/ newbie guides on official site
Estimate how valuable are javadocs for a newbie. (Read Lucene highlight package summary)
Search for simple examples that come with a library, related to what you need. ( Study "src/demo/org/apache/lucene/demo")
Ask about "simple Lucene search highlighting example" in Lucene mail list. You can get no answer or even get a bad reputation if you ask a silly question. And often you don't know whether you question is silly because you have not studied the framework in depth.
Ask it on Stackoverflow or other QA service "could you give me a working example of search keywords highlighting in Lucene". However this question is very specific and can gain no answers or a bad score.
Estimate how easy to get the answer from the framework code if it's open sourced.
What are your study/ search routes? Write them in priority order if possible.
I use a three phase technique for evaluating APIs.
1) Discovery - In this phase I search StackOverflow, CodeProject, Google and Newsgroups with as many different combination of search phrases as possible and add everything that might fit my needs into a huge list.
2) Filter/Sort - For each item I found in my gathering phase I try to find out if it suits my needs. To do this I jump right into the API documentation and make sure it has all of the features I need. The results of this go into a weighted list with the best solutions at the top and all of the cruft filtered out.
3) Prototype - I take the top few contenders and try to do a small implementation hitting all of the important features. Whatever fits the project best here wins. If for some reason an issue comes up with the best choice during implementation, it's possible to fall back on other implementations.
Of course, a huge number of factors go into choosing the best API for the project. Some important ones:
How much will this increase the size of my distribution?
How well does the API fit with the style of my existing code?
Does it have high quality/any documentation?
Is it used by a lot of people?
How active is the community?
How active is the development team?
How responsive is the development team to bug patch requests?
Will the development team accept my patches?
Can I extend it to fit my needs?
How expensive will it be to implement overall?
... And of course many more. It's all very project dependent.
As to saving time, I would say trying to save too much here will just come back to bite you later. The time put into selecting a good library is at least as important as the time spent implementing it. Also, think down the road, in six months would you rather be happily coding or would you rather be arguing with a xenophobic dev team :). Spending a couple of extra days now doing a thorough evaluation of your choices can save a lot of pain later.
The answer to your question depends on where you fall on the continuum of generality/specificity. Do you want to solve an immediate problem? Are you looking to develop a deep understanding of the library? Chances are you’re somewhere between those extremes. Jeff Atwood has a post about how programmers move between these levels, based on their need.
When first getting started, read something on the high-level design of the framework or library (or language, or whatever technology it is), preferably by one of the designers. Try to determine what problems they are trying to address, what the organizing principles behind the design are, and what the central features are. This will form the conceptual framework from which future understanding will hang.
Now jump in to it. Create something. Do not copy and paste somebody's code. Instead, when things don’t work, read the error messages in detail, and the help on those error messages, and figure out why that error occurred. It can be frustrating, when things don’t work, but it forces you to think, and that’s when you learn.
1) Search Google for my task
2) look at examples with a few different libraries, no need to tie myself down to Lucene for example, if I don't know what other options I have.
3) Look at the date of last update on the main page, if it hasn't been updated in 6-months leave (with some exceptions)
4) Search for sample task with library (don't read tutorials yet)
5) Can I understand what's going on without a tutorial? If yes continue if no start back at 1
6) Try to implement the task
7) Watch myself fail
8) Read a tutorial
9) Try to implement the task
10) Watch myself fail and ask on StackOverflow, or mail the authors, post on user group (if friendly looking)
11) If I could get the task done, I'll consider the framework worthy of study and read up the main tutorial for 2 hours (if it doesn't fit in 2 hours I just ignore what's left until I need it)
I have no recipe, in the sense of a set of steps I always follow, that's largely because everything I learn is different. Some things are radically new to me (Dojo for example, I have no fluency in Java script so that's a big task), some just enhancements of previous knowledge (Iknow EJB 2 well, so learning EJB 3 while on the surface is new with all its annotations, its building on concepts.)
My general strategy though is I'd describe as "Spiral and Park". I try to circle the landscape first, understand the general shape, I Park concepts that I don't get just yet, don't let it worry me. Then i go a little deeper into some areas, but again try not to get obsessed with one, Spiralling down into the subject. Hopefully I start to unpark and understand, but also need to park more things.
Initially I want answers to questions such as:
What's it for?
Why would I use this rather than that other thing I already know
What's possible? Any interesting sweet spots. "Eg. ooh look at that nice AJAX-driven update"
I do a great deal of skim reading.
Then I want to do more exploring on the hows. I start to look for gotchas and good advice. (Eg. in java: why is "wibble".equals(var) a useful construct?)
Specific techniques and information sources:
Most important: doing! As early as possible I want to work a tutorial or two. I probably have to get the first circuit of the spiral done, but then I want to touch and experiment.
Overview documents
Product documents
Forums and discussion groups, learning by answering questions is my favourite technique.
if at all possible I try to find gurus. I'm fortunate in having in my immediate colleagues a wealth of knowledge and experience.
Quick-start guides.
A quick look at the API documentation if available.
Reading sample codes.
Messing around YOU HAVE TO MESS AROUND (sorry for the caps).
If it's a small library/API with a small or no community you can always contact the developer himself and ask for help 'cause he'll probably be more than happy to help you; he's happy that one more person is using his API.
Mailing lists are a great resource as long as you do your homework first before asking questions.
Mailing list archives are invaluable for most of the questions I've had on CoreAudio related stuff.
I would never read javadoc. As there often is none. And when there is, most likely it isnt up to date. So one gets confused at the best.
Start with the simplest possible tutorial you find within some minutes.
Often the tutorial will lead you to further sources at the end, so then most of the time one is on a path that goes on and on, deeper and deeper.
It really depends on what the topic is and how much info is on it. Learning by example is a good way to start a topic brand new to you, especially if you're knowledgeable in other similar libraries or languages. You can take a topic you're familiar with, and say "I understand how to implement using X, lets see how it's done using Y".
So what are your strategies, approaches, tricks to save your time?
Well, I search. I generally never ask questions, preferring to research myself. If worse comes to worse I'll read the documentation. In some cases (say, when I was doing some work with SharpSVN) I had to look at the source, specifically the test cases, to get some information about how the API worked.
Generally, I have to be honest, most of my 'study' and 'learning' is by accident.
For example, just a few seconds ago, I discovered how to get the "Recent" folder in C#. I had no idea how to do that before seeing the question, considering it interesting, and then searching.
So for me the real 'trick' is that I hang around on forums, answer questions, and accidentally pick up knowledge. Then when it comes time for me to research something; chances are I know a bit about it, and searching is easier and I can focus on the implementation [typically implementing a test program first] and progressing from there.

Resources