Xcode String Searching Algorithm - string

How does Xcode's autocompletion algorithm work?
I am always amazed by how fast Xcode picks up the code blocks I want to write. I have looked at some different string matching algorithms but none seems to be working as the one Apple uses in Xcode. I would find it quite interesting what type of algorithm they are using.
Thanks in advance.
The image above shows Xcode "predicting" that UTV should be UITableView

It's not just a simple string searching algorithm. It uses nearest code to the scope, last code you picked with the same shortcut, precompiled codes, codes in frameworks, your own defined codes, and many other stuff sorted by some intelligent definition. Since it's private by apple, we may not know how exactly they achieve this. But this year, they open sourced their LSP repository to bring support for these kind of stuff to other editors, even the VIM in terminal! You can investigate on that if you are interested.
Also there are some projects out there like TabNine witch is the all-language autocompleter. It uses machine learning to provide responsive, reliable, and relevant suggestions trained with over 2million github repositories. You can check that out too if you are interested.
Who knows what exactly programming and tech lead companies are currently using while we are looking for algorithms? Maybe a lot of machine learnings is included and only machines knows the exact algorithms.

Related

Semantics based code search

We have a large number of repositories. We want to implement a semantics(functionality) based code search on those repositories. Right now, we already have implemented keyword based code search in which we crawled through all the repository files and indexed them using elasticsearch. But that doesn't solve our problem as some of the repositories are poorly commented and documented, thus searching for specific codes/libraries become difficult.
So my question is: Is there any opensource libraries or any previous work done in this field which could help us index the semantics of the repository files, so that searching the code becomes easy and this would also help us in re-usability of the codes. I have found some research papers like Semantic code browsing, Semantics-based code search etc. but were of no use as there was no actual implementation given. So can you please suggest some good libraries or projects which could help me in achieving the same.
P.S:-Moreover, companies like Koders, Google, cocycles.com etc. started their code search based on functionality. But most of them have shut down their operations without giving any proper feedback, can anyone please tell me what kind of difficulties they are facing.
not sure if this is what you're looking for, but I wrote https://github.com/google/zoekt , which uses ctags-based understanding of code to improve ranking.
Take a look at insight.io
It provides semantic search and browsing

Simple toolkits for emotion (sentiment) analysis (not using machine learning)

I am looking for a tool that can analyze the emotion of short texts. I searched for a week and I couldn't find a good one that is publicly available. The ideal tool is one that takes a short text as input and guesses the emotion. It is preferably a standalone application or library.
I don't need tools that is trained by texts. And although similar questions are asked before no satisfactory answers are got.
I searched the Internet and read some papers but I can't find a good tool I want. Currently I found SentiStrength, but the accuracy is not good. I am using emotional dictionaries right now. I felt that some syntax parsing may be necessary but it's too complex for me to build one. Furthermore, it's researched by some people and I don't want to reinvent the wheels. Does anyone know such publicly/research available software? I need a tool that doesn't need training before using.
Thanks in advance.
I think that you will not find a more accurate program than SentiStrength (or SoCal) for this task - other than machine learning methods in a specific narrow domain. If you have a lot (>1000) of hand-coded data for a specific domain then you might like to try a generic machine learning approach based on your data. If not, then I would stop looking for anything better ;)
Identifying entities and extracting precise information from short texts, let alone sentiment, is a very challenging problem specially with short text because of lack of context. Hovewer, there are few unsupervised approaches to extracting sentiments from texts mainly proposed by Turney (2000). Look at that and may be you can adopt the method of extracting sentiments based on adjectives in the short text for your use-case. It is hovewer important to note that this might require you to efficiently POSTag your short text accordingly.
Maybe EmoLib could be of help.

Guidance on broadening horizons

Let me setup my question with some info. I'm not in college yet and strictly a hobby programmer. Probably a little more than 2 years ago I got started programming on mac. I started with very simplistic GUI examples with Cocoa and XCode. Long story short, I learned from the top down, first learning objective-c, then venturing into more "low-level" projects where I became better at basic C and even used a few C++ libraries in my existing projects.
What I'm saying is that I've never really done anything outside of an XCode project and occasional iPhone project. I've implemented lots of stuff, algorithms, math, etc. but all within that environment. I look at the world of programming and there is so much out there that's not necessarily a standalone application. It seems to me that the hardest thing is finding out where to start; how to setup the environment. I guess I'm wondering if anyone has any suggestions, projects, tutorials, maybe on setting up environments for different languages on different systems. Web programming, java applets? etc.
On the note of environments, I would be interested in knowing on a more basic level what makes a "development environment." To my basic knowledge, an "environment" combines the language, with the compiler that interprets that language, and contains libraries that provide an API for the language, where the compiled product runs on a certain system. This is my basic concept, but again, I'm here.
Sorry if this question... well... combines too many questions, but any input or guidance is welcome. Thanks in advance for any replies!
Not sure if I understood your question correctly or if this will help you, but here are my (relative newbie) thoughts and rambling:
I've done Java at uni in two different courses, one where we wrote the code in Notepad and then compiled it in command line, in some dubious DOS application, and then two years later when we worked in NetBeans and while NetBeans was a lot better and easier, I learned a lot and was a lot more careful when writing code after the Notepad experience (especially after waiting for several minutes for a compile only to see a message caused by a silly bug).
If you can choose between IDEs, I would read on different blogs, see what people prefer and why and make a choice. The problem is that most of the time, both at uni and at work, you can't choose and have to go with the teachers/managers choose, and make the best of it.
It seems to me that the hardest thing is finding out where to start; how to setup the environment.
I think it would be easiest if you found something that you want to do, and then take small steps and get bits done. I work as a desktop app developer and 3 years ago I set up a wordpress blog for a friend and imported posts and comments from a different blogging platform, with minimal knowledge about everything involved. I started with things that were already done by others and learned how to use them and then slowly tried to fill in the gaps - the comments part wasn't done then, so I had to learn about databases, how I could see them and then write the code that inserted in them, etc.
What I'm trying to say is that if you find something to do (and if you don't have ideas for projects, you can find several posts with ideas here, on SO) and then set goals towards doing that, even if you don't finish it, or your studying takes you in areas you hadn't expected, it will all be useful at some point.
I guess I'm wondering if anyone has any suggestions, projects, tutorials, maybe on setting up environments for different languages on different systems. Web programming, java applets? etc.
This is way too broad a question. If you're doing web programming, you need to set up a web programming environment. At a minimum, you would need an HTTP server. You'd probably also need a relational database. The rest of the web environment would be language dependent.
If you're doing GUI programmng, you would need access to the device or devices (iPhone, Android, etc.) that you want to write programs for.
To my basic knowledge, an "environment" combines the language, with the compiler that interprets that language, and contains libraries that provide an API for the language, where the compiled product runs on a certain system.
That gets you started, yes. You'd want an integrated development environment to write the code. Again, you'd probably need a relational or object oriented database. The rest of the development environment is language dependent.

language popularity figures (C++, C#, Java, PHP, flash script, etc.)

I need to find figures that show how many programmers world wide, has each of the following languages as their primary programming language.
C
C++
C#
Object-C
Java
JavaScript
VB.NET
VB6 (or older)
VBA
PHP
flash scripts
Ruby
Does anyone know of such comparison figures?
If not. Do you know of a good way to research this?
I could compare the number of tags here at stackoverflow and the number of articles for each language at sites like codeproject. This would give me a good idea.
But if you can suggest other ideas how to find these numbers I will be greatfull.
/Thomas
A very common site that does this is the TIOBE index. It basically searches for programming languages in major search engines and compares the results, and it shows you some history. The only problem is that C/C++/C# are not distinguished very well, therefore C is more dominant than you'd expect (not to mention that search results include many pages where many languages are listed, like programming FAQs). But in general, TIOBE gives a good idea, I think, and it should get better, since at least Google tends to know the difference between zero, two or four pluses.
Have you tried TIOBE index?
In general this is hard to measure because every approach has a lot of drawbacks.
TIOBE and others that are based on search results e.g. do not tell anything of what is actually used but just what is highly ranked by google (You can even see that just Google changing a bit of their results in 2004/2005 completely mixed TIOBE). And moreover they have the problem that lots of search-terms are ambiguous (Like Java which IS also an island, Ruby which also exists as gem, Python which is a snake and others which have alternative meaning). Another problem with search based is that most things put into the web stay up forever which means it is irrelevant if it is CURRENTLY interesting. If a C resource was put up in 2002 it likely still is available today (which hugely overrates leading or older languages.)
Here one is an interesting approach based on the number of book sales. (This at least eliminates the ambigous problem, but comes with others.)
Wikipedia also has a small article about the topic.
Try Google trends (see an example). In addition, check sites like freshmeat.net and note the number of projects in each language. That's only open source projects and many people will use a different language for their hobby projects than at work (i.e. one that sucks less).
Next, look for sites which offer job openings. I don't have a good link handy but this Google query should get your started.
not yet!!!!!!!
That's only open source projects and many people will use a different language for their hobby projects than at work (i.e. one that sucks less).
Next, look for sites which offer job openings. I don't have a good link handy but this Google query should get your started.

Resources for learning a new language quickly?

The title may seem slightly self-contradictory, and I accept that you can't really learn a language quickly. However, an experienced programmer that already has knowledge of a few languagues and different styles (functional, OO, imperative etc.) often wants to get started quickly. I've seen a few websites doing effective "translations" in the form of "just show me syntax equivalence". I can't remember the sites now, but for related languages (e.g. Perl/PHP) it's quite common.
Is there a better resource that covers more languages? Is there a resource that covers idioms as well as syntax? I think this would be incredibly useful for doing small amounts of work on existing code bases where you are not familiar with the language. Looking at the existing code, as we know, is not always a good indicator of quality. Likewise, for "learn by doing" weekend project I always have the urge to write reasonably idiomatic, clean code from the start. Such a resource could also link to known good example projects of varying sizes for those that prefer to learn by reading. Reading a well-written medium sized code base can also be much more practical when access to development environments might be limited.
I think it's possible to find tutorials and summaries for individual languages that provide some of this functionality in disparate web locations but I'm hoping there is a good, centralised, comparative place that the busy programmer can turn to.
You generally have two main things to overcome:
Syntax
Reference
Syntax you can pick up fairly quickly with a language tutorial and a stack of samplecode.
Reference (library/API calls) you need to find a proper guide to; perhaps the language reference, or perhaps google...
With those two in place, following a walkthrough (to get you used to using the development environment) will have you pretty much ready - you'll be able to look up what you want to say (reference), and know how to say it (syntax).
This, of course, applies principally to procedural/oop languages; languages that require a paradigm switch (ML/Haskell) you should go to lectures for ;)
(and for the weirder moments, there's SO!)
In the past my favour was "learning by doing". So e.g. I know a little bit of C++ and a lot of C#.Net but I must write a FTP Tool in Python.
So I sit for an hour and so the syntax differences by a tutorial, than I develop the form itself and look at the generated code. Then I search a open source Python FTP Client and get pieces of code (Not copy and paste, write it self to see, feel and remember the code!)
After a few hours I get it.
So: The mix is the best. A book, a piece of good code, the willing to learn and a free night with much coffee.
At the risk of sounding cheesy, I would start with the language's website tutorial and/or FAQ, followed by asking more specific questions here. SO is my centralized location for programming knowledge.
I remember when I learned Perl. I was asked to modify some Perl code at work and I'd never seen the language before. I had experience with several other languages, however, so it wasn't hard to figure out the syntax with the online Perl docs in one window and the code in another, side-by-side. I don't know that solely reading existing code is necessarily the best way to learn. In my case, I didn't know Perl but I could tell that the person who originally wrote the code didn't know Perl either. I'm not sure I could've distinguished between good Perl and really confusing Perl. It would've been nice to be able to ask questions here at the time.
Language isn't important. What is important is learning your ways around designing algorithms and the proper application of design patterns. Focus on the technique, not the language that implements a certain technique. Once you understand the proper development techniques, any programming language will just become real easy, no matter how obscure they are...
When you put a focus on a language, you're restricting your own knowledge.
http://devcheatsheet.com/ seems to be a step in the right direction: it aggregates cheat sheets/quick references and they are (somewhat) manually reviewed. It's also wide-ranging. It still comes up short a bit in terms of "idiomatic" quick reference: for example, the page on Ruby doesn't mention yield.
Rosetta Code appears to be an excellent resource that includes hints on coding idiomatically and moves from simple (like for-loops) to things like drawing. I haven't checked out how comprehensive it is, but there are a large number of languages and tasks listed. The drawbacks re: original question are:
Some of the linking is not accurate
(navigating Python->ForLoop will
take you to the top of the ForLoop
page, not the Python section). It's a
wiki, this can be improved.
Ideally you could "slice" the wiki
however you chose to see e.g. the top
20 tasks for two languages
side-by-side.
http://hyperpolyglot.org/ seems to be an almost perfect match for what I was looking for. The quality is not always there, or idiom can be lacking, but it has the same intention and is pretty comprehensive.

Resources