Inspired by
Haskell API Search Engine
I begun to wonder what is the right way of finding names of things in Scala library.
For example let's assume I need some string substation, like search and replace. StringOps has no such thing. Google doesn't help either, because these terms are very general, and manually traversing documentation isn't much fun.
My question is, where do experts look when they are seeking a particular function?
This should be practical enough to use it in everyday work.
I do agree that a Hoogle like search engine for Scala would be great. Until there is such a tool, from another question:
I suggest using the Reference
Index
You can see the whole thing at
nightly Scaladoc
Related
While writing a Haskell binding for some libs written in C, a thing has to do is writing docs in haddock format. But since normally the binding is just plain, the doc would be just reformat of original libs' doc.
So my question is, is there some tools to help with this? Thanks.
I don't know of any tool for that. Since C docs can take many forms, I don't think there is any tool.
If the binding is indeed plain, essentially everything in IO, same names as the C library, etc. there is a very lazy option: provide a link to the C docs and refer to that.
Better: if the C docs are online, and each function/variable/entity has its own link, provide a link for each entity. In such way, the Haskell programmer can find your docs in Hackage, as usual, and then it's just one more click away to the real docs.
Of course, ideally one should copy the C docs, so that it's immediately available. However, this can require a lot of work, and some care in handling copyright correctly.
We have a large number of repositories. We want to implement a semantics(functionality) based code search on those repositories. Right now, we already have implemented keyword based code search in which we crawled through all the repository files and indexed them using elasticsearch. But that doesn't solve our problem as some of the repositories are poorly commented and documented, thus searching for specific codes/libraries become difficult.
So my question is: Is there any opensource libraries or any previous work done in this field which could help us index the semantics of the repository files, so that searching the code becomes easy and this would also help us in re-usability of the codes. I have found some research papers like Semantic code browsing, Semantics-based code search etc. but were of no use as there was no actual implementation given. So can you please suggest some good libraries or projects which could help me in achieving the same.
P.S:-Moreover, companies like Koders, Google, cocycles.com etc. started their code search based on functionality. But most of them have shut down their operations without giving any proper feedback, can anyone please tell me what kind of difficulties they are facing.
not sure if this is what you're looking for, but I wrote https://github.com/google/zoekt , which uses ctags-based understanding of code to improve ranking.
Take a look at insight.io
It provides semantic search and browsing
When learning (or relearning) a language, a significant amount of time goes into learning the functions for doing basic operations. For example, suppose I want to reverse a String. In one language, it may be simple as myString.reverse(). In Python, it is myString[::-1]. In other languages, you may have to create an array, iterate through the string and add all the characters in reverse order and then convert it back to a string. What would be extremely useful would be a reference so that if you know the name of the function in one language, then I could find the equivalent in another. Googling or searching StackOverflow don't seem to solve this problem very well at the moment, as you have to usually try a large number of different queries. I guess I am thinking of some kind of Wiki system. Are there any websites that do this?
It sounds like you're looking for Rosetta Code. There is in fact a page on reversing a string.
The title may seem slightly self-contradictory, and I accept that you can't really learn a language quickly. However, an experienced programmer that already has knowledge of a few languagues and different styles (functional, OO, imperative etc.) often wants to get started quickly. I've seen a few websites doing effective "translations" in the form of "just show me syntax equivalence". I can't remember the sites now, but for related languages (e.g. Perl/PHP) it's quite common.
Is there a better resource that covers more languages? Is there a resource that covers idioms as well as syntax? I think this would be incredibly useful for doing small amounts of work on existing code bases where you are not familiar with the language. Looking at the existing code, as we know, is not always a good indicator of quality. Likewise, for "learn by doing" weekend project I always have the urge to write reasonably idiomatic, clean code from the start. Such a resource could also link to known good example projects of varying sizes for those that prefer to learn by reading. Reading a well-written medium sized code base can also be much more practical when access to development environments might be limited.
I think it's possible to find tutorials and summaries for individual languages that provide some of this functionality in disparate web locations but I'm hoping there is a good, centralised, comparative place that the busy programmer can turn to.
You generally have two main things to overcome:
Syntax
Reference
Syntax you can pick up fairly quickly with a language tutorial and a stack of samplecode.
Reference (library/API calls) you need to find a proper guide to; perhaps the language reference, or perhaps google...
With those two in place, following a walkthrough (to get you used to using the development environment) will have you pretty much ready - you'll be able to look up what you want to say (reference), and know how to say it (syntax).
This, of course, applies principally to procedural/oop languages; languages that require a paradigm switch (ML/Haskell) you should go to lectures for ;)
(and for the weirder moments, there's SO!)
In the past my favour was "learning by doing". So e.g. I know a little bit of C++ and a lot of C#.Net but I must write a FTP Tool in Python.
So I sit for an hour and so the syntax differences by a tutorial, than I develop the form itself and look at the generated code. Then I search a open source Python FTP Client and get pieces of code (Not copy and paste, write it self to see, feel and remember the code!)
After a few hours I get it.
So: The mix is the best. A book, a piece of good code, the willing to learn and a free night with much coffee.
At the risk of sounding cheesy, I would start with the language's website tutorial and/or FAQ, followed by asking more specific questions here. SO is my centralized location for programming knowledge.
I remember when I learned Perl. I was asked to modify some Perl code at work and I'd never seen the language before. I had experience with several other languages, however, so it wasn't hard to figure out the syntax with the online Perl docs in one window and the code in another, side-by-side. I don't know that solely reading existing code is necessarily the best way to learn. In my case, I didn't know Perl but I could tell that the person who originally wrote the code didn't know Perl either. I'm not sure I could've distinguished between good Perl and really confusing Perl. It would've been nice to be able to ask questions here at the time.
Language isn't important. What is important is learning your ways around designing algorithms and the proper application of design patterns. Focus on the technique, not the language that implements a certain technique. Once you understand the proper development techniques, any programming language will just become real easy, no matter how obscure they are...
When you put a focus on a language, you're restricting your own knowledge.
http://devcheatsheet.com/ seems to be a step in the right direction: it aggregates cheat sheets/quick references and they are (somewhat) manually reviewed. It's also wide-ranging. It still comes up short a bit in terms of "idiomatic" quick reference: for example, the page on Ruby doesn't mention yield.
Rosetta Code appears to be an excellent resource that includes hints on coding idiomatically and moves from simple (like for-loops) to things like drawing. I haven't checked out how comprehensive it is, but there are a large number of languages and tasks listed. The drawbacks re: original question are:
Some of the linking is not accurate
(navigating Python->ForLoop will
take you to the top of the ForLoop
page, not the Python section). It's a
wiki, this can be improved.
Ideally you could "slice" the wiki
however you chose to see e.g. the top
20 tasks for two languages
side-by-side.
http://hyperpolyglot.org/ seems to be an almost perfect match for what I was looking for. The quality is not always there, or idiom can be lacking, but it has the same intention and is pretty comprehensive.
I'm looking to extract names and places from very short bursts of text example
"cardinals vs jays in toronto"
" Daniel Nestor and Nenad Zimonjic play Jonas Bjorkman w/ Kevin Ullyett, paris time to be announced"
"jenson button - pole position, brawn-mercedes - monaco".
This data is currently in a MySQL database, and I (pretty much) have a separate record for each athlete, though names are sometimes spelled wrong, etc.
I would like to extract the athletes and locations.
I usually work in PHP, but haven't been able to find a library for entity extraction (and I may want to get deeper into some NLP and ML in the future).
From what I've found, LingPipe and NLTK seem to be the most recommended, but I can't figure out if either will really suit my purpose, or if something else would be better.
I haven't programmed in either Java or Python, so before I start learning new languages, I'm hoping to get some advice on what route I should follow, or other recommendations.
What you're describing is named entity recognition. So I'd recommend checking out the other questions regarding this topic if you haven't already seen them. This looks like the most useful answer to me.
I can't really comment about whether NLTK or LingPipe is best suited for this task although from looking at the answers it looks like there's quite a few other resources written in Java.
One advantage of going with NLTK is that Python is very accessible as a language. The other advantage is that the NLTK book (which is available for free) offers an introduction to both Python and NLTK at the same time, which would be useful for you.