Is there a search engine that will give a direct answer? [closed] - search

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I've been wondering about this for a while and I can't see why Google haven't tried it yet - or maybe they have and I just don't know about it.
Is there a search engine that you can type a question into which will give you a single answer rather than a list of results which you then have to trawl through yourself to find what you want to know?
For example, this is how I would design the system:
User’s input: “Where do you go to get your eyes tested?”
System output: “Opticians. Certainty: 95%”
This would be calculated as follows:
The input is parsed from natural language into a simple search string, probably something like “eye testing” in this case. The term “Where do you go” would also be interpreted by the system and used when comparing results.
The search string would be fed into a search engine.
The system would then compare the contents of the results to find matching words or phrases taking note of what the question is asking (i.e. what, where, who, how etc.)
Once a suitable answer is determined, the system displays it to the user along with a measure of how sure it is that the answer is correct.
Due to the dispersed nature of the Internet, a correct answer is likely to appear multiple times, especially for simple questions. For this particular example, it wouldn’t be too hard for the system to recognise that this word keeps cropping up in the results and that it is almost certainly the answer being searched for.
For more complicated questions, a lower certainty would be shown, and possibly multiple results with different levels of certainty. The user would also be offered the chance to see the sources which the system calculated the results from.
The point of this system is that it simplifies searching. Many times when we use a search engine, we’re just looking for something really simple or trivial. Returning a long list of results doesn’t seem like the most efficient way of answering the question, even though the answer is almost certainly hidden away in those results.
Just take a look at the Google results for the above question to see my point:
http://www.google.co.uk/webhp?sourceid=chrome-instant&ie=UTF-8&ion=1&nord=1#sclient=psy&hl=en&safe=off&nord=1&site=webhp&source=hp&q=Where%20do%20you%20go%20to%20get%20your%20eyes%20tested%3F&aq=&aqi=&aql=&oq=&pbx=1&fp=72566eb257565894&fp=72566eb257565894&ion=1
The results given don't immediately answer the question - they need to be searched through by the user before the answer they really want is found. Search engines are great directories. They're really good for giving you more information about a subject, or telling you where to find a service, but they're not so good at answering direct questions.
There are many aspects that would have to be considered when creating the system – for example a website’s accuracy would have to be taken into account when calculating results.
Although the system should work well for simple questions, it may be quite a task to make it work for more complicated ones. For example, common misconceptions would need to be handled as a special case. If the system finds evidence that the user’s question has a common misconception as an answer, it should either point this out when providing the answer, or even simply disregard the most common answer in favour of the one provided by the website that points out that it is a common misconception. This would all have to be weighed up by comparing the accuracy and quality of conflicting sources.
It's an interesting question and would involve a lot of research, but surely it would be worth the time and effort? It wouldn't always be right, but it would make simple queries a lot quicker for the user.

Such a system is called an automatic Question Answering (QA) system, or a Natural Language search engine. It is not to be confused with a social Question Answering service, where answers are produced by humans. QA is a well studied area, as evidenced by almost a decade of TREC QA track publications, but it is one of the more difficult tasks in the field of natural language processing (NLP) because it requires a wide range of intelligence (parsing, search, information extraction, coreference, inference). This may explain why there are relatively few freely available online systems today, most of which are more like demos. Several include:
AnswerBus
START - MIT
QuALiM - Microsoft
TextMap - ISI
askEd!
Wolfram Alpha
Major search engines have shown interest in question answering technology. In an interview on Jun 1, 2011, Eric Scmidt said, Google’s new strategy for search is to provide answers, not just links. "'We can literally compute the right answer,' said Schmidt, referencing advances in artificial intelligence technology" (source).
Matthew Goltzbach, head of products for Google Enterprise has stated that "Question answering is the future of enterprise search." Yahoo has also forecasted that the future of search involves users getting real-time answers instead of links. These big players are incrementally introducing QA technology as a supplement to other kinds of search results, as seen in Google's "short answers".
While IBM's Jeopardy-playing Watson has done much to popularize machines answering question (or answers), many real-world challenges remain in the general form of question answering.
See also the related question on open source QA frameworks.
Update:
2013/03/14: Google and Bing search execs discuss how search is evolving to conversational question answering (AllThingsD)

Wolfram Alpha
http://www.wolframalpha.com/
Wolfram Alpha (styled Wolfram|Alpha)
is an answer engine developed by
Wolfram Research. It is an online
service that answers factual queries
directly by computing the answer from
structured data, rather than providing
a list of documents or web pages that
might contain the answer as a search
engine would.[4] It was announced in
March 2009 by Stephen Wolfram, and was
released to the public on May 15,
2009.[1] It was voted the greatest computer innovation of 2009 by Popular
Science.[5][6]
http://en.wikipedia.org/wiki/Wolfram_Alpha

Have you tried wolframalpha?
Have a look at this: http://www.wolframalpha.com/input/?i=who+is+the+president+of+brasil%3F

Ask Jeeves, now Ask.com, used to do this. Why nobody does this anymore, except Wolfram:
Question Answering (QA) is far from a solved problem.
There exist strong question answering systems, but they require full parsing of both the question and the data and therefore require tremendous amounts of computing power and storage, even compared to Google scale, to get any coverage.
Most web data is too noisy to handle; you first have to detect if it's in a language you support (or translate it, as some researchers have done; search for "cross-lingual question answering"), then try to detect noise, then parse. You lose more coverage.
The internet changes at lightning pace. You lose even more coverage.
Users have gotten accustomed to keyword search, so that's much more economical.

Powerset, acquired by Microsoft, is also trying to do question answering. They call their product a "natural language search engine" where you can type in a question such as "Which US State has the highest income tax?" and search on the question instead of using keywords.

Related

Agile, Scrum and documentation [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
One of the Four Core Agile values says "Working Software over comprehensive documentation" and this is explained as a good thing. Furthermore it is explained that rather than written communication (e-mails included), face-to-face meetings are preferred and "more productive".
I would like for someone to explain to me why or how is this a good thing?
In a organization I used to work there were heaps of working software that I had to maintain. The documentation was minimal and it was a nightmare. It didn't help that the programs were not modularized and were very hard to understand and with the most esoteric twists and very disorganized. Comprehensive documentation as very important was one think I took from that experience. It doesn't matter if the software works now if it is not going to work in the near future right?
And on face-to-face meetings, I had the same doubt. I very much prefer e-mails (written) You can say the most outrageous of things when talking but when it is written then it is a deal. Plus if you are in a multinational organization with several languages, it helps a lot
I would like to hear the voice of people with Agile experience. How is the above a good thing? Thanks
Working software over comprehensive documentation
Comprehensive documentation is sometimes seen as a way to demonstrate progress. "If we have a detailed specification and a weighty design document then we are making good progress towards a product delivery"
What working software over comprehensive documentation means is that we view working software as a better demonstration of progress than documentation. This is because comprehensive documentation can give a false level of confidence.
So there is nothing that says avoid doing any documentation. It is just saying that we should only do the documentation that is needed and not just do documentation because it is part of a process.
In your example where there the software is difficult to work with then more documentation may well be needed. Just don't write documents that never get used and offer little value.
Individuals and interaction over process and tools
Face-to-face communication has many advantages over other forms of communication. For example:
People use body language to give context to conversations
People use audible and visual clues as to when to start and stop talking - this helps to make conversations flow
Regular face-to-face discussions often help teams to bond together
Notice though that the Agile manifesto does not mention face-to-face communication. All it says is individuals and interaction. If you and your team have ways of communicating that are as effective as face-to-face communication then that fits just as well within the Agile approach. The important part is that we value interaction and having members of the team work closely with each other.
When all agile recommendations are taken into account there are no issues you mentioned in your question. Working software should also has good code standards and design.
Regarding your particular issue with a lack of documentation unit tests (TDD/BDD) could be very useful. Good code coverage can explain how code should work even better than detailed documentation. Agile methodology also welcomes simplicity so your entire architecture might be over-complicated
Regarding face-to-face communication. Just imagine situation when you detected issue in your product (web-site markup). Instead of writing long email with steps to reproduce and attaching screenshots, you just go to front-end developer sitting in your room or make skype call and start explaining problem. Developer quickly realizes that he forgot to include some script. So your will get answer in minutes while your email can be answered next day.
I think it would be necessary to clarify your needs on using agile first before you want to apply agile.
Agile is the recommended working framework for a highly unpredictable domain(you may also check Cynefin model for identifying your working contexts). In this domain, you do require "working software" and "good communication" to review and revise your development in a short-term iterative process. As a result, you can change and improve your software based on the feedback from your software. This is proven to be the most effective and efficient way to build software in high competitive business world.
However, in your organization, you are maintaining legacy software with limited documentation. This context is totally different from what agile is designed for. You need optimization in your world, not testing or growth seeking. In short, process/tools and documentation are more important.
Regarding email communication, there is no doubt that email makes the deal, but you could never make a deal by just using email. It is the same as how you apply agile. You should apply both face-to-face and email based on different situation.
I would regard Agile as a framework more than a methodology. The concept there is to allow you build your own process based on your own working environment.
Documentation is an expression of a shared vocabulary, so it should be consistent from the epic all the way down to the comments in the code:
Documentation should be comprehensive and understandable. Using examples is recommended.
Language between feature stories, technical stories, pseudocode, and assertions should have naming conventions
A feature that people do not know about is a useless feature.
Lack of documentation can be a symptom of the lack of a marketing plan
A feature that isn't documented is a useless feature. A patch for a new feature must include the documentation.
Lack of documentation can be a symptom of the lack of usability, accessibility, and information architecture
Adjust the documentation. Doing this first gives you an impression of how
your changes affect the user.
Lack of documentation can be a symptom of a lack of focus on the user and the maintainer:
Software is not useful by itself. The executable software is only part of the picture. It is of no use without user manuals, business processes, design documentation, well-commented source code, and test cases. These should be produced as an intrinsic part of the development, not added at the end. In particular, recognize that design documentation serves two distinct purposes:
To allow the developers to get from a set of requirements to an implementation. Much of this type of documentation outlives its usefulness after implementation.
To allow the maintainers to understand how the implementation satisfies the requirements. A document aimed at maintainers is much shorter, cheaper to produce and more useful than a traditional design document.
And understanding the purpose of any project requires building a relationship between the project timeline and the source code history:
Write the change log entries for your changes. This is both to save us the extra work of writing them, and to help explain your changes so we can understand them.
The purpose of the change log is to show people where to find what was changed. So you need to be specific about what functions you changed; in large functions, it’s often helpful to indicate where within the function the change was.
On the other hand, once you have shown people where to find the change, you need not explain its purpose in the change log. Thus, if you add a new function, all you need to say about it is that it is new. If you feel that the purpose needs explaining, it probably does — but put the explanation in comments in the code. It will be more useful there.
References
Vim documentation: develop
SCRUM-PSP: Embracing Process Agility and Discipline (pdf)
Secure Software Development Life Cycle Processes | US-CERT
An Uneasy Marriage? Merging Scrum and TSP (pdf)
TSP/PSP and Agile-SCRUM: Similarities & Differences
GNU Emacs Manual: Sending Patches

Where are programming tasks in scrum detailed at? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
When you have sprint task in Scrum, where do you put how you want to program something? For example, say I am making a tetris game and I want to build the part of the game that tracks the current score and a high score table. I have my feature, my user story and my task, but now I want to talk about how to design it.
Is that design something that is recorded on the sprint somewhere as to how to do that or is that just somethign the programmer figures out. Do you put do task x use database such and such, create these columns, etc.? If not, do you record that at all? Is that what trac is for? I don't mean too high level design.
I touched on it here: Where in the scrum process is programming architecture discussed?
but my current question is later in the project after the infrastructure. I'm speaking more about the middle now. The actual typing in the code. Some said they decide along the way, some team-leads. Is this is even documented anywhere except in the code itself with docs and comments?
edit: does your boss just say, okay, you do this part, I don't care how?
Thank you.
There can be architectural requirements in addition to user-specified requirements that can muddy this a bit. Thus, one could have a, "You will use MVP on this," that does limit the design a bit.
In my current project, aside from requirements from outside the team, the programmer just figures it out is our standard operating procedure. This can mean crazy things can be done and re-worked later on as not everyone will code something so that the rest of the team can easily use it and change it.
Code, comments and docs cover 99% of where coding details would be found. What's left, if one assumes that wikis are part of docs?
Scrum says absolutely nothing about programming tasks. Up to you to work that out...
Scrum doesn't necessarily have anything explicitly to do with programming - you can use it to organise magazine publication, church administration, museum exhibitions... it's a management technique not explicitly a way of managing software development.
If you do extreme programming inside scrum, you just break your user stories for the iteration down into task cards, pair up and do them.
When I submit tasks to my programming team, the description usually takes the shape of a demo, a description on how the feature is shown in order to be reviewed.
How the task will be implemented is decided when we evaluate the task. The team members split the task in smaller items. If a design is necessary, the team will have to discuss it before being able to split it. If the design is too complex to be done inside this meeting, we will simply create a design task, agile/scrum doesn't force how this should be done (in a wiki, in a doc, in your mind, on a napkin, your choice) aside for saying as little documentation as possible. In most case the design is decided on a spot, after a bit of debate, and the resulting smaller tasks are the description of how things will be done.
Also, sometimes the person doing it will make discoveries along the way that change the design and so, the way to work on it. We may then thrash some cards, make new ones. The key is to be flexible.
You do what you need to do. Avoid designing everything up front, but if there are things you already know will not change, then just capture them. However, corollary to YAGNI is that you don't try to capture too much too soon as the understanding of what is needed will likely change before someone gets to do it.
I think your question sounds more like you should be asking who, not when or where. The reason Agile projects succeed is that they understand that people are part of the process. Agile projects that fail seem to tend to favor doing things according to someone's idea of "the book" and not understanding the people and project they have. If you have one senior team lead and a bunch of junior developers, then maybe the senior should spend more of their time on such details (emphasis on maybe). If you have a bunch of seniors, then leaving these to the individual may be a better idea. I assume you don't have any cross-team considerations. If you do, then hashing out some of the details like DB schema might need to come early if multiple teams depend on it.
If you (as team member) feels the need to talk about design, to so some design brainstorming with other team members, then just do it. About the how, many teams will just use a whiteboard and brain juice for this and keep things lightweight which is a good practice IMHO.
Personally, I don't see much value in writing down every decision and detail in a formalized document, at least not in early project phases. Written documents are very hard to maintain and get deprecated pretty fast. So I tend to prefer face to face communication. Actually, written documents should only be created if they're really going to be used, and in a very short term. This can sound obvious but I've seen several projects very proud of their (obsolete) documentation but without any line of code. That's just ridiculous. In other words, write extensive documentation as late as possible, and only if someone value it (e.g. the product owner).

How do you measure the popularity of a programming language?

Following on from this question, I am interested in finding out how you could measure the popularity of any and all programming languages.
As professional developers, we need to be aware of the trends in the software industry - what languages will employers be looking for in the coming few years, and we should be proficient in. Also, it can allow us to spot opportunities - perhaps there are opportunities for new developers to branch out into mainframe programming as older members of the profession retire. For this reason, it is important for us to track programming language popularity.
There are number of questions already on Stack Overflow (here and here) about how SO could be used to measure a language's popularity (or the difficulty in using said language). Other methods include tracking job adverts (i.e. http://www.hotskills.net/) and search engine query statistics (i.e. http://langpop.com/).
Can the SO community think of any other methods of measuring this?
Summary
Use Stack Overflow tags to measure language popularity
Search Engine query statistics
Job adverts
Open Source code repositories
As noted by various contributors below, each of the above sources has problems as a reference to calculate language popularity/usage.
As the author of http://www.langpop.com my approach is to find as many metrics as possible (certainly not limited to just search engine results! We have books, job listings, irc, google code, freshmeat and others) and let people see the methodology, making the whole thing as transparent as possible. That's why I added the javascript feature that lets you recalculate the normalized results with different weights for each metric.
As someone else notes, there are many different ways of measuring popularity. Another important one that he doesn't mention might be the "acceleration" of a given language: for instance, Cobol has a big installed base, but I don't think a lot of new Cobol projects are being started. Something like Ruby is probably the opposite - it's not widely used, but a lot of people are picking it up for new projects.
I disagree with the conclusion that the numbers are "meaningless", though. By looking at the different measurements and thinking about them some, I think there are plenty of interesting conclusions to be drawn. Also, don't confuse "rough" numbers with "useless" numbers. I think we can definitely say that Java is more popular than Tcl, for instance.
I'd say a language popularity and success is exponential to the number of people who hate it.
Not voting the question down, because a lot of people ask about this kind of thing. However...
The next words out of anyone's mouth after this is asked should be, "Popular with who?".
Popular is a useless word to apply to programming languages. There is no universally accepted meaning of it, so there's objective way to measure it.
For example, the obvious thing to do would be to go out and count up worldwide deployed LOC in every software project in use. When you do that, you'd discover that hands-down the most popular language is Cobol.
Someone else might think the obvious way to measure would be by Google hits. Doing that, they'd find that Java gets 282 million results, while C# gets 48 million, and Cobol only gets 6.5 million. So clearly Java is more popular than C#, and way more popular than Cobol.
A third person might think the obvious way to check is to look at SO tags. They'd find the single most used tag here is C# (34K uses so far). Cobol only has been used 65 times here. So clearly C# is the most popular, and almost nobody uses Cobol.
So who is right? All three are. It depends on what you really meant when you asked the question.
For those who are surprised at my Cobol assertion, I suggest reading this (somewhat dated 2003) article on the subject. It will be a real eye-opener. It could be argued that we non-Cobol programmers are all working around the margins of a gigantic Cobol world.
You check the tiobe statistic
What does "popular" mean? Here are some potential ways of measuring it:
The number of developers writing with that language professionally at a given point in time.
The number of people frequently experimenting with or using the language at home at any given point in time.
The number of developers who wish they were using language X (or are happy that they are).
Problems with some measurements:
Using SO questions or Google hits could merely indicate which language (among those in the running for most popular) is the hardest to use.
Counting job adverts would be horribly inaccurate, since people tend to switch to things that don't fall into their original job description, and you would miss all the people currently using a language (not applying for a job).
Personally, I'd like to use number 3 as a measurement of popularity, but I have no idea how you would measure it. The internet would seem like a good place, but which site will be able to attract all the developers, and how would you know that enough of them responded to the poll?
Open source contributions perhaps.
number of posts about that programming language on stack overflow
You can use Google Trend to have an idea. Of course it's not very accurate since you can write "C#" or "C Sharp" but it can give you a brief idea.
This blog article neatly summarizes the various ways of determining the popularity of a programming language:
Determining Programming Language Popularity
The article describes one way of measuring popularity that has so far not been mentioned:
Popularity by Book Sales
In terms of ways that have been mentioned, the article offers specific ways of gathering statistics:
Measured by Commits to Open Source projects - use of the Ohloh website.
Popularity by Lines of Code - use of figures compiled by BlackDuck

Effective strategies for studying frameworks/ libraries partially [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I remember the old effective approach of studying a new framework. It was always the best way to read a good book on the subject, say MFC. When I tried to skip a lot of material to speed up coding it turned out later that it would be quicker to read the whole book first. There was no good ways to study a framework in small parts. Or at least I did not see them then.
The last years a lot of new things happened: improved search results from Google, programming blogs, much more people involved in Internet discussions, a lot of open source frameworks.
Right now when we write software we much often depend on third-party (usually open source) frameworks/ libraries. And a lot of times we need to know only a small amount of their functionality to use them. It's just about finding the simplest way of using a small subset of the library without unnecessary pessimizations.
What do you do to study as less as possible of the framework and still use it effectively?
For example, suppose you need to index a set of documents with Lucene. And you need to highlight search snippets. You don't care about stemmers, storing the index in one file vs. multiple files, fuzzy queries and a lot of other stuff that is going to occupy your brain if you study Lucene in depth.
So what are your strategies, approaches, tricks to save your time?
I will enumerate what I would do, though I feel that my process can be improved.
Search "lucene tutorial", "lucene highlight example" and so on. Try to estimate trust score of unofficial articles ( blog posts ) based on publishing date, the number and the tone of the comments. If there is no a definite answer - collect new search keywords and links on the target.
Search for really quick tutorials/ newbie guides on official site
Estimate how valuable are javadocs for a newbie. (Read Lucene highlight package summary)
Search for simple examples that come with a library, related to what you need. ( Study "src/demo/org/apache/lucene/demo")
Ask about "simple Lucene search highlighting example" in Lucene mail list. You can get no answer or even get a bad reputation if you ask a silly question. And often you don't know whether you question is silly because you have not studied the framework in depth.
Ask it on Stackoverflow or other QA service "could you give me a working example of search keywords highlighting in Lucene". However this question is very specific and can gain no answers or a bad score.
Estimate how easy to get the answer from the framework code if it's open sourced.
What are your study/ search routes? Write them in priority order if possible.
I use a three phase technique for evaluating APIs.
1) Discovery - In this phase I search StackOverflow, CodeProject, Google and Newsgroups with as many different combination of search phrases as possible and add everything that might fit my needs into a huge list.
2) Filter/Sort - For each item I found in my gathering phase I try to find out if it suits my needs. To do this I jump right into the API documentation and make sure it has all of the features I need. The results of this go into a weighted list with the best solutions at the top and all of the cruft filtered out.
3) Prototype - I take the top few contenders and try to do a small implementation hitting all of the important features. Whatever fits the project best here wins. If for some reason an issue comes up with the best choice during implementation, it's possible to fall back on other implementations.
Of course, a huge number of factors go into choosing the best API for the project. Some important ones:
How much will this increase the size of my distribution?
How well does the API fit with the style of my existing code?
Does it have high quality/any documentation?
Is it used by a lot of people?
How active is the community?
How active is the development team?
How responsive is the development team to bug patch requests?
Will the development team accept my patches?
Can I extend it to fit my needs?
How expensive will it be to implement overall?
... And of course many more. It's all very project dependent.
As to saving time, I would say trying to save too much here will just come back to bite you later. The time put into selecting a good library is at least as important as the time spent implementing it. Also, think down the road, in six months would you rather be happily coding or would you rather be arguing with a xenophobic dev team :). Spending a couple of extra days now doing a thorough evaluation of your choices can save a lot of pain later.
The answer to your question depends on where you fall on the continuum of generality/specificity. Do you want to solve an immediate problem? Are you looking to develop a deep understanding of the library? Chances are you’re somewhere between those extremes. Jeff Atwood has a post about how programmers move between these levels, based on their need.
When first getting started, read something on the high-level design of the framework or library (or language, or whatever technology it is), preferably by one of the designers. Try to determine what problems they are trying to address, what the organizing principles behind the design are, and what the central features are. This will form the conceptual framework from which future understanding will hang.
Now jump in to it. Create something. Do not copy and paste somebody's code. Instead, when things don’t work, read the error messages in detail, and the help on those error messages, and figure out why that error occurred. It can be frustrating, when things don’t work, but it forces you to think, and that’s when you learn.
1) Search Google for my task
2) look at examples with a few different libraries, no need to tie myself down to Lucene for example, if I don't know what other options I have.
3) Look at the date of last update on the main page, if it hasn't been updated in 6-months leave (with some exceptions)
4) Search for sample task with library (don't read tutorials yet)
5) Can I understand what's going on without a tutorial? If yes continue if no start back at 1
6) Try to implement the task
7) Watch myself fail
8) Read a tutorial
9) Try to implement the task
10) Watch myself fail and ask on StackOverflow, or mail the authors, post on user group (if friendly looking)
11) If I could get the task done, I'll consider the framework worthy of study and read up the main tutorial for 2 hours (if it doesn't fit in 2 hours I just ignore what's left until I need it)
I have no recipe, in the sense of a set of steps I always follow, that's largely because everything I learn is different. Some things are radically new to me (Dojo for example, I have no fluency in Java script so that's a big task), some just enhancements of previous knowledge (Iknow EJB 2 well, so learning EJB 3 while on the surface is new with all its annotations, its building on concepts.)
My general strategy though is I'd describe as "Spiral and Park". I try to circle the landscape first, understand the general shape, I Park concepts that I don't get just yet, don't let it worry me. Then i go a little deeper into some areas, but again try not to get obsessed with one, Spiralling down into the subject. Hopefully I start to unpark and understand, but also need to park more things.
Initially I want answers to questions such as:
What's it for?
Why would I use this rather than that other thing I already know
What's possible? Any interesting sweet spots. "Eg. ooh look at that nice AJAX-driven update"
I do a great deal of skim reading.
Then I want to do more exploring on the hows. I start to look for gotchas and good advice. (Eg. in java: why is "wibble".equals(var) a useful construct?)
Specific techniques and information sources:
Most important: doing! As early as possible I want to work a tutorial or two. I probably have to get the first circuit of the spiral done, but then I want to touch and experiment.
Overview documents
Product documents
Forums and discussion groups, learning by answering questions is my favourite technique.
if at all possible I try to find gurus. I'm fortunate in having in my immediate colleagues a wealth of knowledge and experience.
Quick-start guides.
A quick look at the API documentation if available.
Reading sample codes.
Messing around YOU HAVE TO MESS AROUND (sorry for the caps).
If it's a small library/API with a small or no community you can always contact the developer himself and ask for help 'cause he'll probably be more than happy to help you; he's happy that one more person is using his API.
Mailing lists are a great resource as long as you do your homework first before asking questions.
Mailing list archives are invaluable for most of the questions I've had on CoreAudio related stuff.
I would never read javadoc. As there often is none. And when there is, most likely it isnt up to date. So one gets confused at the best.
Start with the simplest possible tutorial you find within some minutes.
Often the tutorial will lead you to further sources at the end, so then most of the time one is on a path that goes on and on, deeper and deeper.
It really depends on what the topic is and how much info is on it. Learning by example is a good way to start a topic brand new to you, especially if you're knowledgeable in other similar libraries or languages. You can take a topic you're familiar with, and say "I understand how to implement using X, lets see how it's done using Y".
So what are your strategies, approaches, tricks to save your time?
Well, I search. I generally never ask questions, preferring to research myself. If worse comes to worse I'll read the documentation. In some cases (say, when I was doing some work with SharpSVN) I had to look at the source, specifically the test cases, to get some information about how the API worked.
Generally, I have to be honest, most of my 'study' and 'learning' is by accident.
For example, just a few seconds ago, I discovered how to get the "Recent" folder in C#. I had no idea how to do that before seeing the question, considering it interesting, and then searching.
So for me the real 'trick' is that I hang around on forums, answer questions, and accidentally pick up knowledge. Then when it comes time for me to research something; chances are I know a bit about it, and searching is easier and I can focus on the implementation [typically implementing a test program first] and progressing from there.

About "AUTOMATIC TEXT SUMMARIZER (lingustic based)" [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am having "AUTOMATIC TEXT SUMMARIZER (linguistic approach)" as my final year project. I have collected enough research papers and gone through them. Still I am not very clear about the 'how-to-go-for-it' thing. Basically I found "AUTOMATIC TEXT SUMMARIZER (statistical based)" and found that it is much easier compared to my project. My project guide told me not to opt this (statistical based) and to go for linguistic based.
Anyone who has ever worked upon or even heard of this sort of project would be knowing that summarizing any document means nothing but SCORING each sentence (by some approach involving some specific algos) and then selecting sentences having score more than threshold score. Now the most difficult part of this project is choosing the appropriate algorithm for scoring and later implementing it.
I have moderate programming skills and would like to code in JAVA (because there I'll get lots of APIs resulting in lesser overheads). Now I want to know that for my project, what should be my approach and algos used. Also how to implement them.
Using Lexical Chains for Text Summarization (Microsoft Research)
An analysis of different algorithms: DasMartins.2007
Most important part in the doc:
• Nenkova (2005) analyzes that no system
could beat the baseline with statistical
significance
• Striking result!
Note there are 2 different nuances to the liguistic approach:
Linguistic rating system (all clear here)
Linguistic generation (rewrites sentences to build the summary)
Automatic Summarization is a pretty complex area - try to get your java skills first in order as well as your understanding of statistical NLP which uses machine learning. You can then work through building something of substance. Evaluate your solution and make sure you have concretely defined your measurement variables and how you went about your evaluation. Otherwise, your project is doomed to failure. This is generally considered a high risk project for final year undergraduate students as they often are unable to get the principles right and then implement it in a way that is not right either and then their evaluation measures are all ill defined and don't reflect on their own work clearly. My advice would be to focus on one area rather then many in summarization as you can have single and multi document summaries. The more varied you make your project the less likely hold of you receiving a good mark. Keep it focused and in depth. Evaluate other peoples work then the process you decided to take and outcomes of that.
Readings:
-Jurafsky book on NLP there is a back section on summarization and QA.
-Advances in Text Summarization by inderjeet mani is really good
Understand what things like term weighting, centroid based summarization, log-likelihood ratio, coherence relations, sentence simplification, maximum marginal relevance, redundancy, and what a focused summary actually is.
You can attempt it using a supervised or an unsupervised approach as well as a hybrid.
Linguistic is a safer option that is why you have been advised to take that approach.
Try attempting it linguistically then build statistical on to hybridize your solution.
Use it as an exercise to learn the theory and practical implication of the algorithms as well as build on your knowledge. As you will no doubt have to explain and defend your project to the judging panel.
If you really have read those research papers and research books you probably know what is known. Now it is up to you to implement the knowledge of those research papers and research books in a Java application. Or you could expand the human knowledge by doing some innovation/invention. If you do expand human knowledge you have become a true scientist.
Please make your question more specific, in these two main areas:
Project definition: What is the goal of your project?
Is the input unit a single document? A list of documents?
Do you intend your program to use machine learning?
What is the output?
How will you measure success?
Your background knowledge: You intend to use linguistic rather than statistical methods.
Do you have background in parsing natural language? In semantic representation?
I think some of these questions are tough. I am asking them because I spent too much time trying to answer similar questions in the course of my studies. Once you get these sorted out, I may be able to give you some pointers. Mani's "Automatic Summarization" looks like a good start, at least the introductory chapters.
The University of Sheffield did some work on automatic email summarising as part of the EU FASiL project a few years back.

Resources