Citations on Scopus that show outwardly where a work is cited in subsequent publications - reference

In Scopus one can click into a publication and see what other subsequent publications reference the preceding one.
The term “references” in a papers, it is clear that it is the works that the authors suggest the reader “carry backward to” those works for further context / background in relation / comparison to their own new works.
But in Scopus it’s nice that it allows the reader to “carry forward to” subsequent publications which use that paper as a reference. It is this term that is unclear to me and seems to not have a common name.
Profer (proference) with an O not an E, infer (inferences)? Referent? Produced works/products? The first name may be obsolete supposedly though not sure, but can’t see why.
This is similar to but not the same post that I asked before, as I hadn’t given a specific context like in the site Scopus as in this current post above. Here is my related post: https://english.stackexchange.com/q/584642/440001
What’s more is that there is another post where the poster used the words “outward citations” and the respondents didn’t know what poster was talking about: Scopus outward citations?
Further there is a post that is out there that asks for the opposite of reference data: What is the opposite of "Reference Data"?
Is there any historical terminology that is out there that has ever been used before about this topic? Thanks!

Related

Need ideas for rewarding the users of a wiki

I need ideas as to how best reward users of a wiki to make them motivated to keep contributing in a constructive manner. Articles can be upvoted, so the thought is to reward the contributors based on how much they have contributed to a specific article as well as how many upvotes it has gotten. The idea so far is to use a system of rewarding points to those who wrote the article ,and points from the amount of bytes the user have generated by editing articles.
The immediate problems i see is how to correctly assess when to give points in situations such as when a user edits parts of an article that has already been edited before. When a user edits for the sake of correcting misspellings(an example is a user who edits a single word), whether this should give points as i don’t see how the backend would distinct between a user correcting a mistake in spelling or farming points by making small changes here and there.
There is also the issue with how to manage the byte contribution point system with regards to how to handle a situation when a user’s contribution have been overridden by an edit, if they should get to keep their points from contributing bytes now that their original piece of text is gone.
The intention is to make the user feel rewarded for their work without making the reward system too competitive(making them focus more on generating points rather than producing content of value).
Give the major concern appears to be avoiding low value edits you could cap the amount per day and edits per article. For example instead of a user being able to apply multiple edits to a page one word at a time you make it so they are only rewarded for editing a page once a day. Additional edits would give them no additional points however would still be accepted. It doesn't have to be a page either you could use paragraph or whatever level of granularity works for the content. The most important thing is to track all of this over time and do spot checks on whether the top users are indeed the ones that have contributed value according to whatever metric you decide is important.
User always try and game any points system so whatever you choose I would make sure to track enough information so that you can change your algorithm in the future and understand how it will work.

Extracting relationship from NER parse

I'm working on a problem that at the very least seems to require named entity recognition, but I'm not sure how to go farther than the NER parse. What I'm trying to do is parse information (likely from tweets) regarding scheduling of events. So, for example, I'd like to be able to automatically resolve the yes/no answer to the question of "Are The Beatles playing tomorrow?" from short messages like:
"The Beatles cancelled their show tomorrow" or
"The Beatles' show is still on tomorrow"
I know NER will get me close as it will identify the band of interest and the time (if it's indicated), but there are many ways to express the concepts I'm interested in, for example:
"The Beatles are on for tomorrow" or
"The Beatles won't be playing tomorrow."
How can I go from an NER parsed representation to extracting the information of interest? Any suggestions would be much appreciated.
I guess you should search by event detection (optionally - in Twitter); maybe, also by question answering systems, if your example with yes/no questions wasn't just an illustration: if you know user needs in advance, this information may increase the quality of the system.
For start, there are some papers about event detection in Twitter: here and here.
As a baseline, you can create a list with positive verbs for your domain (to be, to schedule) and negative verbs (to cancel, to delay) - just start from manual list and expand it by synonyms from some dictionary, e.g. WordNet. Also check for negations - again, by presence of pre-specified words ('not' in different forms) in a tweet. Then, if there is a negation, you just invert the meaning.
Since you work with Twitter and most likely there would be just one event mentioned in a tweet, it can work pretty well.

nlp: alternate spelling identification

Help by editing my question title and tags is greatly appreciated!
Sometimes one participant in my corpus of "conversations" will refer to another participant using a nickname, usually an abbreviation or misspelling, but hereafter I'll just say "nicknames". Let's say I'm willing to manually tell my software whether or not I think various possible nicknames are in fact nicknames, but I want software to come up with a list of possible matches between the handle's that identify people, and the potential nicknames. How would I go about doing that?
Background on me and then my corpus: I have no experience doing natural language processing but I'm a competent data analyst with R. My data is produced by 70 teams, each forecasting the likelihood of 100 distinct events occurring some time in the future. The result that I have 70 x 100 = 7000 text files, containing the stream of forecasts participants make and the comments they include with their forecasts. I'll paste a very short snip of one of these text files below, this one had to do with whether the Malian government would enter talks with the MNLA:
02/12/2013 20:10: past_returns answered Yes: (50%)
I hadn't done a lot of research when I put in my previous
placeholder... I'm bumping up a lot due to DougL's forecast
02/12/2013 19:31: DougL answered Yes: (60%)
Weak President Traore wants talks if MNLA drops territorial claims.
Mali's military may not want talks. France wants talks. MNLA sugggests
it just needs autonomy. But in 7 weeks?
02/12/2013 10:59: past_returns answered No: (75%)
placeholder forecast...
http://www.irinnews.org/Report/97456/What-s-the-way-forward-for-Mali
My initial thoughts: Obviously I can start by providing the names I'm looking to match things up with... in the above example they would be past_returns and DougL (though there is no use of nicknames in the above). I wouldn't think it'd be that hard to get a computer to guess at minor misspellings (though I wouldn't personally know where to start). I can imagine that other tricks could be used, like assuming that a string is more likely to be a nickname if it is used much much more by one team, than by other teams. A nickname is more likely to refer to someone who spoke recently than someone who spoke long ago, or not at all on regarding this question. And they should be used in sentences in a manner similar to the way the full name/screenname is typically used in the corpus. But I'm interested to hear about simple approaches, as well as ones that try to consider more sophisticated techniques.
This could get about as complicated as you want to make it. From the semi-linguistic side of things, research topics would include Levenshtein Distance (for detecting minor misspellings of known names/nicknames) and Named Entity Recognition (for the task of detecting names/nicknames in the first place). Actually, NER's worth reading about, but existing systems might not help you much in your domain of forum handles and nicknames.
The first rough idea that comes to mind is that you could run a tokenized version of your corpus against an English dictionary (perhaps a dataset compiled from Wiktionary or something like WordNet) to find words that are candidates for names, then filter those through some heuristics (do they start with the same letters as known full names? Do they have a low Levenshtein distance from known names? Are they used more than once?).
You could also try some clustering or supervised ML algorithms against the non-word tokens. That might reveal some non-"word" tokens that often occur in the same threads as a given username; again, heuristics could help rule out some false positives.
Good luck; sounds like a fun problem - hope I mentioned at least one thing you hadn't already thought of.

List of all questions along with tags on StackOverflow (for NLP tasks)

Since StackOverflow comes with a wealth of questions and user-contributed tags, I am looking at it as an interesting, richly annotated, text corpus for NLP (natural language processing) tasks.
Basically, I want to automatically predict question tags based on the questions body. I am sure this can be done to a certain extend, and there's a number of nice use cases, such as tag suggestions (e.g. to make tag usage more consistent), to name just one.
For this I would need a lot - or even better: - all questions along with their body text and user tags to train a tag predicter with machine learning algorithms.
I know there's the StackOverflow API, but the amount of data I can fetch through it seems to be very limited - for good reasons of course.
So the question is: Is there a way to fetch/download all questions along with their user-tags from StackOverflow?
You can get the data dump at http://www.clearbits.net/torrents/2076-aug-2012, sans the meta sites, a minor oversight which has been fixed with an alternate release, but is not applicable to your request.

Interview Question: What is a hashmap? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was asked this in an interview: "Tell me everything you know about hashmaps."
I proceeded to do just that: it's a data structure with key-value pairs; a hash function is used to locate the element; how hash collisions can be resolved, etc.
After I was done, they asked: "OK, now explain everything you just said to a 5-year-old. You can't use technical terms, especially hashing and mapping."
I have to say it took me by surprise and I didn't give a good answer. How would you answer?
Rules. Kids know rules. Kids know that certain items belong in certain places. A HashMap is like a set of rules that say, given an item (your shoes, your favorite book, or your clothes) that there is a specific place that they should go (the shoe rack, the bookshelf, or the closet).
So if you want to know where to look for your shoes, you know to look in the shoe rack.
But wait: what happens if the shoe rack is already full? There are a few options.
1) For each item, there's a list of places you can try. Try putting them next to the door. But wait, there's something there already: where else can we put them? Try the closet. If we need to find our shoes, we follow the same list until we find them. (probing sequences)
2) Buy a bigger house, with a bigger shoe rack. (dynamic resizing)
3) Stack the shoes on top of the rack, ignoring the fact that it makes it a real pain to find the right pair, because we have to go through all of the shoes in the pile to find them. (chaining).
Lets take the big word book, or dictionary, and try to find the word zebra. We can easily guess that zebras will be near the end of the book, just like the letter "Z" is at the end of the alphabet. Now lets say that we can always find where the zebra is inside of the big word book. This is the way that we can quickly find zebras, or elephants, or any other type of thing we can think of in the big word book. Sometimes two words will be on the same page like apple and ant. We are sure which page we want to look at, but we aren't sure how close apple and ant are to each other until we get to the page. Sometimes apple and ant can be on the same page and sometimes they might not be, some big word books have bigger words.
That's how I would have done it.
Speaking as a parent, if I had to explain a hashmap to a 5-year-old, I'd say exactly what you said while waving around a chocolate cupcake.
Seriously, questions like this ought to mean "can you explain the concept in plain english", a good heuristic for how well you've internalized your understanding of it. Since it sounds like you get that, the question seems a bit silly.
The pieces of data the map holds can be looked up by some related information, much like how pages can be looked up by the words on them in the index of a book.
The key advantage to using a HashMap is that like an index in a book, it's much quicker to look up the page a word is on in the index than it would be to start searching page by page for that word.
(I'm giving you a serious answer because the interviewer might have been trying to see how well you can explain technical concepts to non-techies like project managers and customers. Maybe a hashmap directly isn't so useful, but it's probably as fair an indication as any of translation skills.)
You have a book of blank but numbered pages and a special decoder ring that generates a page number when a something is entered into it into it.
To assign a value:
You get a ID (key) and a message (value).
You put the ID into the special decoder ring and it spits out the page number.
Open your book to that page. If the ID is on the page, cross out the ID/message.
Now write the ID and message on the page. If there is already a one or more other IDs with messages just write the new one below it.
To retrieve a value:
You are given just an ID (key).
You put the ID into the special decoder ring and it spits out the page number.
Open your book to that page. If the ID is on the page, read the message (value) that follows it.

Resources