Rank of languages used in my Github commits - programming-languages

Is it possible to display rank of languages used in my commits? It is possible to display overall rank of Github community:
https://github.com/languages
and langues used in my projects, but it's still not what I'm looking for.

Related

How to find popular Google search terms for a particular demographic/location/interest group?

I'm starting an online business targeted at a particular demographic and interests so I would like to produce content targeted at what this particular target market are actually searching for.
Google Ads allowed me to refine my target audience to the exact categories (demographics and interests) I needed but I couldn't tell me what that category of people tend to search for except for the tiny subset that happens to click on one of my ads which is very rare given I am just starting with a small budget. I would like to know the most popular search terms for everyone in the categories I specified not just those who happened to click on my ads.
I tried Google Trends, that told me the popularity of a particular search term for a given country but that's too broad - I need to narrow it down to a particular city, age group, parental status and interests. Google Trends also helped me find popular related search terms given a particular search term so I could try using that to see if there are any common popular related search terms related to my guesses but I could miss terms related to terms I never thought of.
I could try producing content across a rage of topics which I think my target audience might be interested in and then analyse the results using Google Ads but that could be a very expensive trial and error process and I might miss more popular topics which I never thought of.
Of course I could try to ask my target market in person directly (by interrupting people in the street!) but that would be very expensive for me because I would have to travel to and stay at the location where my online business is targeted, hoping to meet people with the exact same demographic and interests that I am looking.
I'm sure there must be a way to figure this out using the the Google search analytics. Essentially, all I need is a list of most popular recent Google search terms for a particular location, demographic and interests group in Google Analytics. Could anyone help me understand how to get this list?
Here are a few considerations, even if you found an answer.
Take a look at the AdRoll platform. Here's a potentially helpful article from them about target audience and demographics.
A recent article about AdWords demographic targeting. An older looking article, connecting demographics to search queries, but page's source code suggests an update this year.
Last but not least, you're probably eligible to talk with a Google Small Business Advisor.

List of all questions along with tags on StackOverflow (for NLP tasks)

Since StackOverflow comes with a wealth of questions and user-contributed tags, I am looking at it as an interesting, richly annotated, text corpus for NLP (natural language processing) tasks.
Basically, I want to automatically predict question tags based on the questions body. I am sure this can be done to a certain extend, and there's a number of nice use cases, such as tag suggestions (e.g. to make tag usage more consistent), to name just one.
For this I would need a lot - or even better: - all questions along with their body text and user tags to train a tag predicter with machine learning algorithms.
I know there's the StackOverflow API, but the amount of data I can fetch through it seems to be very limited - for good reasons of course.
So the question is: Is there a way to fetch/download all questions along with their user-tags from StackOverflow?
You can get the data dump at http://www.clearbits.net/torrents/2076-aug-2012, sans the meta sites, a minor oversight which has been fixed with an alternate release, but is not applicable to your request.

what algorithm does freebase use to match by name?

I'm trying to build a local version of the freebase search api using their quad dumps. I'm wondering what algorithm they use to match names? As an example, if you go to freebase.com and type in "Hiking" you get
"Apo Hiking Society"
"Hiking"
"Hiking Georgia"
"Hiking Virginia's national forests"
"Hiking trail"
Wow, a lot of guesses! I hope I don't muddy the waters too much by not guessing too.
The auto-complete box is basically powered by Freebase Suggest which is powered, in turn, by the Freebase Search service. Strings which are indexed by the search service for matching include: 1) the name, 2) all aliases in the given language, 3) link anchor text from the associated Wikipedia articles and 4) identifiers (called keys by Freebase), which includes things like Wikipedia article titles (and redirects).
How the various things are weighted/boosted hasn't been disclosed, but you can get a feel for things by playing with it for while. As you can see from the API, there's also the ability to do filtering/weighting by types and other criteria and this can come into play depending on the context. For example, if you're adding a record label to an album, topics which are typed as record labels will get a boost relative to things which aren't (but you can still get to things of other types to allow for the use case where your target topic doesn't hasn't had the appropriate type applied yet).
So that gives you a little insight into how their service works, but why not build a search service that does what you need since you're starting from scratch anyway?
BTW, pre-Google the Metaweb search implementation was based on top of Lucene, so you could definitely do worse than using that as your starting point. You can read some of the details in the mailing list archive
Probably they use an inverted Index over selected fields, such as the English name, aliases and the Wikipedia snippet displayed. In your application you can achieve that using something like Lucene.
For the algorithm side, I find the following paper a good overview
Zobel and Moffat (2006): "Inverted Files for Text Search Engines".
Most likely it's a trie with lexicographical order.
There are a number of algorithms available: Boyer-Moore, Smith-Waterman-Gotoh, Knuth Morriss-Pratt etc. You might also want to check up on Edit distance algorithms such as Levenshtein. You will need to play around to see which best suits your purpose.
An implementation of such algorithms is the Simmetrics library by the University of Sheffield.

Word Map for Emotions

I am looking for a resource similar to WordNet. However, I want to be able to look up the positive/negative connotation of a word. For example:
bribe - negative
offer - positive
I'm curious as to whether anyone has run across any tool like this in AI/NLP research, or even in linguistics.
UPDATE:
For the curious, the accepted answer below put me on the right track towards what I needed. Wikipedia listed several different resources. The two I would recommend (because of ease of use/free use for a small number of API calls) are AlchemyAPI and Lymbix. I decided to go with AlchemyAPI, since people affiliated with academic institutions (like myself) and non-profits can get even more API calls per day if they just email the company.
Start looking up topics on 'sentiment analysis': http://en.wikipedia.org/wiki/Sentiment_analysis
The are some vocabulary compilations regarding affect, aka dictionaries of affect, such as the Affective Norms of English Words (ANEW) or the Dictionary of Affect in Language (DAL). They provide a dimensional representation of affect (valence, activation and control) that may be of use in a sentiment analysis scenario (detection of positive/negative connotation). In this sense, EmoLib works with the former, by default, but may be easily extended with a more specific lexicon to tackle particular needs (for example, EmoLib provides an additional neutral label that is more appropriate than the positive/negative tag set alone in a Text-To-Speech synthesis setting).
There is also SentiWordNet, which gives you positive, negative and objective scores for each WordNet synset.
However, you should be aware that the positive and negative connotation of a term often depends on the context in which it is used. A great introduction to this topic is the book Opinion mining and sentiment analysis by Bo Pang and Lillian Lee, which is available online for free.

Building a code asset library [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have been thinking about setting up some sort of library for all our internally developed software at my organisation. I would like collect any ideas the good SO folk may have on this topic.
I figure, what is the point in instilling into developers the benefits of writing reusable code, if on the next project the first thing developers do is file -> new due to a lack of knowledge of what code is already out there to be reused.
As an added benefit, I think that just by having a library like this would encourage developers to think more in terms of reusability when writing code
I would like to keep this library as simple as possible, perhaps my only two requirements being:
Search facility
Usable for many types of components: assemblies, web services, etc
I see the basic information required on each asset/component to be:
Name & version
Description / purpose
Dependencies
Would you record any more information?
What would be the best platform for this i.e., wiki, forum, etc?
What would make a software library like this successful vs unsuccessful?
All ideas are greatly appreciated.
Thanks
Edit:
Found these similar questions after posting:
How do you ensure code is reused correctly?
How do you foster the use of shared components in your organization?
Sounds like there is no central repository of code available at your organization. Depending on what you do this could be because of compatmentalization of the knowledge due to security restrictions, the fact that external vendor code is included in some/all of the solutions, or your company has not yet seen the benefits of getting people to reuse, refactor, and evangelize the benefits of such a repository.
The common attributes of solutions I have seen work at mutiple corporations are a multi pronged approach.
Buy in at some level from the management. Usually it's a CTO/CIO that the idea resonates with and they claim it's a good thing and don't give any money to fund it but they won't sand in your way if they are aware that someone is going to champion the idea before they start soliciting code and consolidating it somewhere.
Some list of projects and the collateral available in english. Seen this on wikis, on sharepoint lists, in text files within a source repository. All of them share the common attribute of some sort of front end search server that allows full text over the description of a solution.
Some common share or repository for the binaries and / or code. Oftentimes a large org has different authentication/authorization methods for many different environments and it might not be practical (or possible logistically) to share a single soure repository - don't get hung up on that aspect - just try to get it to the point that there is a well known share/directory/repository that works for your org.
Always make sure there is someone listed as a contact - no one ever takes code and runs it in production without at lest talking to the previous owner of it - and if you don't have a person they can start asking questions of right away then they might just go ahead and hit file->new.
Unsuccessful attributes I've seen?
N submissions per engineer per time period = lots of crap starts making it's way in
No method of rating / feedback. If there is no means to favorite/rate/give some indicator that allows the cream to rise to the top you don't go back to search it often because you weren't able to benefit from everyone else's slogging through the code that wasn't really very good.
Lack of feedback/email link that contacts the author with questions directly into their email.
lack of ability to categorize organically. Every time there is some super rigid hierarchy or category list that was predetermined everything ends up in "other". If you use tags or similar you can avoid it.
Requirement of some design document to accompany it that is of a rigid format the code isn't accepted - no one can ever agree on the "centralized" format of a design doc and no one ever submits when this is required.
Just my thinking.

Resources