Statistical data on the software business

Statistical data on the software business - statistics

Is there a good source information on the software business (in US)? In particular, I'm looking for things like:
Number of people employed in the software business (not just developers, but support staff as well).
Break down by age, sex, etc. of current software developer workforce.
Number of businesses whose primary product is software development.
Number of universities offering BS, MS, PhD in computer science.
You get the idea...I realize that most of this information is scattered on various websites, but I wanted to know if there was a single source that had a "all the stats on the computer software industry."

Some answers in this question are about US and could be of interest. No single resources, though.

I haven't tried the questions you pose, but you could try http://www.wolframalpha.com/. Notice that below the "knowledge search results" it links to references, that may be of value.

Related

Data on segmentation of US search traffic by topic

I'm working on a research project trying to understand the patterns and breakdowns of search usage and volumes in the United States.
Ideally, I would love a breakdown of search volume across topics like:
navigational (ie just want to get to a domain link)
news (if possible: split amongst events, celebs, politics, ...)
sports (if possible: dig into splits of live scores, news about an athlete or a team, ... )
finance (e.g. stock names )
anything local (e.g. food, restaurants, places)
people (e.g. bios)
anytime time related (what time in nyc, sf, ...)
anything numbers related (math/calculators)
Other topics: immigration, legal, health/medicine, science/technology, food/recipes, code/math, politics, weather, images/video, etc.
Not sure if there is a dataset or good report somewhere that would give me insight into all these?
There seem to be a lot of keyword planning tools, which is somewhat helpful and I guess I could collect data on groups of keywords realted to the topics above, but for things like celebrity bios it would be quite difficult to group together all the data because each possible well known person is there own keyword…
Any help, direction would be appreciated! Thank you so much

Looking for ICD10 API [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Any body knows of a good ICD10 API to do diagnostic code lookups that can recommend. I am currently building a simple app to tag patients with medical condition and the idea is to have a lookup API where one can type asthma for example and get back all the different ICD10 codes for asthma

My R package, icd converts ICD-9 and ICD-10 codes to descriptions, in addition to its main function of finding comorbidities. Documentation at https://jackwasey.github.io/icd/ , and code at https://github.com/jackwasey/icd . It does this using the function explain_code. It currently uses ICD-10-CM, i.e. the USA billing adapted ICD-10 code set, which in general is more specific than the canonical WHO version, but does have some areas of less detail.
E.g. WHO ICD-10 has HIV disease resulting in Pneumocystis jirovecii pneumonia as a subdivision for HIV infection, whereas ICD-10-CM just has HIV. On the other hand, ICD-10-CM has Sucked into jet engine, subsequent encounter whereas the WHO is happy with the terribly vague: Person on ground injured in air transport accident.
The volume of data for all the descriptions is not very high, just handful of megabytes, so although an API may seem convenient, you might consider just having all the data and not having to ping some random server.

I'm going to assume you're ignoring all of the usual stuff around variations of spelling of medical terms, proper terms vs. colloquialisms, labels vs. descriptions, etc. that get to be a pain with term / code finders.
If you want to use a hosted option and are OK with the terms of use, you could use UMLS (https://uts.nlm.nih.gov/home.html#apidocumentation). It's a great resource, but the use case you're describing isn't necessarily what it's intended to address.
Personally - and I usually don't like to roll my own stuff - I'd consider doing your own thing. You could do something focused on your needs and tailor it to any specific behaviors you might want (like preferring specific codes based on an organization - EX: billing preference). You could also probably make it far, far more ... perky ... and address short forms of terms (EX: synonyms like "DVT") or misspellings ("asthma" vs. "athsma"). If you go that route, I'd suggest considering getting your hands on the ICD-10 code info and then mashing it into Elastic Search. You could extend the data by mixing it with other info and really make it hum. And Elastic is wicked fast.
That's just my $0.02, though.

There is a project called "Unified Medical Language System (UMLS)", funded by NIH and apparently they are working on a RESTful Web API for medical terms.
https://documentation.uts.nlm.nih.gov/rest/home.html
I didn't work with their API yest and the samples I am seeing on their website sounds like they are more SNOMED-CT oriented.
The option I would go for is to get the whole ICD-10-CM from CMS and build my own Web API.
https://www.cms.gov/Medicare/Coding/ICD10/2016-ICD-10-CM-and-GEMs.html

you can check the full documentation from WHO https://icd.who.int/icdapi

nlp: alternate spelling identification

Help by editing my question title and tags is greatly appreciated!
Sometimes one participant in my corpus of "conversations" will refer to another participant using a nickname, usually an abbreviation or misspelling, but hereafter I'll just say "nicknames". Let's say I'm willing to manually tell my software whether or not I think various possible nicknames are in fact nicknames, but I want software to come up with a list of possible matches between the handle's that identify people, and the potential nicknames. How would I go about doing that?
Background on me and then my corpus: I have no experience doing natural language processing but I'm a competent data analyst with R. My data is produced by 70 teams, each forecasting the likelihood of 100 distinct events occurring some time in the future. The result that I have 70 x 100 = 7000 text files, containing the stream of forecasts participants make and the comments they include with their forecasts. I'll paste a very short snip of one of these text files below, this one had to do with whether the Malian government would enter talks with the MNLA:
02/12/2013 20:10: past_returns answered Yes: (50%)
I hadn't done a lot of research when I put in my previous
placeholder... I'm bumping up a lot due to DougL's forecast
02/12/2013 19:31: DougL answered Yes: (60%)
Weak President Traore wants talks if MNLA drops territorial claims.
Mali's military may not want talks. France wants talks. MNLA sugggests
it just needs autonomy. But in 7 weeks?
02/12/2013 10:59: past_returns answered No: (75%)
placeholder forecast...
http://www.irinnews.org/Report/97456/What-s-the-way-forward-for-Mali
My initial thoughts: Obviously I can start by providing the names I'm looking to match things up with... in the above example they would be past_returns and DougL (though there is no use of nicknames in the above). I wouldn't think it'd be that hard to get a computer to guess at minor misspellings (though I wouldn't personally know where to start). I can imagine that other tricks could be used, like assuming that a string is more likely to be a nickname if it is used much much more by one team, than by other teams. A nickname is more likely to refer to someone who spoke recently than someone who spoke long ago, or not at all on regarding this question. And they should be used in sentences in a manner similar to the way the full name/screenname is typically used in the corpus. But I'm interested to hear about simple approaches, as well as ones that try to consider more sophisticated techniques.

This could get about as complicated as you want to make it. From the semi-linguistic side of things, research topics would include Levenshtein Distance (for detecting minor misspellings of known names/nicknames) and Named Entity Recognition (for the task of detecting names/nicknames in the first place). Actually, NER's worth reading about, but existing systems might not help you much in your domain of forum handles and nicknames.
The first rough idea that comes to mind is that you could run a tokenized version of your corpus against an English dictionary (perhaps a dataset compiled from Wiktionary or something like WordNet) to find words that are candidates for names, then filter those through some heuristics (do they start with the same letters as known full names? Do they have a low Levenshtein distance from known names? Are they used more than once?).
You could also try some clustering or supervised ML algorithms against the non-word tokens. That might reveal some non-"word" tokens that often occur in the same threads as a given username; again, heuristics could help rule out some false positives.
Good luck; sounds like a fun problem - hope I mentioned at least one thing you hadn't already thought of.

How do I manage specs in Scrum? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
Referring to this buddy question, I want to know how one can manage specs in Scrum process ? I'm facing this problem while assigning tasks to my team for the sprint. Needless to say - I'm new to Agile/Scrum.
Currently, we are using our own specs sheet to map StoryId to SpecId and vice versa. I'm getting the felling that Scrum is more about project management [getting things done on time] and you need a seperate process to manage specs and requirements.
How do we manage specs in a Scrum process ?

The short answer is, you don't.
The important question to ask yourself when writing these specs, is why do we do them? What is the value in the spec?
The value in the spec usually comes in communicating the ideas of the business with the development team. Scrum is designed to bring the business (in the form of the Product Owner) to the development team. By interacting with the team frequently (remember, individuals and interactions over processes and tools), and by seeing working software frequently, the business can work hand in hand with developers to produce software that solves business problems better than by trying to spec out the whole thing before you get to try it out.
This is how Agile projects do a better job of delivering the product the business wants instead of the product they requested.
That said, there are certain base criteria that need to be met. We can test for this, and as with any good tests, we can automate it.
Have a look at BDD and Cucumber. In addition to your User Story, it's good to have a basic set of conditions of satisfaction, preferably in the "Give/When/Then" format. These conditions are the minimum set of criteria for the story to be accepted as complete.
For example, "Given I am logged in, when I log out, then I am taken back to the home page".
If you're going to have acceptance criteria, you're going to want to automate it. The worst part of most specifications is they often end up out of date and collecting dust when the project is complete.
Also, you shouldn't be assigning tasks to the team. Scrum teams are self organizing and anyone should be able to grab any task they feel they can work on while respecting the priority of the stories. Swarming is a big part of the performance benefits of Scrum.
You may want to consider bringing in an outside coach to assist with your transition.

I think that the easiest way is to make the specifications a part of the user stories within the tasks, themselves. Clearly list the acceptance criteria in each one (or if your issue tracking software allows you, create them as first class work item types). Let the issue in whatever you use for work item tracking become the living document.
There are drawbacks, such as finding related issues as specs change over time, but this can usually be managed in the work item tracking tooling, assuming your can relate issues to each other.
The way that we do it is that we (actually a BA, not the developers) creates a sign-off deck for the product owner to review and we collaboratively create tasks off of that. If we cannot create a task, or there are open questions, we will go back to the product owner with those questions and update the deck. All of our decks are organized (in SharePoint) so that we can easily find them in the future.

For me the specs is in the user stories. We define the specs and the tasks duing out initial scrum meeting along with the product owner. The specs and tasks are just for the life time of the scrum iteration as everything might change in the next iteration(in the worst case but there will definitively be changes).
We usually keep track of the specifications and task on a spreadsheet just so that everybody know what they are working on. I have also tried a few software to do this and one of the most interesting ones I have come across is from [VersionOne][1] and also from [Rally][2].
But I still find that using a simple spreadsheet is the fastest and simplest solution.

As I understand SCRUM, it does not take care about specs management. You have to broke/map your specs or specs changes to stories and tasks separately. But you can have a task for this :).

There is a real tension between Scrum and other agile dev methodologies and spec writing. I think there are two big points of tension:
Because agile says everything should
be on an index card, that means you
have to have stuff planned out
enough to fit on an index card.
(E.g. you have to know how it's all
going to work.)
Some things don't make sense in
isolation (what's the use of an
upload file page without a manage
uploaded files page, for instance.)
You don't have to design the whole app all at once, but you have to have a vision of the whole app. Then, especially if you have a separation of designers and programmers, you do functional design for a sprint-sized chunk at a time. Those designs then have to be broken down to story-sized chunks.
This is a lot of up front functional design, and I think that's overlooked in a lot of the talk about agile methodologies. Perhaps some shops have the devs do more of the design. Also, I think it's a lot easier to use scrum/agile for making changes/bug fixes to existing apps rather than building new ones.
The thing I've found most helpful is to fight back on story size. A lot of organizations have gone crazy, saying stories need to be only a few hours. The original scrum book says 16 hours, I think, which is often large enough to fit an entire screen of a web app. So "implement manage my account" could be a story (as opposed to the hundreds-of-tiny-stories approach of "implement username", "implement password" etc.) Then reference your design doc for "Manage My Account" and make sure to have word-perfect screenshots/prototype/mockup so the dev can look at them and copy/paste the text directly into the code they're writing, and they know for sure which fields need to be there (or which links, or which pictures, or whatever).

How do you estimate an agile project up front? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When working on fixed price software development projects, I frequently find myself having to estimate the total number of hours a project will take after the price is set, but before the work is started (or VERY early on in the development). Unfortunately, these types of projects are best developed using an iterative/agile method, which means that we don’t (and really can’t) do a complete up-front design.
In a typical scenario, we would have a contract that has X features and Y dollars. After contracting, the engineering department would then need to estimate the number of hours required to complete the X features. There are several possible reasons to need this information up front, including:
• The Y dollars translates to Z hours available, so we have to make sure that time(X)<=Z, perhaps by reducing the scope of X.
• The delivery date is set, and so we have to assign the appropriate resources to meet that date.
Kelly Waters has an interesting take on estimating agile here: http://www.agile-software-development.com/2009/04/agile-estimating.html Unfortunately, these are estimations of difficulty, using a points system, and do not translate to hours.
It seems to me that we need to be able to do one of two things:
• Obtain contracts that have a huge amount of flexibility in them to accommodate an agile development process.
• Figure out how to provide reasonably accurate up-front estimates for features that have not yet been designed.
The first option is of course not an option in most cases.
Does anyone have any advice/guidance on how to generate up-front estimates in an agile development scenario?
Alternatively, does anyone see another option for solving our problem through some other process change?

I think every client wants to have at least an estimate of how much the implementation of a given number of feature will cost him. I don't agree with people that say that if your using agile than you can't do this. Agile can be adapted to the real world where clients want to know how much money they're gonna spend on a project, or at least have a rough idea.
So, there are at least two documented ways you can do this and both are described in the "Agile Estimating and Planning" book by Mike Cohn that i strongly advise everyone to read.
Before your project even starts do the exercise of breaking down your stories in tasks and estimate each home in hours. Do the budget math with those estimates. Keep in mind that these estimates will only be used to reach the estimate time/budget. When the project starts the team should be responsible for estimating and creating the tasks like normal.
Use historical data. If the same team has worked before on a project with similar technology then you can use the past team velocity to estimate the project cost.
Again, for more details on how to do this read the referenced book.

Big Upfront Estimation is just Big Upfront Design with $$$ attached
You don't, that would go completely against the entire paradigm of agile.
Agile can be about Fixed Costs
What you can do is set a date and then work torwards it in iterations/sprints and let the product owner(s) decided what is important to get done by that date. In this way a Sprint of 1 or 2 weeks is a Fixed Cost the same way it would be in an Big Up Front Design would be, it is just a smaller fixed cost.
The entire premise behind agile methodologies is to do away with arbitrary deadlines and the death march that they become because change is not accounted for in the contracts and deadlines. SCRUM works, is a great starting point to build a methodology on, read about it and do what it says at least as a starting point.
Short Sprints with customer feedback and approvals
Give you a starting point to refine your future estimations upon, and helps you gain trust from the product owner and customer very quickly, because they see their most important concerns delivered in a very short amount of time.

Fixed price in Agile gives you a number of iterations you can run for a given team size. The point of Agile is then to maximize the value you can get during these iterations. Agile is about scope management.
So, you actually shouldn't do upfront estimations. This would mean fixed scope which is the just the opposite of Agile or anti-Agile if you prefer.
Agile doesn't work like this, you need a new kind of contract, as pointed out in another answer. See for example 10 Agile Contracts and/or google.

Instead of aiming for a contract for the "entire" project, you should create a contract that just runs for a month or two.
Set some very low goals, perhaps only two or three features.
Explain to the client this is a discovery phase of the project. Without this phase you cannot give them an estimate for the entire scope. It would be in no ones interest to give them an inaccurate estimate.
The client benefits in the following ways:
Lower upfront cost (only a fraction of the total price). If things go tits up there is a limit on their financial exposure (sensible given that most software projects fail).
Have working software within a couple of months that may immediately solve some of their problems
Have working software to provoke thought/conversation about requirements
Client will discover more about what they actually want
Client gets to see how well you work. If they don't like it they can look elsewhere.

"you don't, that would go completely against the entire paradigm of agile": I think this is quite incorrect. Agile is common sense based and nobody would invest in project if they had no idea how much it would roughly cost: they understand they may have to cut scope to meet deadlines or increase budget and/or extend deadlines to deliver the whole scope. On my company projects, we estimate projects by comparing their size and are also estimating epics using poker planning. Using the cone of uncertainty, and without trading off quality, we are about maximum 50% off on our estimates prior to starting the first sprint compared to the reality. And we will improve as we build up our Agile expertise (on accuracy and time to get first estimates).
You can't expect marketing to sponsor projects for 3 sprints (so up to 3 months) to have an idea of how much it would cost.

If you estimate the backlog items in story points (which is discussed in the Kelly Waters article you link to) then the question becomes how many story points do you expect your team to deliver in a sprint. If your team has been working together before, then you should be able to take a guess at this (perhaps with upper and lower bounds to indicate a confidence range).
If the team haven't worked together before, then you can take a few well-understood stories and break them down into detailed tasks. This will give you a man-hours number and you can compare this against your story point estimates to try and predict your velocity. Again, a confidence range of upper and lower values is probably appropriate.
Let's suppose your user stories add up to 150 points, and you predict that your team can deliver 15-20 story points per month, then the work will take 7.5 - 10 months (through simple division).
The advantage of this approach is that you can measure the team's actual velocity, and compare it to the plan. Re-estimating you timelines should also be pretty easy, e.g. if you discover after a couple of sprints that the team's velocity is actually only 10 story points per month, then you can easily predict what your revised timelines will be.
Do bear in mind that team velocity does fluctuate for a number of reasons. It is typically lower at the start of a project when the team is still forming. Also the team is most likely concentrating on infrastructure concerns at this point rather than delivering functionality.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string