Obtaining a Referenced Assembly's Call Count

Obtaining a Referenced Assembly's Call Count - reference

I am in the process of writing a document analyzing a large codebase for quality and maintainability. As part of this report I wish to include a count of the number of references an assembly makes to another assembly within the solution. This will give an idea of how tightly coupled each assembly is to another.
Is there a tool in Visual Studio 2015 Enterprise (or 3rd Party Plug-In) that can give me this number?
So far I have tried Visual Studio's Code Map tool but this appears to just generate a visualization with arrows which I would then have to count manually and futhermore this only appears to be to class/struct-level, not the number of individual references within each class/struct.

NDepend (http://www.ndepend.com/) offers this functionality. It can be also be quite helpful in more general terms for the type of exploratory quality analysis that you describe.

You can use FxCop / Code Analysis to do this, this has a number of maintainability rules, of most interest to you would probably be:
CA1506: Avoid excessive class coupling
This rule measures class
coupling by counting the number of unique type references that a type
or method contains.
I believe the thresholds are 80 for a class and 30 for a method.
It's relatively easy to set up, basically you just need to configure it on a project:
Opening the ruleset lets you choose which ones to run (and whether they are warnings or errors), there are many, many rules.

To expand upon Nicole's answer, I have tested the trial of NDepend and I believe I have found the figures I was looking for in something they call the "Dependency Matrix". My understanding of it is as follows.
The numbers in green are a count of how many times the assembly in the current row references the assembly relating to the number in the current column. The numbers in blue are a count of how many times the assembly in the current row is referenced by the assembly relating to the number in the current column. Since an assembly cannot make an external reference to itself, no numbers can appear on the diagonal line.
What I do no understand however is why, for example, the number in cell 0, 4 is 93 but the number in cell 4, 0 is 52; shouldn't these numbers be equal? Assembly 0 is only used by assembly 4 the same number of times as assembly 4 uses assembly 0 - how can these numbers be different?
UPDATE: I have watched a PluralSight video on this tool and found out that the number in the green box represents how many methods in the referencing assembly make reference to the referenced assembly. The number in the corresponding blue box represents how many methods in the referenced assembly are being used by the referencing assembly. Neither of these numbers precisely represent the number of calls one assembly makes to another (since a method could contain multiple references) but I believe it does provide a sufficient level of granuarity anyway since methods should conform to SRP and thus all references within a method should relate to a single behaviour.

Related

Do I use foreach for 2 different inspection checks in activity diagram?

I am new in doing an activity and currently, I am trying to draw one based on given description.
I enter into doubt on a particular section as I am unsure if it should be 'split'.
Under the "Employee", the given description is as follows:
Employee enter in details about physical damage and cleanliness on the
machine. For the cleanliness, there must be a statement to indicate
that the problem is no longer an issue.
As such, I use a foreach as a means to describe that there should be 2 checks - physical and cleanliness (see diagram in the link), before it moves on to the next activity under the System - for the system to record the checks.
Thus, am I on the right track? Thank you in advance for any replies.

Your example is no valid UML. In order to make it proper you need to enclose the fork/join in a expansion region like so:
A fork/join does not accept any sematic labels. They just split the control flow into several parallel ones which join at the end.
However, this still seems odd since you would probably have some control for the different inspections being entered. So I'd guess there's a decision which loops through multiple inspection entries. Personally I use regions only for handling interrupts. ADs are nice to a certain level. But sometimes a tabular text (like suggested by Cockburn) is just easier to write and read. Graphical programming is not the ultimate answer (unlike 42).

First, the 'NO' branch of the decision node must lead somewhere (at the end?).
After, It differs if you want to show the process for ONE or MULTIPLE inspections. But the most logical way is to represent the diagram for an inspection, because you wrote inspection without S ! If you want represent more than one inspection, you can use decision and merge node to represent loop that stop when there is no more inspection.

Microsoft Excel 2016 T - tests

I am using Microsoft Excel 2016 for data analysis, I can carry out the test fine but I cannot produce the t value along with my P value?

The function T.INV can be used to get the critical values. The documentation describes that it "Returns the left-tailed inverse of the Student's t-distribution." Thus, e.g. T.INV(0.05,9) returns the value -1.83311. The corresponding right-tailed value ABS(T.INV(0.05,9)) = 1.83311 returns the right-tailed version of it which is more traditionally used in stats books.
There is also a function T.INV.2T which you might prefer if you are using a 2-tailed T-test.
The dots in function names like T.INV are unusual in Excel naming conventions. The reason behind it is that for about a decade Excel used statistical functions which were mathematically correct but had implementations that were not numerically stable. These tended to be fine in 99% of the cases but would sometimes behave poorly when applied to values which were several standard deviations away from the mean. Eventually Microsoft responded to persistent criticism and (with Excel 2010?) completely revamped its statistical functions. The old version were kept as is for reasons of backwards. The new versions have similar names in most cases but use a dot. For example T.INV is a more modern, more numerically stable, version of the old TINV. Whenever you have a choice between using two similarly named statistical functions, use the one with the dot.

Dynamics CRM 2011 Import Data Duplication Rules

I have a requirement in which I need to import data from excel (CSV) to Dynamics CRM regularly.
Instead of using some simple Data Duplication Rules, I need to implement a point system to determine whether a data is considered duplicate or not.
Let me give an example. For example these are the particular rules for Import:
First Name, exact match, 10 pts
Last Name, exact match, 15 pts
Email, exact match, 20 pts
Mobile Phone, exact match, 5 pts
And then the Threshold value => 19 pts
Now, if a record have First Name and Last Name matched with an old record in the entity, the points will be 25 pts, which is higher than the threshold (19 pts), therefore the data is considered as Duplicate
If, for example, the particular record only have same First Name and Mobile Phone, the points will be 15 pts, which is lower than the threshold and thus considered as Non-Duplicate
What is the best approach to achieve this requirement? Is it possible to utilize the default functionality of Import Data in the MS CRM? Is there any 3rd party Add-on that answer my requirement above?
Thank you for all the help.
Updated
Hi Konrad, thank you for your suggestions, let me elaborate here:
Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.
Nice one but I don't think it is really workable in my case, the data will be coming regularly from client in moderate numbers (hundreds to thousands). Typically client won't check about the duplication on the data.
Workflow. Run a process removing any instance calculated as a duplicate.
Workflow is a good idea, however since it is being processed asynchronously, my concern is the user in some cases may already do some update/changes to the data inserted, before the workflow finish working.. therefore creating some data inconsistency or at the very least confusing user experience
Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).
I like this approach. So I just import like usual (for example, to contact entity), but I already have a plugin in place that getting triggered every time a record is created, the plugin will check whether the record is duplicat-ish or not and took necessary action.

I haven't been fiddling a lot with duplicate detection but looking at your criteria you might be able to make rules that match those, pretty much three rules to cover your cases, full name match, last name and mobile phone match and email match.
If you want to do the points system I haven't seen any out of the box components that solve this, however CRM Extensions have a product called Import Manager that might have that kind of duplicate detection. They claim to have customized duplicate checking. Might be worth asking them about this.
Otherwise it's custom coding that will solve this problem.

I can think of the following approaches to the task (depending on the number of records, repetitiveness of the import, automatization requirement etc.) they may be all good somehow. Would you care to elaborate on the current conditions?
Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.
Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).
Workflow. Run a process removing any instance calculated as a duplicate.
You also need to consider the implication of such elimination of data. There's a mathematical issue. Suppose that the uniqueness' radius (i.e. the threshold in this 1D case) is 3. Consider the following set of numbers (it's listed twice, just in different order).
1 3 5 7 -> 1 _ 5 _
3 1 5 7 -> _ 3 _ 7
Are you sure that's the intended result? Under some circumstances, you can even end up with sets of records of different sizes (only depending on the order). I'm a bit curious on why and how the setup came up.
Personally, I'd go with plugin, if the above is OK by you. If you need to make sure that some of the unique-ish elements never get omitted, you'd probably best of applying a test algorithm to a backup of the data. However, that may defeat it's purpose.
In fact, it sounds so interesting that I might create the solution for you (just to show it can be done) and blog about it. What's the dead-line?

alternative spreadsheet for real time data

I am looking for an alternative spreadsheet to Excel, preferably but not necessarily open source, that allows a programmer to create a plugin that can update cells in the sheet from an external data source in real time. The spreadsheet would then internally compute all dependent calculation chains upon change of value.
This is similar functionality to what the RTD method does with Microsoft Excel. The rate of external data change could be moderate to high (whatever such relativistic terms mean).
Also the reverse process would be useful, i.e. detecting a change in cells and then sending that information to a plugin that can communicate with external processes.
Any recommendations or experience in trying this?

I am afraid you will not find any. The main consumers of the real-time spreadsheets (grids) are big banks and they usually invest in their own solutions. [Because they can afford and they used to see it as their advantage over the competition] Some of the solutions are very dated, but still going strong! Three years ago I worked on a system which was written in C++ (with TibcoRv as a backbone) and it was already five years old. It is still alive and kicking.
One of the strong points of the bespoke grid are "Excel-like formulae" where a user can use a field from the provided data dictionary. So rather than reference cells, you reference data from your systems. It makes formulae easier to implement and read. And of course you can export or share them; users really like that.

The following could be of some help:
http://www.dadisp.com
http://www.quantrix.com
http://www.resolversystems.com/products/
http://pyspread.sourceforge.net/
http://matrex.sourceforge.net/
This may not exactly satisfy your real time requirement but worth exploring.

Generic graphing and charting solutions

I'm looking for a generic charting solution, ideally not a hosted one that provides the following features:
Charting a tuple of values where the values are:
1) A service identifier (e.g. CPU usage)
2) A client identifier within that service (e.g. server IP)
3) A value
4) A timestamp with millisecond/second resolution.
Optional:
I'd like to also extend the concept of a client identifier further, taking the above example further, I'd like to store statistics for each core separately, so, another identifier would be Core 1/Core 2..
Now, to make sure I'm clearly stating my problem, I don't want a utility that collects these statistics. I'd like something that stores them, but, this is also not mandatory, I can always store them in MySQL, or such.
What I'm looking for is something that takes values such as these, and charts them nicely, in a multitude of ways (timelines, motion, and the usual ones [pie, bar..]). Essentially, a nice visualization package that allows me to make use of all this data. I'd be collecting data from multiple services, multiple applications, and the datapoints will be of varying resolution. Some of the data will include multiple layers of nesting, some none. (For example, CPU would go down to Server IP, CPU#, whereas memory would only be Server IP, but would include a different identifier, i.e free/used/cached as the "secondary' identifier. Something like average request latency might not have a secondary identifier at all, in the case of ping). What I'm trying to get across is that having multiple layers of identifiers would be great. To add one final example of where multiple identifiers would be great: adding an extra identifier on top of ip/cpu#, namely, process name. I think the advantages of that are obvious.
For some applications, we might collect data at a very narrow scope, focusing on every aspect, in other cases, it might be a more general statistic. When stuff goes wrong, both come in useful, the first to quickly say "something just went wrong", and the second to say "why?".
Further, it would be a nice thing if the charting application threw out "bad" values, that is, if for some reason our monitoring program started to throw values of 300% CPU used on a single core for 10 seconds, it'd be nice if the charts themselves didn't reflect it in the long run. Some sort of smoothing, maybe? This could obviously be done at the data-layer though, so its not a requirement at all.
Finally, comparing two points in time, or comparing two different client identifiers of the same service etc without too much effort would be great.
I'm not partial to any specific language, although I'd prefer something in (one of the following) PHP, Python, C/C++, C#, as these are languages I'm familiar with. It doesn't have to be open source, it doesn't have to be a library, I'm open to using whatever fits my purpose the best.
More of a P.S than a requirement: I'd like to have pretty charts that are easy for non-technical people to understand, and act upon too (and like looking at!).
I'm open to clarifying, and, in advance, thanks for your time!

I am pretty sure that protovis meets all your requirements. But it has a bit of a learning curve. You are meant to learn by examples, and there are plenty to work from. It makes some pretty nice graphs by default. Every value can be a function, so you can do things like get rid of your "Bad" values.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string