Getting data fields AND associated entities into a Word file

Getting data fields AND associated entities into a Word file - dynamics-crm-2011

NB. This question might be a bit similar to another one but still result in a different answer, hence the division in two.
I'd like to print out the contents of an entity, consisting of a subset of the fields and including some rudimentary information on its associated entities.
E.g. I'd like to print a lead by displaying its name and phone number but also a list of associated instances of type new_somesome (their new_name and new_size, for example) connected to it.
How can I do that?
I've tried cheating by using mail template but that doesn't include the related entities. I'm not sure how to edit the default print view to include the associatees neither.

Is it acceptable for you to use reports?
For this purpose you can try free MS ReportBuilder. MS allows to build reports without using such large tool as MS Buisness Intelligence. These custom reports appears to be more flexible than system ones. Another way is to try Mail merge functionality.

Related

Internal Search optimization for relevance

My team is using Solr and I have a question regarding it.
There are some search terms which doesn't gives relevant results or results which should have been displayed. For example:
Searching for Macy's without the apostrophe like "Macys" doesnt give back any result for Macy's.
Searching for JPMorgan vs JP Morgan gives different result
Searching for IBM doesn't show results which contains its full name i.e International business machine.
How can we improve and optimize such cases so that it gets applied to all, even to the one we didn't catch apart from these 3 above?
Any suggestions?

All these issues are related to how you process the incoming text for those fields. You'll have to create a filter chain for the field - and possibly use multiple fields for different use cases and prioritize those using qf - that processes the input values to do what you want.
Your first case can be solved by using a PatternReplaceFilter to remove any apostrophes - depending on your use case and tokenizer you might want to use the CharFilter version, as it processes the text before it's split into multiple tokens.
Your second case is a straight forward synonym filter or a WordDelimiterFilter, where you expand JPMorgan to "JP Morgan", or use the WordDelimiterFilter to expand case changes into separate tokens. That'll also allow you to search for JP and get JPMorgan related entries. These might have different effects on score, use debugQuery=true to see exactly how each term in your query contributes to the score.
The third case is in general the same as the second case. You'll have to create a decent synonym word list for the terms used, and this is usually something you build as you get feedback from your users, from existing dictionaries and from domain knowledge. There's also the option of preprocessing text using NLP, or in this case, something as primitive as indexing the initials of any capitalized words after each other could help.

TFS Query results with a list of linked work item IDs in Excel?

How can I get a list of linked work item IDs for a set of work items?
Excel-hosted queries preferred. API Sample is acceptable.
Direct DB table query is acceptable (read-only and unsupported of course!)
Many thanks in advance! -Zephan
MORE INFORMATION
UPDATE: No answers for my original Q so broadening scope of acceptable answers as follows:
Answer for TFS2015 (migrating very shortly) or TFS2013 (potentially useful for TFS2015) is preferred over TFS2010
Coding acceptable if there are any APIs or PowerShell cmdlets (MS or community).
Connecting directly (read-only!) to TFS DB tables is acceptable (source tables and related relationship link table names). Yes, directly referencing TFS DB tables is VERY unsupported, read-only, and "AT YOUR OWN RISK." Still beats having to manually copy/paste data or reconstruct list of links in Excel.
ORIGINAL QUESTION & DETAILS
My team uses TFS2010 (soon 2013 or hoping 2015) and VS2010-2015. I need to support traceability reports and analyze/quantify our coverage of ~300 Test Case work items linked to ~400 Requirement work items. Direct Link and Tree queries are close but don't give me related links on the same row as parent work item. Many thanks in advance for your suggestions and any related code fragments.
Example:
3 test cases (Test1, Test2, Test3)
4 Requirements (Req1, Req2, Req3, Req4)
For simplicity let's just use TFS work item IDs to represent each TestN and ReqN. In actuality, I have a keyword to identify my validation requirements (separate from the 1,000's of other requirements in this Team Project). The only Test Case WI I care about for this problem are those linked to one or more Validation Requirement trace-ability.
Scenarios:
1:1 (simple) Test1 is linked to Req1
1:2 (1:n) Test2 is linked to Req2 and Req3
2:1 (n:1) Test3 (and Test2) are both linked to Req3
0:1 (Requirement missing Test coverage) Req4 has no test case links
I have a good coverage gap query by creating a Direct Link query for all Requirements then set "linking filters" to Only return items that do not have the specified links.
Desired output (all tests with list of related work items):
|Test1 | Req1 |
|Test2 | Req2, Req3 |
|Test3 | Req3 |
For row #2 I am OK with other separators or even entire list using same separator (.CSV or TAB delimited).
Skip right to answer now if you have a tidy answer. If not then I added considerable RELATED RESEARCH info below to help kick-start an idea that fits the need! Especially since this hasn't been discoverably solved in the last 5 years :-).
RELATED RESEARCH (loooong but may be useful)
1. Visual Studio Queries
Flat Queries should support a list of linked items out-of-the-box... but it does not. RelatedLinkCount field is handy for knowing if there are any links to chase, but that's it for flat queries.
Direct Link queries give a list of all direct links, but the related IDs are on rows below the parent work item. I am seriously considering creating a formula to look on the next X rows to build a list of IDs, but this would be fragile especially when over 3 requirements are linked to same test. Still might solve 80% of my tracing needs.
Tree Queries also show links, but on different rows. Additionally they tend to follow just one link type. Ideally I will need list of User Requirements linked to Functional Requirements linked to Test Case(s).
2. Tools / Plug-ins
SmartExcel4TFS (eDEVTech, http://www.modernrequirements.com/smartexcel4tfs/) has 3 reports it supports, but none get me the core data I need in easily used format. At least it is FREE if you have an MSDN Premium subscription.
Requirements to Tests Trace Matrix is super-interesting. Alass, right now I need to go the other way (Requirements linked to a given test case). Also it merges cells and has sub-sections that are hard to manipulate I think. (I may revisit this option though.)
Intersection Traceability Matrix report is WAY too wide for a full 300 x 400 grid :-O.
Work Item Decomposition Matrix also didn't give me desired contents. (though frankly I've forgotten this report layout from when I checked ~1 month ago.)
3. TFS API calls
I have actually avoided this route in favor of native Excel solution... but if I can get an example of Excel VBA code (or other code with link to calling within Excel) I may go this route. At this point I don't have time to dig into rolling my own... but this would be cool assuming performance is acceptable.
Relevant API/code fragments:
Retrieving TFS Results from a Tree Query (Blogs.msdn.com 2012.02.22) - Looks like this would get me the data I need, but it is not in Excel so I'd need a bridge example of some sort calling this within Excel.
Retrieving work items and their linked work items in a single query using the TFS APIs (stackoverflow.com 2012.01.12) - Also looks very promising, but not connected to Excel. Gives hints for 2 level and 3 level nested links and performance consideration (don't make second call for each item returned!)
Retrieving work items using the Team Foundation Server API (pwee167.github.io 2012.09.18) - Excellently written introductory walkthrough blog posting to learn how to build an (ASP.Net MVC3) app that calls TFS APIs to run Flat or Tree queries. Start here if writing C# (which I could do but don't have time/justification unless easy example to integrate with Excel).
How can I query work items and their linked changesets in TFS? (stackoverflow.com 2011.05.10) - I don't need changesets but this has VB code to instantiate new TfsTeamProjectCollection which might work directly in Excel VBA (assuming proper reference is found and added)
var projectCollection = new TfsTeamProjectCollection(
new Uri("http://localhost:8080/tfs"),
new UICredentialsProvider());
OK, that's everything I have gathered on this problem. Please help contribute with the missing magic tool/snippet or follow the info above to build that last bit I have not had time to prototype & debug. Many thanks in advance!! -Zephan

What is the best method to extract relevant info from Email?

My friend has a small business where customers order services using email. He receives several emails a day and sorting thru it is becoming cumbersome.
There are about 10 different kind of tasks the customer can request, and for each there are one or two words that specify it. The other info present in the emails is the place where the service is to be delivered, the time, and the involved people's names. The email also contains an ID, a long number with a fairly standard format.
The emails are very unstructured, but all contain the key info above. My question is: what is the best method to sweep thru these emails and extract the key info (such as type of service, place, people's names, the ID etc)?
I thought about some kind of pre-processing, then pass it thru AlchemyAPI and then test the Alchemy output using Neural Networks for each feature (key info). This can be supervised learning as I can do a feedback loop all the time, as once the info is inputted, I can have someone to validate.
Any ideas? Thanks

I guess some parts (ID, task, time) can be captured by a regular expression and dictionary matching. Have a look at GATE's JAPE tool.
It should be fairly easy to assemble a dictionary and then use the lookups for the "task", also you can reuse the available jape rules for date/time and write a new one for the ID (also, a simple regex could be fine).
For matching the location and people's names you should be careful, openCalais and alchemyAPI can give you good results if names and places are used in well defined sentences and will probably make more mistakes with some tabular or weird format. Also you can never be sure you captured the place and person correctly so don't rely on that for processing orders directly.
If you have more information about mails' structure or expected names and places (i.e. you have a "clients" table with all possible names), you would probably want to do your own tagging, otherwise I'd stick to openCalais or alchemyAPI + some regular expressions.
P.S. I assume all mails are in English.

Dynamics CRM 2011 Import Data Duplication Rules

I have a requirement in which I need to import data from excel (CSV) to Dynamics CRM regularly.
Instead of using some simple Data Duplication Rules, I need to implement a point system to determine whether a data is considered duplicate or not.
Let me give an example. For example these are the particular rules for Import:
First Name, exact match, 10 pts
Last Name, exact match, 15 pts
Email, exact match, 20 pts
Mobile Phone, exact match, 5 pts
And then the Threshold value => 19 pts
Now, if a record have First Name and Last Name matched with an old record in the entity, the points will be 25 pts, which is higher than the threshold (19 pts), therefore the data is considered as Duplicate
If, for example, the particular record only have same First Name and Mobile Phone, the points will be 15 pts, which is lower than the threshold and thus considered as Non-Duplicate
What is the best approach to achieve this requirement? Is it possible to utilize the default functionality of Import Data in the MS CRM? Is there any 3rd party Add-on that answer my requirement above?
Thank you for all the help.
Updated
Hi Konrad, thank you for your suggestions, let me elaborate here:
Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.
Nice one but I don't think it is really workable in my case, the data will be coming regularly from client in moderate numbers (hundreds to thousands). Typically client won't check about the duplication on the data.
Workflow. Run a process removing any instance calculated as a duplicate.
Workflow is a good idea, however since it is being processed asynchronously, my concern is the user in some cases may already do some update/changes to the data inserted, before the workflow finish working.. therefore creating some data inconsistency or at the very least confusing user experience
Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).
I like this approach. So I just import like usual (for example, to contact entity), but I already have a plugin in place that getting triggered every time a record is created, the plugin will check whether the record is duplicat-ish or not and took necessary action.

I haven't been fiddling a lot with duplicate detection but looking at your criteria you might be able to make rules that match those, pretty much three rules to cover your cases, full name match, last name and mobile phone match and email match.
If you want to do the points system I haven't seen any out of the box components that solve this, however CRM Extensions have a product called Import Manager that might have that kind of duplicate detection. They claim to have customized duplicate checking. Might be worth asking them about this.
Otherwise it's custom coding that will solve this problem.

I can think of the following approaches to the task (depending on the number of records, repetitiveness of the import, automatization requirement etc.) they may be all good somehow. Would you care to elaborate on the current conditions?
Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.
Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).
Workflow. Run a process removing any instance calculated as a duplicate.
You also need to consider the implication of such elimination of data. There's a mathematical issue. Suppose that the uniqueness' radius (i.e. the threshold in this 1D case) is 3. Consider the following set of numbers (it's listed twice, just in different order).
1 3 5 7 -> 1 _ 5 _
3 1 5 7 -> _ 3 _ 7
Are you sure that's the intended result? Under some circumstances, you can even end up with sets of records of different sizes (only depending on the order). I'm a bit curious on why and how the setup came up.
Personally, I'd go with plugin, if the above is OK by you. If you need to make sure that some of the unique-ish elements never get omitted, you'd probably best of applying a test algorithm to a backup of the data. However, that may defeat it's purpose.
In fact, it sounds so interesting that I might create the solution for you (just to show it can be done) and blog about it. What's the dead-line?

Integrating with 500+ applications

Our customers use 500+ applications and we would like to integrate these applications with our. What is the best way to do that? These applications are time registration applications and common for most of them is that they can export to csv or similar, some of them are actually home-brewed excel sheets where time is registered.
The best idea so far is to create our own excel sheet, which can be used to integrate with all these applications. The integrations could be in the form of cells containing something like ='[c:\export.csv]rawdata'!$A$3 Where export.csv is the csv file exported from the time registration applications. Can you see a better way to integrate against all these applications? It should be mentioned that almost all our customers have Microsoft Office.
Edit: Answers to the excellent questions from Pontus Gagge:
How similar are the data in the different applications?
I assume that since they time registration applications, they will have some similarities, but I assume that some will register the how long time one has worked in total for a whole month, while others will spesify for each day. If Excel is chosen, I believe that many of the differences could be ironed out using basic formulas.
What quality is the data?
The quality of the data can vary so basic validation must be undertaken, a good way is also to make it transparent for the customers, how our application understands their input, so they are responsible.
How large amounts of data are you talking about?
There will be information about the time worked for up to 50 employees.
Is the integration one-way only?
Yes
With what frequency should information be transferred?
Once per month (when they need to pay salaries).
How often do the applications themselves change, and how often does your product change?
If their application is a home-brewed Excel sheet, then I assume it will change once a year (due for example a mistake someone). If it is a standard proper time registration application, then I do not believe they are updated more often than every fifth year or so, as it is a very stabile concept.
Should the integration be fully automatic or can your end users trigger a data transfer?
They can surely trigger data transfer. The users are often dedicated to the process so they can be trained at doing it, which means that they could make up to, say 30, mouse clicks in order to integrate each month.
Will the customers have somebody to monitor the integrations?
As we have many customers, many of them should be able to undertake the integration themselves. We will though be able to assist them over the telephone. We cannot, though undertake the integration ourselves because we would then be responsible for any errors due to user mistakes, etc.
Does the phrase 'integration spaghetti' mean anything to you...?
I am looking for ideas from the best chefs to cook a nice large portion of that.

You need to come up with a common data format, and a way to translate the individual data formats to the common format. There's really no way around this - any solution you come up with will have to do this in one way or the other. It's the essential complexity of what you're doing.
The bigger issue is actually variances within the source data, in terms of how things like dates are stored, missing columns, etc. Doing a generic conversion for CSV to move columns around is comparatively easy.

I would also look at CSV and then use an OLEDB connection against the CSV file for importing.

If you try to make something that can interface to any data structure in the universe (and 500 is plenty close enough), it is guaranteed to be a maintenance nightmare. Instead I would approach this from multiple angles:
Devise an interface into which a human can enter this data already in the proper format. With 500+ clients, I'd make this a small, raw but functional browser based site that users can use to enter this information manally. This is the fall-back. At the end of the day, a human can re-key the information into the site and solve the import issue. Ideally, everyone would use this instead of their own format. Data entry people are cheap.
Similar to above, but expanded, I would develop a standard application or standardize on an off-the-shelf application that can be used to replace their existing format. This might take more time than #1. The goal would be to only do one-time imports of these varying data schemas into the application and be done with them for good.
The nice thing about spreadsheets is that you can do anything anywhere. The bad thing about spreadsheets is that you can do anything anywhere. With CSV or a spreadsheet there is simply no way to enforce data integrity and thus consistency (which is the primary goal) on the data. If the source data is already in a database, then that is obviously simpler.
I would be inclined to use database format into which each of these files need to be converted rather than a spreadsheet (e.g. use something like Jet (MDB)). If you have non-Windows users then that will make it harder and you might have to use a spreadsheet. The problem is that it is too easy for the user to change their source structure, break their upload and come crying to you. If a given end user has a resident expert, they can find a way of importing the data into that database format . If you are that expert, then I would on a case-by-case basis, write something that would import into that database format. XML would be the other choice, but that will likely take more coding than an import/export into a database format.
Standardization of the apps (even having all the sources in a database format instead of a spreadsheet would help) and control over the data schema is the ultimate goal rather than permitting a gazillion formats. There really is no nice answer other than standardization. Otherwise, you are having to write a converter for every Tom-Dick-and-Harry format and again when someone changes the source format.

With a multitude of data sources mapping each one correctly to an intermediate format is not trivial. Regular expressions are good with a finite set of known data formats. Multipass can help when data is ambiguous without context (month,day fields and have several days of data), and also help defeat data entry errors. But it seems as this data is connected to salaries there needs a good reliable transfer.
An import configuring trick
Get the customer to make a set of training data in the application. It should have a "predefined unique date" and each subsequent data field have a number corresponding to the target data field in your application. On importing your application needs to recognise the predefined date, determine the unique translation required and effect the displaying/saving of this "mapping key", and stop the import. eg If you expect "Duration hours" in field two then get the user to enter 2 in the relevant field which might be "Attendance hours".
On subsequent runs, and with the mapping definition key, import becomes a fairly easy process of translation.
Note on terms
"predefined date" - must be historical, say founding date of your company?, might need to be in PC clock settable range.
"mapping key" - could be string of hex digits and nybble based so tractable to workout
The entered code can be extended to signify required conversions ie customer's application has durations in days and your application expects it in hours.
Interfacing with windows programs (in order if increasing fragility)
Ye Olde saving as CSV file
Print to operating system printer that is setup as a text file/pdf, then scavenge the data out of that
Extract data via the application interface control, typically ActiveX for several windows programs ie like Matlab's Spreadsheet Link
Read native file format xls format ie like Matlab's xlsread
Add an additional intermediate spreadsheet sheet that has extended cell references ie ='[filename]rawdata'!$A$3

Have a look at Teiid by JBoss: http://jboss.org/teiid
Also consider using SOA - e.g., if you're on Java, try JBoss SOA platform: http://www.jboss.com/resources/soa/?intcmp=1004

Use a simple XML format. A non-technical person can easily understand a simple XML format (and could even identify basic problems with XML documents that are not well-formed).
Maybe use a DTD (or even better an XML schema) to do very basic validation, and then supplement this with an XSL stylesheet to do more validation with better error reporting. (An XSL stylesheet simply converts from XML to something else and so can be generate readable error messages.)
The advantage of this approach is that web browsers such as Internet Explorer can apply the XSL stylesheets. A customer need only spend at most a day enhancing their applications or writing excel macros to generate the XML data in the format that you specify.
Recent versions of Excel have support for converting spreadsheet data to XML, and can even validate against schemas.
Once the data passes the XSL validation checks, you have validated XML data.

If you have heaps of data and heaps of money, you could look at existing data management and cleansing tools:
http://www-01.ibm.com/software/data/infosphere/datastage
http://www-01.ibm.com/software/data/infosphere/qualitystage
But even then, you'll likely need to follow kyoryu's suggestion assuming you have 500+ data formats. The problem isn't your side. You need them to standardize their output formats if you have no control over their apps. CSV is likely the easiest. You could even send them a excel template to help them along.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string