I am setting up a solution that is used by multiple developers/ teams that are working on different features of the next release of an application.
In my IntegrationTest-project I need to maintain the test-setting for the different teams. Basically each team uses a different test-server /web-services.
I can create a ".testsettings" file but I cannot find the section where I could define simple key-values pairs. For the testing I only need a few key-value pairs (server_url, webservice_url, ...)
Furthermore: How to I access the ".testsettings" at runtime form my unit-tests?
Related
I have worked on Azure Search service previously where I created an indexer directly on a SQL DB in the Azure Portal.
Now I have a use-case where I would want to ingest from multiple data sources each having different data schema. Assume these data sources to be 3 search APIs of X,Y,Z teams. All of them take search term and gives back results in their own schema. I want my Azure Search Service to be proxy for these so that I have one search API that a user can use to get results from multiple sources, ordered correctly.
How should I go about doing it? I assume that I might have to create a common schema and whenever user searches something, I would call these 3 APIs and get results, map them to a common schema and then index this data in common schema into Azure Search index. Finally, call this Azure Search API to give back the results to the caller.
I would appreciate any help! If I can get hold of a better documentation for doing this work, that will be great as well.
Your assumption is correct. You can work with 3 different indexes and fire queries against them, or you can try to combine all of them in the same index. The benefit of the second approach is a better way to implement ordering / paging as all the information will be stored in the same index.
It really depends on what you mean by ordered correctly. Should team X be able to see results from teams Y and Z? The only way you can get ranked results like this is to maintain a single index with a common schema containing data from all teams.
One potential pitfall with this approach is conflicts in the schema. For example if one team requires a field to be of a specific datatype or use a specific analyzer, while another team has different requirements. We do this in our indexes, but with some carefully selected common fields and then dedicated fields prefixed according to our own naming convention to avoid conflicts.
One thing to consider is the need to reset the index. If you need to add, change or remove fields you will have to delete the index and create it again with a new schema. If you have a common index and team X needs to add a new property, you would need to reset (delete and create) the common index which affects all teams.
So, creating separate indexes per team has its benefits. Each team can have their own schema without risk of conflicts and they can reset their index without affecting the other teams.
What's the common pattern for not duplicating variable values across plans?
We have a standard set of tags we use in plans and modules for which we wish to define once and use many. For example: we set CostType tag to values like compute, storage, etc.. We can define it plan level, or module level but that means defining a variable in multiple places which isn't very DRY (don't repeat yourself).
Options
non infrastructure changing module which defines these "global" variables and all modules/plans use that first so the rest of the actions can harvest the values from that plan
use a non infrastructure changing plan store remote state to store variable values and access it as from module/plans
use a tfvars file and handle it via the scripts that wrap terraform actions
devops elves magically handle this problem
How do you solve this problem in your organization?
I used with success symbolic links to link the same variable file in multiple locations.
Symbolic links are well supported by Git and can be used on Windows too (with some care Git Symlinks in Windows).
I am trying to use google cloud datastore kind query to get a list of kind names as demoed in the Kind queries,
query = client.query(kind='__kind__')
query.keys_only()
kinds = [entity.key.id_or_name for entity in query.fetch()]
but the code generates some built-in kind names, e.g.
['_AE_DatastoreAdmin_Operation', '_GAE_MR_TaskPayload',
'__Stat_Kind_IsRootEntity__', '__Stat_Kind_NotRootEntity__',
'__Stat_Kind__', '__Stat_PropertyName_Kind__',
'__Stat_PropertyType_Kind__', '__Stat_PropertyType_PropertyName_Kind__',
'__Stat_PropertyType__', '__Stat_Total__']
I am wondering how to remove these built-in kind names and only retain user created kind names.
Those appear to be kinds of real entities created on the local development server/emulator - they can actually be seen in the Datastore Viewer. For example the __Stat_* ones are created when the datastore Generate Stats action is performed on the local development server.
These entities do not exist in the project's live cloud datastore (or they are stored elsewhere).
With a simple naming rule for the application's entity kinds - to not start with the _ character - you could obtain the kinds list like this:
kinds = [entity.key.id_or_name for entity in query.fetch()
if not entity.key.id_or_name.startswith('_')]
Depending on the kinds use, another option - safer IMHO from coding perspective - might be to always check the kind names against an explicit expected list (for example when wiping out all kinds entities):
kinds = [entity.key.id_or_name for entity in query.fetch()
if entity.key.id_or_name in known_kinds_list]
This is a simple requirement: I want to add a set of strings to Accounts in Dynamics 2011. The string are external IDs for other systems. All the strings should be unique accross all entities.
The only way I can see to do this is define the strings as entities (say 'ExternalCode') and set up a 1:N reslationship between Account and ExternalCode, but this seems incredibly overweight. Also, defining as an entity insists thhat I give the 'ExternalCode' a name, which it obviously doesn't have.
What's the best way to implement this?
Thank you
Ryan
It may seem overweight, but think about entities as if it were tables. Would you create a second table inside MS SQL? If so, then you should create another entity. CRM is very well optimized so I wouldn't worry about this additional overhead.
Alternatively, you could always carry the GUID in the other system.
How are these unique references entering your CRM system. Are you importing the data from each of the external systems? If so I assume the references are unique in the external system? Once imported you want to make sure that any of these references are not duplicated?
Additionally, how many strings are we talking about here? If it is a small number then it would make sense to just define attributes to manage them and check for duplicates in one of the following ways:-
1) Some javascript could be used to make an oData query to confirm the 'uniqueness' of your external reference number before the record is commited. (But, this is not sufficient is records will be created programmatically in the system also).
2) A plug-in which fires on pre-create to again query the system for other records which match the same unique reference numbers and handles the event of a match accordingly.
However, if there are many of them then it may make more sense to define a separate entity as you say and then as above you could associate a new 'reference record' with the entity via a plug-in, but again, check if the record already exists and then either handle an exception or merely associate with an existing record if that is appropriate.
I think they key is what you want to do if you do find a duplicate and how these records are going to be created in the system (e.g. via UI or programmatically or potentially both).
Happy to provide some more assistance if you have some more details.
I am building a tool that searches people based on a number of attributes. The values for these attributes are scattered across several systems.
As an example, dateOfBirth is stored in a SQL Server database as part of system ABC. That person's sales region assignment is stored in some horrible legacy database. Other attributes are stored in a system only accessible over an XML web service.
To make matters worse, the the legacy database and the web service can be really slow.
What strategies and tips should I consider for implementing a search across all these systems?
Note: Although I posted an answer, I'm not confident its a great answer. I don't intend to accept my own answer unless no one else gives better insight.
You could consider using an indexing mechanism to retrieve and locally index the data across all the systems, and then perform your searches against the index. Searches would be an awful lot faster and more reliable.
Of course, this just shifts the problem from one part of your system to another - now your indexing mechanism has to handle failures and heterogeneous systems, but that may be an easier problem to solve.
Another factor is how often the data changes. If you have to query data in real-time that goes stale very quickly, then indexing may not be practical.
If you can get away with a restrictive search, start by returning a list based on the search criteria corresponding to the fastest data source. Then join up those records with the other systems and remove records which don't match the search criteria.
If you have to implement OR logic, this approach is not going to work.
While not an actual answer, this might at least get you partway to a workable solution. We had a similar situation at a previous employer - lots of data sources, different ways of accessing those data sources, different access permissions, military/government/civilian sources, etc. We used Mule, which is built around the Enterprise Service Bus concept, to connect these data sources to our application. My details are a bit sketchy, as I wasn't the actual implementor, just an integrator, but what we did was define a channel in Mule. Then you write a simple integration piece to go between the channel and the data source, and the application and the channel. The integration piece does the work of making the actual query, and formatting the results, so we had a generic SQL integration piece for accessing a database, and for things like web services, we had some base classes that implemented common functionality, so the actual customization of the integration piecess was a lot less work than it sounds like. The application could then query the channel, which would handle accessing the various data sources, transforming them into a normalized bit of XML, and return the results to the application.
This had a lot of advantages for our situation. We could include new data sources for existing queries by simply connecting them to the channel - the application didn't have to know or care what data sources where there, as it only looked at the data from the channel. Since data can be pushed or pulled from the channel, we could have a data source update the application when, for example, it was updated.
It took a while to get it configured and working, but once we got it going, we were pretty successful with it. In our demo setup, we ended up with 4 or 5 applications acting as both producers and consumers of data, and connecting to maybe 10 data sources.
Have you thought of moving the data into a separate structure?
For example, Lucene stores data to be searched in a schema-less inverted indexed. You could have a separate program that retrieves data from all your different sources and puts them in a Lucene index. Your search could work against this index and the search results could contain a unique identifier and the system it came from.
http://lucene.apache.org/java/docs/
(There are implementations in other languages as well)
Have you taken a look at YQL? It may not be the perfect solution but I might give you starting point to work from.
Well, for starters I'd parallelize the queries to the different systems. That way we can minimize the query time.
You might also want to think about caching and aggregating the search attributes for subsequent queries in order to speed things up.
You have the option of creating an aggregation service or middleware that aggregates all the different systems so that you can provide a single interface for querying. If you do that, this is where I'd do the previously mentioned cache and parallize optimizations.
However, with all of that it you will need weighing up the development time/deployment time /long term benefits of the effort against migrating the old legacy database to a faster more modern one. You haven't said how tied into other systems those databases are so it may not be a very viable option in the short term.
EDIT: in response to data going out of date. You can consider caching if your data if you don't need the data to always match the database in real time. Also, if some data doesn't change very often (e.g. dates of birth) then you should cache them. If you employ caching then you could make your system configurable as to what tables/columns to include or exclude from the cache and you could give each table/column a personalizable cache timeout with an overall default.
Use Pentaho/Kettle to copy all of the data fields that you can search on and display into a local MySQL database
http://www.pentaho.com/products/data_integration/
Create a batch script to run nightly and update your local copy. Maybe even every hour. Then, write your query against your local MySQL database and display the results.