Limit data to be generated in cognos cube generation by year - cognos

Ive been working with cognos from a year now but I stumble into another problem that needs urgent workaround resolution. I have this model that automatically generates cube (using a batch file) every day. the generation is successful although the main issue is the time it consumes to generate the cube. Although it is somehow acceptable considering the data involved (data includes historical transactions from about 7 years ago). Now the main idea we came up is to limit or somehow include the data from previous years in the model and be remained untouched in the everyday cube generation (since there are no expected changed int these data). Only data from current year should be processed and these will be added with the historical data. So far I have tried using manipulating iqd but the results do not include the historical data in the generated cube. Also I am using both 7.4 and later version ibm cognos 10 so I wish you could give me ideas on how to work on with this on both versions.

What you're looking for is time based parititioning. Search on transformer time based partition in Google. For example this is for 8.4 but you should be able to find doco on v10.
http://publib.boulder.ibm.com/infocenter/c8bi/v8r4m0/index.jsp?topic=/com.ibm.swg.im.cognos.ug_cogtr.8.4.0.doc/ug_cogtr_id7761DefiningTBPCube.html
This allows you to build your prior years once, which will exist as multiple individual MDC files. You then only regularly build your current year cube, which refreshes just one MDC file. These are all transparently sown together in a master dummy MDC and VCD file.
Apparently v10 has some build optimisations so I would do this in v10 rather than v7 if possible

Related

View Index is always being rebuilt

As of late, I have encountered a problem with my view index being rebuilt all the time and users are having massive issues with this particular view.
I figured it was due to #Date in my selection formula aswell as one of my column formulas. This way the selection formula would be different every second that passes.
So I figured, since I dont need hours/minutes/seconds in my formulas, I would use #Today. This worked out well for 2-3 days and after that the same problem occured again.
So since the problem is back again, I'm not quite sure if that even causes the problem. When this particular view is open, I have issues in every tab that's open in notes, not only this specific database.
Is this a common/known issue? What can I do to avoid this problem?
Yes, it's a common issue that has been well known since the very early days of Notes more than 20 years ago.
#Date is not a problem on its own. #Now and #Today are both problems.
Using #TextToTime("Today") was a popular workaround that was discovered early on. This hid the problem from the indexer so the server failed to realize that the view was out of date. It doesn't solve the underlying problem, though, which is that the view is trying to do something that views simply aren't designed to do. Views are intended to be static, requiring update only when documents change. Introducing time into a selection or column formula makes them dynamic, which kills that presumption and is a major source of performance problems. Using this workaround requires that the view be fully rebuilt every night. You can do that by setting the view index options to "Manual", and setting up a program document to run an updall command with the -T option for the specific database and view once per night. Note that if your users are spread out across timezones, you'll have to pick one specific time as your standard, and if you have servers spread out across timezones you're going to have a lot of fun figuring out how to make them all show the same documents in the view at all times - but that's common to pretty much all approaches to the problem.
See this IBM Technote for a description of several other options that people have used over the years, with their pros and cons. Also see this article by Andre Guirard, which covers date/time issues in great detail.
I would add that the agent-and-folder solution that they describe in the Technote was generally my preferred approach, but it does have an additional disadvantage that they don't mention: it can eventually lead to an obscure situation where the server throws an error "Folder is larger than supported". This error actually has nothing to do with the size of the folder in documents; it refers to fragmentation of internal structures that occurs as large numbers of documents are moved in and out of the folder over time. It could only be fixed by deleting and re-creating the folder, which you can do in your agent code. I believe this problem may be fixed in more recent versions of Domino, but it caused me a lot of grief back in the Notes 6 and 7 timeframes.

Localized Xpages application multiplying properties files

Yet another weird story from Domino Designer 9.0.1:
The application in question is set to support German and English; German is set to be both the source and the default langauge.
Over the course of the past few weeks we observed that there are some CustomControls and Xpages whose properties files are multiplying; within something like 12 hours we often see hundreds of multiplied files (currently we have 120 multiplications; earlier this week we had a case with > 1000 multiplied propertiey files!) In package explorer they turn up like this:
As you can see there is something like a docUnid added to the property's file name. Apart from a different time stamp they all are identical internally. In same cases both language versions are multiplied, in this particular case here only the German (= source) version shows that phenomenon.
Another strange fact: this particular custom control hasn't changed for quite a while, and it only contains a single control with a static text attribute, alongside a
Anyone having an idea what could be causing this, and what possible solutions I could try?
Tech facts and some more observations:
Domino Designer 9.0.1 FP6, ExtLib 17; we are working in a team where each one of us is coding in their own local replica, then replicating into the "hub" replica. I can't prove it but I assume that there is a connection between one of us replicating updates and the creation of new prüperty duplicates
EDIT: some more observations: I think I was able to pin it down to the replication between two specific machines; I just ran a sequence of 5 or 6 manually driven replications between both instances, every time without making any changes to the design code on either side. nevertheless every replication reported exactly 1 update and 1 addition, and each time a new property file was added.
So meanwhile I deleted the custom control in question and rebuilt it from scratch under a slightly different name (just to be on the safe side). For now it seems that the application is "behaving" now but I'm sort of sure that this will return sooner or later.
Speak after me: source control and replication do not match.
More details:
The property files get stored as attachments in a design note. That's usually the note with the form. Unless you switch on multi lingual, then each property gets its own note. When different people work on the database these note elements get recreated on build getting the next UNID kind of.
So the right flow for what you try to do: pick your best version of the nsf. Nuke the other replicas. Bind it to version control. Let your peer developer create an nsf from that repository. Sync of design shall only happen via that repository.
While your on it: add Bavarian as language, so your Munich customers can use the app too

Is It Possible To Reference TFS Work Item Fields More Than Once Within The Same Work Item?

We are currently in the process of upgrading from TFS 2008 to TFS 2012. When TFS 2008 was set up, the people involved didn't understand a lot of what the work item fields were for, and we ended up with very heavily customised templates and in fact lost a lot of default fields. As part of the upgrade to 2012 we are trying to return to the out of the box templates as much as possible to ensure we get to use as many of the features as possible, however there are a small number of custom fields that we need to include for reporting purposes.
Our product development process involves a roadmap for upcoming releases which includes new work as well as bug fixes. When a bug is assigned to be worked on by the developers we would like to be able to choose which release we're targeting the fix for - as far as I can see, Iteration is best suited for this. At the point the bug is closed though, we would also like to track what release it was actually fixed in, since things often get bumped from one release to the next if higher priority bugs or change requests come in, but this is where we come unstuck since I can't seem to assign Iteration to both fields such that the two show different values.
If possible we would prefer not to have global lists that have to be constantly updated with release numbers across our product range (we have around 8 different products which are constantly in development, each with their own release numbers), and leaving one of them as a text field leaves open the possibility that we will get inconsistencies in what people enter, eg 1.01 versus 1.1 which will show up in reporting as 2 different releases. As the fields are just looking up a set of values in the background, is there no way that the iteration list can be used twice? Or does someone have an alternative suggestion as to how we get round this?
What I think I'd suggest in this case is using a COPY rule on a state change event, so that when you move your work item into the Closed state, it would populate your custom field with the value currently in your Iteration field.
This would give you a snapshot of the value at the right point in time which then wouldn't be altered if the iteration was later changed, along with a history entry if it was opened & closed multiple times over its lifetime.
As iteration is time limited and release is perpetual there is an inherent mismatch of purpose with using iteration here. Iteration is for planning.
You would be better creating a release list with the version that you release.
If you are sprinting for example you may not know up front which release you will end up on before you start. If you are not sprinting then you are just kidding yourself that your know.

How to implement difference aggregations / rollups on time series data in Cassandra

I have a situation where I will be collect many time series metrics (electricity used, hours used, hours idle) from operating equipment in a manufacturing plan. I need to create many different rollup numbers on individual and grouped assets. For example, I need to create min, max, average electricity used over 1,5,10,30 days for a given machine. Create same types of metrics for different groups of machines.... Many of the calculated values are driven from the raw values retrieved from the assets.
What is the best approach for calculating these values within a Cassandra environment?
Do I need to create 'batch jobs' that execute the calculations?
It seems as if there are some built in data types (counter) in Cassandra, but seem to be some issues (simply reading some comments on stack overflow)
Has anyone integrate Cassandra with a Twitter storm or something to constantly update the counters?
Thanks
There's an open-source project called Blueflood that does exactly this. You could likely use it directly out of the box to fill your use-case, or fork the repo and modify as necessary.
Documentation and homepage: http://blueflood.io/
Source-code: https://github.com/rackerlabs/blueflood
Irc: #blueflood on Freenode
(Disclaimer: I am a contributor to the project)

Integrating with 500+ applications

Our customers use 500+ applications and we would like to integrate these applications with our. What is the best way to do that? These applications are time registration applications and common for most of them is that they can export to csv or similar, some of them are actually home-brewed excel sheets where time is registered.
The best idea so far is to create our own excel sheet, which can be used to integrate with all these applications. The integrations could be in the form of cells containing something like ='[c:\export.csv]rawdata'!$A$3 Where export.csv is the csv file exported from the time registration applications. Can you see a better way to integrate against all these applications? It should be mentioned that almost all our customers have Microsoft Office.
Edit: Answers to the excellent questions from Pontus Gagge:
How similar are the data in the different applications?
I assume that since they time registration applications, they will have some similarities, but I assume that some will register the how long time one has worked in total for a whole month, while others will spesify for each day. If Excel is chosen, I believe that many of the differences could be ironed out using basic formulas.
What quality is the data?
The quality of the data can vary so basic validation must be undertaken, a good way is also to make it transparent for the customers, how our application understands their input, so they are responsible.
How large amounts of data are you talking about?
There will be information about the time worked for up to 50 employees.
Is the integration one-way only?
Yes
With what frequency should information be transferred?
Once per month (when they need to pay salaries).
How often do the applications themselves change, and how often does your product change?
If their application is a home-brewed Excel sheet, then I assume it will change once a year (due for example a mistake someone). If it is a standard proper time registration application, then I do not believe they are updated more often than every fifth year or so, as it is a very stabile concept.
Should the integration be fully automatic or can your end users trigger a data transfer?
They can surely trigger data transfer. The users are often dedicated to the process so they can be trained at doing it, which means that they could make up to, say 30, mouse clicks in order to integrate each month.
Will the customers have somebody to monitor the integrations?
As we have many customers, many of them should be able to undertake the integration themselves. We will though be able to assist them over the telephone. We cannot, though undertake the integration ourselves because we would then be responsible for any errors due to user mistakes, etc.
Does the phrase 'integration spaghetti' mean anything to you...?
I am looking for ideas from the best chefs to cook a nice large portion of that.
You need to come up with a common data format, and a way to translate the individual data formats to the common format. There's really no way around this - any solution you come up with will have to do this in one way or the other. It's the essential complexity of what you're doing.
The bigger issue is actually variances within the source data, in terms of how things like dates are stored, missing columns, etc. Doing a generic conversion for CSV to move columns around is comparatively easy.
I would also look at CSV and then use an OLEDB connection against the CSV file for importing.
If you try to make something that can interface to any data structure in the universe (and 500 is plenty close enough), it is guaranteed to be a maintenance nightmare. Instead I would approach this from multiple angles:
Devise an interface into which a human can enter this data already in the proper format. With 500+ clients, I'd make this a small, raw but functional browser based site that users can use to enter this information manally. This is the fall-back. At the end of the day, a human can re-key the information into the site and solve the import issue. Ideally, everyone would use this instead of their own format. Data entry people are cheap.
Similar to above, but expanded, I would develop a standard application or standardize on an off-the-shelf application that can be used to replace their existing format. This might take more time than #1. The goal would be to only do one-time imports of these varying data schemas into the application and be done with them for good.
The nice thing about spreadsheets is that you can do anything anywhere. The bad thing about spreadsheets is that you can do anything anywhere. With CSV or a spreadsheet there is simply no way to enforce data integrity and thus consistency (which is the primary goal) on the data. If the source data is already in a database, then that is obviously simpler.
I would be inclined to use database format into which each of these files need to be converted rather than a spreadsheet (e.g. use something like Jet (MDB)). If you have non-Windows users then that will make it harder and you might have to use a spreadsheet. The problem is that it is too easy for the user to change their source structure, break their upload and come crying to you. If a given end user has a resident expert, they can find a way of importing the data into that database format . If you are that expert, then I would on a case-by-case basis, write something that would import into that database format. XML would be the other choice, but that will likely take more coding than an import/export into a database format.
Standardization of the apps (even having all the sources in a database format instead of a spreadsheet would help) and control over the data schema is the ultimate goal rather than permitting a gazillion formats. There really is no nice answer other than standardization. Otherwise, you are having to write a converter for every Tom-Dick-and-Harry format and again when someone changes the source format.
With a multitude of data sources mapping each one correctly to an intermediate format is not trivial. Regular expressions are good with a finite set of known data formats. Multipass can help when data is ambiguous without context (month,day fields and have several days of data), and also help defeat data entry errors. But it seems as this data is connected to salaries there needs a good reliable transfer.
An import configuring trick
Get the customer to make a set of training data in the application. It should have a "predefined unique date" and each subsequent data field have a number corresponding to the target data field in your application. On importing your application needs to recognise the predefined date, determine the unique translation required and effect the displaying/saving of this "mapping key", and stop the import. eg If you expect "Duration hours" in field two then get the user to enter 2 in the relevant field which might be "Attendance hours".
On subsequent runs, and with the mapping definition key, import becomes a fairly easy process of translation.
Note on terms
"predefined date" - must be historical, say founding date of your company?, might need to be in PC clock settable range.
"mapping key" - could be string of hex digits and nybble based so tractable to workout
The entered code can be extended to signify required conversions ie customer's application has durations in days and your application expects it in hours.
Interfacing with windows programs (in order if increasing fragility)
Ye Olde saving as CSV file
Print to operating system printer that is setup as a text file/pdf, then scavenge the data out of that
Extract data via the application interface control, typically ActiveX for several windows programs ie like Matlab's Spreadsheet Link
Read native file format xls format ie like Matlab's xlsread
Add an additional intermediate spreadsheet sheet that has extended cell references ie ='[filename]rawdata'!$A$3
Have a look at Teiid by JBoss: http://jboss.org/teiid
Also consider using SOA - e.g., if you're on Java, try JBoss SOA platform: http://www.jboss.com/resources/soa/?intcmp=1004
Use a simple XML format. A non-technical person can easily understand a simple XML format (and could even identify basic problems with XML documents that are not well-formed).
Maybe use a DTD (or even better an XML schema) to do very basic validation, and then supplement this with an XSL stylesheet to do more validation with better error reporting. (An XSL stylesheet simply converts from XML to something else and so can be generate readable error messages.)
The advantage of this approach is that web browsers such as Internet Explorer can apply the XSL stylesheets. A customer need only spend at most a day enhancing their applications or writing excel macros to generate the XML data in the format that you specify.
Recent versions of Excel have support for converting spreadsheet data to XML, and can even validate against schemas.
Once the data passes the XSL validation checks, you have validated XML data.
If you have heaps of data and heaps of money, you could look at existing data management and cleansing tools:
http://www-01.ibm.com/software/data/infosphere/datastage
http://www-01.ibm.com/software/data/infosphere/qualitystage
But even then, you'll likely need to follow kyoryu's suggestion assuming you have 500+ data formats. The problem isn't your side. You need them to standardize their output formats if you have no control over their apps. CSV is likely the easiest. You could even send them a excel template to help them along.

Resources