Best practices for removing old activities out of a feed - getstream-io

What would be the best way to handle an activity lifecycle? Is it common to have a maintenance job that purges feed activities that are no longer relevant or out of scope?

yes some people have done it that way. Alternatively you can leave all the data there (though you might have to pay for additional storage if it gets really big).

Related

How do new features/changes in azure data factory become available?

Suppose I start using Azure Data Factory today, at some point the tool is likely to see improvements or other changes. Note that I am not talking about what I do inside the tool, but I am talking about the Data Factory itself. How will these changes become available to me?
Will I be able to look at the changes before they happen (and how long)?
Will I be able to stay on an old version if I do not like the new one or have not finished testing (e.g. security testing)?
Is there any indication of how often changes are rolled out? (Every year, 10x per day)
Does any of the above depend on the type of change (big, small, feature/bug/vulnerability).
I suspect that people have this question for many similar tools, so though I am specifically interested in the Azure Data Factory at this time, an indication of whether the answer applies to other types of solutions (within Azure or perhaps it is even similar for other vendors) would be useful.
Suppose I start using Azure Data Factory today, at some point the tool is likely to see improvements or other changes. Note that I am not talking about what I do inside the tool, but I am talking about the Data Factory itself. How will these changes become available to me?
Will I be able to look at the changes before they happen (and how long)?
You are talking about a Managed Solution so I expect a continuous stream of (small) fixes and improvements. That said, changes are generally announced for various Azure Products. See the ADF updates
Big changes might be first accessible as an opt-in preview feature before becoming General Available.
Is there any indication of how often changes are rolled out? (Every year, 10x per day)
Since it is a managed solution, why bother with such details? Rest assured that breaking changes are very limited and announced well before.
Will I be able to stay on an old version if I do not like the new one or have not finished testing (e.g. security testing)?
Again, this is a managed cloud service we are talking about. It is not an installable product you can decide to stay on older versions forever. They will push changes and you have to hope it is for the better ;-)
I suspect that people have this question for many similar tools, so though I am specifically interested in the Azure Data Factory at this time, an indication of whether the answer applies to other types of solutions (within Azure or perhaps it is even similar for other vendors) would be useful.
It will vary per company per (type of) product. For most Azure Services the answer will be the same.

Leverage Google's spidering infrastructure to build your own niche index?

Let's say I want to build a specialised catalog of information that organisations can provide about themselves. We agree a metadata standard, and they include this information on webites.
Is it possible to use Google's infrastructure somehow to solve the problem of discovering sites with that metadata, and regularly re-spidering to pick up any updates?
The way this kind of problem is often solved seems to involve "registering" the site with the central index, who then build infrastructure to regularly visit each registered site. But I wonder if it can be done smarter - and without the need to formally "register".
For example, presumably you could make part of the metadata standard a unique string, which you could then literally Google search for. Then you'd process the rest of the page. But is there a more streamlined, smarter, more formal way to do this?

Sharepoint 2010 - SPMonitoredScope ... performance implication?

Does SPMonitoreScope have any performance implication? I mean, can I leave the SPMonitored scope in production environment? or is it better practice to remove this from code?
Many Thanks,
Joseph,
Here's what MSDN says - http://msdn.microsoft.com/en-us/library/ff512758.aspx
Performance Considerations
Using SPMonitoredScope to wrap code has a very low performance hit. However, it should be noted that if a section of code wrapped by SPMonitoredScope were to contain a loop that performed a high number of iterations (for example, iterating through XML nodes that are returned by a SharePoint Foundation 2010 Web service), the call stack included on the Developer Dashboard could increase in size exponentially, making it difficult to decipher the information displayed.
Best Practices
A tip for the best and most effective use of SPMonitoredScope:
All calls to external components, such as custom databases, external Web services, and so on, should be wrapped with SPMonitoredScope. This will make it easier for administrators to identify them as points of failure, and to isolate the problem quickly.
Regards,
Nitin Rastogi
There is certainly a performance hit using monitored scopes. That being said, it's relatively small for the tyoe of work it does. Best practice is to switch it off on production environments unless you are investigating a specific issue.

SharePoint 2010: solution/feature upgrade recommended practices

this is kind of an open question: I'm trying to define for a team a set of recommended practices for delivering a SharePoint solution in stages. That means, changes will happen at every level (solution, feature, web part, content types, etc).
On your experience, which are the practices that really, really, worked for you guys? For ex. using upgrade custom actions, placing upgrade logic entirely on FeatureUprading event handlers? Or in FeatureActiving handlers and assume features could already exist? Else?
I'm asking because I know of projects that follow The Word on this from many MSDN articles and still, upgrades are kind of nightmarish to manage, and those processes are sometimes difficult to grasp for average devs.
Thanks!
As no one else has ventured an answer, my current approach is:
Use the declarative approach for those cases where it works 100% of the time e.g. NOT content types
Fall back to code for the rest
Always write the related code so that it can be run multiple times against either the pre- or post-upgrade state

Open-source production data for developers?

I'm building a website that will be an open-source, user-contributed content kind of thing, and I think if developers had access to nightly production SQL dumps, they'd be more likely to check out the code from github and play with it.
In line with that idea, I'm considering either:
Not collecting private user information at all, using open-id for accounts and making heavy use of memcache for things like session authentication.
Anonymizing sensitive data before publishing
Sometimes I get carried away with "wouldn't it be cool if...?" ideas, so I'm hoping for a sanity check here. Any obvious flaws in either approach? Is this a sane idea?
Speaking generally, I think you should do both. Any private data you collect is simply a liability for you, and not just because you intend to publish your databases. The less you can collect, the better.
By the same token, however, you probably realize that it is not just IDs and passwords which are sensitive. Remember the AOL search data leak? Or the Netflix database publication? Even without having IDs, people managed to figure out the real identities of some of the accounts, simply by piecing together trails of user behavior, and corresponding that with data from other places. Some people are embarrassed by their search histories and their movie rentals. Go figure.
Therefore, I think the general rule should be to collect as little as possible, and anonymize what is left. Even if you don't store the identity of the person corresponding to a certain account, you may want to scramble what the various logins did.
On the other hand, there some cases where you simply don't care about this kind of privacy. In Wikipedia, for example, pretty much everything you can do on the site is public anyway. At least, everything which gets recorded in the database. If the information is already available through the API, there is no point in hiding it in a database download.
In addition to collecting less data and anonymizing the data you do collect, you could add a bit/flag for the users to select whether their data is included or not. You could make it a CC license flag to give users the warm'n'fuzzies while filling your need.
Sounds like a pretty good idea. The one thing you have to be careful with though is security, since hackers will know the exact schema of your DB. Although this isn't impossible to deal with, just look at most open source projects. But you will need to put a little extra emphasis on security since say a potential SQL injection is now made much easier.
Another thing is to make sure doubly that the sensitive data is anonymized. Also, some people may (wrongly) try and claim their copyrights on user submitted content is being violated, so you may want to specify a CC license or something just to make everything extra clear and prevent future headaches (even if you're right anyway).

Resources