Managing version conflicts between Kafka-Spark-Scala-Cassandra? - apache-spark

I have been recently working on a project that involves the integration of Kafka, Spark and Cassandra. One of the key things I noticed when trying to get the whole thing setup is that there are a lot of version conflicts that needs to be very carefully matched in order to get these technologies to work together.
In addition, it was important to take note of the Scala version used with Spark when writing your own Spark-Jobs.
A slight change in the version of one of the above technologies breaks the complete flow and requires a proper redo of matching them together.
The task was not very straight forward (at least for me and I guess it's the same for all) and I am wondering how do companies which have these technologies working in sync actually manage this?
As I see it, it is an important problem with new releases and bug fixes being rolled out, to keep these tools working together without a break.
Can someone who has experience with regards to this enlighten me as to how companies actually manage/maintain these conflicts?
Or is it an overstatement to say it's an actual problem?
Thanks in advance

Related

How can I check if my code will run in a new (or old) version of Node?

I have a code running in Node 9.8
Node 9 will reach End-of-life soon.
If I switch to node 10, how can I check if my code will run in node 10 without having to execute all paths of the code ?
Or if I go down to 8.11, how can I check if my code will run in node 8.11 ?
There is no test cases written on the code.
This is a good example of why solid unit/integration tests are critical to long-term maintainability. That said, there are a few steps you can take to reduce the risk of breaking things:
Take a look at the change logs pertaining to the versions you're moving to/from. The NodeJS team kindly includes a Notable Changes section in each change log, though I wouldn't rely on that alone as being 100% inclusive of the potentially breaking changes you may be up against.
Consider writing unit/integration tests, both as part of your assurance that things won't break from this version change, as well as that things won't break from later version changes (or everyday changes for that matter).
As much as I hate to say it, Googling around for guides on upgrading (or downgrading?) NodeJS versions may help you identify potential danger zones.
Generally, I'd consider it safer and better practice to upgrade the version than downgrade. For one, you're moving forward to the newer and greater experience the NodeJS team wants you work with, and secondly, future versions are probably more likely to be backwards compatible, whereas the old version may be missing features you're using.

Semantic Versioning & Continuous Deployment

Murphy kicked my a$$ about an hour ago.
Context:
I recently joined a new employer and the product was quite outdated in terms of dependencies, Angular 1.2.x, Angular-UI 0.12.0, etc...
This is the first employer I've worked at that does daily builds to prod etc. (previosuly I've only worked in what can be called large corporate, with much slower turn around) Part of my initial task was to upgrade dependencies where I can. Thus earlier this morning we had a watercooler talk with some of the devs about why all of our bower dependencies are hardcoded to specific versions.
The 2 schools of thought are:
Hardcoding versions obviously gives 100% security as versions can't dynamically jump, but has the drawback that if someone doesnt actively update we'll fall behind again.
I'm of the opinion that semantic versioning gives us some form of security (coupled with having multiple staging environments), and that it should be good enough to, say, have Angular set to say ^1.5.9.
Quoted from the Semantic Version Docs:
Minor version Y (x.Y.z | x > 0) MUST be incremented if new, backwards
compatible functionality is introduced to the public API. It MUST be
incremented if any public API functionality is marked as deprecated.
It MAY be incremented if substantial new functionality or improvements
are introduced within the private code. It MAY include patch level
changes. Patch version MUST be reset to 0 when minor version is
incremented.
Problem:
Earlier this morning we deployed to staging, and everything seemed good to go, then we deployed to production an hour or so ago and ... BOOM
The issue was the AngularJs change from 1.5.9, to 1.6.0. I've seen in the migration docs (migrate 1.5 -> 1.6) that this has been noted:
You may also notice that this release comes with a longer-than-usual
list of breaking changes. Don't let this dishearten you though, since
most of them are pretty minor - often not expected to affect real
applications. These breaking changes were necessary in order to:
Question:
Where is my disconnect? ...or is the semantic version docs just a false sense of security I've had all along?
How do people out there in the handle these situations? Do people make use of auto dependency upgrading in any real world solutions (excuse me if this is super obvious to some), as to me, the fact that the build passed staging, and broke in production is actually more concerning.
(The reason I'm asking is because the fear of small incremental updates are now back and stronger than ever, and I'm not sure if I agree with the sentiment of it all...)
Seems pretty simple, if they make breaking changes, they should have bumped it up to 2.0.0. They are not doing semantic versioning. Not all projects using X.Y.Z. style versions are doing semantic versioning.
Try to catch how this went "boom" in an automated way in your testing and staging environments. Can't fear moving forward, it has to be done sometime, and I'd rather move step-by-step more frequently, than to suddendly move up many versions as would be done with an entirely manual process.

Memcached on NodeJS - node-memcached or node-memcache, which one is more stable?

I need to implement a memory cache with Node, it looks like there are currently two packages available for doing this:
node-memcached (https://github.com/3rd-Eden/node-memcached)
node-memcache (https://github.com/vanillahsu/node-memcache)
Looking at both Github pages it looks like both projects are under active development with similar features.
Can anyone recommend one over the other? Does anyone know which one is more stable?
At the moment of writing this, the project 3rd-Eden/node-memcached doesn't seem to be stable, according to github issue list. (e.g. see issue #46) Moreover I found it's code quite hard to read (and thus hard to update), so I wouldn't suggest using it in your projects.
The second project, elbart/node-memcache, seems to work fine , and I feel good about the way it's source code is written. So If I were to choose between only this two options, I would prefer using the elbart/node-memcache.
But as of now, both projects suffer from the problem of storing BLOBs. There's an opened issue for the 3rd-Eden/node-memcached project, and the elbart/node-memcache simply doesn't support the option. (it would be fair to add that there's a fork of the project that is said to add option of storing BLOBs, but I haven't tried it)
So if you need to store BLOBs (e.g. images) in memcached, I suggest using overclocked/mc module. I'm using it now in my project and have no problems with it. It has nice documentation, it's highly-customizable, but still easy-to-use. And at the moment it seems to be the only module that works fine with BLOBs storing and retrieving.
Since this is an old question/answer (2 years ago), and I got here by googling and then researching, I feel that I should tell readers that I definitely think 3rd-eden's memcached package is the one to go with. It seems to work fine, and based on the usage by others and recent updates, it is the clear winner. Almost 20K downloads for the month, 1300 just today, last update was made 21 hours ago. No other memcache package even comes close. https://npmjs.org/package/memcached
The best way I know of to see which modules are the most robust is to look at how many projects depend on them. You can find this on npmjs.org's search page. For example:
memcache has 3 dependent projects
memcached has 31 dependent projects
... and in the latter, I see connect-memcached, which would seem to lend some credibility there. Thus, I'd go with the latter barring any other input or recommenations.

If you had one wish for SubSonic what would it be?

I know this question seems subjective but it's really pretty simple. As a long term user, and part time contributor to SubSonic I'm interested in what the community thinks would be the single best way to improve it.
So what's your opinion, how would you make SubSonic even better? What one thing would make you more likely to use/recommend/evangelise/stop complaining about it?
As I said I know this is a bit subjective and may get closed but as SO is the main support forum for SubSonic I think this could be a useful way to solicit opinion and/or contributions.
To keep this from turning into a general discussion here's the rules:
No omnibus wishes
No duplicate wishes
Up-vote those you agree with rather than re-posting them
Ability to run in MediumTrust out of the box
In all honesty the biggest thing thats lacking is solid documentation and HowTo's
Its got better but I think it needs a lot more.
Ability to automatically map collections of other objects, like Fluent NHibernate does.
When SubSonic throws an exception that isn't clear, I'd like to be able to use Google or some other mechanism to discover more information about how to keep my development effort moving forward. Right now it's too easy to get into a situation where you have to go spelunking into the SubSonic source code since SubSonic doesn't seem to be very proactive when the user goes off the "happy path".
This critique is hardly specific to SubSonic. Many (most?) software products suffer from this same problem. I have not really had this problem with NHibernate though, which is SubSonic's most clear competitor.
Faster and higher quality releases
Binary types for SimpleRepository (Images)
Left Outer Joins
Support more database-independent code generation...
What I mean by this is that it is truly a real pain if your application wants to talk to different databases (e.g. SQL Server and Oracle) and you want to only have one set of generated DAL objects. I would love it if you had the option of specifying that any SQL code that gets sent to the DB would be as compatible with most engines as possible, since right now if you generated your objects targeting SQL Server then all queries will be of the form:
SELECT [schema].[table_name] FROM ....
Sadly, this does not work in Oracle, so basically you're out of luck there.
Perhaps this isn't a huge concern for most of you, but I'm currently writing a commercial app that touts one of its main features as being able to run on various database engines just by changing its configuration and I chose SubSonic because I thought it could handle the job pretty easily, but I'm honestly having second thoughts now because of all the hoops I may have to jump through just to get this to work correctly under different environments.
Support MS Access ,Postgres and FireBird database :)....

Should I keep solutions and features in a 1-1 ratio?

I have a complex sharepoint deploy with multiple EventReceivers and Workflows.
I also have schema changes to existing lists, adding new columns of metadata and changing existing columns.
Should I package a single feature, eventreceiver or workflow, to a single solution, or should I put multiple features inside the single solution since they all work together?
One major reason I am asking is for future code upgrades. If the features are seperated, then an upgrade in one portion of code would not require a re-deploy of all the features in the solution. Is this something I should worry about or does the "stsadmin -o upgradesolution" take care of any issues with the upgrade of a solution with many features?
Let me know if this makes sense to any SharePoint gurus out there.
Thank you,
Keith
Update:
Looking at the website drax referenced, I found this reference site: http://msdn.microsoft.com/en-us/library/aa543659.aspx
This statement seems to put a large handicap on upgrading features in solutions:
Solution upgrade can only be used to
replace files. You can add new files
in a solution upgrade and remove old
versions of the files, but you cannot
install Features or use Feature event
handlers to run code for Feature
installation and activation. The
following operations are not supported
in solution upgrade.
Removing old Features in a new
version of a solution.
Adding new Features in a solution
upgrade.
Updating or changing the receiver
assembly for existing Features in a
new version of a solution.
Adding or changing Feature elements
(Element.xml files) in a new version
of a solution.
Adding or changing Feature
properties in a new version of a
solution.
Changing the ID or scope of old
Features in a new version of a
solution.
Removing Feature elements
(Element.xml files) in a new version
of a solution.
Removing Feature properties in a new
version of a solution.
So... What can you do with a solution upgrade?
I would advise against splitting everything into multiple solutions. Maintaing that can quickly become nightmare. Try to structure your project, which should is used to create WSP, in same manner as 12 folder of sharepoint. Then you can use WSP builder, last stable version brings a lot of useful stuff.
Also i've not noticed any problems with redeploying solutions. According to this article and to my experience deployment of WSP takes care of synchronization between versions. So if you will add some new features they will appear and if you remove/change features they will be modified accordingly.
EDITED:
So I did some quick research on MOSS Updating topic. According to MS there are two ways of updating solutions:
In-place update
Incremental update
Basically, in-place update is standard way of updating. Meaning you are relying on build-in functionality as described in this (same document as posted before) document. Problem with this solution is that it lacks quite a lot of functionality (versioning, changing of ID's of features,...).
Incremental update (this is how MS calls it probably) don't rely on build-in solutions. That means it is up to everybody to implement it by themselves :(. What is even better I was not really able to find any guidelines for this approach. I suppose that approach you would like to take is example of incremental update (splitting project into many independent solutions).
Also note that Incremental update is not officially supported by MS.
So I don't really know what advice should I give to you. Single WSP is more maintanable than buch of them, also if you are doing just some minor changes updates work perfectly. But if you need to make some bigger structural changes problems start to show.
I'll probably wait and see if people with more MOSS expertise can say something about this topic.
Basically (for the reasons you've mentioned), you should think of solutions as you would .Net assemblies - atomic units of code that can be deployed separately from others. Using upgradesolution will cause a redeploy of all the contained features - if nothing's changed, then nothing should change for the sites that use that feature. But, if that makes you nervous, consider splitting it up.
UpgradeSolution is really handy if you are just updating the assembly and leaving the provisioned files intact.
Unless you specify -local then upgradesolution will perform a full iisreset across your infrastructure. This is really worth noting for when you are planning the right time to perform upgrades.

Resources