I am currently using the HDInsight Hadoop Emulator, which comes with Pig Version .12. Our problem involves parsing xml files and I'd like to use the XPath command from PiggyBank, but it is only available with Pig version .13.
a. Can I Upgrade Pig in the emulator? How would I go about doing that?
b. Is the version of Pig really critical, or could I just get the latest version of the PiggyBank.jar file and use that?
currently there is no way to update component versions for HDInsight emulator (or at least that's very hard to do).
I have never used PiggyBank, but from the introduction page (https://cwiki.apache.org/confluence/display/PIG/PiggyBank) it seems that it is a collection of UDFs which should work with Pig 0.12. So i guess using the jar directly (and of course registering it in pig) should work.
Also, we are looking for an updated story for HDInsight emulator - so feel free to reach us at hdivstool at microsoft dot com if you have any thoughts, comments, requirements.
Xiaoyong Zhu from HDInsight team
Related
I'm working on converting a library from full .NetFramework to .NetCore
I'm looking for a replacement for Microsoft.Azure.Management.HDInsight.Job, which hasn't been updated in over a year and is not compatible with .NetCore. I was hoping that the functionality would be rolled up into the much-more-recently-updated and netcore-compatible Microsoft.Azure.Management.HDInsight, but that doesn't appear to be the case.
I'm down to use the REST API, but I haven't been able to find the same functionality there. Any guidance would be appreciated.
You could try to install Microsoft.Azure.Management.HDInsight.Job with Package Manager to install some prerelease versions, so that its dependencies would not be conflict with your asp.net core.
I test them, no matter it is preview, it also have the functionality what you want.
Write in Package Manager Console such as:
Install-Package Microsoft.Azure.Management.HDInsight.Job -Version 1.0.7-preview
You could only install the version <= 1.0.7-preview. If not, you may could not install it.
For more detail, you could refer to this article.
I found the REST API I was looking for. It is the WebHCat API, not an Azure API.
MapReduce Job creation: https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar
Pig Job creation:
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Pig
Hive Job creation:
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive
Sqoop Job creation: https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-sqoop-curl and https://sqoop.apache.org/docs/1.99.3/RESTAPI.html
Hopefully they will release 3.0.0 soon
https://github.com/Azure/azure-sdk-for-net/issues/9219
I am currently working on a project in which I need to connect Talend open Studio for Big Data (v 6.3.1) to an Azure’s HDInsight (3.5) Hadoop Cluster. So far, I am trying a simple example which consists in creating an Hive Table.
For that, I am using the following diagram:
The hive connection was configured as followed:
… and please find below the specifications of the tHiveCreateTable_1 node:
By running this process:
· The specified container and deployment Blob is created (see image below) - which make me believe that everything is ok with the Windows Storage Configuration
· However the tHiveCreateTable_1 node has an error (see image below)
· I strongly believe that it´s something related with the Hostname and Port;
· I tried to use the host name of the cluster and the hostname of the Hive server that we can find in Ambari (see image below)
· But none of them worked as expected.
Has any one tried something similar to this?
Note: It seems reasonably important to say that the Azure version supported by Talend is 3.4, however, I am using 3.5, it might be it.
Many thanks for your help in advance.
According to the offical docuemnt about the differences between Hadoop components and versions available with HDInsight, HDInsight 3.5 is based on Hortonworks Data Platform(HDP) 2.5, but HDI 3.4 is based on HDP 2.4. However, there is not big version difference for their Hive componets or other componets. So, my suggestion is that you can try to create a HDI 3.4 using the same Azure Storage account for your current HDI 3.5, without more effects for your needs.
We are trying to implement external authentication to Cassandra on DSE 4.7. Followed few of the guides where we have to extend IAuthenticator class but after doing that there is less documentation on how to integrate.
Is it more of plug and play where we extend IAuthenticator class build a jar and place it in lib(/usr/share/dse/resources/cassandra/lib) and change the yaml file accordingly or is it take a source code from Github build entire tree and then use?
If so is Datastax's Cassandra available on Github?
What do we need to do to build external authentication other that LDAP and Kerberos in DSE 4.7?
extend IAuthenticator class build a jar and place it in
lib(/usr/share/dse/resources/cassandra/lib) and change the yaml file
accordingly
^^ yes, this is the right approach.
Datastax's Cassandra available on Github?
Not exactly. You'll see the version of c* that ships with DSE in the release notes, you can check the source in the apache/cassandra github and it will match (up to and excluding the build number). The exact c* build under DSE will have some critical patches from future versions and that exact source code is not avaliable. However, the dot release in apache/cassandra is good enough for all intents and purposes.
I.E. look at https://github.com/apache/cassandra/tree/cassandra-2.1.8 for 4.7.1
As mentioned by #Mikea we need to override ISaslAwareAuthenticator and while using Cassandra in DSE need to be very sure of Cassandra version and then dig into appropriator Github repo.
Just wonder if there is a way to add Microsoft's javascript for HDInsight to hadoop project?
Has Microsoft released their javascript solution as open source?
Your comment welcome
What javascript for HDInsight project are you referring to?
Our node.js SDK and command line tooling is available on Github (here's the CLI: https://github.com/Azure/azure-xplat-cli )
If that's not it, let me know!
--matt winkler, big data # microsoft
It's not possible to write a solution for hadoop in pure Javascript, however Javascript can be used for writing Pig UDFs - http://pig.apache.org/docs/r0.9.1/udf.html#js-udfs
We have a liferay portal running on a hosting company, and We want to bring it to our own structure. So, I've downloaded the excellent bitnami stack and loaded it in our vmware server.
I've no experience on liferay whatsoever, all I know its that it uses mysql as database. Is there any docs on how to do it?
Tks!
Use the Liferay's Wiki:
5.0 to 5.1: http://www.liferay.com/community/wiki/-/wiki/Main/Upgrade+Instructions+from+5.0+to+5.1
5.1. to 5.2: http://www.liferay.com/community/wiki/-/wiki/Main/Upgrade+Instructions+from+5.1+to+5.2
I recommend to do a 2-step upgrade since direct upgrade from 5.0 to 5.2 is more troublesome.
There have been reports that it's some work to upgrade older versions to the latest and greatest, so you should be prepared for some efforts.
That said, the way you should go is to backup the previous installation (e.g. all directories, database entries etc) and deploy that on your own server. This installation then is updated to the latest version by installing the latest version and pointing it to the data from the previous installation. During the first startup, liferay will (given sufficient privileges on mysql) update the database structure and everything it needs. Keep your backup ready and test thoroughly if everything is upgraded the way you intended it to be.
Also you need to keep an eye on your customized stuff - if you have portlets or other components that use the liferay api, you might need to upgrade those manually to take changed APIs into account.
Theoretically that should be it. I've heard of people having had some problems with this - but it all depends on your level of customization and utilization of features in liferay.
The liferay folks intend to circumvent this in future with their EE environment, where you get better defined upgrade paths and long term support with minor upgrades to your environment, keeping APIs and database requirements stable. I'd hope that even upgrades between major versions will benefit from this, but have not yet tried it.