How to upgrade Spark to newer version? - apache-spark

I have a virtual machine which has Spark 1.3 on it but I want to upgrade it to Spark 1.5 primarily due certain supported functionalities which were not in 1.3. Is it possible I can upgrade the Spark version from 1.3 to 1.5 and if yes then how can I do that?

Pre-built Spark distributions, like the one I believe you are using based on another question of yours, are rather straightforward to "upgrade", since Spark is not actually "installed". Actually, all you have to do is:
Download the appropriate Spark distro (pre-built for Hadoop 2.6 and later, in your case)
Unzip the tar file in the appropriate directory (i.e.where folder spark-1.3.1-bin-hadoop2.6 already is)
Update your SPARK_HOME (and possibly some other environment variables depending on your setup) accordingly
Here is what I just did myself, to go from 1.3.1 to 1.5.2, in a setting similar to yours (vagrant VM running Ubuntu):
1) Download the tar file in the appropriate directory
vagrant#sparkvm2:~$ cd $SPARK_HOME
vagrant#sparkvm2:/usr/local/bin/spark-1.3.1-bin-hadoop2.6$ cd ..
vagrant#sparkvm2:/usr/local/bin$ ls
ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6
ipcluster2 ipengine iptest2 jsonschema
ipcontroller ipengine2 ipython pygmentize
vagrant#sparkvm2:/usr/local/bin$ sudo wget http://apache.tsl.gr/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz
[...]
vagrant#sparkvm2:/usr/local/bin$ ls
ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6
ipcluster2 ipengine iptest2 jsonschema spark-1.5.2-bin-hadoop2.6.tgz
ipcontroller ipengine2 ipython pygmentize
Notice that the exact mirror you should use with wget will be probably different than mine, depending on your location; you will get this by clicking the "Download Spark" link in the download page, after you have selected the package type to download.
2) Unpack the tgz file with
vagrant#sparkvm2:/usr/local/bin$ sudo tar -xzf spark-1.*.tgz
vagrant#sparkvm2:/usr/local/bin$ ls
ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6
ipcluster2 ipengine iptest2 jsonschema spark-1.5.2-bin-hadoop2.6
ipcontroller ipengine2 ipython pygmentize spark-1.5.2-bin-hadoop2.6.tgz
You can see that now you have a new folder, spark-1.5.2-bin-hadoop2.6.
3) Update accordingly SPARK_HOME (and possibly other environment variables you are using) to point to this new directory instead of the previous one.
And you should be done, after restarting your machine.
Notice that:
You don't need to remove the previous Spark distribution, as long as all the relevant environment variables point to the new one. That way, you may even quickly move "back-and-forth" between the old and new version, in case you want to test things (i.e. you just have to change the relevant environment variables).
sudo was necessary in my case; it may be unnecessary for you depending on your settings.
After ensuring that everything works fine, it's good idea to delete the downloaded tgz file.
You can use the exact same procedure to upgrade to future versions of Spark, as they come out (rather fast). If you do this, either make sure that previous tgz files have been deleted, or modify the tar command above to point to a specific file (i.e. no * wildcards as above).

Set your SPARK_HOME to /opt/spark
Download the latest pre-built binary i.e. spark-2.2.1-bin-hadoop2.7.tgz - can use wget
Create the symlink to the latest download - ln -s /opt/spark-2.2.1 /opt/spark
Edit files in $SPARK_HOME/conf accordingly
For every new version you download just create the symlink to it (step 3)
ln -s /opt/spark-x.x.x /opt/spark

Related

Cloudera Quick Start VM lacks Spark 2.0 or greater

In order to test and learn Spark functions, developers require Spark latest version. As the API's and methods earlier to version 2.0 are obsolete and no longer work in the newer version. This throws a bigger challenge and developers are forced to install Spark manually which wastes a considerable amount of development time.
How do I use a later version of Spark on the Quickstart VM?
Every one should not waste setup time which I have wasted, so here is the solution.
SPARK 2.2 Installation Setup on Cloudera VM
Step 1: Download a quickstart_vm from the link:
Prefer a vmware platform as it is easy to use, anyways all the options are viable.
Size is around 5.4gb of the entire tar file. We need to provide the business email id as it won’t accept personal email ids.
Step 2: The virtual environment requires around 8gb of RAM, please allocate sufficient memory to avoid performance glitches.
Step 3: Please open the terminal and switch to root user as:
su root
password: cloudera
Step 4: Cloudera provides java –version 1.7.0_67 which is old and does not match with our needs. To avoid java related exceptions, please install java with the following commands:
Downloading Java:
wget -c --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
Switch to /usr/java/ directory with “cd /usr/java/” command.
cp the java download tar file to the /usr/java/ directory.
Untar the directory with “tar –zxvf jdk-8u31-linux-x64.tar.gz”
Open the profile file with the command “vi ~/.bash_profile”
export JAVA_HOME to the new java directory.
export JAVA_HOME=/usr/java/jdk1.8.0_131
Save and Exit.
In order to reflect the above change, following command needs to be executed on the shell:
source ~/.bash_profile
The Cloudera VM provides spark 1.6 version by default. However, 1.6 API’s are old and do not match with production environments. In that case, we need to download and manually install Spark 2.2.
Switch to /opt/ directory with the command:
cd /opt/
Download spark with the command:
wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz
Untar the spark tar with the following command:
tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz
We need to define some environment variables as default settings:
Please open a file with the following command:
vi /opt/spark-2.2.0-bin-hadoop2.7/conf/spark-env.sh
Paste the following configurations in the file:
SPARK_MASTER_IP=192.168.50.1
SPARK_EXECUTOR_MEMORY=512m
SPARK_DRIVER_MEMORY=512m
SPARK_WORKER_MEMORY=512m
SPARK_DAEMON_MEMORY=512m
Save and exit
We need to start spark with the following command:
/opt/spark-2.2.0-bin-hadoop2.7/sbin/start-all.sh
Export spark_home :
export SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.7/
Change the permissions of the directory:
chmod 777 -R /tmp/hive
Try “spark-shell”, it should work.

Is it necessary to install hadoop in /usr/local?

I am trying to build a hadoop cluster with four nodes.
The four machines are from my school's lab and I found their /usr/local are mount from a same public disk which means their /usr/local are identical.
The problem is, I can not start data node on slaves because the hadoop files are always the same(like tmp/dfs/data).
I am planning to configure and insatll hadoop in other dirs like /opt .
The problem is I found almost all the installation tutorial ask us to install it in /usr/local , so I was wondering will there be any bad consequence if I install hadoop in other place like /opt ?
Btw, I am using Ubuntu 16.04
As long as HADOOP_HOME points to where you extracted the hadoop binaries, then it shouldn't matter.
You'll also want to update PATH in ~/.bashrc, for example.
export HADOOP_HOME=/path/to/hadoop_x.yy
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
For reference, I have some configuration files inside of /etc/hadoop.
(Note: Apache Ambari makes installation easier)
It is not at all necessary to install hadoop under /usr/local. That location is generally used when you install single node hadoop cluster (although it is not mandatory). As long as you have following variables specified in .bashrc, any location should work.
export HADOOP_HOME=<path-to-hadoop-install-dir>
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

Installing lapis on linux mint (alongside lua 5.3)

A few days ago I had a few problems while installing trying to install lapis on my new installation of linux mint. The main problem was that I wanted to have lua 5.3 as the main lua interpreter on my system, but lapis only works with lua 5.1.
This is how I ended up installing it
Note: Instructions for normal installation process, with aditional lua 5.3 being optional
Prerequisites
First of all install all the prerequisites with apt-get install libreadline-dev libncurses5-dev libpcre3-dev libssl-dev perl make build-essential*. This is all you should need to install lua, luarocks and openresty.
* copied from openresty website
Lua Interpreter(s)
Next, go to https://www.lua.org/versions.html and download the latest version of lua5.1 (wget https://www.lua.org/ftp/lua-5.1.5.tar.gz). Then extract the downloaded file tar -xf lua-5.1.5.tar.gz and optionally rename the directory mv lua-5.1.5 lua51.
Now you can simply build and install lua by moving to the directory cd lua51 and running make make linux and sudo make install
Aditionally, you might want to have lua5.3 installed on your system as the main lua interpreter. Luarocks doesn't seem to particularly like this kind of setup though, so I recommend the following:
First download and extract (and optionally rename) both lua5.1 and lua 5.3; go to the lua 5.1 directory and open Makefile in a text editor; Edit lines 12-15 to install lua in another directory. For me it worked to just add /lua51 to INSTALL_TOP (line 12). Next go to line 44 and change the names of the binaries (I chose lua51 and luac51), optionally do the same with the man pages (this requires also changing them in the doc subdirectory).
The next step is to go to the src/ directory and edit the makefile there as well: in lines 32 and 35 change the names as you did in the previous makefile (lua51 and luac51 in my case).
After this you can just make linux and sudo make install as described above.
Luarocks
Now you need to install luarocks on your system. Start by downloading the latest release of luarocks (http://keplerproject.github.io/luarocks/releases/) and extract it. Again, you can rename it to luarocks/ reduce typing. cd to the directory you just extracted and run ./condigure.
If you changed the lua installation path, you will have give some parameters to the configure script:
For lua 5.1 ./configure --lua-version=5.1 --with-lua=/usr/local/lua51 --lua-suffix=51 is how I had to do it (--lua-suffix is what I added to lua and luac and --with-lua tells it where the bin, lib, etc. subdirectories are; only relevant if you changed INSTALL_TOP in the makefile)
Optionally you can now proceed to (download, ) build and install lua 5.3 with its standard configuration. After that you can even go back to the luarocks directory and repeat ./configure, make build and make install and it should automatically install itself with lua 5.3 and leave the installation for lua5.1 intact**.
** the luarocks executable is actually just a symlink to luarocks-VERSION (where VERSION can be 5.1, 5.3, etc.) in the same directory. Each time you install luarocks this link is overwritten to point to the latest installation, but the other executables are still there.
OpenResty
The next step is to install OpenResty: open http://openresty.org/en/installation.html and check the prerequisite section. It should say the same as at the beginning of this answer. If not, install any missing package now. You can also just follow the installation instructions there, but I will be repeating it anyway; go to http://openresty.org/en/download.html and download the latest version. Extract the downloaded archives (and rename the new directory to simply openresty). cd to the new directory and run ./configure --with-pcre-jit --with-ipv6 (this might take a while), make (this might take an even longer while) and sudo make install.
At this point everything except lapis itself should be set up and working.
Lapis
To install lapis, type sudo lurocks install lapis (user luarocks-5.1** instead if you have installed more than one version of it).
Congratulations! If you got no errors, you should now have lapis installed and ready to use :)
** see section Luarocks.

How to get the COBRA toolbox working with proper SBML support under MATLAB in linux (such as Ubuntu 14.04)?

Consider these 4 pieces of software:
COBRA 2.05
LibSBML 5.10
MATLAB R2013a (Also known as 8.1, 64-bit; MATLAB no longer supports 32-bit Linux anyway)
A 64-bit Linux OS (such as Ubuntu 14.04 or the latest Mint but not restricted to them)
Intro
The COBRA toolbox is an optimization suite that runs on top of MATLAB aimed at the development of MATLAB code for metabolic network modelling. Such a "network" is a system of equations that can have a very large number of equations and variables (such as thousands). Therefore, routines to read and write those large models according to some format specification are a must-have, and COBRA uses the standard SBML for that.
Problem
Unlike the Windows versions, the Linux binary packages do not integrate well out-of-the-box: to begin with, the pre-compiled Linux binary of libSBML (open-source) available for download does not come with MATLAB support. If one tries to use the pre-compiled libSBML, COBRA won't find the "MATLAB bindings" and therefore won't be able to, for example, read and write SBML XML files from the disk in a m-script.
The question
What needs to be done to make COBRA 2.05 running on top of MATLAB R2013a under Linux (Ubuntu 14.04 or the latest Mint, but this is not likely distro-specific) able to read and write SBML XML files? In other words, what needs to be done system-wide to make COBRA pass its own testSBML test?
This is how I got it working and what I could learn from all the hassle regarding how my Linux box works. I hope I am not forgetting/missing/mistaking anything.
1. MATLAB
1.1. Install MATLAB
Install it in its default location (you will need root access for this), don't be stubborn like I tried to be. Why:
1.1.1. Integration
It is very likely you will want to install some other software that uses the MATLAB framework at some point, and in the real world software doesn't always find other software even if you know how to tell it where to look for.
1.1.2. Make your life easier
Even though it seems like a good idea to install a big software in a place where you have lots of available space and that you can use in multiple machines (specially in Linux, which doesn't have that abomination called Registry, and has symbolic links), that would only perhaps be a good idea - apart from item 1.1.1 - if that place is a partition with a Linux filesystem, since at some point, some executable/script will need execution permission, and mounting the entire partition with execution permission for all its files is rather unsafe. Therefore, do not put MATLAB in an NTFS partition of an external HD; perhaps creating a Linux partition in the external HD just for Linux-specific stuff could work for this matter, but how much hassle is that?
1.2. Setup MATLAB so people and other software can launch it
Even though I have seen somewhere that the MATLAB installer eventually shows an option to create symbolic links in the system path for convenience, it didn't in my case. But that is OK, since I would have to replace the symbolic link /usr/local/bin/matlab by the following shell-script (same path, same filename) anyway:
#!/bin/sh
export LD_LIBRARY_PATH=/usr/local/lib:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
exec /usr/local/MATLAB/R2013a/bin/matlab $*
OBS: That LD_LIBRARY_PATH is needed for MATLAB to find SBML bindings later and to be able to use them. Also, if you install some third-party solver such as TOMLAB, you will most likely need to add some more paths in this little launcher script.
OBS 2: In my case, the installation script didn't automatically create any launchers or shortcuts, but I have found an iconless and extension-less Matlab 8.01 file and a matlab icon as a png file, and that first file was a template .desktop file that I could edit to fit my needs and put in /usr/share/applications so the Unity Dash would find it. The contents of this Matlab.desktop file are:
[Desktop Entry]
Type=Application
Icon=/usr/local/MATLAB/R2013a/Matlab.png
Name=Matlab 8.01
Comment="Start Matlab 8.01"
Exec=/usr/local/MATLAB/R2013a/bin/matlab -desktop
Categories=Development;
Name=Matlab 8.01
GenericName=Matlab 8.01
Comment="Start Matlab 8.01"
2. libSBML
2.1. Install libSBML
libSBML is provided by a deb package specific for Ubuntu (and for CentOS), and also versions for several flavours of Windows and MacOSX (their home page: http://sbml.org/Software/libSBML). Guess which is the only platform whose binaries weren't compiled with MATLAB support? Linux, of course. That means you will need to compile from source (and that the deb package is therefore useless to you). To compile:
2.1.1. Install dependencies
The dependency libxml2-dev (from software manager or from a terminal):
sudo apt-get install libxml2-dev
2.1.2. Save yourself some time in the future
Usually, one would do configure, make and then make install. But this is not recommended for the same reason as installing anything that doesn't come in a pretty little package: you will loose control of which files went where, and will need to keep the source-code to be able to run make uninstall if you need to uninstall it later. So, install checkinstall and use it to replace the step make install, since checkinstall creates a package for your system that can be later uninstalled or reinstalled just as any regular packaged software (from software manager or from a terminal):
sudo apt-get install checkinstall
2.1.3. Configure the compiling-process
Get LibSBML source code and extract it to some folder. From a terminal, navigate to that folder and configure the compilation:
./configure --with-matlab
OBS: with the with-matlab flag, the configure script will fail it it cannot find an executable whose filename is matlab. If it fails, it outputs that the matlab file could not be found, but the test it performs is actually both for the existence of the file and whether it is executable. So, if the file is in an NTFS partition, configure will fail even if it finds the file, but will tell you the file couldn't be found. You can enforce it to look for the executable in /path/to/matlab/root by passing (it will look for a bin folder inside that path, and for the executable inside that bin folder):
./configure --with-matlab=/path/to/matlab/root
OBS: This will install libSBML in the default location: /usr/local/lib. Again, it is a good idea to just let it install in its default location, but if you need to change it, you can pass the path with the flag: --prefix=/your/installation/path
OBS 2: You might ask why libSBML needs to find and execute matlab to be compiled with support for it: it needs to fire up MATLAB later to build MEX-files (compiled MATLAB code), so I would speculate you wouldn't be able to install libSBML after all if your MATLAB has no compiler to generate MEX-files.
2.1.4. Build and install libSBML
make
checkinstall
VERY IMPORTANT OBS:
I) checkinstall asks for confirmation of the metadata of the package it is about to create. In my case, the string for the version field came by default as "Source" (without the quotes), which caused checkinstall to fail because dpkg (the system tool that actually builds the deb file) failed complaining the version number must start with, well, a number. So, save yourself some time and make sure the string in the version field starts with a number (i.e. "5.10", without the quotes obviously)
II) checkinstall asks if you want to exclude from the future package files that the make install command would put in your home folder and says it is a good idea to exclude them. LibSBML has a test.xml file that it needs to be in the $HOME folder later, or else it won't let you integrate with MATLAB. And even though it tells you a test.xml is missing, it doesn't tell you where that file should be or if that file was something that came with the library. I only noticed it because checkinstall had found a $HOME/test.xml reference earlier (that I excluded from the package, of course) and I had found that odd. So, save yourself some time and exclude $HOME/test.xml from the package generated by checkinstall, and then search for test.xml inside the source-code folder and copy it to $HOME as soon as libSBML finishes being installed by checkinstall.
2.2. Integrate libSBML to MATLAB
Fire up MATLAB, navigate to where the SBML MATLAB-bindings were installed in step 2.1.5 (in my case: /usr/local/lib) and run the file installSBML.m that should be there.
2.2.1. Shared libraries problems
In my case, I had an error due to an old unresolved issue: libstdc++.so.6 not having a reference to GLIBCXX_3.4.15. Turns out that MATLAB was trying to use a libstdc++.so.6.0.13 (libstdc++.so.6 was a symbolic link pointing to this file) that came with it in /usr/local/MATLAB/R2013a/sys/os/glnxa64, which indeed didn't have that reference (one could verify that by issuing:
strings /usr/local/MATLAB/R2013a/sys/os/glnxa64/libstdc++.so.6.0.13 | grep GLIBC
). My system has a libstdc++.so.6.0.19 located in /usr/lib/x86_64-linux-gnu that has that reference, so I enforced MATLAB to use 6.0.19 one by setting the LD_LIBRARY_PATH properly (refer to step 1.2) and also by renaming the libstdc++.so.6 that came with MATLAB to something else so it would not find it and would keep looking until it found my system's. A friend of mine running Linux Mint didn't need to rename anything: for him, setting the LD_LIBRARY_PATH was enough.
2.2.2. Other problems
installSBML.m will fail if it doesn't find that $HOME/test.xml file mentioned in step 2.1.5, regardless of whether the library actually works. It assumes that if it could not test itself using a file that it assumes to be in $HOME, the user shouldn't have the option to install it anyway.
3. COBRA / SBML toolbox
3.1. Setup COBRA
In MATLAB, navigate to <YOUR_COBRA_ROOT_FOLDER_HERE>/external/toolboxes/SBMLToolbox-4.1.0/toolbox and run the file install.m there. You should have all set so it finds the MATLAB-bindings you set up in step 2.2.
3.2. MATLAB setpaths problems
I had to manually edit the file /usr/local/MATLAB/R2013a/toolbox/local/pathdef.m as root to include the folder /usr/local/lib (where libSBML and its MATLAB-bindings are) to make that setting persistent. Every time I restarted MATLAB, its setpath had gone back to the default, no matter if I started MATLAB as root when setting its setpath via the MATLAB GUI.
3.3. Test
Now you have hopefully connected all the dots. Try it: in MATLAB, navigate to <YOUR_COBRA_ROOT_FOLDER_HERE> and issue:
initCobraToolbox
testAll
If you haven't got any third-party solvers installed and configured, it should pass 14 of the 19 tests, including the SBML test (testSBML). Now you can load SBML files into MATLAB and play with them.
I also needed to add a symbolic link from /usr/local/lib/libsbml.so.5 to the MATLAB sys folder by:
sudo ln -s /usr/local/lib/libsbml.so.5 /usr/local/MATLAB/R2014a/sys/os/glnxa64/
This finally made the installation possible.
I installed using Cmake. To do this it is necessary to find the FindMatlab.cmake in the source package and insert the MATLAB path manually!
.............
elseif(EXISTS "/Applications/MATLAB_R2008a.app/")
set(MATLAB_ROOT_PATH "/Applications/MATLAB_R2008a.app/")
endif()
else()
if (EXISTS "/usr/local/MATLAB/R2014a/")
set(MATLAB_ROOT_PATH "/usr/local/MATLAB/R2014a/")
endif()
endif()
..........
FYI, to resolve the shared library issue at point 2.2.1 I needed to install the package matlab-support (in Ubuntu repositories)

Run time installation directory of debian package contents

I have a debian package that I built that contains a tar ball of the files, a control file, and a postinst file. Its built using dpkg-deb and it installs properly using dpkg.
The modification I would like to make is to have the installation directory of the files be determined at runtime based on an environment variable that will be set when dpkg -i is run on the deb file. I echo out the environment variable in the postinst script and I can see that its set properly.
My questions:
1) Is it possible to dynamically determine the installation directory at runtime?
2) If its possible how would I go about this? I have read about the rules file and the mypackage.install files but I don't know if either of these would allow me to accomplish this.
I could hack it by copying the files to the target location in the posinst script but I would prefer to do it the right way if possible.
Thanks in advance!
So this is what I found out about this problem over the past couple of weeks.
With prepackaged binaries you can't build a debian package with a destination directory dynamicall determined at runtime. I believe that this might be possible if installing a package that is built from source where you can set the install directory using configure. But in this case since these are embedded Ubuntu machines they don't have make so I didn't pursue such an option. I did work out a non traditional method (hack) for installing that did work. Since debian packages simply contain a tar ball relative to / simply build your package relative to a directory under /tmp. In the postinst script you can then determine where to copy the files from the archive into a permanent location.
I expected that after rebooting and the automatic deletion of the subdirectory under /tmp that dpkg might not know that the file package existed. This wasn't a problem. When I ran 'dpkg -l myapp' it showed as still installed. Updating the package using dpkg/apt-get also worked without a hitch.
What I did find is that if you attempted to remove the package using 'dpkg -r myapp' that dpkg would try and remove /tmp which wasn't good. However /tmp isn't easily removed so it never succeeded. Plus in our situation we never remove packages but instead simply upgrade them.
I eventually had to abandon the universal package due to code differences in the sources resulting in having to recompile per platform but I would have left it this way and it did work.
I tried using --instdir to change the install directory of the package and it does relocate the files but dpkg fails since the dpkg file can't be found relative to the new instdir. Using --instdir is sort of like a chroot. I also tried --admindir and --root in various combinations to see if I could use the dpkg system relative to / but install relocate the files but they didn't work. I guess rpm has a relocate option that works but not Ubuntu.
You can also write a script that runs dpkg-deb with a different environment for 6 times, generating 6 different packages. When you make a modification, you simply have to run your script, and all 6 packages gets generated and you can install them on your machines avoiding postinst hacking!
Why not install to a standard location, and simply use a postinst script to create symbolic links to the desired location? This is much cleaner, and shouldn't break anything in dpk -I.

Resources