Working with memsql cluster as primary storage design, by default data files are installed in a place like the following on CentOS 6.x:
/var/lib/memsql-ops/data/installs/MI9dfcc72a5b044f2694b5f7028803a21e
Is there any way to relocate the data path to another folder on the same machine?
This is not a best way but it works. I just re-install MemSQL to other directory:
sudo mkdir /data/memsql
sudo ./install.sh --root-dir /data/memsql
In this case MemSQL Ops still will be in /var/lib/memsql-ops but all nodes will be installed to /data/memsql directory (look at symlink /var/lib/memsql) and all data will be inside this directory too.
P.S. Additional installation options you can find use memsql-ops agent-install --help command.
Related
In order to test and learn Spark functions, developers require Spark latest version. As the API's and methods earlier to version 2.0 are obsolete and no longer work in the newer version. This throws a bigger challenge and developers are forced to install Spark manually which wastes a considerable amount of development time.
How do I use a later version of Spark on the Quickstart VM?
Every one should not waste setup time which I have wasted, so here is the solution.
SPARK 2.2 Installation Setup on Cloudera VM
Step 1: Download a quickstart_vm from the link:
Prefer a vmware platform as it is easy to use, anyways all the options are viable.
Size is around 5.4gb of the entire tar file. We need to provide the business email id as it won’t accept personal email ids.
Step 2: The virtual environment requires around 8gb of RAM, please allocate sufficient memory to avoid performance glitches.
Step 3: Please open the terminal and switch to root user as:
su root
password: cloudera
Step 4: Cloudera provides java –version 1.7.0_67 which is old and does not match with our needs. To avoid java related exceptions, please install java with the following commands:
Downloading Java:
wget -c --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
Switch to /usr/java/ directory with “cd /usr/java/” command.
cp the java download tar file to the /usr/java/ directory.
Untar the directory with “tar –zxvf jdk-8u31-linux-x64.tar.gz”
Open the profile file with the command “vi ~/.bash_profile”
export JAVA_HOME to the new java directory.
export JAVA_HOME=/usr/java/jdk1.8.0_131
Save and Exit.
In order to reflect the above change, following command needs to be executed on the shell:
source ~/.bash_profile
The Cloudera VM provides spark 1.6 version by default. However, 1.6 API’s are old and do not match with production environments. In that case, we need to download and manually install Spark 2.2.
Switch to /opt/ directory with the command:
cd /opt/
Download spark with the command:
wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz
Untar the spark tar with the following command:
tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz
We need to define some environment variables as default settings:
Please open a file with the following command:
vi /opt/spark-2.2.0-bin-hadoop2.7/conf/spark-env.sh
Paste the following configurations in the file:
SPARK_MASTER_IP=192.168.50.1
SPARK_EXECUTOR_MEMORY=512m
SPARK_DRIVER_MEMORY=512m
SPARK_WORKER_MEMORY=512m
SPARK_DAEMON_MEMORY=512m
Save and exit
We need to start spark with the following command:
/opt/spark-2.2.0-bin-hadoop2.7/sbin/start-all.sh
Export spark_home :
export SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.7/
Change the permissions of the directory:
chmod 777 -R /tmp/hive
Try “spark-shell”, it should work.
I am trying to build a hadoop cluster with four nodes.
The four machines are from my school's lab and I found their /usr/local are mount from a same public disk which means their /usr/local are identical.
The problem is, I can not start data node on slaves because the hadoop files are always the same(like tmp/dfs/data).
I am planning to configure and insatll hadoop in other dirs like /opt .
The problem is I found almost all the installation tutorial ask us to install it in /usr/local , so I was wondering will there be any bad consequence if I install hadoop in other place like /opt ?
Btw, I am using Ubuntu 16.04
As long as HADOOP_HOME points to where you extracted the hadoop binaries, then it shouldn't matter.
You'll also want to update PATH in ~/.bashrc, for example.
export HADOOP_HOME=/path/to/hadoop_x.yy
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
For reference, I have some configuration files inside of /etc/hadoop.
(Note: Apache Ambari makes installation easier)
It is not at all necessary to install hadoop under /usr/local. That location is generally used when you install single node hadoop cluster (although it is not mandatory). As long as you have following variables specified in .bashrc, any location should work.
export HADOOP_HOME=<path-to-hadoop-install-dir>
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
I have a virtual machine which has Spark 1.3 on it but I want to upgrade it to Spark 1.5 primarily due certain supported functionalities which were not in 1.3. Is it possible I can upgrade the Spark version from 1.3 to 1.5 and if yes then how can I do that?
Pre-built Spark distributions, like the one I believe you are using based on another question of yours, are rather straightforward to "upgrade", since Spark is not actually "installed". Actually, all you have to do is:
Download the appropriate Spark distro (pre-built for Hadoop 2.6 and later, in your case)
Unzip the tar file in the appropriate directory (i.e.where folder spark-1.3.1-bin-hadoop2.6 already is)
Update your SPARK_HOME (and possibly some other environment variables depending on your setup) accordingly
Here is what I just did myself, to go from 1.3.1 to 1.5.2, in a setting similar to yours (vagrant VM running Ubuntu):
1) Download the tar file in the appropriate directory
vagrant#sparkvm2:~$ cd $SPARK_HOME
vagrant#sparkvm2:/usr/local/bin/spark-1.3.1-bin-hadoop2.6$ cd ..
vagrant#sparkvm2:/usr/local/bin$ ls
ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6
ipcluster2 ipengine iptest2 jsonschema
ipcontroller ipengine2 ipython pygmentize
vagrant#sparkvm2:/usr/local/bin$ sudo wget http://apache.tsl.gr/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz
[...]
vagrant#sparkvm2:/usr/local/bin$ ls
ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6
ipcluster2 ipengine iptest2 jsonschema spark-1.5.2-bin-hadoop2.6.tgz
ipcontroller ipengine2 ipython pygmentize
Notice that the exact mirror you should use with wget will be probably different than mine, depending on your location; you will get this by clicking the "Download Spark" link in the download page, after you have selected the package type to download.
2) Unpack the tgz file with
vagrant#sparkvm2:/usr/local/bin$ sudo tar -xzf spark-1.*.tgz
vagrant#sparkvm2:/usr/local/bin$ ls
ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6
ipcluster2 ipengine iptest2 jsonschema spark-1.5.2-bin-hadoop2.6
ipcontroller ipengine2 ipython pygmentize spark-1.5.2-bin-hadoop2.6.tgz
You can see that now you have a new folder, spark-1.5.2-bin-hadoop2.6.
3) Update accordingly SPARK_HOME (and possibly other environment variables you are using) to point to this new directory instead of the previous one.
And you should be done, after restarting your machine.
Notice that:
You don't need to remove the previous Spark distribution, as long as all the relevant environment variables point to the new one. That way, you may even quickly move "back-and-forth" between the old and new version, in case you want to test things (i.e. you just have to change the relevant environment variables).
sudo was necessary in my case; it may be unnecessary for you depending on your settings.
After ensuring that everything works fine, it's good idea to delete the downloaded tgz file.
You can use the exact same procedure to upgrade to future versions of Spark, as they come out (rather fast). If you do this, either make sure that previous tgz files have been deleted, or modify the tar command above to point to a specific file (i.e. no * wildcards as above).
Set your SPARK_HOME to /opt/spark
Download the latest pre-built binary i.e. spark-2.2.1-bin-hadoop2.7.tgz - can use wget
Create the symlink to the latest download - ln -s /opt/spark-2.2.1 /opt/spark
Edit files in $SPARK_HOME/conf accordingly
For every new version you download just create the symlink to it (step 3)
ln -s /opt/spark-x.x.x /opt/spark
I am following the instructions to configure hadoop-2.0.0 cluster for installing Impala. In hdfs-site.xml, I add two properties "dfs.client.read.shortcircuit" and "dfs.domain.socket.path" (/var/lib/hadoop-hdfs/dn_socket).
But when I start the Hadoop cluster by start-dfs.sh, it fails to start datanodes. The log in datanode says that "failed to stat a path component: '/var/lib/hadoop-hdfs'". Then I create /var/lib/hadoop-hdfs manually, and start Hadoop cluster again. It fails again and log says that it's the permission problem about that directory. OK, fine. I change the owner of hadoop-hdfs from root to ubuntu (ubuntu is the machine username). Now it finally works normally.
I am just confused. Am I doing in the right way? Do we really need to create /var/lib/hadoop-hdfs by ourselves and change the permission or the owner of that directory? Or I missed some configuration setting?
I was running into similar problems using Cloudera Manager. It was an issue of trying to run in 'single-user mode' instead of using root. I think you are doing something similar with user ubuntu. Is this a clean install or are you upgrading / did you have a failed install last time?
I'm guessing you sudo-ed somewhere you should have run something as 'ubuntu'.
If you can make it work by manually setting permissions, go for it. I have a feeling there are lots of other files owned by root that should be owned byubuntu lurking about in your system.
Anecdotally, if there is no critical data in the server, I have found it is easier to very thoroughly remove any and all files from the old install and then reinstall fresh.
I was facing a similar issue with starting the datanodes. Then, I came across this link https://github.com/cloudera/Impala/wiki/Build-prerequisites, where it states that we need to create the /var/lib/hadoop-hdfs manually and set the appropriate permissions. This has also fixed my problem.
Make certain directory /var/lib/hadoop-hdfs/present is OK.
I am trying to install and run the Datastax cassandra community edition on Redhat Linux but I don't have root privileges. I extracted the tar in my home directory but I'm unable to do ./cassandra
I am doing this on a HPC cluster and thought I'd install Cassandra in my home directory and save the data in a scratch space we've been provided (home directory doesn't have enough space to hold entire data)
I would appreciate any help! Thanks!
From the installation docs for DataStax community edition, the only other step you need is to create the data and log directories:
$ sudo mkdir /var/lib/cassandra
$ sudo mkdir /var/log/cassandra
$ sudo chown -R $USER: $GROUP /var/lib/cassandra
$ sudo chown -R $USER: $GROUP /var/log/cassandra
If you are using a different location, that's fine. Just make sure to create the dirs and assign owners (like above) and also set the appropriate values in cassandra.yaml (data_file_directories, commitlog_directories, saved_caches_directory) and log4j-server.properties.
A more detailed log of the results you're seeing would confirm whether this is the problem.
Yes, you can run Cassandra without having root or sudo privileges. Extract Cassandra tar file into your local user directory, configure cassandra.yaml as single node. Then you run Cassandra from bin directory, either in foreground or background and login using cql shell.
bin/cassandra -f
OR
bin/cassandra
AND
cqlsh
This is for Cassandra version 2.1x
You can run Cassandra without root or sudo privileges, Besides extracting the
tar file, you need to modify the conf/logback.xml to redirect the log to
your home or somewhere you can write.
<file>/home/xxxx/system.log</file>
<fileNamePattern>/home/xxxx/system.log.%i.zip</fileNamePattern>
The only minor issue of not running with root is - the ULIMIT -l (RLIMIT on
max locked memory) will need to be increased and I cannot increase it with my account.
But this does not prevent it to run..
In my opinion, almost all the java-written apache projects need not the root privilege. Cassandra has the same feature.
Firstly, download apache-cassandra-bin.tar.gz from http://cassandra.apache.org/download/. Remember that do not use .deb or .rpm or others.
Secondly, run tar -xzf cassandra-bin.tar.gz to unzip it to any folder, suppose the folder is $cassandra_home
Thirdly, just go to $cassandra_home/bin, run ./cassandra, done! The data is stored in $cassandra_home/data and the logs are in $cassandra_home/logs.
If you want to set the position of data and logs:
1st, go to $cassandra_home/conf, modify cassandra.yaml file.
Set these directories to the folder which you have read and write access:
data_file_directories:
commitlog_directory:
cdc_raw_directory:
hints_directory:
saved_caches_directory:
(different cassandra version may have different parameters. You can just search director in the yaml file.)
2nd, if you want to enable the log, modify the log file position, modify $cassandra_home/conf/logback.xml (or log4j or others), and set the log folder to another position.
Enjoy it.