Accumulo not getting initialised. - accumulo

I am trying to initialise accumulo. I am configuring accumulo on hadoop2.0.0-cdh4.4.0.
I am making using tars on a MAC book.
I am getting an error when initialising accumulo : bin/accumulo init.
java.io.IOException: Mkdirs failed to create /accumulo/instance_id error.
The log says:
2014-05-24 01:24:33,935 [util.Initialize] FATAL: Failed to initialize filesystem
java.io.IOException: Mkdirs failed to create /accumulo/instance_id
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:447)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:867)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:829)
at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1129)
at org.apache.accumulo.server.util.Initialize.initFileSystem(Initialize.java:269)
at org.apache.accumulo.server.util.Initialize.initialize(Initialize.java:213)
at org.apache.accumulo.server.util.Initialize.doInit(Initialize.java:199)
at org.apache.accumulo.server.util.Initialize.main(Initialize.java:545)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.accumulo.start.Main$1.run(Main.java:103)
at java.lang.Thread.run(Thread.java:744)
2014-05-24 01:24:33,937 [conf.Configuration] WARN : fs.default.name is deprecated. Instead, use fs.defaultFS
2014-05-24 01:24:33,937 [util.Initialize] FATAL: Default filesystem value ('fs.defaultFS' or 'fs.default.name') was found in the Hadoop configuration
2014-05-24 01:24:33,938 [util.Initialize] FATAL: Please ensure that the Hadoop core-site.xml is on the classpath using 'general.classpaths' in accumulo-site.xml
Please suggest me , I tried to fix this by creating the /accumulo, /user/accumulo on hdfs and gave 777 permissions also.

The root cause is that the Hadoop jars and configuration are not being placed on Accumulo's classpath. I'm not familiar with how Cloudera packages their Hadoop artifacts.
If you notice in your stack trace, it lists out the ChecksumFileSystem class instead of the DistributedFileSystem. This means that Accumulo doesn't know about the HDFS instance you're trying to write to and is falling back to using the local file system (that's what the ChecksumFileSystem is doing).
To fix this, check a couple of things in your Accumulo configuration files. First, make sure that you have correctly defined HADOOP_PREFIX and HADOOP_CONF_DIR in accumulo-env.sh. Second, make sure that the value you have configured for general.classpaths in accumulo-site.xml all exist, specifically the ones that reference HADOOP_PREFIX and HADOOP_CONF_DIR.

Related

Spark Submit error when running a JAR from Azure Databricks

I'm trying to issue spark submit from Azure Databricks jobs scheduler, currently stuck with the below error. Error says: File file:/tmp/spark-events does not exist. I need some pointers to understand do we need to create this directory in Azure blob location(which is my storage Layer) or in Azure DBFS location.
As per the below link, not so clear where to create the directory when running the spark-submit from Azure Databricks jobs scheduler.
SparkContext Error - File not found /tmp/spark-events does not exist
Error:
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Warning: Ignoring non-Spark config property: eventLog.rolloverIntervalSeconds
Exception in thread "main" java.lang.ExceptionInInitializerError
at com.dta.dl.ct.qm.hbase.reverse.pipeline.HBaseVehicleMasterLoad.main(HBaseVehicleMasterLoad.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: File file:/tmp/spark-events does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:97)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:580)
at com.dta.dl.ct.qm.hbase.reverse.pipeline.HBaseVehicleMasterLoad$.<init>(HBaseVehicleMasterLoad.scala:32)
at com.dta.dl.ct.qm.hbase.reverse.pipeline.HBaseVehicleMasterLoad$.<clinit>(HBaseVehicleMasterLoad.scala)
... 13 more
You need to create this folder on the driver node before collecting event logs (that's by design).
To do so, one way could be adding the property spark.history.fs.logDirectory (present at the spark-defaults.conf file) on a global init script as described here.
Please make sure that the folder defined on that property exist and can be accessed from the driver node

Why spark app with deploy-mode local use "ls -F" on windows? [duplicate]

I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors?
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
tl;dr You'd rather not.
Well, it may be possible, but given you've just started your journey to Spark's land the efforts would not pay off.
Windows has never been a developer-friendly OS to me and whenever I teach people Spark and they use Windows I just take it as granted that we'll have to go through the winutils.exe setup but many times also how to work on command line.
Please install winutils.exe as follows:
Run cmd as administrator
Download winutils.exe binary from https://github.com/steveloughran/winutils repository (use hadoop-2.7.1 for Spark 2)
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin
Set HADOOP_HOME to reflect the directory with winutils.exe (without bin), e.g. set HADOOP_HOME=c:\hadoop
Set PATH environment variable to include %HADOOP_HOME%\bin
Create c:\tmp\hive directory
Execute winutils.exe chmod -R 777 \tmp\hive
Open spark-shell and run spark.range(1).show to see a one-row dataset.

Spark on Yarn Container Failure

For reference: I solved this issue by adding Netty 4.1.17 in hadoop/share/hadoop/common
No matter what jar I try and run (including the example from https://spark.apache.org/docs/latest/running-on-yarn.html), I keep getting an error regarding container failure when running Spark on Yarn. I get this error in the command prompt:
Diagnostics: Exception from container-launch.
Container id: container_1530118456145_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
When I look at the logs, I then find this error:
Exception in thread "main" java.lang.NoSuchMethodError:io.netty.buffer.PooledByteBufAllocator.metric()Lio/netty/buffer/PooledByteBufAllocatorMetric;
at org.apache.spark.network.util.NettyMemoryMetrics.registerMetrics(NettyMemoryMetrics.java:80)
at org.apache.spark.network.util.NettyMemoryMetrics.<init>(NettyMemoryMetrics.java:76)
at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:109)
at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:71)
at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:461)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:530)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:347)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1758)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:869)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Any idea why this is happening? This is running on a pseudo-distributed cluster set up according to this tutorial: https://wiki.apache.org/hadoop/Hadoop2OnWindows. Spark runs fine locally, and seeing as this jar was provided with Spark, I doubt it's a problem within the jar. (Regardless, I added a Netty dependency inside another jar and I'm still getting the same error).
The only thing set in my spark-defaults.conf is spark.yarn.jars, which points to a hdfs directory where I uploaded all of Spark's jars. io.netty.buffer.PooledByteBufAllocator is contained within these jars.
Spark 2.3.1, Hadoop 2.7.6
I had exactly same issue. Previously I used Hadoop 2.6.5 and the compatible spark version, things worked out fine. When I switched to Hadoop 2.7.6, problem occurred. Not sure what is cause, but I copied to netty.4.1.17.Final jar file to the hadoop library folder then the problem goes away.
Seems like you have multiple netty version on your classpath ,
mvn clean compile
Remove all and add latest one.
This may have the version problem between your yarn and spark. check the compatibility of the versions are installed.
I strongly suggest to read more about NoSuchMethodError and some other similar Exceptions like NoClassDefFoundError and ClassNotFoundException. This suggestions reason is that when you start using spark in different situations these are the much more confusing errors and exception for the people are not so experienced. NosuchMethodError
Of course caring a lot is the best practice strategy for a programmer absolutely the ones working on distributed systems like spark. Well Done. ;)

Running spark-shell on windows results in unusable shell [duplicate]

I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors?
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
tl;dr You'd rather not.
Well, it may be possible, but given you've just started your journey to Spark's land the efforts would not pay off.
Windows has never been a developer-friendly OS to me and whenever I teach people Spark and they use Windows I just take it as granted that we'll have to go through the winutils.exe setup but many times also how to work on command line.
Please install winutils.exe as follows:
Run cmd as administrator
Download winutils.exe binary from https://github.com/steveloughran/winutils repository (use hadoop-2.7.1 for Spark 2)
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin
Set HADOOP_HOME to reflect the directory with winutils.exe (without bin), e.g. set HADOOP_HOME=c:\hadoop
Set PATH environment variable to include %HADOOP_HOME%\bin
Create c:\tmp\hive directory
Execute winutils.exe chmod -R 777 \tmp\hive
Open spark-shell and run spark.range(1).show to see a one-row dataset.

spark-shell error on Windows - can it be ignored if not using hadoop?

I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors?
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
tl;dr You'd rather not.
Well, it may be possible, but given you've just started your journey to Spark's land the efforts would not pay off.
Windows has never been a developer-friendly OS to me and whenever I teach people Spark and they use Windows I just take it as granted that we'll have to go through the winutils.exe setup but many times also how to work on command line.
Please install winutils.exe as follows:
Run cmd as administrator
Download winutils.exe binary from https://github.com/steveloughran/winutils repository (use hadoop-2.7.1 for Spark 2)
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin
Set HADOOP_HOME to reflect the directory with winutils.exe (without bin), e.g. set HADOOP_HOME=c:\hadoop
Set PATH environment variable to include %HADOOP_HOME%\bin
Create c:\tmp\hive directory
Execute winutils.exe chmod -R 777 \tmp\hive
Open spark-shell and run spark.range(1).show to see a one-row dataset.

Resources