Yugabyte Backup Completed with Error but Backup is completed - yugabytedb

Hi I am getting error below when I triggered Yugabyte Backup in parallel. I also see the backup marked as completed with location where it is stored. Does this mean backup task failed in UI but the backup succeeded?
Failed to execute task java.util.concurrent.FutureTask#2e4aaa7d, hit
error java.lang.RuntimeException:
UniverseUpdateSucceeded(d0b90430-25e2-4d92-954a-99eeddc875ea) failed
with exception UserUniverse d0b90430-25e2-4d92-954a-99eeddc875ea is
not being edited.

Related

GitLab CI Error Job failed (system failure)

Anyone has any idea how I can correct this error? Seems templates on GitLab don't work.
ERROR: Job failed (system failure): Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/docker/volumes/runner-mh5zgyxe-project-119-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8/_data\\\" to rootfs \\\"/var/lib/docker/overlay2/19e6d863d0d048b030aa4702ee47667085a8e571c266b9623294c4ac68e7c8da/merged\\\" at \\\"/var/lib/docker/overlay2/19e6d863d0d048b030aa4702ee47667085a8e571c266b9623294c4ac68e7c8da/merged/builds\\\" caused \\\"mkdir /var/lib/docker/overlay2/19e6d863d0d048b030aa4702ee47667085a8e571c266b9623294c4ac68e7c8da/merged/builds: no space left on device\\\"\"": unknown (docker.go:817:0s)

Dkron job failed to execute by the scheduled time

I'm using dkron 0.10.4. I created the scheduled jobs for dkron and it was working fine previously. But suddenly the jobs are not executed on the scheduled time. It shows the output as
rpc error: code = Unknown desc = exit status 1
and status "False"
You could open dkron.yml in /etc/dkron and add debug line like
log-level: debug
Run dkron agent :
# dkron agent --config /etc/dkron/dkron.yml
You should see debug lines with more informations about your rpc error code.

Container exited with a non-zero exit code 50 for saving Spark Dataframe to hdfs

I am running a small script on Pyspark, where I am extracting some data from hbase tables and creating a Pyspark dataframe. I am trying to save the dataframe back on to local hdfs, and am running into an exit 50 error.
I am able to do the same operation successfully for comparatively smaller dataframes, but am unable to do it for large files.
I can gladly share any code snippets, and would appreciate any help. Also, the entire environment from SparkUI can be shared as a screenshot.
This is the config for my Spark(2.0.0) Properties (shown here as a dictionary). Deployed on yarn-client.
configuration={'spark.executor.memory':'4g',
'spark.executor.instances':'32',
'spark.driver.memory':'12g',
'spark.yarn.queue':'default'
}
After I obtain the dataframe, I am trying to save it as:
df.write.save('user//hdfs//test_df',format = 'com.databricks.spark.csv',mode = 'append')
The following error block keeps on repeating until the job fails. I believe it might be an OOM error, but I have tried by giving as many as 128 executors, each with 16GB memory, but to no avail.
Any workaround would be greatly appreciated.
Container exited with a non-zero exit code 50
17/09/25 15:19:35 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 64, fslhdppdata2611.imfs.micron.com): ExecutorLostFailure (executor 42 exited caused by one of the running tasks) Reason: Container marked as failed: container_e37_1502313369058_6420779_01_000043 on host: fslhdppdata2611.imfs.micron.com. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_e37_1502313369058_6420779_01_000043
Exit code: 50
Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : run as user is hdfsprod
main : requested yarn user is hdfsprod
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /opt/hadoop/data/03/hadoop/yarn/local/nmPrivate/application_1502313369058_6420779/container_e37_1502313369058_6420779_01_000043/container_e37_1502313369058_6420779_01_000043.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...

Why is Spark application's final status FAILED while it finishes successfully?

My application Spark 2.0.0 runs on yarn 2.7.2. It finishes successfully but Yarn marks it as failed with error:
Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
I see no errors on executors nor driver and application writes the data it is supposed to.
This seems to be caused by calling System.exit( 0 ) specifically in my code. After removing it the problem is gone

NEO4J local server does not start

I am running Linux in VirtualBox and am having an issue that I did not encounter on my machine with Linux as the primary OS.
When launching the neo4j service through sudo ./neo4j start in /opt/neo4j-community-2.3.1/bin I get a timeout with the message Failed to start within 120 seconds. Neo4j Server may have failed to start, please check the logs
my log from /opt/neo4j-community-2.3.1/data/graph.db/messages.log says:
http://pastebin.com/wUA715QQ
and data/log/console.log says:
2016-01-06 02:07:03.404+0100 INFO Successfully started database
2016-01-06 02:07:03.603+0100 INFO Successfully stopped database
2016-01-06 02:07:03.604+0100 INFO Successfully shutdown Neo4j Server
2016-01-06 02:07:03.608+0100 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.security.auth.FileUserRepository#9ab182' was successfully initialized, but failed to start. Please see attached cause exception. Starting Neo4j failed: Component 'org.neo4j.server.security.auth.FileUserRepository#9ab182' was successfully initialized, but failed to start. Please see attached cause exception.
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.security.auth.FileUserRepository#9ab182' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:67)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:234)
at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:97)
at org.neo4j.server.CommunityBootstrapper.start(CommunityBootstrapper.java:48)
at org.neo4j.server.CommunityBootstrapper.main(CommunityBootstrapper.java:35)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.security.auth.FileUserRepository#9ab182' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:462)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:194)
... 3 more
Caused by: java.nio.file.AccessDeniedException: /opt/neo4j-community-2.3.1/data/dbms/auth
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at java.nio.file.Files.readAllBytes(Files.java:3152)
at org.neo4j.server.security.auth.FileUserRepository.loadUsersFromFile(FileUserRepository.java:208)
at org.neo4j.server.security.auth.FileUserRepository.start(FileUserRepository.java:73)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Any idea why the server won't start?
Check the permissions on /opt/neo4j-community-2.3.1/data/dbms/auth
See the line that says:
Caused by: java.nio.file.AccessDeniedException: /opt/neo4j-community-2.3.1/data/dbms/auth

Resources