Unable to open ZFS pool because of incorrect pool configuration on Linux - linux

After rebooting my ZFS pool was unable to open my main pool. The exact error that I'm getting is:
"The pool metadata is corrupted and the pool cannot be opened"
When I checked the zpool configuration using zpool status (from the recovery console), the configuration that it displayed was all wrong. The configuration listed several drives that I had just moved to other drives.
Currently the output of zpool status looks like this:
pool: pool
state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from
a backup source.
see: http://zfsonlinux.org/msg/ZFS-8000-72
scan: resilvered 511G in 12h39m with 0 erors on Sat Mar 14 06:14:34 2015
config:
NAME STATE READ WRITE CKSUM
pool FAULTED 0 0 1 corrupted data
raidz1-0 ONLINE 0 0 8
wwn-0x50014ee05943ce36-part4 ONLINE 0 0 0
wwn-0x50014ee05943ce36-part5 ONLINE 0 0 1
wwn-0x50014ee05943ce36-part6 ONLINE 0 0 0
wwn-0x50014ee05943ce36-part7 ONLINE 0 0 0
wwn-0x50014ee05943ce36-part8 ONLINE 0 0 1
My question is, why does the configuration suddenly revert back to an old state after rebooting? (I checked with zpool status before rebooting and everything was fine and reported no errors). And how can I tell ZFS to change the configuration so that I can open the pool and get my data back?
I'm running Fedora 20, kernel 3.18.7-100

Related

YugaByte DB services gets down when any of the yb-tserver stops working

I am new to Yugabyte DB i have 6 pods given below names and states . One pod yb-tserver-3 is in CrashLoopBackOff state .Now i am not able to connect to my DB through DBeaver as i am getting error1 :- FATAL: Remote error: Service unavailable (yb/rpc/service_pool.cc:223): OpenTable request on yb.tserver.PgClientService from [xxx.xxx.xxx.xxx.xxx]:xxxx dropped due to backpressure
error2 :- connection timeout
yb-master-0 2/2 Running 0 24h
yb-master-1 2/2 Running 0 24h
yb-master-2 2/2 Running 0 23h
yb-tserver-0 2/2 Running 230 14d
yb-tserver-1 2/2 Running 0 25h
yb-tserver-2 2/2 Running 0 23h
yb-tserver-3 1/2 CrashLoopBackOff 4 6m33s
Now my question is if my one of the yb-tserver is down my yugabyte db services should be up and running but here my db is down and rejecting application connectivity and Dbeaver connectivity . How can i resolve this issue i have seen many times if one tablet servers stops working all connectivity gets lost.
please help me out.

ORA-01034: ORACLE not available ORA-27101: shared memory realm does not exist Linux-x86_64 Error: 2: No such file or directory

I am running Oracle 11g on Linux server and one the below Database issues occurred suddenly (every 2 weeks or 3 weeks sometimes):
Some times:
ORA-01034: ORACLE not available ORA-27102: out of memory Linux-x86_64 Error: 12: Cannot allocate memory Additional information: 1 Additional information: 163844 Additional information: 8
And last time:
ORA-01034: ORACLE not available ORA-27101: shared memory realm does not exist Linux-x86_64 Error: 2: No such file or directory
When I tried to startup database after setting up SID but I had the below error:
SQL> startup
ORA-00845: MEMORY_TARGET not supported on this system
I rebooted the server then everything is OK
My page size: 4096
kernel.shmall = 4294967296
How can I prevent these issues from happening again? should I update anything in Oracle memory settings?
Make sure your /dev/shm allocation is greater than what you have set for MEMORY_MAX_TARGET
Example fix for a memory allocation of 4Gb:
mount -o remount,size=4096m /dev/shm
Entry for /etc/fstab file to make the change permanent
tmpfs /dev/shm tmpfs size=4096m 0 0
Also see Oracle support: Doc ID 1399209.1 - ORA-00845 - Which value for /dev/shm is needed to startup database without ORA-00845
See, this is what worked for me. My ORACLE_SID, ORACLE_HOME etc., were just fine.
Restart the listener - lsnrctl start
sqlplus /nolog
connect /as sysdba
startup

Creating zfs zpool on initiated iSCSI disk on FreeBSD

I have properly connected an iSCSI target to my FreeBSD host using iscsictl. This new device shows up as da7. The disk shows up with:
geom disk list
as
Geom name: da7
Providers:
1. Name: da7
Mediasize: 4294967296000 (3.9T)
Sectorsize: 512
Stripesize: 8192
Stripeoffset: 0
Mode: r0w0e0
descr: SYNOLOGY iSCSI Storage
lunname: SYNOLOGYiSCSI Storage:44281bed-ce3d-4a9f-b95e-c89b6c74c345
lunid: 600140544281beddce3dd4a9fdb95edc
ident: 44281bed-ce3d-4a9f-b95e-c89b6c74c345
rotationrate: unknown
fwsectors: 63
fwheads: 255
I wanted to create a new ZFS zpool on this single disk with the command:
zpool create backuppool /dev/da7
The zpool command will now utilise a lot of cpu, but newer finishs. (Let it run for 2h).
If I create an ufs filesystem on the properly partitioned disk, the process is extremly fast. Also if I create a pool on a different raw disks, zpool finishs within seconds.
After some research I could not find any information if creating a zpool on a iSCSI target is allowed or not. Does anyone get this working?
Tested on: FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06:12:40 UTC 2017

Node state=down with TORQUE v6.1.0 on a Workstation

I was installing Torque 6.1.0 on a Ubuntu 16.04 Workstation, but the
installation doesn't seem to recognize how many cores and threads the
machine has. The only node I set up showed a status of "state=down" and
any job would trigger an error saying "not enough of the right type
of nodes". In fact, the workstation has 56 threads or 28 physical cores
on 2 processors, and I only want to use 54 threads or 27 physical cores
for the shared computing jobs. I realized that this might be related to the configuration of cgroup or NUMA starting from Torque V6.0 which I am not if I was doing the right thing while installing. I indeed had the cgroup enabled, but not sure if I also need to configure NUMA-aware function to be enabled as well. Below are some outputs of current configs. What should I do? Thanks.
$ pbsnodes
node1
state = down
power_state = Running
np = 54
ntype = cluster
mom_service_port = 15002
mom_manager_port = 15003
total_sockets = 0
total_numa_nodes = 0
total_cores = 0
total_threads = 0
dedicated_sockets = 0
dedicated_numa_nodes = 0
dedicated_cores = 0
dedicated_threads = 0
$ lssubsys -am
cpuset /sys/fs/cgroup/cpuset
cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
blkio /sys/fs/cgroup/blkio
memory /sys/fs/cgroup/memory
devices /sys/fs/cgroup/devices
freezer /sys/fs/cgroup/freezer
net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio
perf_event /sys/fs/cgroup/perf_event
hugetlb /sys/fs/cgroup/hugetlb
pids /sys/fs/cgroup/pids
There is also a fishy part that it seems the server cannot see the node I defined already on the server's configure file. This can be seen on the /var/spool/torque/server_logs log file:
12/27/2016 15:48:33.147;01;PBS_Server.2692;Svr;PBS_Server;LOG_ERROR::get_node_from_str, Node node1 is reporting on node NapaValley, which pbs_server doesn't know about
12/27/2016 15:49:18.232;01;PBS_Server.2692;Svr;PBS_Server;LOG_ERROR::get_node_from_str, Node node1 is reporting on node NapaValley, which pbs_server doesn't know about
12/27/2016 15:49:25.491;08;PBS_Server.2696;Job;0.NapaValley;Job deleted at request of cquic#localhost
12/27/2016 15:49:27.023;08;PBS_Server.2657;Job;0.NapaValley;on_job_exit valid pjob: 0.NapaValley (substate=59)
12/27/2016 15:49:32.996;256;PBS_Server.2657;Job;0.NapaValley;dequeuing from batch, state COMPLETE
12/27/2016 15:49:59.722;256;PBS_Server.2696;Job;1.NapaValley;enqueuing into batch, state 1 hop 1
12/27/2016 15:49:59.722;08;PBS_Server.2696;Job;perform_commit_work;job_id: 1.NapaValley
12/27/2016 15:49:59.722;02;PBS_Server.2696;node;close_conn;Closing connection 9 and calling its accompanying function on close
12/27/2016 15:49:59.795;64;PBS_Server.2692;Req;node_spec;job allocation request exceeds currently available cluster nodes, 1 requested, 0 available
12/27/2016 15:49:59.796;08;PBS_Server.2692;Job;1.NapaValley;Job Modified at request of root#localhost
12/27/2016 15:50:03.312;01;PBS_Server.2696;Svr;PBS_Server;LOG_ERROR::get_node_from_str, Node node1 is reporting on node NapaValley, which pbs_server doesn't know about
On my /etc/hosts, I have
127.0.0.1 localhost node1
127.0.0.1 NapaValley
PS: I have tried to mount cpu and other modules to /var/spool/torque/cgroup directories, but lssubsys -am still showed the same information as above. I assume they should have been mounted?
A node will report to the server with a name returned by the gethostbyname call. Based on the log lines you posted, the server and the node don't agree on that name. You can have pbs_mom return a different name by starting it with the -H option:
http://docs.adaptivecomputing.com/torque/6-0-2/adminGuide/help.htm#topics/torque/commands/pbs_mom.htm#-h
"-H hostname Sets the MOM's hostname. This can be useful on multi-homed networks."
This is equivalent to setting $mom_host node1 in /var/spool/torque/mom_priv/config.

curl error 23 using vagrant on OSX 10.10.5

I'm using a Udacity class on linux shell commands. I'm using OSX 10.10.5. and I installed Ubuntu from Virtual Box (VirtualBox 5.0.20 for OS X hosts amd64 from xxxs://www.virtualbox.org/wiki/Downloads as instructed.)
It uses this VM of Ubuntu, and Vagrant (from xxxs://releases.hashicorp.com/vagrant/1.8.1/vagrant_1.8.1.dmg) to connect that terminal to the VM.
Using this VM is for file consistency. Commands build on each other in the class.
One task (which is minor...and not graded) is to run the following command
curl xxx://udacity.github.io/ud595-shell/stuff.zip -o things.zip
[I can't post more than one link due to low reputation the xxx is http above.]
This command should hit the 'net and download a zip file named "things.zip". This fails for me, giving the below:
vagrant#vagrant-ubuntu-trusty-64:/$ curl http://udacity.github.io/ud595-shell/stuff.zip -o things.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Warning: Failed to create the file things.zip: Permission denied
0 144k 0 796 0 0 3241 0 0:00:45 --:--:-- 0:00:45 3235
curl: (23) Failed writing body (0 != 796)
vagrant#vagrant-ubuntu-trusty-64:/$
So I get error 23 and am not sure why. (Googling is failing to answer this.) I'm guessing there is a permission error but not sure where to start.
Your missing permissions from the directory you're in when downloading the file. You can check this by changing to a directory like /tmp and trying it there.

Resources