I am configuring my zookeeper nodes and when I restart my machine all the nodes are gone. I am using the command "create /[node_name] [data]"
If you're using the default zoo.cfg, your zookeeper data directory might be in the tmp directory, which is getting cleared on boot. Try setting it to something else, like /var/lib/zookeeper.
Also ensure you are creating your nodes as CreateMode.PERMANENT
EPHEMERAL nodes will disappear.
Did you figure this out? If not, one more WAG:
The default zoo.cfg uses a relative path for the dataDir. So one time if you start ZK from within the bin dir the data directory will be one place, but if you start it from the directory above (via ./bin/...) then it will be a directory higher. I would always use an absolute path there...
Related
I need to export memory dump from Aks Cluster and save it in some location
How can I do it? Is easy to export to a storage account? Exist another solution? Can someone give me an step y step?
EDIT: the previous answer was wrong, I didn't paid attention you needed a dump. You'll actually will need to get it from Boot Diagnostic or some command line:
https://learn.microsoft.com/en-us/azure/virtual-machines/troubleshooting/boot-diagnostics#enable-boot-diagnostics-on-existing-virtual-machine
This question is quite old, but let me nevertheless share how I realized it:
Linux has an internal setting called RLIMIT_CORE which limits the size of the core dump you'll receive when your application crashes - this is what you find quite quickly.
Next, you have to define the location of where core files are saved, which is done in the file /proc/sys/kernel/core_pattern. The given path can either be a relative file name (saved next to the binary which crashed), an absolute path (absolute to the mounted namespace) or - here is where it gets interesting - a pipe followed by an absolute path to an executable (application or script). This script will (according to the docs - see headline Piping core dumps to a program) be started as user and group root - but furthermore, it will (according to this post in the Linux mailing list) also be executed in the global namespace - in other words, outside of the container.
If you are like me, and you do not have access to the image used for new nodes on your AKS cluster, you want to set these values using DaemonSets, a pod which runs once on every node.
Armed with all this knowledge, you can do the following:
Create a DaemonSet - a pod running on every machine performing the initial setup.
This DaemonSet will run as a privileged container to allow it to switch to the root namespace.
After having switched namespaces successfully, it can change the value of /proc/sys/kernel/core_pattern.
The value should be something like |/bin/dd of=/core/%h.%e.%p.%t (dd will take the stdin, the core file, and save it to the location defined by the parameter of). Core files will now be saved at /core/. The name of the file can be explained by the variables found in the docs for core files.
After knowing that the files will be saved to /core/ of the root namespace, we can mount our storage there - in my case Azure File Storage. Here's a tutorial of how to mount AzureFileStorage.
Pods have the RestartPolicy set to Always. Since the job of your pod is done, and you don't want it to restart automatically, let it remain running using sleep infinity.
This writeup is almost a copy of what I discovered while contacting the support from Microsoft. Here's the thread in their forum, which contains an almost finished configuration for a DaemonSet.
I'll leave some links here which I used during my research:
how to generate core file in docker container?
How to access docker host filesystem from privileged container
https://medium.com/#patnaikshekhar/initialize-your-aks-nodes-with-daemonsets-679fa81fd20e
Sidenote:
I could also just have mounted the AzureFileSystem into every container and set the value for /proc/sys/kernel/core_pattern to just /core/%h.%e.%p.%t but this would require me to mention the mount on every container. Going this way I could free the configuration of the pods of this administrative task and put it where it (in my opinion) belongs, to the initial machine setup.
/var/mqsi/components/broker_name is the default directory that the broker called broker_name stores its configuration information, as far as I am aware. Below this directory are the pid directories containing execution group information, stdout/stderr, as well as other things.
I don't know if it's relevant to the issue we're seeing, but for historic reasons we have this directory in a different location for our brokers: (/var/broker_name/data/mqsi/components/broker_name).
We are using IIB 9.0.0.7, and multi-instancing (so all the directories are on the same HNAS with different mounts). We have eight brokers, so /var/broker_name_1, /var/broker_name_2, etc. are each mount points; with directories /data/mqsi/components/broker_name_n below that.
Yesterday afternoon, the directory /var/broker_name_n/data/mqsi/components/broker_name_n disappeared from the file system for two of the brokers. The directory /var/broker_name_n/data/mqsi/components still exists; but the broker_name_n directory below components disappeared.
Because the processes are running in memory, the broker and applications on the execution groups carried on working until we restarted one of the brokers; at which point all of the execution groups disappeared. The other broker was still running, so I looked and saw that the directory was missing. restarting the broker recreated it, but not the execution groups.
Is this behaviour something that anyone has seen before? Could this be caused by something in IIB; or is it likely to be something on the system that caused this to happen? It's weird that it was that specific directory on both of the servers, if it's a file-system issue.
in /var/log/messages, we don't see any issues; except for errors because different processes can't find the files they need - e.g. BIP3108S: Unable to initialize the listener environment. Exception text getListenerParametersFromFile: java.io.FileNotFoundException: /var/broker_name_1/data/mqsi/components/broker_name_1/config/wsplugin6.conf (No such file or directory)
Is there any way to recover from this in IIB without reinstalling everything?
I recently stumbled across the fact that on shutdown/reboot any script in /usr/lib/systemd/system-shutdown will get executed before the shutdown starts.
Paraphrasing - https://www.freedesktop.org/software/systemd/man/systemd-halt.service.html
With the /usr filesystem being read only on CoreOS I cannot put any of my shutdown scripts in /usr/lib/systemd/system-shutdown. I'm hoping someone more knowledgeable about CoreOS and systemd knows an alternate directory path on CoreOS nodes that would give me the same results. Or a configuration that I can adjust to point the directory to /etc/systemd/system-shutdown or something else.
Optionally any pointers on creating a custom service that does the same thing as systemd-shutdown.
My use case is that I have a few scripts that I want to execute when a node shutsdown. For example remove the node from the monitoring system, unschedule the node in kubernetes and drain any running pods while allowing in flight transactions to finish.
this question might be a silly one but since i am new in hadoop and there are very few material available online which can be used as a reference point so i thought this might be the best place to ask this question .
i have successfully configured few computers in multi node configuration. during the setup process i have to change many hadoop file .now i am wondering can i use every single computer as an single node configuration with out changing any settings or hadoop file ?
You can make your each node as separate instance. But you have to modify the configuration files surely and restart all the instances.
You can do that
Follow below steps
Remove IP or Hostname from masters file
Remove IP's or hostname's from slaves file
change fs.defaultFS property IP address in core-site.xml
As well as Resource Manager IP
I am a newbie. We have setup solr environment and we see that in nutch we are facing an issue. Disk space is being 100% utilized. When we debug it we see that the jobcache in the below location is utilizing more space (70% appx.).
"/tmp/hadoop-root/mapred/local/taskTracker/root/jobcache/".
I have searched many forums to understand what exactly does this jobcache folder contains.
Can anyone help me in understanding what does this jobcache folder contains and how can I restrict this tmp folder to not to utilize the space.
What effect will it have if I remove the jobcache folder and again create it by using mkdir command?
Thanks in advance.
The directory name you mentioned is /tmp/hadoop-root/mapred/local/taskTracker/root/jobcache/.
This directory is used by the TaskTracker (slave) daemons to localize job files when the tasks
are run on the slaves. When a job completes, the directories under the jobCache must get automatically cleaned up.
This email chain http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3C26850_1357828735_0MGE0023YZCTOO30_99DD75DC8938B743BBBC2CA54F7224A706D2E1AF#NYSGMBXB06.a.wcmc-ad.net%3E discussed a similar problem.