We are trying to use slurm in our uni lab but we can't quite understand slurmUser behavior.
For instance:
If I run srun while I'm logged in as the user 'acnazarejr' (srun -n1 id -a), then I would expect something like this:
uid=80000001637(acnazarejr) gid=80000000253(domain user) groups=80000000253(domain user),1001(slurm)
But this is what I get:
uid=1001(slurm) gid=1001(slurm) groups=1001(slurm), 27(sudo), docker(999)
Even if run (srun --uid=80000001637 -n1 id -a) I get the same result. We are using LDAP across all nodes and 'slurm' user can't access the user's home folder, which is important to us.
Is this the expected behavior? I'm almost sure that in earlier tests I was getting my user as output instead of slurm, but I can't replicate it anymore.
Your slurm.conf probably contains
SlurmdUser=slurm
while it should be
SlurmdUser=root
The SlurmdUser is the user running the slurmd daemon, which must be root, or another account able to demote to the submitting user's account.
Not to be mixed up with SlurmUser, the user running the slurmctld daemon which should be a regular user, often named slurm.
Related
I have created a Cassandra database in DataStax Astra and am trying to load a CSV file using DSBulk in Windows. However, when I run the dsbulk load command, the operation never completes or fails. I receive no error message at all, and I have to manually terminate the operation after several minutes. I have tried to wait it out, and have let the operation run for 30 minutes or more with no success.
I know that a free tier of Astra might run slower, but wouldn't I see at least some indication that it is attempting to load data, even if slowly?
When I run the command, this is the output that is displayed and nothing further:
C:\Users\JT\Desktop\dsbulk-1.8.0\bin>dsbulk load -url test1.csv -k my_keyspace -t test_table -b "secure-connect-path.zip" -u my_user -p my_password -header true
Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
A cloud secure connect bundle was provided: ignoring all explicit contact points.
A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
Operation directory: C:\Users\JT\Desktop\dsbulk-1.8.0\bin\logs\LOAD_20210407-143635-875000
I know that DataStax recently changed Astra so that you need credentials from a generated Token to connect DSBulk, but I have a classic DB instance that won't accept those token credentials when entered in the dsbulk load command. So, I use my regular user/password.
When I check the DSBulk logs, the only text is the same output displayed in the console, which I have shown in the code block above.
If it means anything, I have the exact same issue when trying to run dsbulk Count operation.
I have the most recent JDK and have set both the JAVA_HOME and PATH variables.
I have also tried adding dsbulk/bin directory to my PATH variable and had no success with that either.
Do I need to adjust any settings in my Astra instance?
Lastly, is it possible that my basic laptop is simply not powerful enough for this operation or just running the operation crazy slow?
Any ideas or help is much appreciated!
I have the main server-'A' hosting the SLURM cluster. The set up is working fine as expected.
I wanted to know if there is a way to submit the jobs to that main server from another server- 'B' remotely and get the responses.
This situation arises because I don't want to give access to the terminal of the main server- 'A' to the users on 'B'.
I have gone through the documentation and FAQs, but unfortunately couldn't find the details.
If you install the Slurm client on Server B . Copy your slurm.conf to it and then ensure it has the correct authentication (i.e the correct Munge key) , it should work.
I'd like to do two things in sequence:
Submit a job with sbatch
Once the job has been allocated, retrieve the hostname of the allocated node and, using that name, execute a second command on the host (login) node.
Step 2 is the hard part. I suppose I could write a Python script that polls squeue. Is there a better way? Can I set up a callback that Slurm will execute once a job starts running?
(In case you're curious, my motivation is to launch a Jupyter notebook on a compute node and automatically set up ssh port forwarding as in this blog post.)
We have a SignalR push server using Mono/Owin on a Linux Debian server.
We perform a load test and we get a different behaviour according to how the push is started on systemd
Working:
ExecStart=/bin/su root -c '/usr/bin/mono --server mydaemon.exe -l:/var/run/mydaemon.pid'
Hanging after around 1k connections:
ExecStart=/usr/bin/mono --server mydaemon.exe -l:/var/run/mydaemon.pid
We may reproduce the different behaviour anytime: in the second case, the test client stay in SignalR negotiate call, without receiving any answer.
We actvated as well the export of the environment varables "max thread" for Mono for both case.
So the question is, what could be the difference in resource system usage/avaliability in these 2 cases?
In the systemd service definition, you can specify the limit for the number of open files, so if you add a line:
LimitNOFILE=65536
in the [Service] section of the service definition file, it should set the limit to that value, rather than the default which comes through systemd as 1024.
The systemd-system.conf file defines the parameters for defaults for the limits (e.g. DefaultLimitNOFILE), and the systemd.exec manual page defines the parameters that can be used to set overrides on the various limits.
Everytime we create a new server I have a bash script that asks the end-user a set of questions to help chef configure the custom server, his/her answer to those questions needs to be injected into chef so that I can use their responses within my chef script (to set the server "hostname" = "server1.stack.com", for instance). There is a json attribute when running chef-client I've read about that may be helpful but I'm not sure how that would work in our environment.
Note: We run chef-client on all of our systems every 15 minutes via cronjob to get updates.
Psuedocode:
echo -n "What is the server name?"
read hostname
chef-client -j {'hostname' => ENV['$hostname']}
Two issues, first is that -j takes a filename not raw JSON and second is that using -j will entirely override the node data coming from the server which also includes the run list and environment. If this is being done at system provisioning time you can definitely do stuff like this, see my AMI bootstrap script for an example. If this is done after initial provisioning, you are probably best off writing those responses to a file, and then reading that in from you Chef recipe code.
Passing raw json into chef-client is possible, but requires a little creativity. You simply do something like this:
echo '{"hostname": "$hostname"}' | chef-client -j /dev/stdin
The values in your json will be deep merged with the "normal" attributes stored in the chef-server. You can also include a run_list in your json, which will replace (not be merged) the run_list on the chef server.
You can see the run_list replacing the server run list here:
https://github.com/opscode/chef/blob/cbb9ae97e2d3d90b28764fbb23cb8eab4dda4ec8/lib/chef/node.rb#L327-L338
And you can see the deep merge of attributes here:
https://github.com/opscode/chef/blob/cbb9ae97e2d3d90b28764fbb23cb8eab4dda4ec8/lib/chef/node.rb#L305-L311
Also, any attributes you declare in your json will override the attributes already stored on the chef-server.