RHEL: cgroup change of group failed - rhel

When I run the following command, I get cgroup change of group failed:
cgexec --sticky -g *:/throttle some_task
Cgroup throttle is defined in cgconfig.conf, which looks like this:
# Configuration file generated by cgsnapshot
mount {
cpuset = /cgroup/cpuset;
cpu = /cgroup/cpu;
cpuacct = /cgroup/cpuacct;
memory = /cgroup/memory;
devices = /cgroup/devices;
freezer = /cgroup/freezer;
net_cls = /cgroup/net_cls;
blkio = /cgroup/blkio;
}
group throttle {
cpu {
cpu.rt_period_us="1000000";
cpu.rt_runtime_us="0";
cpu.cfs_period_us="1000000";
cpu.cfs_quota_us="500000";
cpu.shares="1024";
}
}
group throttle {
memory {
memory.memsw.failcnt="0";
memory.limit_in_bytes="1073741824";
memory.memsw.max_usage_in_bytes="0";
memory.move_charge_at_immigrate="0";
memory.swappiness="60";
memory.use_hierarchy="0";
memory.failcnt="0";
memory.soft_limit_in_bytes="134217728";
memory.memsw.limit_in_bytes="1073741824";
memory.max_usage_in_bytes="0";
}
}
group throttle {
blkio {
blkio.throttle.write_iops_device="8:0 10";
blkio.throttle.read_iops_device="8:0 10";
blkio.throttle.write_bps_device="";
blkio.throttle.read_bps_device="";
blkio.weight="500";
blkio.weight_device="";
}
}
I have searched far and wide and haven't a clue how to start trouble shooting this. This seems to be commonly associated with incorrect permissions. However, I don't define permissions (the documentation for cgroups says that this is optional). I'm running the process as root.

Figured it out. For some reason, cgexec on my system is not liking the wildcard (*) for the controller. When I listed controllers by name, it worked:
cgexec --sticky -g "cpu,memory,blkio":/throttle some_task
The manpage for cgexec on my system lists *:<group_name> as valid syntax, however, so I'm not sure what exactly is going on. Either way, it's working correctly when the controllers are specified.

Related

How to script stap (systemtap) to see if some process has called specific kernel function?

Using stap, I can write *.stp file to
Either track a process's action like:
probe process("mytest").begin
{
printf("Caught mytest process")
}
Or to track if a kernel function is called by any process:
probe kernel.function("do_exit").call #all processes
{
printf("called kernel/exit.c: do_exit\n")
}
But my requirement is: to track the kernel function call from specific process names, like tracking "sys_open" called by "mytest" processes.
How to write this .stp statement/function?
Thanks!
I found a way to do that: use a variable indicating the program name
global prog_name = "mytest";
probe kernel.function("do_exit").call
{
if(execname() == progname){
printf("called kernel/exit.c: do_exit\n");
}
}

Kernel API to get Physical RAM Offset

I'm writing a device driver (for Linux kernel 2.6.x) that interacts directly with physical RAM using physical addresses. For my device's memory layout (according to the output of cat /proc/iomem), System RAM begins at physical address 0x80000000; however, this code may run on other devices with different memory layouts so I don't want to hard-code that offset.
Is there a function, macro, or constant which I can use from within my device driver that gives me the physical address of the first byte of System RAM?
Is there a function, macro, or constant which I can use from within my device driver that gives me the physical address of the first byte of System RAM?
It doesn't matter, because you're asking an XY question.
You should not be looking for or trying to use the "first byte of System RAM" in a device driver.
The driver only needs knowledge of the address (and length) of its register block (that is what this "memory" is for, isn't it?).
In 2.6 kernels (i.e. before Device Tree), this information was typically passed to drivers through struct resource and struct platform_device definitions in a board_devices.c file.
The IORESOURCE_MEM property in the struct resource is the mechanism to pass the device's memory block start and end addresses to the device driver.
The start address is typically hardcoded, and taken straight from the SoC datasheet or the board's memory map.
If you change the SoC, then you need new board file(s).
As an example, here's code from arch/arm/mach-at91/at91rm9200_devices.c to configure and setup the MMC devices for a eval board (AT91RM9200_BASE_MCI is the physical memory address of this device's register block):
#if defined(CONFIG_MMC_AT91) || defined(CONFIG_MMC_AT91_MODULE)
static u64 mmc_dmamask = DMA_BIT_MASK(32);
static struct at91_mmc_data mmc_data;
static struct resource mmc_resources[] = {
[0] = {
.start = AT91RM9200_BASE_MCI,
.end = AT91RM9200_BASE_MCI + SZ_16K - 1,
.flags = IORESOURCE_MEM,
},
[1] = {
.start = AT91RM9200_ID_MCI,
.end = AT91RM9200_ID_MCI,
.flags = IORESOURCE_IRQ,
},
};
static struct platform_device at91rm9200_mmc_device = {
.name = "at91_mci",
.id = -1,
.dev = {
.dma_mask = &mmc_dmamask,
.coherent_dma_mask = DMA_BIT_MASK(32),
.platform_data = &mmc_data,
},
.resource = mmc_resources,
.num_resources = ARRAY_SIZE(mmc_resources),
};
void __init at91_add_device_mmc(short mmc_id, struct at91_mmc_data *data)
{
if (!data)
return;
/* input/irq */
if (data->det_pin) {
at91_set_gpio_input(data->det_pin, 1);
at91_set_deglitch(data->det_pin, 1);
}
if (data->wp_pin)
at91_set_gpio_input(data->wp_pin, 1);
if (data->vcc_pin)
at91_set_gpio_output(data->vcc_pin, 0);
/* CLK */
at91_set_A_periph(AT91_PIN_PA27, 0);
if (data->slot_b) {
/* CMD */
at91_set_B_periph(AT91_PIN_PA8, 1);
/* DAT0, maybe DAT1..DAT3 */
at91_set_B_periph(AT91_PIN_PA9, 1);
if (data->wire4) {
at91_set_B_periph(AT91_PIN_PA10, 1);
at91_set_B_periph(AT91_PIN_PA11, 1);
at91_set_B_periph(AT91_PIN_PA12, 1);
}
} else {
/* CMD */
at91_set_A_periph(AT91_PIN_PA28, 1);
/* DAT0, maybe DAT1..DAT3 */
at91_set_A_periph(AT91_PIN_PA29, 1);
if (data->wire4) {
at91_set_B_periph(AT91_PIN_PB3, 1);
at91_set_B_periph(AT91_PIN_PB4, 1);
at91_set_B_periph(AT91_PIN_PB5, 1);
}
}
mmc_data = *data;
platform_device_register(&at91rm9200_mmc_device);
}
#else
void __init at91_add_device_mmc(short mmc_id, struct at91_mmc_data *data) {}
#endif
ADDENDUM
i'm still not seeing how this is an xy question.
I consider it an XY question because:
You conflate "System RAM" with physical memory address space.
"RAM" would be actual (readable/writable) memory that exists in the address space.
"System memory" is the RAM that the Linux kernel manages (refer to your previous question).
Peripherals can have registers and/or device memory in (physical) memory address space, but this should not be called "System RAM".
You have not provided any background on how or why your driver "interacts directly with physical RAM using physical addresses." in a manner that is different from other Linux drivers.
You presume that a certain function is the solution for your driver, but you don't know the name of that function. That's a prototype for an XY question.
can't i call a function like get_platform_device (which i just made up) to get the struct platform_device and then find the struct resource that represents System RAM?
The device driver would call platform_get_resource() (in its probe function) to retrieve its struct resource that was defined in the board file.
To continue the example started above, the driver's probe routune has:
static int __init at91_mci_probe(struct platform_device *pdev)
{
struct mmc_host *mmc;
struct at91mci_host *host;
struct resource *res;
int ret;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res)
return -ENXIO;
if (!request_mem_region(res->start, resource_size(res), DRIVER_NAME))
return -EBUSY;
...
/*
* Map I/O region
*/
host->baseaddr = ioremap(res->start, resource_size(res));
if (!host->baseaddr) {
ret = -ENOMEM;
goto fail1;
}
that would allow me to write code that can always access the nth byte of RAM, without assumptions of how RAM is arranged in relation to other parts of memory.
That reads like a security hole or a potential bug.
I challenge you to to find a driver in the mainline Linux kernel that uses the "physical address of the first byte of System RAM".
Your title is "Kernel API to get Physical RAM Offset".
The API you are looking would seem to be the struct resource.
What you want to do seems to fly in the face of Linux kernel conventions. For the integrity and security of the system, drivers do not try to access any/every part of memory.
The driver will request and can be given exclusive access to the address space of its registers and/or device memory.
All RAM under kernel management is only accessed through well-defined conventions, such as buffer addresses for copy_to_user() or the DMA API.
A device driver simply does not have free reign to access any part of memory it chooses.
Once a driver is started by the kernel, there is absolutely no way it can disregard "assumptions of how RAM is arranged".
RAM memory mapping is based identical to each SOC or processor. Vendor usually provide their memory mapping related documents to the user.
Probably you need to refer your processor datasheet related document.
As you said memory is hard-coded. In most cases memory mapping is hard-coded over device-tree or in SDRAM driver itself.
Maybe you could look into memblock struct and memblock_start_of_DRAM().
/sys/kernel/debug/memblock/memory represent memory banks.
0: 0x0000000940000000..0x0000000957bb5fff
RAM bank 0: 0x0000000940000000 0x0000000017bb6000
1: 0x0000000980000000..0x00000009ffffffff
RAM bank 1: 0x0000000980000000 0x0000000080000000

Spark/Gradle -- Getting IP Address in build.gradle to use for starting master and workers

I understand at a basic level the various moving parts of build.gradle build scripts but am having trouble tying it all together.
In Apache Spark standalone mode, just trying to start a master and worker on the same box from build.gradle. (Later will extend with call with $SPARK_HOME/sbin/start-slaves with the proper argument for masterIP.)
Question: How can I assign my IP address to a variable in Groovy/build.gradle so I can pass it to a command in an Exec task? We want this to run on a couple different development machines.
We have a (I think fairly standard) /etc/hosts config with the FQDN and hostname assigned to 127.0.1.1. The driver gets around this OK but starting master and slaves with hostnames is not an option, I need the ip address.
I am trying:
task getMasterIP (type: Exec){
// declare script scope variable using no def or
executable "hostname"
args += "-I"
// need results of hostname call assigned to script scope variable
sparkMasterIP = <resultsOfHostnameCall>
}
// added this because startSlave stops if Master is already running
task startSlaveOnly(dependsOn:'getMasterIP', type: Exec){
executable "/usr/local/spark/sbin/start-slave.sh"
args += "spark://$sparkMasterIP:7077"
doLast {
println "enslaved"
}
}
// now make startSlave call startSlaveOnly after the initial startMaster
task startSlave(dependsOn:'startMaster', type: Exec) {
finalizedBy 'startSlaveOnly'
}
When I try something like suggested in the docs for Exec for Groovy calls:
task getMasterIP (type: Exec){
// declare script scope variable using no def or
sparkMasterIP = executable "hostname"
args += "-I"
}
I get a warning that executable is not recognized.
The " for a little more background on what I am thinking" section, not the main question.
Googling "build.gradle script scope variables" and looking at the first two results, in the basic docs I only see one type of variable and ext properties to be used.
16.4. Declaring variables -- There are two kinds of variables that can be declared in a build script: local variables and extra properties.
But in this other Gradle doc Appendix B. Potential Traps I am seeing two kinds of variables scopes aside from the ext properties:
For Gradle users it is important to understand how Groovy deals with
script variables. Groovy has two types of script variables. One with a
local scope and one with a script-wide scope.
With this example usage:
String localScope1 = 'localScope1'
def localScope2 = 'localScope2'
scriptScope = 'scriptScope'
I am assuming I should be using script-scope variables with no "def" or type declaration.
To fetch local IPs:
// Return all IPv4 addresses
def getLocalIPv4() {
def ip4s = []
NetworkInterface.getNetworkInterfaces()
.findAll { it.isUp() && !it.isLoopback() && !it.isVirtual() }
.each {
it.getInetAddresses()
.findAll { !it.isLoopbackAddress() && it instanceof Inet4Address }
.each { ip4s << it.getHostAddress() }
}
return ip4s
}
// Optionally, return all IPv6 addresses
def getLocalIPv6() {
def ip6s = []
NetworkInterface.getNetworkInterfaces()
.findAll { it.isUp() && !it.isLoopback() && !it.isVirtual() }
.each {
it.getInetAddresses()
.findAll { !it.isLoopbackAddress() && it instanceof Inet6Address }
.each { ip6s << it.getHostAddress() }
}
return ip6s
}
task printIP() doLast {
println getLocalIPv4()
println getLocalIPv6()
}
The two functions above return a list of IPv4 or IPv6 addresses respectively. You might notice that I'm skipping all localhosts, interfaces that are not up, all loopbacks and virtual interfaces. If you want to use the first IPv4 address, you can use it elsewhere as:
getLocalIPv4()[0]
or in your case:
args += "spark://"+ getLocalIPv4()[0] + ":7077"
I found this post that appears to be a more straightforward way of doing this but it limited to Linux platforms, hostname -I doesn't work in Windows and maybe not all Linux distros?
getting hostname
assigning it to variable
using in a build.gradle
task
Here's the task I built as a result, the accepted answer is much better and more universal, this is just for another way of looking at it
task getMasterIP{
doLast {
new ByteArrayOutputStream().withStream { os ->
def result = exec {
executable = 'hostname'
args += '-I'
}
ext.ipAddress = os.toString()
}
}
}
RaGe's answer does a better job of looking at all interfaces on all platforms

How to overwrite default cgroup cgconfig.conf using cgconfig.d?

The default cgroup config file cgconfig.conf provided with the libcgroup conatains:
mount {
cpuset = /cgroup/cpuset;
cpu = /cgroup/cpu;
cpuacct = /cgroup/cpuacct;
memory = /cgroup/memory;
devices = /cgroup/devices;
freezer = /cgroup/freezer;
net_cls = /cgroup/net_cls;
blkio = /cgroup/blkio;
}
I want to use something like this:
mount {
cpuset = /cgroup/cpu_and_mem;
cpu = /cgroup/cpu_and_mem;
cpuacct = /cgroup/cpu_and_mem;
memory = /cgroup/cpu_and_mem;
}
group cpu_memory_high {
cpu {
cpu.shares = 800;
}
cpuset {
cpuset.cpus="0-6";
}
memory {
memory.limit_in_bytes = 5G;
}
}
group cpu_memory_low {
cpu {
cpu.share = 200;
}
cpuset {
cpuset.cpus="8"
}
memory {
memory.limit_in_bytes = 500M;
}
}
I don't want to overwrite cgconfig.conf, So I tried to use cgconfig.d with above settings in a new file abc.conf and placed it in cgconfig.d.
But these new settings didn't work for me.
Anyone have idea whats wrong with above config.
In order to work /etc/cgconfig.d/ directory you have to add:
$CGCONFIGPARSER_BIN -L /etc/cgconfig.d/
after:
$CGCONFIGPARSER_BIN -l $CONFIG_FILE
line in /etc/init.d/cgconfig file.
It works on Amazon Linux at least.
On Centos 7, it seems /etc/cgconfig.d was enabled by default, but it seems that is no longer the case on Centos 8. Here's how I fixed it:
Copy the default cgconfig unit file to cgconfigd.service:
sudo cp /usr/lib/systemd/system/cgconfig.service \
/etc/systemd/system/cgconfigd.service
Edit the file:
Change both occurrences of -l /etc/cgconfig.conf to -L /etc/cgconfig.d.
Under [Unit], add After=cgconfig.service
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable cgconfigd
sudo systemctl start cgconfigd
You have misunderstood how cgroups work. The mounts are arbitrary. Your group declarations create new directories in the mounts. There is never any need to combine subsystem directories and doing so will make it impossible to control tasks separately as they all use similar directory structures. It will more than likely cause cgroups to fail altogether.
The mounts you already have are fine, and the groups you set up will work as they are. You only need to reference the group name and the subsystem. Combining mounts will not make any difference.
See https://askubuntu.com/a/94743/170177

What is the valid address space for a user process? (OS X and Linux)

The mmap system call documentation says that the function will fail if:
MAP_FIXED was specified and the addr
argument was not page aligned, or part
of the desired address space resides
out of the valid address space for a
user process.
I can't find documentation anywhere saying what would be a valid address to map. (I'm interested in doing this on OS X and linux, ideally the same address would be valid for both...).
Linux kernel reserves part of virtual address space for itself to where userspace have (almost) no access and can't map anything. You're looking for what's called "userspace/kernelspace split".
On i386 arch default is 3G/1G one -- userspace gets lower 3 GB of virtual address space, kernel gets upper 1 GB, additionally there are 2G/2G and 1G/3G splits:
config PAGE_OFFSET
hex
default 0xB0000000 if VMSPLIT_3G_OPT
default 0x80000000 if VMSPLIT_2G
default 0x78000000 if VMSPLIT_2G_OPT
default 0x40000000 if VMSPLIT_1G
default 0xC0000000
depends on X86_32
On x86_64, userspace lives in lower half of (currently) 48-bit of virtual address space:
/*
* User space process size. 47bits minus one guard page.
*/
#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE)
This varies based on a number of factors, many of which aren't under your control. As adobriyan mentioned, depending on the OS, you have various fixed upper limits, beyond which kernel code and data lies. Usually this upper limit is at least 2GB on 32-bit OSes; some OSes provide additional address space. 64-bit OSes generally provide an upper limit controlled by the number of virtual address bits supported by your CPU (usually at least 40 bits worth of address space). However there are yet other factors beyond your control:
On recent version of linux, mmap mappings below the address configured in /proc/sys/vm/mmap_min_addr will be denied.
You cannot make mappings which overlap with any existing mappings. Since the dynamic linker is free to map anywhere that does not overlap with your executable's fixed sections, this means potentially any address may be denied.
The kernel may inject other additional mappings, such as the system call gate.
malloc may perform mmaps on its own, which are placed in a somewhat arbitrary location
As such, there is no way to absolutely guarentee that MAP_FIXED will succeed, and so it should normally be avoided.
The only place I've seen where MAP_FIXED is necessary is in the wine startup code, which reserves (using MAP_FIXED) all addresses above 2G, in order to avoid confusing windows code which assumes no mappings will ever show up with a negative address. This is, of course, a highly specialized use of the flag.
If you're trying to do this in order to avoid having to deal with offsets in shared memory, one option would be to wrap pointers in a class to automatically handle offsets:
template<typename T>
class offset_pointer {
private:
ptrdiff_t offset;
public:
typedef T value_type, *ptr_type;
typedef const T const_type, *const_ptr_type;
offset_ptr(T *p) { set(p); }
offset_ptr() { set(NULL); }
void set(T *p) {
if (p == NULL)
offset = 1;
else
offset = (char *)p - (char *)this;
}
T *get() {
if (offset == 1) return NULL;
return (T*)( (char *)this + offset );
}
const T *get() const { return const_cast<offset_pointer>(this)->get(); }
T &operator*() { return *get(); }
const T &operator*() const { return *get(); }
T *operator->() { return get(); }
const T *operator->() const { return get(); }
operator T*() { return get(); }
operator const T*() const { return get(); }
offset_pointer operator=(T *p) { set(p); return *this; }
offset_pointer operator=(const offset_pointer &other) {
offset = other.offset;
return *this;
}
};
Note: This is untested code, but should give you the basic idea.

Resources