Explanation of fuzz and flat in input_absinfo struct in input.h - linux

I'm trying to tweak the sensitivity of a joystick which does not work correctly with SDL, using the EVIOCSABS call from input.h. I think that the fuzz and flat members of the input_absinfo struct affect the sensitivity of the axes, but after not a few shots in the dark I'm still feeling stumped as to how they work exactly. I am hoping someone can point me in the right direction.
Thank you for considering my question! Here's the code I have written in my Joystick class:
int Joystick::configure_absinfo(int axis, int fuzz, int flat)
{
struct input_absinfo jabsx;
int result_code = ioctl(joystick_fd, EVIOCGABS(axis), &jabsx);
if (result_code < 0)
{
perror("ioctl GABS failed");
}
else
{
jabsx.fuzz = fuzz;
jabsx.flat = flat;
result_code = ioctl(joystick_fd, EVIOCSABS(axis), &jabsx);
if (result_code < 0)
{
perror("ioctl SABS failed");
}
}
return result_code;
}

Regarding the fuzz value it seems like a value used for abs input devices. Looking at the documentation of input_absinfo in input.h
Link to input.h at lxr.linux.no
You can find that
fuzz: specifies fuzz value that is used to filter noise from the event stream.
Which means that the input system in linux will drop events generated by the device driver if the difference from the last value is lower than the fuzz. This is done in the input layer.

The flat value determines the size of the dead zone ([source])(https://wiki.archlinux.org/index.php/Gamepad#evdev_API_deadzones). fuzz is far harder to find, but the best I can find are these docs:
Filtering (fuzz value): Tiny changes are not reported to reduce noise
So it would seem that any changes less than fuzz are should be filtered out / ignored.

Related

Was: How does BPF calculate number of CPU for PERCPU_ARRAY?

I have encountered an interesting issue where a PERCPU_ARRAY created on one system with 2 processors creates an array with 2 per-CPU elements and on another system with 2 processors, an array with 128 per-CPU elements. The latter was rather unexpected to me!
The way I discovered this behavior is that a program that allocated an array for the number of CPUs (using get_nprocs_conf(3)) and then read in the PERCPU_ARRAY into it (using bpf_map_lookup_elem()) ended up writing past the end of the array and crashing.
I would like to find out what is the proper way to determine in a program that reads BPF maps the number of elements in a PERCPU_ARRAY used on a system.
Failing that, I think the second best approach is to pick a buffer for reading in that is "large enough." Here, the problem is similar: what is that number and is there way to learn it at runtime?
The question comes from reading the source of bpftool, which figures this out:
unsigned int get_possible_cpus(void)
{
int cpus = libbpf_num_possible_cpus();
if (cpus < 0) {
p_err("Can't get # of possible cpus: %s", strerror(-cpus));
exit(-1);
}
return cpus;
}
int libbpf_num_possible_cpus(void)
{
static const char *fcpu = "/sys/devices/system/cpu/possible";
static int cpus;
int err, n, i, tmp_cpus;
bool *mask;
/* ---8<--- snip */
}
So that's how they do it!

Is CGAL 2D Regularized Boolean Set-Operations lib thread safe?

I am currently using the library mentioned in the title, see
CGAL 2D-reg-bool-set-op-pol
The library provides types for polygons and polygon sets which are internally represented as so called arrangements.
My question is: How far is this library thread safe, that is, fit for parallel computation on its objects?
There could be several levels in which thread safety is guaranteed:
1) If I take an object from a library like an arrangement
Polygon_set_2 S;
I might be able to execute
Polygon_2 P;
S.join(P);
and
Polygon_2 Q;
S.join(Q);
in two different concurrent execution units/threads in parallel without harm and get the right result, as if I had done everything sequentially. That would be the highest degree of thread safety/possible parallelism.
2) In fact for me a much lesser degree would be enough. In that case S and P would be members of a class C so that two class instances have different S and P instances. Then I would like to compute (say) S.join(P) in parallel for a list of instances of the class C, say, by calling a suitable member function of C with std::async
Just to be complete, I insert here a bit of actual code from my project which gives more flesh to these terse descriptions.
// the following typedefs are more or less standard from the
// CGAL library examples.
typedef CGAL::Exact_predicates_exact_constructions_kernel Kernel;
typedef Kernel::Point_2 Point_2;
typedef Kernel::Circle_2 Circle_2;
typedef Kernel::Line_2 Line_2;
typedef CGAL::Gps_circle_segment_traits_2<Kernel> Traits_2;
typedef CGAL::General_polygon_set_2<Traits_2> Polygon_set_2;
typedef Traits_2::General_polygon_2 Polygon_2;
typedef Traits_2::General_polygon_with_holes_2 Polygon_with_holes_2;
typedef Traits_2::Curve_2 Curve_2;
typedef Traits_2::X_monotone_curve_2 X_monotone_curve_2;
typedef Traits_2::Point_2 Point_2t;
typedef Traits_2::CoordNT coordnt;
typedef CGAL::Arrangement_2<Traits_2> Arrangement_2;
typedef Arrangement_2::Face_handle Face_handle;
// the following type is not copied from the CGAL library example code but
// introduced by me
typedef std::vector<Polygon_with_holes_2> pwh_vec_t;
// the following is an excerpt of my full GerberLayer class,
// that retains only data members which are used in the join()
// member function. These data is therefore local to the class instance.
class GerberLayer
{
public:
GerberLayer();
~GerberLayer();
void join();
pwh_vec_t raw_poly_lis;
pwh_vec_t joined_poly_lis;
Polygon_set_2 Saux;
annotate_vec_t annotate_lis;
polar_vec_t polar_lis;
};
//
// it is not necessary to understand the working of the function
// I deleted all debug and timing output etc. It is just to "showcase" some typical
// operations from the CGAL reg set boolean ops for polygons library from
// Efi Fogel et.al.
//
void GerberLayer::join()
{
Saux.clear();
auto it_annbase = annotate_lis.begin();
annotate_vec_t::iterator itann = annotate_lis.begin();
bool first_block = true;
int cnt = 0;
while (itann != annotate_lis.end()) {
gpolarity akt_polar = itann->polar;
auto itnext = std::find_if(itann, annotate_lis.end(),
[=](auto a) {return a.polar != akt_polar;});
Polygon_set_2 Sblock;
if (first_block) {
if (akt_polar == Dark) {
Saux.join(raw_poly_lis.begin() + (itann - it_annbase),
raw_poly_lis.begin() + (itnext - it_annbase));
}
first_block = false;
} else {
if (akt_polar == Dark) {
Saux.join(raw_poly_lis.begin() + (itann - it_annbase),
raw_poly_lis.begin() + (itnext - it_annbase));
} else {
Polygon_set_2 Saux1;
Saux1.join(raw_poly_lis.begin() + (itann - it_annbase),
raw_poly_lis.begin() + (itnext - it_annbase));
Saux.complement();
pwh_vec_t auxlis;
Saux1.polygons_with_holes(std::back_inserter(auxlis));
Saux.join(auxlis.begin(), auxlis.end());
Saux.complement();
}
}
itann = itnext;
}
ende:
joined_poly_lis.clear();
annotate_lis.clear();
Saux.polygons_with_holes (std::back_inserter (joined_poly_lis));
}
int join_wrapper(GerberLayer* p_layer)
{
p_layer->join();
return 0;
}
// here the parallelism (of the "embarassing kind") occurs:
// for every GerberLayer a dedicated task is started, which calls
// the above GerberLayer::join() function
void Window::do_unify()
{
std::vector<std::future<int>> fivec;
for(int i = 0; i < gerber_layer_manager.num_layers(); ++i) {
GerberLayer* p_layer = gerber_layer_manager.at(i);
fivec.push_back(std::async(join_wrapper, p_layer));
}
int sz = wait_for_all(fivec); // written by me, not shown
}
One might think, that 2) must be possible trivially as only "different" instances of polygons and arrangements are in the play. But: It is imaginable, as the library works with arbitrary precision points (Point_2t in my code above) that, for some implementation reason or other, all the points are inserted in a list static to the class Point_2t, so that identical points are represented only once in this list. So there would be nothing like "independent instances of Point_2t" and as a consequence also not for "Polygon_2" or "Polygon_set_2" and one could say farewell to thread safety.
I tried to resolve this question by googling (not by analyzing the library code, I have to admit) and would hope for an authoritative answer (hopefully positive as this primitive parallelism would greatly speed up my code).
Addendum:
1)
I implemented this already and made a test run with nothing exceptional occurring and visually plausible results, but of course this proves nothing.
2) The same question for the CGAL 2D-Arrangement-package from the same authors.
Thanks in advance!
P.S.: I am using CGAL 4.7 from the packages supplied with Ubuntu 16.04 (Xenial). A newer version on Ubuntu 18.04 gave me errors so I decided to stay with 4.7. Should a version newer than 4.7 be thread-safe, but not 4.7, of course I will try to use that newer version.
Incidentally I could not find out if the libcgal***.so libraries as supplied by Ubuntu 16.04 are thread safe as described in the documentation. Especially I found no reference to the Macro-Variable CGAL_HAS_THREADS that is mentioned in the "thread-safety" part of the docs, when I looked through the build-logs of the Xenial cgal package on launchpad.
Indeed there are several level of thread safety.
The 2D Regularized Boolean operation package depends of the 2D Arrangement package, and both packages depend on a kernel. For most operations the EPEC kernel is required.
Both packages are thread-safe, except for the rational-arc traits (Arr_rational_function_traits_2).
However, the EPEC kernel is not thread-safe yet when sharing number-type objects among threads. So, if you, for example, construct different arrangements in different threads, from different input sets of curves, respectively, you are safe.

Is there any kill_proc() replacement for proprietary Linux kernel drivers?

I'm in the process of porting 4 proprietary (read: non-GPL) Linux kernel drivers (that I didn't write) from RHEL 5.x to RHEL 6.x (2.6.32 kernel). The drivers all use kill_proc() for signalling the user-space "session", but this function has been removed from the more recent kernels (somewhere between 2.6.18 and 2.6.32). I've seen this question asked many times here and elsewhere and I've searched fairly extensively, but of the many suggested solutions, none work due to either the functions no longer being exported, or requrieing a GPL-only function (see below). Does anyone know of a solution that could work for a proprietary driver?
given: kill_proc(pid, sig, 1);
The simplest solution I found was to use: kill_proc_info(sig, SEND_SIG_PRIV, pid); however kill_proc_info is no longer exported so it can't be used.
kill_pid_info() has been suggested (this is called by kill_proc_info() after setting an rcu_read_lock(). kill_pid_info() requires a struct pid* so I could use: kill_pid_info(sig, SEND_SIG_PRIV, find_vpid(pid)); however find_vpid() is exported for GPL use only and this is a proprietary driver. Is there another way to get the struct pid*?
kill_pid_info() also sets up an rcu_read_lock() and then calls group_send_sig_info(). Unfortunately, group_send_siginfo() is not exported, and also it requires a struct task_struct*, but the required find_task_by_vpid() function is not exported either.
Another suggestion was kill_pid(), but this also requires a struct pid*, and again, the function find_vpid() is only exported for GPL.
There were also suggestions for send_sig() and send_sig_info(), but these also require a struct task_struct*, and again, find_task_by_pid() is not exported, and pid_task() requires that (GPLd) find_vpid() to get a struct pid*. Also, these function don't set an rcu_read_lock() and they also pass a FALSE value for the group flag (whereas kill_proc ended up using a TRUE value) - so there could be some subtle differences.
That's all that I could find. Does anyone have a suggestion that will work for my case? Thanks in advance.
Since there have been no responses to my question, I've been
reading much of the kernel code and I think I've found a
solution.
It seems that the only exported function that provides the
same semantics as kill_proc() is kill_pid(). We can't use
the GPL find_vpid() function to get the needed struct pid*,
but if we can get the struct task_struct*, then we can get
the struct pid* from there as:
task->pids[PIDTYPE_PID].pid
Since find_task_by_vpid() is no longer exported, it seems
the only way to find the task is to go through the entire
task list looking for it. So, the proposed solution is:
int my_kill_proc(pid_t pid, int sig) {
int error = -ESRCH; /* default return value */
struct task_struct* p;
struct task_struct* t = NULL;
struct pid* pspid;
rcu_read_lock();
p = &init_task; /* start at init */
do {
if (p->pid == pid) { /* does the pid (not tgid) match? */
t = p;
break;
}
p = next_task(p); /* "this isn't the task you're looking for" */
} while (p != &init_task); /* stop when we get back to init */
if (t != NULL) {
pspid = t->pids[PIDTYPE_PID].pid;
if (pspid != NULL) error = kill_pid(pspid,sig,1);
}
rcu_read_unlock();
return error;
}
I know it will take a lot more time to search the whole task list rather
than using the hash tables, but it's all I've got. Some concerns/questions
that I have:
Is the rcu_read_lock() sufficient for this? Would
it be better to use something like preempt_disable() instead?
Can the struct task_struct ever NOT have a PIDTYPE_PID entry
in the pids array? And if so, is checking for NULL sufficient?
I'm new to working with the kernel, are there any other
suggestions to make this better?

What is openCL equivalent for this cuda "cudaMallocPitch "code.?

My PC has an AMD processor with an ATI 3200 GPU which doesn't support OpenCL. The rest of the codes all running by "Falling back to CPU itself".
I am converting one of the code from CUDA to OpenCL but stuck in some particular part for which there is no exact conversion code in OpenCL. since i have less experience in OpenCL I can't make out this, please suggest me some solution if any of you think will work,
The CUDA code is,
size_t pitch = 0;
cudaError error = cudaMallocPitch((void**)&gpu_data, (size_t*)&pitch,
instances->cols * sizeof(float), instances->rows);
for( int i = 0; i < instances->rows; i++ ){
error = cudaMemcpy((void*)(gpu_data + (pitch/sizeof(float))*i),
(void*)(instances->data + (instances->cols*i)),
instances->cols * sizeof(float) ,cudaMemcpyHostToDevice);
If I remove the pitch value from the above I end up with an problem which doesn't write to the device memory "gpu_data".
Somebody please convert this code to OpenCL and reply. I have converted it to OpenCL, but its not working and the data is not written to "gpu_data". My converted OpenCL code is
gpu_data = clCreateBuffer(context, CL_MEM_READ_WRITE, ((instances->cols)*(instances->rows))*sizeof(float), NULL, &ret);
for( int i = 0; i < instances->rows; i++ ){
ret = clEnqueueWriteBuffer(command_queue, gpu_data, CL_TRUE, 0, ((instances->cols)*(instances->rows))*sizeof(float),(void*)(instances->data + (instances->cols*i)) , 0, NULL, NULL);
Sometimes it runs well for this code and gets stuck in the reading part i.e.
ret = clEnqueueReadBuffer(command_queue, gpu_data, CL_TRUE, 0,sizeof( float ) * instances->cols* 1 , instances->data, 0, NULL, NULL);
overhere. And it gives error like
Unhandled exception at 0x10001098 in CL_kmeans.exe: 0xC000001D: Illegal Instruction.
when break is pressed , it gives:
No symbols are loaded for any call stack frame. The source code cannot be displayed.
while debugging. In the call stack it is displaying:
OCL8CA9.tmp.dll!10001098()
[Frames below may be incorrect and/or missing, no symbols loaded for OCL8CA9.tmp.dll]
amdocl.dll!5c39de16()
I really dont know what it means. someone please help me to rid of this problem.
First of all, in the CUDA code you're doing a horribly inefficient thing to copy the data. The CUDA runtime has the function cudaMemcpy2D that does exactly what you are trying to do by looping over different rows.
What cudaMallocPitch does is to compute an optimal pitch (= distance in byte between rows in a 2D array) such that each new row begins at an address that is optimal for coalescing, and then allocates a memory area as large as pitch times the number of rows you specify. You can emulate the same thing in OpenCL by first computing the optimal pitch and then doing the allocation of the correct size.
The optimal pitch is computed by (1) getting the base address alignment preference for your card (CL_DEVICE_MEM_BASE_ADDR_ALIGN property with clGetDeviceInfo: note that the returned value is in bits, so you have to divide by 8 to get it in bytes); let's call this base (2) find the largest multiple of base that is no less than your natural data pitch (sizeof(type) times number of columns); this will be your pitch.
You then allocate pitch times number of rows bytes, and pass the pitch information to kernels.
Also, when copying data from the host to the device and converesely, you want to use clEnqueue{Read,Write}BufferRect, that are specifically designed to copy 2D data (they are the counterparts to cudaMemcpy2D).

Choppy SDL+OpenGL animation when vsync is on

Uint32 prev = SDL_GetTicks();
while ( true )
{
Draw();
Uint32 now = SDL_GetTicks();
Uint32 delta = now - prev;
printf( "%u\n" , delta );
Update( delta / 1000.0f );
prev = now;
ProcessEvents();
}
The application is a simple moving square. My loop looks like that and when vsync is on the whole thing just runs quite smoothly; turning it off instead causes some kind of jumps of the animation. I've inserted some prints and here's what I've found:
[...]
16
15
16
66 #
2 #
0 #
0 #
16
16
21
[...]
I know there are several issues with this kind of loop but none of them seem to apply to this simple example (am I wrong?). What causes this behavior and how can I overcome it?
I'm using an ATI card on a Linux system, but I'm expecting a portable explanation/solution.
It seems that it was a lack of glFinish(), I've read somewhere that calls to that function are in most cases useless (here or here for example). Well, I'm maybe misunderstanding some fundamental concepts but that worked for me and now the Draw() function ends with:
[...]
glFinish();
SDL_GL_SwapBuffers();
}

Resources