Communication frequency vs Simulation Time for FMU - openmodelica

Lets say we have a FMU which is getting inputs from Python and simulating at an interval of 0.001s. Does the FMI/FMU standard allow us to run the FMU multiple times for a same input (so Python provides the input at 0.01s interval and the FMU simulates that 10 times at each step)? Would that be faster since we have reduced the communication interface by 1/10th ?

(For CS FMUs:) Updating the inputs only every 10th step can be seen as a special co-simualtion algorithm and is ok. Input variables keep their values until they they are newly set.
This will only lead to a benefit in simulation speed, if the the internal calculation time (of a doStep) is small compared to the communication runtime.

Related

Synchronization problem while executing Simulink FMU in ROS 2-Gazebo (TF_OLD_DATA warning)

I'm working in a co-simulation project between Simulink and Gazebo. The aim is to move a robot model in Gazebo with the trajectory coordinates computed from Simulink. I'm using MATLAB R2022a, ROS 2 Dashing and Gazebo 9.9.0 in a computer running Ubuntu 18.04.
The problem is that when launching the FMU with the fmi_adapter, I'm obtaining the following. It is tagged as [INFO], but actually messing up all my project.
[fmi_adapter_node-1] [INFO] [fmi_adapter_node]: Simulation time 1652274762.959713 is greater than timer's time 1652274762.901340. Is your step size to large?
Note the timer's time is higher than the simulation time. Even if I try to change the step size with the optional argument of the fmi_adapter_node, the same log appears with small differences in the times. I'm using the next commands:
ros2 launch fmi_adapter fmi_adapter_node.launch.py fmu_path:=FMI/Trajectory/RobotMARA_SimulinkFMU_v2.fmu # default step size: 0.2
ros2 launch fmi_adapter fmi_adapter_node.launch.py fmu_path:=FMI/Trajectory/RobotMARA_SimulinkFMU_v2.fmu _step_size:=0.001
As you would expect, the outputs of the FMU are the xyz coordinates of the robot trajectory in each time step. Since the fmi_adapter_node creates topics for both inputs and outputs, I'm reading the output xyz values by means of 3 subscribers with the next code. Then, those coordinates are being used to program the robot trajectories with the MoveIt-Python API.
When I run the previous Python code, I'm obtaining the following warning once and again and the robot manipulator actually doesn't move.
[ WARN] [1652274804.119514250]: TF_OLD_DATA ignoring data from the past for frame motor6_link at time 870.266 according to authority unknown_publisher
Possible reasons are listed at http://wiki.ros.org/tf/Errors%20explained
The previous warning is explained here, but I'm not able to fix it. I've tried clicking Reset in RViz, but nothing changes. I've also tried the following without success:
ros2 param set /fmi_adapter_node use_sim_time true # it just sets the timer's time to 0
It seems that the clock is taking negative values, so there is a synchronization problem.
Any help is welcome.
The warning message by the FMIAdapterNode is emitted if the timer's period is only slightly greater than the simulation step-size and if the timer is preempted by other processes or threads.
I created an issue at https://github.com/boschresearch/fmi_adapter/issues/9 which explains this in more detail and lists two possible fixes. It would be great if you could contribute to this discussion.
I assume that the TF_OLD_DATA error is not related to the fmi_adapter. Looking at the code snippet at ROS Answers, I wondered whether x,y,z values are re-published at all given that the lines
pose.position.x = listener_x.value
pose.position.y = listener_y.value
pose.position.z = listener_z.value
are not inside a callback and executed even before rospy.spin(), but maybe that's just truncated.

Different number of cycles when running a benchmark more than once on C++ emulator

When running a benchmark e.g. dhrystone with the command:
make output/dhrystone.riscv.out
as described at: http://riscv.org/download.html#tab_rocket,
on the C++ emulator. I get the following output:
When running it for the first time:
Microseconds for one run through Dhrystone: 1064
Dhrystones per Second: 939
cycle = 533718
instret = 148672
and the second time:
Microseconds for one run through Dhrystone: 1064
Dhrystones per Second: 939
cycle = 533715
instret = 148672
Why do the cycles differ? Shouldn't they be exactly the same. I have tried this with other benchmarks too and had even higher deviations. If this is normal where do the deviations come from?
There are small amounts of nondeterminism from randomly initialized registers (e.g., the clock that is recovered by the HTIF is initialized to a random phase). It doesn't seem like these minor deviations would impact any performance benchmarking.
If you need identical results each time (e.g., for verification?), you could modify the emulator code to initialize registers to some known value each time.

tuning pid in systems with delay

I need to tune PI(D) gains in a system which has a quite large delay. It's a common temperature controller, but the temperature probe is far away from the heater. Some further info:
the response of the probe is delayed about 10 seconds from any change on the heater
the temperature is sampled # 1 Hz, with a resolution of 0.01 °C
the heater is controller in PWM with a period of 1 Hz, with a 10-bit PWM
the goal is to maintain the oscillation below ±0.05 °C
Currently I'm using the controller as PI. I can't avoid oscillations. The higher the gain, the smaller and faster the oscillations. Still too high (about ±0.15 °C).
Reducing the P and I gains leads to very long and deep oscillations.
I think this is due to the delay.
The settling time is not a problem, it may take all the time it needs.
I'm puzzling over how get the system to work. Let's think to use only I. When the probe reaches the target value and the I output starts to decrease, the temperature will rise for some other time. I cannot use the derivative term because the variations are too slow and the dError is very close to zero (if I set the dGain to a huge value there is too much noise).
Any idea?
Try P-only. How fast are the proportional-only oscillations? If you can't tune Kp small enough to get no oscillations, then your heater is overpowered for your system.
If the dead time of the of the system is on the order of 10s, the time constant (T_i) for the Integral term should be 3.3 times the dead time, using a Ziegler Nichols open-loop PI rule ( https://controls.engin.umich.edu/wiki/index.php/PIDTuningClassical#Ziegler-Nichols_Open-Loop_Tuning_Method_or_Process_Reaction_Method: ) , and then Integral term should be Ki = Kp/T_i. So with deadtime = 10s, then Ki should be Kp/33 or slower.
If you are getting integral-only oscillations, then the integral is winding up and down quicker than the process responds, and it should be even smaller.
Also -- think of the units of the different terms. It might not be the delay causing your problems so much as the resolution of the measurement and control systems. If you're driving a (for example) 100W heater with a 1/1024 resolution PWM, you've got 0.1W resolution per PWM count that you are trying to adjust based on 0.01C temperature differences. At less than Kp = 100 PWMcount/degree (or 10W/degree) you don't have enough resolution in the PWM to make changes in response to a 0.01C error. At a Kp=10PWM/C you might need a 0.10C change to result in an actual change in the PWM power. Can you use a higher resolution PWM?
Thinking of it the other way, if you want to operate a system over a range of 30C at 0.01C, I'd think you would want at least a 15bit PWM to have 10 times the resolution in the controlled system. With only 10 bits of PWM you only get about 1C of total range with control at 10x the resolution of the measurements.
Normally for large delays you have two options: Lower the gains of the system or, if you have a model of the plant you are controlling, use a Smith Predictior.
I would start by modelling your system (using open-loop steps in the input) to quantify the delay and the time constant of your plant, then check if the sampling of the temperature and the PWM rate are OK.
Notice that if your PWM frequency is too small in comparison to the plant dynamics, you will have sustained oscillations because of the slow PWM. You can check it using just an constant input to your PWM (with no controllers, open loop).
EDIT: Didn't see that the problem was already solved, but I'll leave this here for reference.

Measuring time: differences among gettimeofday, TSC and clock ticks

I am doing some performance profiling for part of my program. And I try to measure the execution with the following four methods. Interestingly they show different results and I don't fully understand their differences. My CPU is Intel(R) Core(TM) i7-4770. System is Ubuntu 14.04. Thanks in advance for any explanation.
Method 1:
Use the gettimeofday() function, result is in seconds
Method 2:
Use the rdtsc instruction similar to https://stackoverflow.com/a/14019158/3721062
Method 3 and 4 exploits Intel's Performance Counter Monitor (PCM) API
Method 3:
Use PCM's
uint64 getCycles(const CounterStateType & before, const CounterStateType &after)
Its description (I don't quite understand):
Computes the number core clock cycles when signal on a specific core is running (not halted)
Returns number of used cycles (halted cyles are not counted). The counter does not advance in the following conditions:
an ACPI C-state is other than C0 for normal operation
HLT
STPCLK+ pin is asserted
being throttled by TM1
during the frequency switching phase of a performance state transition
The performance counter for this event counts across performance state transitions using different core clock frequencies
Method 4:
Use PCM's
uint64 getInvariantTSC (const CounterStateType & before, const CounterStateType & after)
Its description:
Computes number of invariant time stamp counter ticks.
This counter counts irrespectively of C-, P- or T-states
Two samples runs generate result as follows:
(Method 1 is in seconds. Methods 2~4 are divided by a (same) number to show a per-item cost).
0.016489 0.533603 0.588103 4.15136
0.020374 0.659265 0.730308 5.15672
Some observations:
The ratio of Method 1 over Method 2 is very consistent, while the others are not. i.e., 0.016489/0.533603 = 0.020374/0.659265. Assuming gettimeofday() is sufficiently accurate, the rdtsc method exhibits the "invariant" property. (Yep I read from Internet that current generation of Intel CPU has this feature for rdtsc.)
Methods 3 reports higher than Method 2. I guess its somehow different from the TSC. But what is it?
Methods 4 is the most confusing one. It reports an order of magnitude larger number than Methods 2 and 3. Shouldn't it be also kind of cycle counts? Let alone it carries the "Invariant" name.
gettimeofday() is not designed for measuring time intervals. Don't use it for that purpose.
If you need wall time intervals, use the POSIX monotonic clock. If you need CPU time spent by a particular process or thread, use the POSIX process time or thread time clocks. See man clock_gettime.
PCM API is great for fine tuned performance measurement when you know exactly what you are doing. Which is generally obtaining a variety of separate memory, core, cache, low-power, ... performance figures. Don't start messing with it if you are not sure what exact services you need from it that you can't get from clock_gettime.

multithreading or shared memory - Architecture

There are 3 parts to my application:
A numerical simulator solving a 21 variable diff equation by runge-kutta method - direct from numerical recipes in C, step size is 0.0001 s
A C code pinging a PIC based micrprocessor every 1s and receiving data at about 3600 samples per second over the USB-COM port; It sends relevant data to the front end over TCP/IP
A JAVA front end reading the data from the numerical simulator via SWIG (for the C code) and JNI, modifying the parameters with input from the microprocessor and finally plotting it to the GUI.
I want to recode the JAVA front end in C++ now, with the option of using HTML/Javascript for plotting.
Would rewriting the front end in C++ so that the numerical simulator runs on a separate thread be a good approach?
I don't understand threading though I have used it for the listening and plotting functions in the JAVA code. It seems like having it all run on multiple threads instead of separate processes would slow down my simulations.
Can I combine 1 , 2 and 3 into a single program or should they remain separate to retain the 0.0001 ms simulation speed and the ability to handle the large amount to microprocessor data.
Please help me pick a path forward!
Thanks in Advance!
On a multicore platform, multithreading will generally improve performance. However, GPOS such as Linux and Windows are not deterministic, so there are no guarantees.
That said, the computational performance of a modern PC is such that it will hardly be stretched by this task and data rate,so it hardly matters perhaps?

Resources