vivado hls loop unroll is sequential - vivado
I have a fully connected layer function which i want to parallelize in vivado HLS.
As seen below in the code, the loop i am concerned with is 'input_loop:' which i have set a directive to unroll by a factor of 16. Vivado HLS is indeed unrolling it and i can see 16 multipliers being created, but it unrolls it sequentially:
What am i missing? I want this loop to be duplicated 16x.
Code is as follows:
#include <algorithm>
#include "fc_layer.h"
#include <stdio.h>
#include <string.h>
typedef ap_fixed<64,32,AP_RND> db_word_fixedpt;
typedef ap_fixed<32,16,AP_RND> word_fixedpt;
typedef ap_fixed<16,8,AP_RND> small_fixedpt;
void fc_layer(small_fixedpt weights[MAX_INPUT_SIZE*MAX_OUTPUT_SIZE],
small_fixedpt biases[MAX_OUTPUT_SIZE],
small_fixedpt input[MAX_INPUT_SIZE*MAX_BATCH],
small_fixedpt output[MAX_OUTPUT_SIZE*MAX_BATCH],
const int batch_size,
const int num_inputs,
const int num_outputs)
{
// Batch Iterator
for (int b = 0; b < batch_size; b++) {
#pragma HLS loop_tripcount min=1 max=10
// Output Node Iterator
array_cpy:for (int o = 0; o < num_outputs; o++) {
#pragma HLS loop_tripcount min=1 max=1024
// Set bias
small_fixedpt output_fixp = 0;
//output_fixp = biases[o];
//float input_sub_array[1024] = input[o*num_inputs:o*num_inputs+1024];
small_fixedpt input_sub_array[1024] = {0};
small_fixedpt weight_sub_array[1024] = {0};
small_fixedpt output_sub_array[1024] = {0};
small_fixedpt output_sub_array_stg2[64] = {0};
subcopy:for(int i = 0; i < 1024; i++) {
input_sub_array[i] = input[b*num_inputs+i];
weight_sub_array[i] = weights[o*num_inputs+i];
}
// Accumulate weighted sum
input_loop:for (int i = 0; i < std::min(num_inputs,MAX_INPUT_SIZE); i++) {
#pragma HLS loop_tripcount min=1 max=1024
output_sub_array[i] = input_sub_array[i]*weights[i];
}
output[b*num_outputs+o] = biases[o];
for(int i = 0; i < 64; i++) {
output_sub_array_stg2[i] = output_sub_array[16*i] + output_sub_array[16*i+1] \
+ output_sub_array[16*i+2] + output_sub_array[16*i+3] \
+ output_sub_array[16*i+4] + output_sub_array[16*i+5] \
+ output_sub_array[16*i+6] + output_sub_array[16*i+7] \
+ output_sub_array[16*i+8] + output_sub_array[16*i+9] \
+ output_sub_array[16*i+10] + output_sub_array[16*i+11] \
+ output_sub_array[16*i+12] + output_sub_array[16*i+13] \
+ output_sub_array[16*i+14] + output_sub_array[16*i+15];
}
for(int i = 0; i < 64; i++) {
output[b*num_outputs+o] += output_sub_array_stg2[i];
}
}
}
}
Directives file:
############################################################
## This file is generated automatically by Vivado HLS.
## Please DO NOT edit it.
## Copyright (C) 1986-2017 Xilinx, Inc. All Rights Reserved.
############################################################
set_directive_unroll -factor 16 "fc_layer/input_loop"
set_directive_array_partition -type cyclic -factor 16 -dim 1 "fc_layer" input
set_directive_array_partition -type cyclic -factor 16 -dim 1 "fc_layer" weights
set_directive_array_partition -type cyclic -factor 16 -dim 1 "fc_layer/array_cpy" input_sub_array
set_directive_array_partition -type cyclic -factor 16 -dim 1 "fc_layer/array_cpy" weight_sub_array
set_directive_unroll -factor 16 "fc_layer/subcopy"
set_directive_array_partition -type cyclic -factor 16 -dim 1 "fc_layer/array_cpy" output_sub_array
set_directive_resource -core RAM_S2P_LUTRAM "fc_layer/array_cpy" output_sub_array
set_directive_resource -core RAM_S2P_LUTRAM "fc_layer/array_cpy" weight_sub_array
set_directive_resource -core RAM_S2P_LUTRAM "fc_layer/array_cpy" input_sub_array
set_directive_resource -core RAM_T2P_BRAM "fc_layer" weights
set_directive_resource -core RAM_T2P_BRAM "fc_layer" biases
set_directive_resource -core RAM_T2P_BRAM "fc_layer" input
Here is the log output:
Starting C synthesis ...
/opt/Xilinx/Vivado_HLS/2017.2/bin/vivado_hls /workspace/REDACTED
INFO: [HLS 200-10] Running '/opt/Xilinx/Vivado_HLS/2017.2/bin/unwrapped/lnx64.o/vivado_hls'
INFO: [HLS 200-10] For user 'root' on host '3310c2d0e0d4' (Linux_x86_64 version 4.9.125-linuxkit) on Fri Feb 08 20:06:44 UTC 2019
INFO: [HLS 200-10] In directory REDACTED
INFO: [HLS 200-10] Opening project REDACTED
INFO: [HLS 200-10] Adding design file '../fc_test/fc_layer.cpp' to the project
INFO: [HLS 200-10] Adding test bench file '../fc_test/fc_layer_test.cpp' to the project
INFO: [HLS 200-10] Adding test bench file '../util/shared.cpp' to the project
INFO: [HLS 200-10] Adding test bench file '../nn_params' to the project
INFO: [HLS 200-10] Opening solution REDACTED
INFO: [SYN 201-201] Setting up clock 'default' with a period of 10ns.
INFO: [HLS 200-10] Setting target device to 'xcvu095-ffvc1517-2-e'
INFO: [HLS 200-10] Analyzing design file '../fc_test/fc_layer.cpp' ...
INFO: [HLS 200-10] Validating synthesis directives ...
INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:01:48 ; elapsed = 00:00:57 . Memory (MB): peak = 345.266 ; gain = 12.586 ; free physical = 3149 ; free virtual = 4919
INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:01:50 ; elapsed = 00:00:59 . Memory (MB): peak = 345.266 ; gain = 12.586 ; free physical = 3141 ; free virtual = 4918
INFO: [HLS 200-10] Starting code transformations ...
INFO: [XFORM 203-603] Inlining function 'std::min<int>' into 'fc_layer' (../fc_test/fc_layer.cpp:35).
INFO: [XFORM 203-603] Inlining function 'ap_fixed_base<16, 8, true, (ap_q_mode)0, (ap_o_mode)3, 0>::quantization_adjust' into 'ap_fixed_base<16, 8, true, (ap_q_mode)0, (ap_o_mode)3, 0>::ap_fixed_base<32, 16, true, (ap_q_mode)5, (ap_o_mode)3, 0>' ().
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:01:52 ; elapsed = 00:01:01 . Memory (MB): peak = 346.039 ; gain = 13.359 ; free physical = 3118 ; free virtual = 4900
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 00:01:52 ; elapsed = 00:01:02 . Memory (MB): peak = 473.895 ; gain = 141.215 ; free physical = 3107 ; free virtual = 4891
INFO: [XFORM 203-501] Unrolling loop 'subcopy' (../fc_test/fc_layer.cpp:30) in function 'fc_layer' partially with a factor of 16.
INFO: [XFORM 203-501] Unrolling loop 'input_loop' (../fc_test/fc_layer.cpp:35) in function 'fc_layer' partially with a factor of 16.
INFO: [XFORM 203-101] Partitioning array 'weights.V' (../fc_test/fc_layer.cpp:8) in dimension 1 with a cyclic factor 16.
INFO: [XFORM 203-101] Partitioning array 'input.V' (../fc_test/fc_layer.cpp:10) in dimension 1 with a cyclic factor 16.
INFO: [XFORM 203-101] Partitioning array 'input_sub_array.V' (../fc_test/fc_layer.cpp:26) in dimension 1 with a cyclic factor 16.
INFO: [XFORM 203-101] Partitioning array 'weight_sub_array.V' (../fc_test/fc_layer.cpp:27) in dimension 1 with a cyclic factor 16.
INFO: [XFORM 203-101] Partitioning array 'output_sub_array.V' (../fc_test/fc_layer.cpp:28) in dimension 1 with a cyclic factor 16.
INFO: [XFORM 203-11] Balancing expressions in function 'fc_layer' (../fc_test/fc_layer.cpp:8)...15 expression(s) balanced.
INFO: [HLS 200-111] Finished Pre-synthesis Time (s): cpu = 00:01:55 ; elapsed = 00:01:05 . Memory (MB): peak = 473.895 ; gain = 141.215 ; free physical = 3069 ; free virtual = 4859
INFO: [HLS 200-111] Finished Architecture Synthesis Time (s): cpu = 00:01:59 ; elapsed = 00:01:09 . Memory (MB): peak = 473.895 ; gain = 141.215 ; free physical = 3063 ; free virtual = 4855
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'fc_layer' ...
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Implementing module 'fc_layer'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111] Elapsed time: 73.78 seconds; current allocated memory: 108.265 MB.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111] Elapsed time: 3.42 seconds; current allocated memory: 112.243 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'fc_layer'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_0_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_1_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_2_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_3_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_4_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_5_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_6_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_7_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_8_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_9_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_10_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_11_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_12_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_13_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_14_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_15_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/biases_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_0_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_1_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_2_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_3_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_4_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_5_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_6_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_7_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_8_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_9_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_10_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_11_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_12_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_13_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_14_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_15_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/output_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/batch_size' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/num_inputs' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/num_outputs' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on function 'fc_layer' to 'ap_ctrl_hs'.
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_0_V' to 'fc_layer_input_subkb' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_1_V' to 'fc_layer_input_sucud' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_2_V' to 'fc_layer_input_sudEe' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_3_V' to 'fc_layer_input_sueOg' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_4_V' to 'fc_layer_input_sufYi' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_5_V' to 'fc_layer_input_sug8j' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_6_V' to 'fc_layer_input_suhbi' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_7_V' to 'fc_layer_input_suibs' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_8_V' to 'fc_layer_input_sujbC' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_9_V' to 'fc_layer_input_sukbM' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_10_s' to 'fc_layer_input_sulbW' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_11_s' to 'fc_layer_input_sumb6' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_12_s' to 'fc_layer_input_suncg' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_13_s' to 'fc_layer_input_suocq' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_14_s' to 'fc_layer_input_supcA' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_15_s' to 'fc_layer_input_suqcK' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_0_s' to 'fc_layer_output_srcU' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_1_s' to 'fc_layer_output_ssc4' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_2_s' to 'fc_layer_output_stde' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_3_s' to 'fc_layer_output_sudo' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_4_s' to 'fc_layer_output_svdy' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_5_s' to 'fc_layer_output_swdI' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_6_s' to 'fc_layer_output_sxdS' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_7_s' to 'fc_layer_output_syd2' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_8_s' to 'fc_layer_output_szec' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_9_s' to 'fc_layer_output_sAem' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_10' to 'fc_layer_output_sBew' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_11' to 'fc_layer_output_sCeG' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_12' to 'fc_layer_output_sDeQ' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_13' to 'fc_layer_output_sEe0' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_14' to 'fc_layer_output_sFfa' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_15' to 'fc_layer_output_sGfk' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_stg' to 'fc_layer_output_sHfu' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_mux_164_16_1' to 'fc_layer_mux_164_IfE' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_mul_mul_16s_16s_32_1' to 'fc_layer_mul_mul_JfO' due to the length limit 20
INFO: [RTGEN 206-100] Generating core module 'fc_layer_mul_mul_JfO': 16 instance(s).
INFO: [RTGEN 206-100] Generating core module 'fc_layer_mux_164_IfE': 16 instance(s).
INFO: [RTGEN 206-100] Finished creating RTL model for 'fc_layer'.
INFO: [HLS 200-111] Elapsed time: 4.51 seconds; current allocated memory: 117.723 MB.
INFO: [RTMG 210-278] Implementing memory 'fc_layer_input_subkb_ram' using block RAMs.
INFO: [HLS 200-111] Finished generating all RTL models Time (s): cpu = 00:02:21 ; elapsed = 00:01:36 . Memory (MB): peak = 537.262 ; gain = 204.582 ; free physical = 3021 ; free virtual = 4826
INFO: [SYSC 207-301] Generating SystemC RTL for fc_layer.
INFO: [VHDL 208-304] Generating VHDL RTL for fc_layer.
INFO: [VLOG 209-307] Generating Verilog RTL for fc_layer.
INFO: [HLS 200-112] Total elapsed time: 96.25 seconds; peak allocated memory: 117.723 MB.
Finished C synthesis.
Anyone know what im missing here?
I am afraid that the number of the iterations has to be known compiletime. And the arrays may require an explicit array partitioning pragma.
Related
SMP threads not showing in GDB
Trying to debug multi CPU SoC (Amlogic A113X) and faced a problem. So I have this debug configuration: A113X(JTAG) -> Segger J-Link V11 -> OpenOCD -> gdb-multiarch Everything is connected and seems okay, but GDB shows just 1 thread (should be 4 - one for each CPU): (gdb) info threads Id Target Id Frame * 1 Remote target 0xffffff8009853364 in arch_spin_lock (lock=<optimized out>) at ./arch/arm64/include/asm/spinlock.h:89 Meanwhile there are 4 debug targets according to telenet 'targets' command: > targets TargetName Type Endian TapName State -- ------------------ ---------- ------ ------------------ ------------ 0* A113X.a53.0 aarch64 little A113X.cpu halted 1 A113X.a53.1 aarch64 little A113X.cpu halted 2 A113X.a53.2 aarch64 little A113X.cpu unknown 3 A113X.a53.3 aarch64 little A113X.cpu halted Core 2 is shut down in this particular case. When I halt CPUs then get this output in GDB: (gdb) continue Continuing. ^CA113X.a53.1 halted in AArch64 state due to debug-request, current mode: EL1H cpsr: 0x800001c5 pc: 0xffffff8009853364 MMU: enabled, D-Cache: enabled, I-Cache: enabled A113X.a53.3 halted in AArch64 state due to debug-request, current mode: EL1H cpsr: 0x800000c5 pc: 0xffffff80098532b4 MMU: enabled, D-Cache: enabled, I-Cache: enabled Program received signal SIGINT, Interrupt. 0xffffff8009853364 in arch_spin_lock (lock=<optimized out>) at ./arch/arm64/include/asm/spinlock.h:89 89 asm volatile( (gdb) where #0 0xffffff8009853364 in arch_spin_lock (lock=<optimized out>) at ./arch/arm64/include/asm/spinlock.h:89 #1 do_raw_spin_lock (lock=<optimized out>) at ./include/linux/spinlock.h:148 #2 __raw_spin_lock (lock=<optimized out>) at ./include/linux/spinlock_api_smp.h:145 #3 _raw_spin_lock (lock=0xffffffc01fb86a00) at kernel/locking/spinlock.c:151 #4 0xffffff80090c2114 in try_to_wake_up (p=0xffffffc01963a880, state=<optimized out>, wake_flags=0) at kernel/sched/core.c:2110 #5 0xffffff80090c239c in wake_up_process (p=<optimized out>) at kernel/sched/core.c:2203 #6 0xffffff80090b1b7c in wake_up_worker (pool=<optimized out>) at kernel/workqueue.c:837 #7 insert_work (pwq=<optimized out>, work=<optimized out>, head=<optimized out>, extra_flags=<optimized out>) at kernel/workqueue.c:1310 #8 0xffffff80090b1d10 in __queue_work (cpu=0, wq=0xdf2, work=0x8df2) at kernel/workqueue.c:1460 #9 0xffffff80090b1fc8 in queue_work_on (cpu=8, wq=0xffffffc01dbb5c00, work=0xffffffc01ccb82a0) at kernel/workqueue.c:1485 #10 0xffffff800191d068 in ?? () #11 0xffffffc0138664e8 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Is something wrong with my OpenOCD configuration? SMP config looks fine because it halts all 4 cores. What can be wrong here? Thanks beforehand. Here is openocd config: telnet_port 4444 gdb_port 3333 source [find interface/jlink.cfg] transport select jtag adapter speed 1000 scan_chain set _CHIPNAME A113X set _DAPNAME $_CHIPNAME.dap jtag newtap $_CHIPNAME cpu -irlen 4 -expected-id 0x5ba00477 dap create $_DAPNAME -chain-position $_CHIPNAME.cpu echo "$_CHIPNAME.cpu" set CA53_DBGBASE {0x80410000 0x80510000 0x80610000 0x80710000} set CA53_CTIBASE {0x80420000 0x80520000 0x80620000 0x80720000} set _num_ca53 4 set _ap_num 0 set smp_targets "" proc setup_a5x {core_name dbgbase ctibase num boot} { for { set _core 0 } { $_core < $num } { incr _core } { set _TARGETNAME $::_CHIPNAME.$core_name.$_core set _CTINAME $_TARGETNAME.cti cti create $_CTINAME -dap $::_DAPNAME -ap-num $::_ap_num \ -baseaddr [lindex $ctibase $_core] target create $_TARGETNAME aarch64 -dap $::_DAPNAME -cti $_CTINAME -coreid $_core set ::smp_targets "$::smp_targets $_TARGETNAME" } } setup_a5x a53 $CA53_DBGBASE $CA53_CTIBASE $_num_ca53 1 echo "SMP targets:$smp_targets" eval "target smp $smp_targets" targets $_CHIPNAME.a53.0 And output of OpenOCD: Open On-Chip Debugger 0.11.0+dev-00640-ge83eeb44a (2022-04-21-10:10) Licensed under GNU GPL v2 For bug reports, read http://openocd.org/doc/doxygen/bugs.html A113X.cpu SMP targets: A113X.a53.0 A113X.a53.1 A113X.a53.2 A113X.a53.3 Info : Listening on port 6666 for tcl connections Info : Listening on port 4444 for telnet connections Info : J-Link V11 compiled Mar 3 2022 10:16:14 Info : Hardware version: 11.00 Info : VTarget = 3.309 V Info : clock speed 1000 kHz Info : JTAG tap: A113X.cpu tap/device found: 0x5ba00477 (mfg: 0x23b (ARM Ltd), part: 0xba00, ver: 0x5) Info : A113X.a53.0: hardware has 6 breakpoints, 4 watchpoints Info : A113X.a53.1: hardware has 6 breakpoints, 4 watchpoints Error: JTAG-DP STICKY ERROR Warn : target A113X.a53.2 examination failed Info : A113X.a53.3: hardware has 6 breakpoints, 4 watchpoints Info : A113X.a53.0 cluster 0 core 0 multi core Info : A113X.a53.1 cluster 0 core 1 multi core Info : A113X.a53.3 cluster 0 core 3 multi core Info : starting gdb server for A113X.a53.0 on 3333 Info : Listening on port 3333 for gdb connections Info : accepting 'gdb' connection on tcp/3333 Info : New GDB Connection: 1, Target A113X.a53.0, state: halted Warn : Prefer GDB command "target extended-remote :3333" instead of "target remote :3333" A113X.a53.1 halted in AArch64 state due to debug-request, current mode: EL1H cpsr: 0x800001c5 pc: 0xffffff8009853364 MMU: enabled, D-Cache: enabled, I-Cache: enabled A113X.a53.3 halted in AArch64 state due to debug-request, current mode: EL1H cpsr: 0x800000c5 pc: 0xffffff80098532b4 MMU: enabled, D-Cache: enabled, I-Cache: enabled
I tried to combine hwthread and targete smp to achieve multi-cores. Add "-rtos hwthread" for each core, like target create ...... -rtos hwthread In my opinion, a core is regarded as a hwthread by OpenOCD, so set hwthread for each core. Following is the response of info threads. (gdb) i threads Id Target Id Frame * 1 Thread 1 (Name: riscv.cpu0.0, state: debug-request) 0x8000000a in _mstart () 2 Thread 2 (Name: riscv.cpu0.1, state: debug-request) 0x82000000 in ?? () 3 Thread 3 (Name: riscv.cpu0.2, state: debug-request) 0x82000000 in ?? () 4 Thread 4 (Name: riscv.cpu0.3, state: debug-request) 0x82000000 in ?? ()
DPDK for general purpose workload
I have deployed OpenStack and configured OVS-DPDK on compute nodes for high-performance networking. My workload is a general-purpose workload like running haproxy, mysql, apache, and XMPP etc. When I did load-testing, I found performance is average and after 200kpps packet rate I noticed packet drops. I heard and read DPDK can handle millions of packets but in my case, it's not true. In guest, I am using virtio-net which processes packets in the kernel so I believe my bottleneck is my guest VM. I don't have any guest-based DPDK application like testpmd etc. Does that mean OVS+DPDK isn't useful for my cloud? How do I take advantage of OVS+DPDK with a general-purpose workload? Updates We have our own loadtesting tool which generate Audio RTP traffic which is pure UDP based 150bytes packets and noticed after 200kpps audio quality go down and choppy. In short DPDK host hit high PMD cpu usage and loadtest showing bad audio quality. when i do same test with SRIOV based VM then performance is really really good. $ ovs-vswitchd -V ovs-vswitchd (Open vSwitch) 2.13.3 DPDK 19.11.7 Intel NIC X550T # ethtool -i ext0 driver: ixgbe version: 5.1.0-k firmware-version: 0x80000d63, 18.8.9 expansion-rom-version: bus-info: 0000:3b:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes In the following output what does these queue-id:0 to 8 and why only the first queue is in use but not others, they are always zero. What does this mean? ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 2: isolated : false port: vhu1c3bf17a-01 queue-id: 0 (enabled) pmd usage: 0 % port: vhu1c3bf17a-01 queue-id: 1 (enabled) pmd usage: 0 % port: vhu6b7daba9-1a queue-id: 2 (disabled) pmd usage: 0 % port: vhu6b7daba9-1a queue-id: 3 (disabled) pmd usage: 0 % pmd thread numa_id 1 core_id 3: isolated : false pmd thread numa_id 0 core_id 22: isolated : false port: vhu1c3bf17a-01 queue-id: 3 (enabled) pmd usage: 0 % port: vhu1c3bf17a-01 queue-id: 6 (enabled) pmd usage: 0 % port: vhu6b7daba9-1a queue-id: 0 (enabled) pmd usage: 54 % port: vhu6b7daba9-1a queue-id: 5 (disabled) pmd usage: 0 % pmd thread numa_id 1 core_id 23: isolated : false port: dpdk1 queue-id: 0 (enabled) pmd usage: 3 % pmd thread numa_id 0 core_id 26: isolated : false port: vhu1c3bf17a-01 queue-id: 2 (enabled) pmd usage: 0 % port: vhu1c3bf17a-01 queue-id: 7 (enabled) pmd usage: 0 % port: vhu6b7daba9-1a queue-id: 1 (disabled) pmd usage: 0 % port: vhu6b7daba9-1a queue-id: 4 (disabled) pmd usage: 0 % pmd thread numa_id 1 core_id 27: isolated : false pmd thread numa_id 0 core_id 46: isolated : false port: dpdk0 queue-id: 0 (enabled) pmd usage: 27 % port: vhu1c3bf17a-01 queue-id: 4 (enabled) pmd usage: 0 % port: vhu1c3bf17a-01 queue-id: 5 (enabled) pmd usage: 0 % port: vhu6b7daba9-1a queue-id: 6 (disabled) pmd usage: 0 % port: vhu6b7daba9-1a queue-id: 7 (disabled) pmd usage: 0 % pmd thread numa_id 1 core_id 47: isolated : false $ ovs-appctl dpif-netdev/pmd-stats-clear && sleep 10 && ovs-appctl dpif-netdev/pmd-stats-show | grep "processing cycles:" processing cycles: 1697952 (0.01%) processing cycles: 12726856558 (74.96%) processing cycles: 4259431602 (19.40%) processing cycles: 512666 (0.00%) processing cycles: 6324848608 (37.81%) Does processing cycles mean my PMD is under stress? but i am only hitting 200kpps rate? This is my dpdk0 and dpdk1 port statistics sudo ovs-vsctl get Interface dpdk0 statistics {flow_director_filter_add_errors=153605, flow_director_filter_remove_errors=30829, mac_local_errors=0, mac_remote_errors=0, ovs_rx_qos_drops=0, ovs_tx_failure_drops=0, ovs_tx_invalid_hwol_drops=0, ovs_tx_mtu_exceeded_drops=0, ovs_tx_qos_drops=0, rx_128_to_255_packets=64338613, rx_1_to_64_packets=367, rx_256_to_511_packets=116298, rx_512_to_1023_packets=31264, rx_65_to_127_packets=6990079, rx_broadcast_packets=0, rx_bytes=12124930385, rx_crc_errors=0, rx_dropped=0, rx_errors=12, rx_fcoe_crc_errors=0, rx_fcoe_dropped=12, rx_fcoe_mbuf_allocation_errors=0, rx_fragment_errors=367, rx_illegal_byte_errors=0, rx_jabber_errors=0, rx_length_errors=0, rx_mac_short_packet_dropped=128, rx_management_dropped=35741, rx_management_packets=31264, rx_mbuf_allocation_errors=0, rx_missed_errors=0, rx_oversize_errors=0, rx_packets=71512362, rx_priority0_dropped=0, rx_priority0_mbuf_allocation_errors=1096, rx_priority1_dropped=0, rx_priority1_mbuf_allocation_errors=0, rx_priority2_dropped=0, rx_priority2_mbuf_allocation_errors=0, rx_priority3_dropped=0, rx_priority3_mbuf_allocation_errors=0, rx_priority4_dropped=0, rx_priority4_mbuf_allocation_errors=0, rx_priority5_dropped=0, rx_priority5_mbuf_allocation_errors=0, rx_priority6_dropped=0, rx_priority6_mbuf_allocation_errors=0, rx_priority7_dropped=0, rx_priority7_mbuf_allocation_errors=0, rx_undersize_errors=6990079, tx_128_to_255_packets=64273778, tx_1_to_64_packets=128, tx_256_to_511_packets=43670294, tx_512_to_1023_packets=153605, tx_65_to_127_packets=881272, tx_broadcast_packets=10, tx_bytes=25935295292, tx_dropped=0, tx_errors=0, tx_management_packets=0, tx_multicast_packets=153, tx_packets=109009906} stats sudo ovs-vsctl get Interface dpdk1 statistics {flow_director_filter_add_errors=126793, flow_director_filter_remove_errors=37969, mac_local_errors=0, mac_remote_errors=0, ovs_rx_qos_drops=0, ovs_tx_failure_drops=0, ovs_tx_invalid_hwol_drops=0, ovs_tx_mtu_exceeded_drops=0, ovs_tx_qos_drops=0, rx_128_to_255_packets=64435459, rx_1_to_64_packets=107843, rx_256_to_511_packets=230, rx_512_to_1023_packets=13, rx_65_to_127_packets=7049788, rx_broadcast_packets=199058, rx_bytes=12024342488, rx_crc_errors=0, rx_dropped=0, rx_errors=11, rx_fcoe_crc_errors=0, rx_fcoe_dropped=11, rx_fcoe_mbuf_allocation_errors=0, rx_fragment_errors=107843, rx_illegal_byte_errors=0, rx_jabber_errors=0, rx_length_errors=0, rx_mac_short_packet_dropped=1906, rx_management_dropped=0, rx_management_packets=13, rx_mbuf_allocation_errors=0, rx_missed_errors=0, rx_oversize_errors=0, rx_packets=71593333, rx_priority0_dropped=0, rx_priority0_mbuf_allocation_errors=1131, rx_priority1_dropped=0, rx_priority1_mbuf_allocation_errors=0, rx_priority2_dropped=0, rx_priority2_mbuf_allocation_errors=0, rx_priority3_dropped=0, rx_priority3_mbuf_allocation_errors=0, rx_priority4_dropped=0, rx_priority4_mbuf_allocation_errors=0, rx_priority5_dropped=0, rx_priority5_mbuf_allocation_errors=0, rx_priority6_dropped=0, rx_priority6_mbuf_allocation_errors=0, rx_priority7_dropped=0, rx_priority7_mbuf_allocation_errors=0, rx_undersize_errors=7049788, tx_128_to_255_packets=102664472, tx_1_to_64_packets=1906, tx_256_to_511_packets=68008814, tx_512_to_1023_packets=126793, tx_65_to_127_packets=1412435, tx_broadcast_packets=1464, tx_bytes=40693963125, tx_dropped=0, tx_errors=0, tx_management_packets=199058, tx_multicast_packets=146, tx_packets=172252389} Update - 2 dpdk interface # dpdk-devbind.py -s Network devices using DPDK-compatible driver ============================================ 0000:3b:00.1 'Ethernet Controller 10G X550T 1563' drv=vfio-pci unused=ixgbe 0000:af:00.1 'Ethernet Controller 10G X550T 1563' drv=vfio-pci unused=ixgbe Network devices using kernel driver =================================== 0000:04:00.0 'NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 165f' if=eno1 drv=tg3 unused=vfio-pci 0000:04:00.1 'NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 165f' if=eno2 drv=tg3 unused=vfio-pci 0000:3b:00.0 'Ethernet Controller 10G X550T 1563' if=int0 drv=ixgbe unused=vfio-pci 0000:af:00.0 'Ethernet Controller 10G X550T 1563' if=int1 drv=ixgbe unused=vfio-pci OVS # ovs-vsctl show 595103ef-55a1-4f71-b299-a14942965e75 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: netdev Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port vxlan-0a48042b Interface vxlan-0a48042b type: vxlan options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="10.72.4.44", out_key=flow, remote_ip="10.72.4.43"} Port vxlan-0a480429 Interface vxlan-0a480429 type: vxlan options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="10.72.4.44", out_key=flow, remote_ip="10.72.4.41"} Port vxlan-0a48041f Interface vxlan-0a48041f type: vxlan options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="10.72.4.44", out_key=flow, remote_ip="10.72.4.31"} Port vxlan-0a48042a Interface vxlan-0a48042a type: vxlan options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="10.72.4.44", out_key=flow, remote_ip="10.72.4.42"} Bridge br-vlan Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: netdev Port br-vlan Interface br-vlan type: internal Port dpdkbond Interface dpdk1 type: dpdk options: {dpdk-devargs="0000:af:00.1", n_txq_desc="2048"} Interface dpdk0 type: dpdk options: {dpdk-devargs="0000:3b:00.1", n_txq_desc="2048"} Port phy-br-vlan Interface phy-br-vlan type: patch options: {peer=int-br-vlan} Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: netdev Port vhu87cf49d2-5b tag: 7 Interface vhu87cf49d2-5b type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_socket/vhu87cf49d2-5b"} Port vhub607c1fa-ec tag: 7 Interface vhub607c1fa-ec type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_socket/vhub607c1fa-ec"} Port vhu9a035444-83 tag: 8 Interface vhu9a035444-83 type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_socket/vhu9a035444-83"} Port br-int Interface br-int type: internal Port int-br-vlan Interface int-br-vlan type: patch options: {peer=phy-br-vlan} Port vhue00471df-d8 tag: 8 Interface vhue00471df-d8 type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_socket/vhue00471df-d8"} Port vhu683fdd35-91 tag: 7 Interface vhu683fdd35-91 type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_socket/vhu683fdd35-91"} Port vhuf04fb2ec-ec tag: 8 Interface vhuf04fb2ec-ec type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_socket/vhuf04fb2ec-ec"} Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} ovs_version: "2.13.3" I have created guest vms using openstack and they can see them they are connected using vhost socket (Ex: /var/lib/vhost_socket/vhuf04fb2ec-ec)
When I did load-testing, I found performance is average and after 200kpps packet rate I noticed packet drops. In short DPDK host hit high PMD cpu usage and loadtest showing bad audio quality. when i do same test with SRI [Answer] this observation is not true based on the live debug done so far. The reason as stated below qemu launched were not pinned to specific cores. comparison done against PCIe pass-through (VF) against vhost-client is not apples to apples comparison. with OpenStack approach, there are at least 3 bridges before the packets to flow through before reaching VM. OVS threads were not pinned which led to all the PMD threads running on the same core (causing latency and drops) in each bridge stage. To have a fair comparison against SRIOV approach, the following changes have been made with respect to similar question External Port <==> DPDK Port0 (L2fwd) DPDK net_vhost <--> QEMU (virtio-pci) Numbers achieved with iperf3 (bidirectional) is around 10Gbps. Note: requested to run trex, pktgen to try out Mpps. Expectation is to reach minimum 8 MPPS with the current setup. Hence this is not DPDK, virtio-client, qemu-kvm or SRIOV related issue, instead a configuration or platform setup issue.
RegexpParser crashes Python 3.8.6 kernel in JupyterLab
I am parsing chunks of pos-tagged text in JupyterLabs. NLTK can run on Python<3.5 and >3.8 according to its faq. I can return pos-tagged text just fine. But when I want to return parsed chunks, it crashes python. I am running MacOS 11.1, Python 3.8.6, Jupyterlab 3.0.0, and nltk 3.5 from nltk import * text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital computer or the gears of a cycle transmission as he does at the top of a mountain or in the petals of a flower. To think otherwise is to demean the Buddha...which is to demean oneself.""" sentence_re = r'''(?x) ([A-Z])(\.[A-Z])+\.? | \w+(-\w+)* | \$?\d+(\.\d+)?%? | \.\.\. | [][.,;"'?():-_`] ''' grammar = r""" NBAR: {<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns NP: {<NBAR>} {<NBAR><IN><NBAR>} # Above, connected with in/of/etc... """ chunker = RegexpParser(grammar) toks = word_tokenize(text) postoks = pos_tag(toks) All is fine until I want to parse the chunks with RegexpParser. At which point, it crashes the kernel. chunker.parse(postoks) beginning of the crash report: Process: Python [1549] Path: /usr/local/Cellar/python#3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/Resources/Python.app/Contents/MacOS/Python Identifier: org.python.python Version: 3.8.6 (3.8.6) Code Type: X86-64 (Native) Parent Process: Python [1522] Responsible: Terminal [416] User ID: 501 Date/Time: 2020-12-29 15:34:13.546 -0500 OS Version: macOS 11.1 (20C69) Report Version: 12 Anonymous UUID: 8EEE2257-0986-3569-AA83-52641AF02282 Time Awake Since Boot: 1200 seconds System Integrity Protection: enabled Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Application Specific Information: abort() called end of the crash report: External Modification Summary: Calls made by other processes targeting this process: task_for_pid: 4 thread_create: 0 thread_set_state: 0 Calls made by this process: task_for_pid: 0 thread_create: 0 thread_set_state: 0 Calls made by all processes on this machine: task_for_pid: 4785 thread_create: 0 thread_set_state: 0 VM Region Summary: ReadOnly portion of Libraries: Total=835.0M resident=0K(0%) swapped_out_or_unallocated=835.0M(100%) Writable regions: Total=1.5G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=1.5G(100%) VIRTUAL REGION REGION TYPE SIZE COUNT (non-coalesced) =========== ======= ======= Activity Tracing 256K 1 Dispatch continuations 64.0M 1 Kernel Alloc Once 8K 1 MALLOC 150.1M 37 MALLOC guard page 24K 5 MALLOC_MEDIUM (reserved) 960.0M 8 reserved VM address space (unallocated) STACK GUARD 72K 18 Stack 86.6M 18 VM_ALLOCATE 182.5M 359 VM_ALLOCATE (reserved) 128.0M 2 reserved VM address space (unallocated) __DATA 16.1M 460 __DATA_CONST 11.8M 200 __DATA_DIRTY 509K 87 __FONT_DATA 4K 1 __LINKEDIT 506.1M 241 __OBJC_RO 60.5M 1 __OBJC_RW 2452K 2 __TEXT 329.5M 432 __UNICODE 588K 1 mapped file 51.5M 9 shared memory 40K 4 =========== ======= ======= TOTAL 2.5G 1888 TOTAL, minus reserved VM space 1.4G 1888 Model: iMac18,2, BootROM 429.60.3.0.0, 4 processors, Quad-Core Intel Core i7, 3.6 GHz, 32 GB, SMC 2.40f1 Graphics: kHW_AMDRadeonPro560Item, Radeon Pro 560, spdisplays_pcie_device, 4 GB Memory Module: BANK 0/DIMM0, 16 GB, DDR4 SO-DIMM, 2400 MHz, 0x802C, 0x313641544632473634485A2D3247334232202020 Memory Module: BANK 1/DIMM0, 16 GB, DDR4 SO-DIMM, 2400 MHz, 0x802C, 0x313641544632473634485A2D3247334232202020 AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x16E), Broadcom BCM43xx 1.0 (7.77.111.1 AirPortDriverBrcmNIC-1675.1) Bluetooth: Version 8.0.2f9, 3 services, 27 devices, 1 incoming serial ports Network Service: Wi-Fi, AirPort, en1 USB Device: USB 3.0 Bus USB Device: AS2105 USB Device: USB 2.0 BILLBOARD USB Device: Bluetooth USB Host Controller USB Device: FaceTime HD Camera (Built-in) USB Device: Scarlett 2i4 USB USB Device: My Passport 0827 Thunderbolt Bus: iMac, Apple Inc., 41.4
running background tasks through dramatic does not work
I'm trying to run background task processing, redis and rabbitMQ work in separate docker containers #dramatiq.actor(store_results=True) def count_words(url): try: response = requests.get(url) count = len(response.text.split(" ")) print(f"There are {count} words at {url!r}.") except requests.exceptions.MissingSchema: print(f"Message dropped due to invalid url: {url!r}") result_backend = RedisBackend(host="172.17.0.2", port=6379) result_broker = RabbitmqBroker(host="172.17.0.5", port=5672) result_broker.add_middleware(Results(backend=result_backend)) dramatiq.set_broker(result_broker) message = count_words.send('https://github.com/Bogdanp/dramatiq') print(message.get_result(block=True)) RabbitMQ: {"queue_name":"default","actor_name":"count_words","args":["https://github.com/Bogdanp/dramatiq"],"kwargs":{},"options":{},"message_id":"8e10b6ef-dfef-47dc-9f28-c6e07493efe4","message_timestamp":1608877514655} Redis 1:C 22 Dec 2020 13:38:15.415 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf 1:M 22 Dec 2020 13:38:15.417 * Running mode=standalone, port=6379. 1:M 22 Dec 2020 13:38:15.417 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1:M 22 Dec 2020 13:38:15.417 # Server initialized 1:M 22 Dec 2020 13:38:15.417 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 1:M 22 Dec 2020 13:38:15.417 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never'). 1:M 25 Dec 2020 10:08:12.274 * Background saving terminated with success 1:M 26 Dec 2020 19:23:59.445 * 1 changes in 3600 seconds. Saving... 1:M 26 Dec 2020 19:23:59.660 * Background saving started by pid 24 24:C 26 Dec 2020 19:23:59.890 * DB saved on disk 24:C 26 Dec 2020 19:23:59.905 * RDB: 4 MB of memory used by copy-on-write 1:M 26 Dec 2020 19:23:59.961 * Background saving terminated with success Error: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/dramatiq/message.py", line 147, in get_result return backend.get_result(self, block=block, timeout=timeout) File "/usr/local/lib/python3.6/dist-packages/dramatiq/results/backends/redis.py", line 81, in get_result raise ResultTimeout(message) dramatiq.results.errors.ResultTimeout: count_words('https://github.com/Bogdanp/dramatiq')
Ceph-rgw Service stop automatically after installation
in my local cluster (4 Raspberry PIs) i try to configure a rgw gateway. Unfortunately the services disappears automatically after 2 minutes. [ceph_deploy.rgw][INFO ] The Ceph Object Gateway (RGW) is now running on host OSD1 and default port 7480 cephuser#admin:~/mycluster $ ceph -s cluster: id: 745d44c2-86dd-4b2f-9c9c-ab50160ea353 health: HEALTH_WARN too few PGs per OSD (24 < min 30) services: mon: 1 daemons, quorum admin mgr: admin(active) osd: 4 osds: 4 up, 4 in rgw: 1 daemon active data: pools: 4 pools, 32 pgs objects: 80 objects, 1.09KiB usage: 4.01GiB used, 93.6GiB / 97.6GiB avail pgs: 32 active+clean io: client: 5.83KiB/s rd, 0B/s wr, 7op/s rd, 1op/s wr After one minute the service(rgw: 1 daemon active) is no longer visible: cephuser#admin:~/mycluster $ ceph -s cluster: id: 745d44c2-86dd-4b2f-9c9c-ab50160ea353 health: HEALTH_WARN too few PGs per OSD (24 < min 30) services: mon: 1 daemons, quorum admin mgr: admin(active) osd: 4 osds: 4 up, 4 in data: pools: 4 pools, 32 pgs objects: 80 objects, 1.09KiB usage: 4.01GiB used, 93.6GiB / 97.6GiB avail pgs: 32 active+clean Many thanks for the help
Solution: On the gateway node, open the Ceph configuration file in the /etc/ceph/ directory. Find an RGW client section similar to the example: [client.rgw.gateway-node1] host = gateway-node1 keyring = /var/lib/ceph/radosgw/ceph-rgw.gateway-node1/keyring log file = /var/log/ceph/ceph-rgw-gateway-node1.log rgw frontends = civetweb port=192.168.178.50:8080 num_threads=100 https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index