giving a query as part of a uri in autobench - autobench

I am using autobench for doing becnhmark. An example of autobench command is as shown below.
autobench --single_host --host1 testhost.foo.com --uri1 /index.html --quiet
--timeout 5 --low_rate 20 --high_rate 200 --rate_step 20 --num_call 10
--num_conn 5000 --file bench.tsv**
The uri which I have to specify has a query attached to it. When I run the command which has the query, I get the following result
dem_req_rate req_rate_localhost con_rate_localhost min_rep_rate_localhost avg_rep_rate_localhost max_rep_rate_localhost stddev_rep_rate_localhost resp_time_localhost net_io_localhost errors_localhost
200 0 20 0 0 0 0 0 0 101
400 0 40 0 0 0 0 0 0 101
600 0 60 0 0 0 0 0 0 101
800 0 80 0 0 0 0 0 0 101
1000 0 100 0 0 0 0 0 0 101
1200 0 120 0 0 0 0 0 0 101
1400 0 140 0 0 0 0 0 0 101
1600 0 160 0 0 0 0 0 0 101
1800 0 180 0 0 0 0 0 0 101
2000 0 200 0 0 0 0 0 0 101
The query request, response are all zeroes. Can anybody please tell me how to give a query as part of the uri?
Thank you in advance

It worked for me when I surrounded the uri containing the query string in single quotes. Something like:
uri1 '/my/uri/query?string'

Related

Parse .log file to collect data after keyword then create a nested dictionary using predefined column names

I am trying to parse a .log file using python for the Status information about processes of a linux system. The .log file has a lot of different sections of information, the sections of interest start with "##START-ALLPROCESSES-xxxxxxxx" where x is the epoch date and end with '##END-ALLPROCESSES-xxxxxxx". After this line each process is listed with 52 columns each, the number of processes may change depending on the info recorded at the time, and there may be multiple sections with this information at different times.
The idea is to open the .log file, find the sections and then use the XXXXXXX as the key for a nested dictionary where the keys are the predefined column dates filled in with the values from the section, and do this for all different sections that would be found on the .log fie. The nested dictionary would look something like below
[date1-XXXXXX:
[ columnName1: process1,
.
.
.
columnName52: info1
],
.
.
.
[ columnName1: process52,
.
.
.
columName52: info52
]
],
[date2-XXXXXX:
[ columnName1: process1,
.
.
.
columnName52: info1
],
.
.
.
[ columnName1: process52,
.
.
.
columName52: info52
]
]
The data in the .log file looks as follow and would have multiple sections as this but with a different date each line starts with the process id and (process name)
##START-ALLPROCESSES-1676652419
1 (systemd) S 0 1 1 0 -1 4210944 2070278 9743969773 2070 2703 8811 11984 7638026 9190549 20 0 1 0 0 160043008 745 18446744073709551615 187650352414720 187650353516788 281474853505456 0 0 0 671173123 4096 1260 1 0 0 17 0 0 0 2706 0 0 187650353585800 187650353845340 187651263758336 281474853506734 281474853506745 281474853506745 281474853507053 0
10 (rcu_bh) I 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 2 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10251 (kworker/1:2) I 2 0 0 0 -1 69238880 0 0 0 0 0 914 0 0 20 0 1 0 617684776 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
10299 (loop2) S 2 0 0 0 -1 3178560 0 0 0 0 0 24 0 0 0 -20 1 0 10871 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 169 0 0 0 0 0 0 0 0 0 0
10648 (kworker/2:0) I 2 0 0 0 -1 69238880 0 0 0 0 0 567 0 0 20 0 1 0 663634994 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
1082 (nvme-wq) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 0 -20 1 0 109 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 3 0 0 0 0 0 0 0 0 0 0 0 0 0
1095 (scsi_eh_0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 3 0 0 0 0 0 0 0 0 0 0 0 0 0
1096 (scsi_tmf_0) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 0 -20 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1099 (scsi_eh_1) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 (migration/0) S 2 0 0 0 -1 69238848 0 0 0 0 0 4961 0 0 -100 0 1 0 2 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 99 1 0 0 0 0 0 0 0 0 0 0 0
1100 (scsi_tmf_1) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 0 -20 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##END-ALLPROCESSES-1676652419
I have tried it multiple ways but I cannot seem to get it to go correctly, my last attempt
columns = ['pid', 'comm', 'state', 'ppid', 'pgrp', 'session', 'tty_nr', 'tpgid', 'flags', 'minflt', 'cminflt', 'majflt', 'cmajflt', 'utime', 'stime',
'cutime', 'cstime', 'priority', 'nice', 'num_threads', 'itrealvalue', 'starttime', 'vsize', 'rss', 'rsslim', 'startcode', 'endcode', 'startstack', 'kstkesp',
'kstkeip', 'signal', 'blocked', 'sigignore', 'sigcatch', 'wchan', 'nswap', 'cnswap', 'exit_signal', 'processor', 'rt_priority', 'policy', 'delayacct_blkio_ticks',
'guest_time', 'cguest_time', 'start_data', 'end_data', 'start_brk', 'arg_start', 'arg_end', 'env_start', 'env_end', 'exit_code' ]
for file in os.listdir(dir):
if file.endswith('.log'):
with open(file, 'r') as f:
data = f.read()
data = data.split('##START-ALLPROCESSES-')
data = data[1:]
for i in range(len(data)):
data[i] = data[i].split('##END-ALLPROCESSES-')
data[i] = data[i][0]
data[i] = re.split('\r', data[i])
data[i] = data[i][0]
data[i] = re.split('\n', data[i])
for j in range(len(data[i])):
data[i][j] = re.split('\s+', data[i][j])
#print(data[i])
data[i][0] = str(data[i][0])
data_dict = {}
for i in range(len(data)):
data_dict[data[i][0]] = {}
for j in range(len(columns)):
data_dict[data[i][0]][columns[j]] = data[i][j+1]
print(data_dict)
I converted the epoch date into a str as I was getting unhashable list errors, however that made it so the epoch date shows as a key but each column now has the entire list for the 52 columms of information as a single one, so definitely I am missing something
To solve this problem, you could follow the following steps:
Open the .log file and read the contents
Search for all the sections of interest by finding lines that start with "##START-ALLPROCESSES-" and end with "##END-ALLPROCESSES-"
For each section found, extract the epoch date and create a dictionary with an empty list for each of the 52 columns
Iterate over the lines within the section and split the line into the 52 columns using space as a separator. Add the values to the corresponding list in the dictionary created in step 3
Repeat steps 3 and 4 for all the sections found in the .log file
Return the final nested dictionary
Here is some sample code that implements these steps:
import re
def parse_log_file(log_file_path):
with open(log_file_path, 'r') as log_file:
log_contents = log_file.read()
sections = re.findall(r'##START-ALLPROCESSES-(.*?)##END-ALLPROCESSES-', log_contents, re.DOTALL)
nested_dict = {}
for section in sections:
lines = section.strip().split('\n')
epoch_date = lines[0].split('-')[-1]
column_names = ['column{}'.format(i) for i in range(1, 53)]
section_dict = {column_name: [] for column_name in column_names}
for line in lines[1:]:
values = line.strip().split()
for i, value in enumerate(values):
section_dict[column_names[i]].append(value)
nested_dict['date{}-{}'.format(epoch_date, len(section_dict['column1']))] = section_dict
return nested_dict
You can call this function by passing the path to the .log file as an argument. The function returns the nested dictionary described in the problem statement.

re-indexing a dataset that has empty rows that are being transformed as columns using pivot

I'm needing to separate a row into multiple columns, for a previous post was able to separate that, but some of the rows are empty and because of that, I get this error:
ValueError: Index contains duplicate entries, cannot reshape
here is a sample dataset to mock up this issue:
myData = [['Abc: 9.22 Mno: 6.90 IExplorer 0.00 OCa: 0.00 Foo: 0.00'],
['Abc: 0.61 Mno: 0.14'],
[''],
['MCheese: (37.20) dimes: (186.02) Feria: (1,586.02)'],
['Abc: 16.76 Mno: 4.25 OMG: 63.19'],
['yonka: 19.27'],
['Dome: (552.23)'],
['Fray: 2,584.96'],
['CC: (83.31)'],
[''],
['Abc: 307.34 Mno: 18.40 Feria: 509.67'],
['IExplorer: 26.28 OCa: 26.28 Foo: 730.68'],
['Abc: 122.66 Mno: 11.85 Feria: 213.24'],
[''],
['Wonka: (13.67) Fray: (1,922.48)'],
['Mno: 18.19 IExplorer: 0.00 OCa: 0.00 Foo: 0.00'],
['Abc: 74.06 Mno: 12.34 Feria: 124.42 MCheese: (4.07)'],
[''],
['Abc: 45.96 Mno: 18.98 IExplorer: 0.00 OCa: 0.00 Foo: 0.00'],
['IExplorer: 0.00 OCa: 0.00 Dome: (166.35) Foo: 0.00'],
['']]
df7 = pd.DataFrame(myData)
df7.columns = ['Original']
df7['Original'] = df7['Original'].str.replace(" ","")
df7['Original']
after separating the columns with a regex from a previous post I get results:
df8 = df7['Original'].str.extractall(r'^(.*?):([\(\)(\,)0-9.]+)').reset_index().fillna(0)
df8 = df8.pivot(index='level_0', columns=0, values=1).rename_axis(index=None, columns=None).fillna(0)
df8
this gives me this result:
Abc CC Dome Fries IExplorer MCheese Mno Wonka yonka
0 9.22 0 0 0 0 0 0 0 0
1 0.61 0 0 0 0 0 0 0 0
3 0 0 0 0 0 (37.20) 0 0 0
4 16.76 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 19.27
6 0 0 (552.23) 0 0 0 0 0 0
7 0 0 0 2,584.96 0 0 0 0 0
8 0 (83.31) 0 0 0 0 0 0 0
10 307.34 0 0 0 0 0 0 0 0
11 0 0 0 0 26.28 0 0 0 0
12 122.66 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 (13.67) 0
15 0 0 0 0 0 0 18.19 0 0
16 74.06 0 0 0 0 0 0 0 0
18 45.96 0 0 0 0 0 0 0 0
19 0 0 0 0 0.00 0 0 0 0
if I change the regex the number of columns increase but I do not get the entirety of the dataset. This second part for this particular sample gives me more columns with this last snippet code.
df8 = df7['Original'].str.extractall(r'(.*?):([\(\)(\,)0-9.]+)').reset_index().fillna(0)
df8 = df8.pivot(index='level_0', columns=0, values=1).rename_axis(index=None, columns=None).fillna(0)
df8
Although, in my particular case the first line gives me more columns than the second one. However none of them count the empty rows.
Is there any way I can count those empty rows within the dateset whenever it finds an empty row? in total there 21 rows, I can only get to 19 shown and count.
We can use str.findall to find all the matching occurrences of regex pattern from each row, then map the occurrences to dict and create a new dataframe. This approach will avoid re-indexing the dataframe. Further you also have to fix your regex pattern to properly capture matching pairs.
s = df7['Original'].str.findall(r'([^:0-9]+):\(?([0-9.,]+)\)?')
df_out = pd.DataFrame(map(dict, s), index=s.index).fillna(0)
>>> df_out
Abc Mno OCa Foo MCheese dimes Feria OMG yonka Dome Fray CC IExplorer Wonka
0 9.22 6.90 0.00 0.00 0 0 0 0 0 0 0 0 0 0
1 0.61 0.14 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 37.20 186.02 1,586.02 0 0 0 0 0 0 0
4 16.76 4.25 0 0 0 0 0 63.19 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 19.27 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 552.23 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 2,584.96 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 83.31 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 307.34 18.40 0 0 0 0 509.67 0 0 0 0 0 0 0
11 0 0 26.28 730.68 0 0 0 0 0 0 0 0 26.28 0
12 122.66 11.85 0 0 0 0 213.24 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 1,922.48 0 0 13.67
15 0 18.19 0.00 0.00 0 0 0 0 0 0 0 0 0.00 0
16 74.06 12.34 0 0 4.07 0 124.42 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 45.96 18.98 0.00 0.00 0 0 0 0 0 0 0 0 0.00 0
19 0 0 0.00 0.00 0 0 0 0 0 166.35 0 0 0.00 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0

how to find combinations present in different columns

I have dataset with sample and shop names. I am trying to figure out a way to calculate proportion of shops that sell a combination of samples. For example, sample 12,13 and 22 are available in shop2,3 and 4. Like wise, sample6,7,8,9,10, 16 and 17 is available in shop1.
The dataset i have is very large with 9000 columns and 26 rows. Here what i show is just a small dataset. What i want to do is to figure a way to screen the table for all possible combination of samples present in shops (if >0) and print out in a dictionary, for example sample12_sample13_sample22:[shop2,shop3,shop4] and List out all possible combinations that are available.
Sorry that I could not figure out how to do this, so i do not have any code, right now.
What approach should i use here?
Any help is appreciated.
Thanks!
Name Shop1 Shop2 Shop3 Shop4
Sample1 0 0 0 0
Sample2 0 0 0 0
Sample3 0 0 0 0
Sample4 0 0 0 0
Sample5 0 0 0 0
Sample6 1 0 0 0
Sample7 4 0 0 0
Sample8 12 0 0 0
Sample9 1 0 0 0
Sample10 1 0 0 0
Sample11 0 0 0 0
Sample12 0 5 21 233
Sample13 0 8 36 397
Sample14 0 4 0 0
Sample15 0 0 0 0
Sample16 2 0 0 0
Sample17 17 0 0 0
Sample18 0 0 0 0
Sample19 0 0 0 0
Sample20 0 0 0 0
Sample21 0 0 0 0
Sample22 0 1 20 127
What we can do is melt then we groupby twice
s = df.melt('Name')
s = s[s.value!=0]
s = s.groupby('Name')['variable'].agg([','.join,'count'])
out = s[s['count']>1].reset_index().groupby('join')['Name'].agg(','.join)
out
Out[104]:
join
Shop2,Shop3,Shop4 Sample12,Sample13,Sample22
Name: Name, dtype: object

How to interpret such value of the time column in /proc/self/mountstats - does it indicate a performance issue?

I have some bladefs volume and I just checked /proc/self/mountstats where I see statistics per operations:
...
opts: rw,vers=3,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.2.100,mountvers=3,mountport=903,mountproto=tcp,local_lock=all
age: 18129
caps: caps=0x3fc7,wtmult=512,dtsize=32768,bsize=0,namlen=255
sec: flavor=1,pseudoflavor=1
events: 18840 116049 23 5808 22138 21048 146984 13896 287 2181 0 7560 31380 0 9565 5106 0 6471 0 0 13896 0 0 0 0 0 0
bytes: 339548407 48622919 0 0 311167118 48622919 76846 13896
RPC iostats version: 1.0 p/v: 100003/3 (nfs)
xprt: tcp 875 1 7 0 0 85765 85764 1 206637 0 37 1776 35298
per-op statistics
NULL: 0 0 0 0 0 0 0 0
GETATTR: 18840 18840 0 2336164 2110080 92 8027 8817
SETATTR: 0 0 0 0 0 0 0 0
LOOKUP: 21391 21392 0 3877744 4562876 118 103403 105518
ACCESS: 20183 20188 0 2584304 2421960 72 10122 10850
READLINK: 0 0 0 0 0 0 0 0
READ: 3425 3425 0 465848 311606600 340 97323 97924
WRITE: 2422 2422 0 48975488 387520 763 200645 201522
CREATE: 2616 2616 0 447392 701088 21 870 1088
MKDIR: 858 858 0 188760 229944 8 573 705
SYMLINK: 0 0 0 0 0 0 0 0
MKNOD: 0 0 0 0 0 0 0 0
REMOVE: 47 47 0 6440 6768 0 8 76
RMDIR: 23 23 0 4876 3312 0 3 5
RENAME: 23 23 0 7176 5980 0 5 6
LINK: 0 0 0 0 0 0 0 0
READDIR: 160 160 0 23040 4987464 0 16139 16142
READDIRPLUS: 15703 15703 0 2324044 8493604 43 1041634 1041907
FSSTAT: 1 1 0 124 168 0 0 0
FSINFO: 2 2 0 248 328 0 0 0
PATHCONF: 1 1 0 124 140 0 0 0
COMMIT: 68 68 0 9248 10336 2 272 275...
about my bladefs. I am interested in READ operation statistics. As I know the last column (97924) means:
execute: How long ops of this type take to execute (from
rpc_init_task to rpc_exit_task) (microsecond)
How to interpret this? Is it the average time of each read operation regardless of the block size? I have very strong suspicion that I have problems with NFS: am I right? The value of 0.1 sec looks bad for me, but I am not sure how exactly to interpret this time: average, some sum...?
After reading the kernel source, the statistics are printed from net/sunrpc/stats.c rpc_clnt_show_stats() and the 8th column of per-op statistics statistics seems to printed from _print_rpc_iostats, it's printing struct rpc_iostats member om_execute. (The newest kernel has 9 columns with errors on the last column.)
That member looks to be only referenced/actually changed in rpc_count_iostats_metrics with:
execute = ktime_sub(now, task->tk_start);
op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
Assuming ktime_add does what it says, the value of om_execute only increases. So the 8th column of mountstats would be the sum of the time of operations of this type.

How can I determine the current CPU utilization from the shell? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
How can I determine the current CPU utilization from the shell in Linux?
For example, I get the load average like so:
cat /proc/loadavg
Outputs:
0.18 0.48 0.46 4/234 30719
Linux does not have any system variables that give the current CPU utilization. Instead, you have to read /proc/stat several times: each column in the cpu(n) lines gives the total CPU time, and you have to take subsequent readings of it to get percentages. See this document to find out what the various columns mean.
You can use top or ps commands to check the CPU usage.
using top : This will show you the cpu stats
top -b -n 1 |grep ^Cpu
using ps: This will show you the % cpu usage for each process.
ps -eo pcpu,pid,user,args | sort -r -k1 | less
Also, you can write a small script in bash or perl to read /proc/stat and calculate the CPU usage.
The command uptime gives you load averages for the past 1, 5, and 15 minutes.
Try this command:
$ top
http://www.cyberciti.biz/tips/how-do-i-find-out-linux-cpu-utilization.html
Try this command:
cat /proc/stat
This will be something like this:
cpu 55366 271 17283 75381807 22953 13468 94542 0
cpu0 3374 0 2187 9462432 1393 2 665 0
cpu1 2074 12 1314 9459589 841 2 43 0
cpu2 1664 0 1109 9447191 666 1 571 0
cpu3 864 0 716 9429250 387 2 118 0
cpu4 27667 110 5553 9358851 13900 2598 21784 0
cpu5 16625 146 2861 9388654 4556 4026 24979 0
cpu6 1790 0 1836 9436782 480 3307 19623 0
cpu7 1306 0 1702 9399053 726 3529 26756 0
intr 4421041070 559 10 0 4 5 0 0 0 26 0 0 0 111 0 129692 0 0 0 0 0 95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 369 91027 1580921706 1277926101 570026630 991666971 0 277768 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 8097121
btime 1251365089
processes 63692
procs_running 2
procs_blocked 0
More details:
http://www.mail-archive.com/linuxkernelnewbies#googlegroups.com/msg01690.html
http://www.linuxhowtos.org/System/procstat.htm
Maybe something like this
ps -eo pid,pcpu,comm
And if you like to parse and maybe only look at some processes.
#!/bin/sh
ps -eo pid,pcpu,comm | awk '{if ($2 > 4) print }' >> ~/ps_eo_test.txt
You need to sample the load average for several seconds and calculate the CPU utilization from that. If unsure what to you, get the sources of "top" and read it.

Resources