Fio results are steadily increasing IOPS, not what I expected - linux

I'm trying to somehow test my rbd storage with random read, random write, mixed randrw, but the output is not correct, it is a sequential growing number.
What is wrong with my steps?
This is the fio file that I ran:
; fio-rand-write.job for fiotest
[global]
name=fio-rand-write
filename=fio-rand-write
rw=randwrite
bs=4K
direct=1
write_iops_log=rand-read
[file1]
size=1G
ioengine=libaio
iodepth=16
And the result is this:
head rand-read_iops.1.log
2, 1, 1, 4096, 0
2, 1, 1, 4096, 0
2, 1, 1, 4096, 0
2, 1, 1, 4096, 0
2, 1, 1, 4096, 0
3, 1, 1, 4096, 0
4, 1, 1, 4096, 0
5, 1, 1, 4096, 0
5, 1, 1, 4096, 0
5, 1, 1, 4096, 0
tail rand-read_iops.1.log
30700, 1, 1, 4096, 0
30700, 1, 1, 4096, 0
30700, 1, 1, 4096, 0
30700, 1, 1, 4096, 0
30700, 1, 1, 4096, 0
30700, 1, 1, 4096, 0
30700, 1, 1, 4096, 0
30700, 1, 1, 4096, 0
30700, 1, 1, 4096, 0
30700, 1, 1, 4096, 0
I'm using fio 3.18.
Why I don't get the iops that is the real one?

(Note this isn't really a programming question so Stackoverflow is the wrong place to ask this... Maybe Super User or Serverfault would be a better choice and get faster answers?)
but the output is not correct, it is a sequential growing number
Which column are you referring to? If you mean the left most column then isn't that time per the fio Log File Formats documentation?
Fio supports a variety of log file formats, for logging latencies, bandwidth, and IOPS. The logs share a common format, which looks like this:
time (msec), value, data direction, block size (bytes), offset (bytes)
Doesn't time generally monotonically increase relative to prior readings (accounting for precision)?
Also see the documentation for write_iops_log that says:
Because fio defaults to individual I/O logging, the value entry in the IOPS log will be 1 unless windowed logging (see log_avg_msec) has been enabled

Related

How to make a checkerboard in Pytorch?

I see that a simple checkerboard pattern can be created fairly concisely with numpy Does anyone know if a checkerboard where each square may contain multiple values could be created? E.g.:
1 1 0 0 1 1
1 1 0 0 1 1
0 0 1 1 0 0
0 0 1 1 0 0
Although there is no equivalent of np.indices in PyTorch, you can still find a workaround using a combination of torch.arange, torch.meshgrid, and torch.stack:
def indices(h,w):
return torch.stack(torch.meshgrid(torch.arange(h), torch.arange(w)))
This allows you to define a base tensor with a checkboard pattern following your linked post:
>>> base = indices(2,3).sum(axis=0) % 2
tensor([[0, 1, 0],
[1, 0, 1]])
Then you can repeat the row end columns with torch.repeat_interleave:
>>> base.repeat_interleave(2, dim=0).repeat_interleave(2, dim=1)
tensor([[0, 0, 1, 1, 0, 0],
[0, 0, 1, 1, 0, 0],
[1, 1, 0, 0, 1, 1],
[1, 1, 0, 0, 1, 1]])
And you can take the opposite of a given checkboard x by computing 1-x.
So you could define a function like this:
def checkerboard(shape, k):
"""
shape: dimensions of output tensor
k: edge size of square
"""
h, w = shape
base = indices(h//k, w//k).sum(dim=0) % 2
x = base.repeat_interleave(k, 0).repeat_interleave(k, 1)
return 1-x
And try with:
>>> checkerboard((4,6), 2)
tensor([[1, 1, 0, 0, 1, 1],
[1, 1, 0, 0, 1, 1],
[0, 0, 1, 1, 0, 0],
[0, 0, 1, 1, 0, 0]])

Confusion Matrix: ValueError: Classification metrics can't handle a mix of unknown and multiclass targets

I have a long script, but the key point is here:
result = confusion_matrix(y_test, ypred)
where y_test is
>>> y_test
ZFFYZTN 3
ZDDKDTY 0
ZTYKTYKD 0
ZYNDQNDK 1
ZYZQNKQN 3
..
ZYMDDTM 3
ZYLNYFLM 0
ZTNTKDY 0
ZYYLZNKM 3
ZYZMQTZT 0
Name: BT, Length: 91, dtype: object
and the values are
>>> y_test.values
array([3, 0, 0, 1, 3, 0, 0, 1, 0, 3, 1, 0, 3, 1, 0, 0, 3, 0, 3, 0, 0, 0,
1, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 2, 3, 3, 0, 0, 3, 3, 1, 1, 0, 2,
0, 0, 0, 3, 3, 3, 1, 0, 3, 3, 3, 2, 3, 3, 0, 1, 0, 3, 3, 0, 0, 0,
0, 0, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 3, 2, 0, 0, 0, 3, 3, 3, 0,
0, 3, 0], dtype=object)
and ypred is
>>> ypred
array([3, 0, 0, 1, 3, 0, 0, 1, 0, 3, 1, 0, 3, 1, 0, 0, 3, 0, 3, 0, 0, 0,
1, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 2, 3, 3, 0, 0, 3, 3, 1, 1, 0, 2,
0, 0, 0, 3, 3, 3, 1, 0, 3, 3, 3, 2, 3, 3, 0, 1, 0, 3, 3, 0, 0, 0,
0, 0, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 3, 2, 0, 0, 0, 3, 3, 3, 0,
0, 3, 0])
gives
raise ValueError("Classification metrics can't handle a mix of {0} "
ValueError: Classification metrics can't handle a mix of unknown and multiclass targets
The confusing part is that I don't see any unknown targets.
so I checked out ValueError: Classification metrics can't handle a mix of unknown and binary targets but the solution there doesn't apply in my case, because all values are integers.
I've also checked Skitlearn MLPClassifier ValueError: Can't handle mix of multiclass and multilabel-indicator but there aren't any encodings in my data.
What can I do to get the confusion matrix and avoid these errors?
This error is due to confusing types.
The solution is to cast y_test values as a list to confusion_matrix:
result = confusion_matrix(list(y_test.values), ypred)

Finding the position of the median of an array containing mostly zeros

I have a very large 1d array with most elements being zero while nonzero elements are all clustered around some few islands separated by many zeros: (here is a smaller version of that for the purpose of a MWE)
In [1]: import numpy as np
In [2]: A=np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,6,20,14,10,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,4,5,5,18,18,16,14,10,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,3,3,6,16,4,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
I want to find the median and its position (even approximately) in terms of the index corresponding to the median value of each island. Not surprisingly, I am getting zero which is not what I desire:
In [3]: np.median(A)
Out[3]: 0.0
In [4]: np.argsort(A)[len(A)//2]
Out[4]: 12
In the case of a single island of nonzero elements, to work around this caveat and meet my requirement that only nonzero elements are physically meaningful, I remove all zeros first and then take the median of the remaining elements:
In [5]: masks = np.where(A>0)
In [6]: A[masks]
Out[6]: array([ 1, 3, 6, 20, 14, 10, 5, 1])
This time, I get the median of the new array correctly, however the position (index) would not be correct as it is evident and also pointed out in the comments as being ill-defined mathematically.
In [7]: np.median(A[masks])
Out[7]: 5.5
In [8]: np.argsort(A[masks])[len(A[masks])//2]
Out[8]: 2
According to this approximation, I know that real median is located in the third index of the modified array but I would like to translate it back into the format of the original array where the position (index) of the median should be somewhere in the middle of the first island of the nonzero elements corresponding to a larger index (where indices of zeros are all counted correctly). Also answered in the comments are two suggestions made to come up with the position of the median given one island of nonzero elements in the middle of a sea of zeros. But what if there is more than one such island? How could possibly one calculate the index corresponding to median of each island in the context of the original histogram array where zeros are all counted?
I am wondering if there is any easy way to calculate the position of the median in such arrays of many zeros. If not, what else should I add to my lines of code to make that possible after knowing the position in the modified array? Your help is great appreciated.
Based on the comment "A is actually a discrete histogram with many bins", I think what you want is the median of the values being counted. If A is an integer array of counts, then an exact (but probably very inefficient, if you have values as high as 1e7) formula for the median is
np.median(np.repeat(np.arange(len(A)), A)) # Do not use if A contains very large values!
Alternatively, you can use
np.searchsorted(A.cumsum(), 0.5*A.sum())
which will be the integer part of the median.
For example:
In [157]: A
Out[157]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3,
6, 20, 14, 10, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0])
In [158]: np.median(np.repeat(np.arange(len(A)), A))
Out[158]: 35.5
In [159]: np.searchsorted(A.cumsum(), 0.5*A.sum())
Out[159]: 35
Another example:
In [167]: B
Out[167]:
array([ 0, 0, 0, 1, 100, 21, 8, 3, 2, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
In [168]: np.median(np.repeat(np.arange(len(B)), B))
Out[168]: 4.0
In [169]: np.searchsorted(B.cumsum(), 0.5*B.sum())
Out[169]: 4

Creating a dictionary {names : list of values} from this .txt file?

I have a .txt file that has names followed by numbers. For Example
namesToRatings = {}
with open("example.txt") as document:
for line in document:
print(line)
Would output:
Simon
5 0 0 0 0 0 0 1 0 1 -3 5 0 0 0 5 5 0 0 0 0 5 0 0 0 0 0 0 0 0 1 3 0 1 0 -5 0
0 5 5 0 5 5 5 0 5 5 0 0 0 5 5 5 5 -5
John
5 5 0 0 0 0 3 0 0 1 0 5 3 0 5 0 3 3 5 0 0 0 0 0 5 0 0 0 0 0 3 5 0 0 0 0 0 5
-3 0 0 0 5 0 0 0 0 0 0 5 5 0 3 0 0
and so on...
How do I create a dictionary with the key being the name of the person and the value being a list of the numbers following that name?
E.g {Simon : [5, 0, 0, 0, 0,......... 5, 5, -5]
with open("example.txt") as document:
lines = [line for line in document if len(line.strip())]
namesToRatings = {lines[i] : lines[i+1].split(" ") for i in range(0, len(lines), 2)}
print(namesToRatings) # print it, return it from a function, or set it as a global if you really must.
You can use a regex:
import re
di={}
with open('file.txt') as f:
for m in re.finditer(r'^([a-zA-Z]+)\s+([-\d\s]+)', f.read(), re.M):
di[m.group(1)]=m.group(2).split()
Try using the get_close_matches on the difference library.
Save the dictionary words and meanings in a JSON file, then it will be in the Form of a python dictionary.
import json
from difflib import get_close_matches
data=json.load(open('filePath'))
def check_word(word) :
word=word.lower()
for word in data:
return data
If len(get_close_matches(word, data.keys(), cutoff=0.7))>0:
return get_close_matches(word, data.keys(), cutoff=0.7)
You can add more exceptions here...
with open("example.txt") as document:
lines = (line.strip() for line in document)
lines = (line for line in lines if line)
pairs = zip(*[lines]*2)
namesToRatings = {name: [int(x) for x in values.split()] for name, values in pairs}
This version is similar to the list based approach outlined in John's answer, but doesn't require reading the entire file into a list. The zip(*[lines]*2) will split the input (the lines) into pairs. Output:
{
'Simon': [5, 0, 0, 0, 0, 0, 0, 1, 0, 1, -3, 5, 0, 0, 0, 5, 5, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 1, 0, -5, 0, 0, 5, 5, 0, 5, 5, 5, 0, 5, 5, 0, 0, 0, 5, 5, 5, 5, -5],
'John': [5, 5, 0, 0, 0, 0, 3, 0, 0, 1, 0, 5, 3, 0, 5, 0, 3, 3, 5, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 3, 5, 0, 0, 0, 0, 0, 5, -3, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 5, 5, 0, 3, 0, 0]
}

How to define a row and a column on N-queen program?

I have to write a code for the N-queen chess board problem. I understand the theory behind it but dont understand how I should code it. In this exercise 0's represent spaces, and 1's represent queens)
so far I have only written:
import numpy as np
board=np.zeros((8,8))
board[0:,0]=1
Following this, I want to define what the rows in this board are and what the columns in this board are. So I am able to define collisions between the queens on the board.
Thank you.
I don't know how much I should be helping you (sounds like a homework), but my curiosity was piqued. So here's a preliminary exploration:
Representing a board as a 8x8 array of 0/1 is easy:
In [1783]: B=np.zeros((8,8),int)
But since a solution requires 1 queen per row, and only 1 per column, I can represent it as just a permutation of the column numbers. Looking online I found a solution, which I can enter as:
In [1784]: sol1=[2,5,1,6,0,3,7,4]
I can map that onto the board with:
In [1785]: B[np.arange(8),sol1]=1
In [1786]: B # easy display
Out[1786]:
array([[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0]])
How about testing this? Row and column sums are easy with numpy. For a valid solution these must all be 1:
In [1787]: B.sum(axis=0)
Out[1787]: array([1, 1, 1, 1, 1, 1, 1, 1])
In [1788]: B.sum(axis=1)
Out[1788]: array([1, 1, 1, 1, 1, 1, 1, 1])
Diagonals differ in length, but can also be summed
In [1789]: np.diag(B,0)
Out[1789]: array([0, 0, 0, 0, 0, 0, 0, 0])
and to look at the other diagonals, 'flip' columns:
In [1790]: np.diag(B[:,::-1],1)
Out[1790]: array([0, 1, 0, 0, 0, 0, 0])
I can generate all diagonals with a list comprehension (not necessarily the fastest way, but easy to test):
In [1791]: [np.diag(B,i) for i in range(-7,8)]
Out[1791]:
[array([0]),
array([0, 0]),
array([0, 0, 0]),
array([1, 0, 0, 0]),
array([0, 0, 0, 0, 1]),
array([0, 0, 0, 1, 0, 0]),
array([0, 1, 0, 0, 0, 0, 0]),
array([0, 0, 0, 0, 0, 0, 0, 0]),
array([0, 0, 0, 0, 0, 0, 1]),
array([1, 0, 0, 0, 0, 0]),
array([0, 0, 0, 1, 0]),
array([0, 1, 0, 0]),
array([0, 0, 0]),
array([0, 0]),
array([0])]
and for the other direction, with sum:
In [1792]: [np.diag(B[:,::-1],i).sum() for i in range(-7,8)]
Out[1792]: [0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0]
No diagonal can have a sum >1, but some may be 0.
If the proposed solution is indeed a permutation of np.arange(8) then it is guaranteed to satisfy the row and column sum test. That just leaves the diagonal tests. The board mapping may be nice for display purposes, but it isn't required to represent the solution. And it might not be the best way to test the diagonals.
A brute force solution is to generate all permutations, and test each.
In [1796]: len(list(itertools.permutations(range(8))))
Out[1796]: 40320
There are, of course, smarter ways of generating and test solutions.
A few months ago I worked on a Sudoku puzzle question
Why is translated Sudoku solver slower than original?
the initial question was whether lists or arrays were a better representation. But I found, on an AI site, that an efficient, smart solver can be written with a dictionary.
There are quite a number of SO questions tagged Python and involving 8-queens. Fewer tagged with numpy as well.
==========
Your initial setup:
board[0:,0]=1
would pass the row sum test, fail the column sum test, and pass the diagonals tests.

Resources