More about binary search - search

I just learned binary search . I searched about its use.
Among other things I found :
Can be used in word suggestion for browser
May be used in application Akinator to find the person that you think ?? (Not sure http://en.akinator.com/ )
Testing processors ( http://cglab.ca/~morin/misc/arraylayout/ )
Debugging
Last three * I did not understand too much. Somebody can explain me please?
Besides, what other uses have?

Binary Search has O(logn) complexity.... which means that if there is a list of 'n' sorted elements, it takes logn comparisons to search for a number.
For example your list contains 20 integers: {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}...
And you want to search for any number in this list, (think of a number in this list). Now have a look at these set of questions:
Q1. Is 10 the number which you chose? - (y/n)?
-------- if yes then done. (I say "No")
if no then,
Q2. Is the number less than 10 - (y/n)?
-------- I say "No"; which means the number is greater than 10
Q3. Is 15 the number which you chose? - (y/n)?
-------- if yes then done. (I say "No")
Q4. Is the number less than 15? - (y/n)?
-------- I say "Yes"; So now its sure that the number is in between 10 and 15 (i.e., [11,14])
Q5. Is 12 the number which you chose? - (y/n)?
-------- if yes then done. (I say "No")
Q6. Is the number greater than 12? - (y/n)?
I say "Yes"; So now its sure that the number is either 13 or 14
Q7. Is the number 13? - (y/n)?
-------- I say "Yes" else its 14.
(Actually I had chosen 13 as my number)
So in total there are 20 elements, log of 20 (base 2) is 4.32 ~ 4.
So here are my comparisons:
1 st comparison: Is the number less than 10 - (y/n)?
2 nd comparison: Is the number less than 15? - (y/n)?
3 rd comparison: Is the number greater than 12? - (y/n)?
4 th comparison: Is the number 13? - (y/n)?
and i find the number 13 (or 14) in just 4 comparisons, which is true as logn = log20(base 2)=4.
Hence, Akinator application works like this. It works with a large amount of data, and predicts the answer based on the user's reply to Akinator's questions.
Reading through Decision Tree and Binary Search Tree might help you :)

Related

How to get delta percentage from /proc/schedstat

I am trying to get node CFS scheduler throttling in percent. For that i am reading 2 values 2 times (ignoring timeslices) from /proc/schedstat it has following format:
$ cat /proc/schedstat
version 15
timestamp 4297299139
cpu0 0 0 0 0 0 0 1145287047860 105917480368 8608857
CpuTime RunqTime
so i read from file, sleep for some time, read again, calculate time passed and value delta between, and calc percent then using following code:
cputTime := float64(delta.CpuTime) / delta.TimeDelta / 10000000
runqTime := float64(delta.RunqTime) / delta.TimeDelta / 10000000
percent := runqTime
the trick is that percent could be like 2000%
i assumed that runqtime is incremental, and is expressed in nanoseconds, so i divided it by 10^7 (to get it to 0-100% range), and timedelta is difference between measurements in seconds. what is wrong with it? how to do that properly?
I, for one, do not know how to interpret the output of /proc/schedstat.
You do quote an answer to a unix.stackexchange question, with a link to a mail in LKML that mentions a possible patch to the documentation.
However, "schedstat" is a term which is suspiciously missing from my local man proc page, and from the copies of man proc I could find on the internet. Actually, when searching for schedstat on Google, the results I get either do not mention the word "schedstat" (for example : I get links to copies of the man page, which mentions "sched" and "stat"), or non authoritative comments (fun fact : some of them quote that answer on stackexchange as a reference ...)
So at the moment : if I had to really understand what's in the output, I think I would try to read the code for my version of the kernel.
As far as "how do you compute delta ?", I understand what you intend to do, I had in mind something more like "what code have you written to do it ?".
By running cat /proc/schedstat; sleep 1 in a loop on my machine, I see that the "timestamp" entry is incremented by ~250 units on each iteration (so I honestly can't say what's the underlying unit for that field ...).
To compute delta.TimeDelta : do you use that field ? or do you take two instances of time.Now() ?
The other deltas are less ambiguous, I do imagine you took the difference between the counters you see :)
Do note that, on my mainly idle machine, I sometimes see increments higher than 10^9 over a second on these counters. So again : I do not know how to interpret these numbers.

Replace same character from python string but different replacement values [duplicate]

This question already has answers here:
Python Number Limit [duplicate]
(4 answers)
Closed 2 years ago.
Let's say I have this python string
>>> s = 'dog /superdog/ | cat /thundercat/'
Is there a way to like replace the character / (first one) with [ & second / with ].
I was thinking like an about like this.
Output:
'dog [superdog] | cat [thundercat]'
I tried doing like this but did not quite get that well.
>>> s = 'dog /superdog/ | cat /thundercat/'
>>> s.replace('/','[')
'dog [superdog[ | cat [thundercat['
I was thinking to know the best and pythonic way as possible. Thank you!
Python can handle arbitrarily large numbers because python has built-in arbitrary-precision integers. The limit is related to the amount of RAM memory Python can access. These built-in Long Integers arithmetic is implemented as an Integer object which is initially set to 32 bits for speed, and then start allocating memory on demand.
Integers are commonly stored using a word of memory, which is 4 bytes or 32 bits, so integers from 0 up to 4,294,967,295 (2e32 -1) can be stored.
But if your system has 1GB available to a python process, it will have 8589934592 bits to represent numbers, and you can use numbers like (2e8589934592 -1).
Computers can only handle numbers up to a certain size, but this is to be taken with some caveats.
2147483648 through 2147483647 are the limits of 32 bit numbers.
Most of todays computers can handle numbers of 64
bits, i.e. numbers from -9,223,372,036,854,775,808 to
9,223,372,036,854,775,807, or from −(2^63) to 2^63 − 1
It is possible to create a software that can handle arbitrary large numbers, as long as RAM or storage suffice. Those solutions are rather slow, but e.g. SSL encryption is based on numbers thousands of digits long.
As a side note, you are doubling your initial million in every iteration, not adding a million.

FIFO almost full and empty conditions Verilog

Suppose i am having a FIFO with depth 32 and width 8 bit.There is a valid bit A in all 32 locations.If this bit is 1 in all locations we have full condition and if 0 it will be empty condition.My Requirement is if this bit A at one location is 0 and all locations of this bit A is 1. when reaches to 30th location it should generate Almost_full condition.
Help me out please.
Thanks in Advance.
So you have a 32 bit vector and you want to check only one of the bits is 0. If speed is not much of a concern I will use a for loop to do this.
If speed is a concern I will get this done in 5 iterations. You can do this by divide and check method. Check two 16 bit words in parallel. Then divide this into two 8 bits and check them in parallel. And depending on where the zero is divide that particular 8 bit into 4 bits and check and so on.
If at any point you have zeros in both the parts, then you can exit the checking and conclude that almost_full = 0;

MInimal time to compute the minimal value

I was asked such question, what is the minimal time needed to compute the minimal value of an unsorted array of 32 integers, given that you have 8 cores and each comparison takes 1 minute. My solution is 6 minutes, assuming that each core operates independently. Divide the array into 8 portions, each has 4 integers, 8 cores concurrently compute the local min of each portion, takes 3 minutes, (3 comparisons in each portion). Then 4 cores to compute the local min of those 8 local mins, 1 minute. Then 2 cores to compute the 4 local mins, 1 minute, then 1 core to compute the global min among the remaining 2 mins, 1 minute. Therefore, the total amount is 6 minutes. However, it didn't seem to be the answer that the interviewer was looking for. So what do you guys think about it? Thank you
If you assume that the program is CPU-bound, which is fairly ridiculous, but seems to be where you were going with your analysis, then you need to decide how to divide the work to gain something by multithreading.
8 pieces of 4 integers each seems arbitrary. Interviewers usually like to see a thought process. Being mathematically general, let us compute total orderings over subsets of the problem. How hard is it to compute a total ordering, and what is the payoff?
Total ordering of N items, picking arbitrarily when two items are equal, requires N*(N-1)/2 comparisons and eliminates (N-1) items. Let's make a table.
N = 2: 1 comparison, 1 elimination.
N = 3: 3 comparisons, 2 eliminations.
N = 4: 6 comparisons, 3 eliminations.
Clearly it's most efficient to work with pairs (N = 2), but the other operations are useful if resources would otherwise be idle.
Minute 1-3: Eliminate 24 candidates using operations with N = 2, 8 at a time.
Minute 4: Now there are 8 candidates. Keeping N = 2 would leave 4 cores idle. Setting N = 3 uses 2 more cores per operation and yields 1 more elimination. So do two operations with N = 3 and one with N = 2, eliminating 2+2+1 = 5 candidates. Or, use 6 cores with N = 4 and two with N = 1 to eliminate 3+1+1 = 5. The result is the same.
Minute 5: Only 3 candidates remain, so set N = 3 for the last round.
If you keep the CPUs busy, it takes 5 minutes using a mix of two higher-level abstractions. More energy is spent because this isn't the most efficient way to solve the problem, but it is faster.
I'm going to assume that comparing two "integers" is a black box that takes 1 minute to complete, but we can cache those comparisons and only do any particular comparison once.
There's not much you can do until you're down to 8 candidates (3 minutes). But you don't want to leave cores sitting idle if you can help it. Let's say that the candidates are numbered 1 through 8. Then in minutes 4 you can compare:
1v2 3v4 5v6 7v8 AND 1v5 2v6 3v7 4v8
If we're lucky, this eliminates 6 candidates, and we can use minute 5 to to pick the winner.
If we're not lucky, this leaves 4 candidares (for example, 1, 3, 6, and 8), and that step didn't gain us anything over the original approach. In minute 5, we need to throw everything at it (to beat the original approach). But there are 8 cores, and C(4,2)=6 possible pairings. So we can make every possible comparison (and leave 2 cores idle), and get our winner in 5 minutes.
Those are really big integers, too big to fit into CPU cache, so multithreading doesn't really help you — this problem is I/O bound. (I suppose it depends on the specifics of the I/O bottleneck, but let's not pick nits.)
Since you need exactly N-1 comparisons, the answer is 31.

What do the numbers in /proc/loadavg mean on Linux?

When issuing this command on Linux:
# cat /proc/loadavg
0.75 0.35 0.25 1/25 1747
The first three numbers are load averages. What are the last 2 numbers?
The last one keeps increasing by 2 every second, should I be worried?
/proc/loadavg
The first three fields in this file are load average figures giving
the number of jobs in the run queue (state R) or waiting for disk
I/O (state D) averaged over 1, 5, and 15 minutes. They are the
same as the load average numbers given by uptime(1) and other
programs.
The fourth field consists of two numbers separated by a
slash (/). The first of these is the number of currently executing
kernel scheduling entities (processes, threads); this will be less
than or equal to the number of CPUs. The value after the slash is the
number of kernel scheduling entities that currently exist on the
system.
The fifth field is the PID of the process that was most
recently created on the system.
I would like to comment the accepted answer.
The fourth field consists of two numbers separated by a slash (/). The
first of these is the number of currently executing kernel scheduling
entities (processes, threads); this will be less than or equal to the
number of CPUs.
I did a test program that reads integer N from input and then creates N threads and their run them forever. On RHEL 6.5 computer I have 8 processor and each processor has hyper threading. Anyway if I run my test and it creates 128 threads I see in the fourth field values that are greater than 128, for example 135. It is clearly greater than the number of CPU. This post supports my observation: http://juliano.info/en/Blog:Memory_Leak/Understanding_the_Linux_load_average
It is worth noting that the current explanation in proc(5) manual page
(as of man-pages version 3.21, March 2009) is wrong. It reports the
first number of the forth field as the number of currently executing
scheduling entities, and so predicts it can't be greater than the
number of CPUs. That doesn't match the real implementation, where this
value reports the current number of runnable threads.
The first three columns measure CPU and I/O utilization of the last one, five, and 15 minute periods. The fourth column shows the number of currently running processes and the total number of processes. The last column displays the last process ID used.
https://docs.fedoraproject.org/en-US/Fedora/17/html/System_Administrators_Guide/s2-proc-loadavg.html
The following page explains these in detail:
http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
Some interpretations:
If the averages are 0.0, then your system is idle.
If the 1 minute average is higher than the 5 or 15 minute averages, then load is increasing.
If the 1 minute average is lower than the 5 or 15 minute averages, then load is decreasing.
If they are higher than your CPU count, then you might have a performance problem (it depends).
You can consult the proc manual page for /proc/loadavg :
$ man proc | sed -n '/loadavg/,/^$/ p'
/proc/loadavg
The first three fields in this file are load average figures giving the number of jobs in the run queue
(state R) or waiting for disk I/O (state D) averaged over 1, 5, and 15 minutes. They are the same as
the load average numbers given by uptime(1) and other programs. The fourth field consists of two num‐
bers separated by a slash (/). The first of these is the number of currently runnable kernel schedul‐
ing entities (processes, threads). The value after the slash is the number of kernel scheduling enti‐
ties that currently exist on the system. The fifth field is the PID of the process that was most
recently created on the system.
For that, you need to install the man-pages package on CentOS7/RedHat7 or the manpages package on Ubuntu 20.04/22.04 LTS.

Resources