How do I determine the slowest component of my shell pipeline?

How do I determine the slowest component of my shell pipeline? - linux

I have an extremely long and complicated shell pipeline set up to grab 2.2Gb of data and process it. It currently takes 45 minutes to process. The pipeline is a number of cut, grep, sort, uniq, grep and awk commands tied together. I have my suspicion that it's the grep portion that is causing it to take so much time but I have no way of confirming it.
Is there anyway to "profile" the entire pipeline from end to end to determine which component is the slowest and if it is CPU or IO bound so it can be optimised?
I cannot post the entire command here unfortunately as it would require posting proprietary information but I suspect it is the following bit checking it out with htop:
grep -v ^[0-9]

One way to do this is to gradually build up the pipeline, timing each addition, and taking as much out of the equation as possible (such as outputting to a terminal or file). A very simple example is shown below:
pax:~$ time ( cat bigfile >/dev/null )
real 0m4.364s
user 0m0.004s
sys 0m0.300s
pax:~$ time ( cat bigfile | tr 'a' 'b' >/dev/null )
real 0m0.446s
user 0m0.312s
sys 0m0.428s
pax:~$ time ( cat bigfile | tr 'a' 'b' | tail -1000l >/dev/null )
real 0m0.796s
user 0m0.516s
sys 0m0.688s
pax:~$ time ( cat bigfile | tr 'a' 'b' | tail -1000l | sort -u >/dev/null )
real 0m0.892s
user 0m0.556s
sys 0m0.756s
If you add up the user and system times above, you'll see that the incremental increases are:
0.304 (0.004 + 0.300) seconds for the cat;
0.436 (0.312 + 0.428 - 0.304) seconds for the tr;
0.464 (0.516 + 0.688 - 0.436 - 0.304) seconds for the tail; and
0.108 (0.556 + 0.756 - 0.464 - 0.436 - 0.304) seconds for the sort.
This tells me that the main things to look into are the tail and the tr.
Now obviously, that's for CPU only, and I probably should have done multiple runs at each stage for averaging purposes, but that's the basic first approach I would take.
If it turns out it really is your grep, there are a few other options available to you. There are numerous other commands that can strip lines not starting with a digit but you may find that a custom-built command for doing this may be faster still, pseudo-code like (untested, but you should get the idea):
state = echo
lastchar = newline
while not end of file:
read big chunk from file
for every char in chunk:
if lastchar is newline:
if state is echo and char is non-digit:
state = skip
else if state is skip and and char is digit:
state = echo
if state is echo:
output char
lastchar = char
Custom, targeted code like this can sometimes be made more efficient than a general-purpose regex processing engine, simply because it can be optimised to the specific case. Whether that's true is this case, or any case for that matter, is something you should test. My number one optimisation mantra is measure, don't guess!

I found the problem myself after some further experimentation. It appears to be due to the encoding support in grep. Using the following hung the pipeline:
grep -v ^[0-9]
I replaced it with sed as follows and it finished in under 45 seconds!
sed '/^[0-9]/d'

This is straightforward with zsh:
zsh-4.3.12[sysadmin]% time sleep 3 | sleep 5 | sleep 2
sleep 3 0.01s user 0.03s system 1% cpu 3.182 total
sleep 5 0.01s user 0.01s system 0% cpu 5.105 total
sleep 2 0.00s user 0.05s system 2% cpu 2.121 total

Related

Calculate CPU usage from top command in linux

I have to display the CPU usage on my application and update it in real time. I am using top command to get the CPU usage i.e.
I add the two highlighted values to get the CPU usage. The command which i am using to add the highlighted values and get the final CPU usage is:
top -b -n 2 | grep Cpu | awk '{printf "CPU Load:%.2f\n", $(NF-13) + $(NF-15)}' | sed -n '2 p'
Issues is that, this command stops working after sometime i.e. for 3-4 minutes i do get the CPU usage but after that command does not process and i do not get the updated value. I am running this command in a loop.
Any help would be much appreciated.

I am using the similar script without issues for some time now:
top -bn2 | grep Cpu | tail -n1 | sed -e 's/.*, *\([0-9.]*\)%* id.*/\1/' | awk '{print 100-$1}'
The script takes the 'idle' time from top output and deduct it from 100% to get cpu usage.
The periodicity of the loop in which you are calling the script should not be faster than the time needed for the script to finish. Otherwise, you may get multiple 'top's running in parallel. This primarily depends on the 'top' default delay on your system, on mine, it is about 5s, but you can set this with -d switch.

Bash script: CPU stress test while watching clock speed

I am totally new to this forum and also new to bash, so please bear with me :).
I would like to write a bash script to conduct a CPU stress test while observing the clock speed. Therefore, I have done the following:
1.) For the CPU stress test, I have created a script named "bernoulli" with the following code:
#!/bin/bash
# argument 1: n
function bernoulli()
{
if (( $1 < 3 ))
then
echo 1
else
echo $(( $(bernoulli $(( $1 - 1 ))) + $(bernoulli $(( $1 - 2 ))) ))
fi
}
bernoulli $1
2.) I have figured out that by using the "timeout" command I can kill a task after a specified time. For example,
timeout 30s ./bernoulli 35
starts a task calculating the 35th bernoulli number and the task is killed after 30 seconds.
3.) I also found out that by typing
timeout 30s watch grep \"cpu MHz\" /proc/cpuinfo
I can watch the clock speed of my cores (updated every 2 seconds) for 30 seconds (at which point "timeout 30s" kills this task).
What I want: I would like to do the above stress test and simultaneously observe the clock speed. In other words, I would somehow run the two commands
timeout 30s ./bernoulli 35
timeout 30s watch grep \"cpu MHz\" /proc/cpuinfo
"at the same time". I hope I could make it clear what I would like to achieve. Can anyone help with my issue? Thanks a lot for every comment!

How about
timeout 30s ./bernoulli 35 &
timeout 30s watch grep \"cpu MHz\" /proc/cpuinfo
& at the end will make command to run at background, so that second timeout will be executed almost instantly after the first one.
PS: this is rather poor way to test modern CPU. You will be exercising only single core and most likely only limited part of your CPU (no sse, etc). It is not trivial to write CPU benchmark, so you might want to use one of already available. For example, you can take a look at sysbench with something like sysbench --test=cpu --cpu-max-prime=20000 run.

You can run them in a dedicated shell:
timeout 30s bash -c './bernoulli 35 & watch grep \"cpu MHz\" /proc/cpuinfo'
Note that the single & is not a typo. It is not a logical and, it runs the bernoulli script in background.

How to create a CPU spike with a bash command

I want to create a near 100% load on a Linux machine. It's quad core system and I want all cores going full speed. Ideally, the CPU load would last a designated amount of time and then stop. I'm hoping there's some trick in bash. I'm thinking some sort of infinite loop.

I use stress for this kind of thing, you can tell it how many cores to max out.. it allows for stressing memory and disk as well.
Example to stress 2 cores for 60 seconds
stress --cpu 2 --timeout 60

You can also do
dd if=/dev/zero of=/dev/null
To run more of those to put load on more cores, try to fork it:
fulload() { dd if=/dev/zero of=/dev/null | dd if=/dev/zero of=/dev/null | dd if=/dev/zero of=/dev/null | dd if=/dev/zero of=/dev/null & }; fulload; read; killall dd
Repeat the command in the curly brackets as many times as the number of threads you want to produce (here 4 threads).
Simple enter hit will stop it (just make sure no other dd is running on this user or you kill it too).

I think this one is simpler. Open Terminal and type the following and press Enter.
yes > /dev/null &
To fully utilize modern CPUs, one line is not enough, you may need to repeat the command to exhaust all the CPU power.
To end all of this, simply put
killall yes
The idea was originally found here, although it was intended for Mac users, but this should work for *nix as well.

Although I'm late to the party, this post is among the top results in the google search "generate load in linux".
The result marked as solution could be used to generate a system load, i'm preferring to use sha1sum /dev/zero to impose a load on a cpu-core.
The idea is to calculate a hash sum from an infinite datastream (eg. /dev/zero, /dev/urandom, ...) this process will try to max out a cpu-core until the process is aborted.
To generate a load for more cores, multiple commands can be piped together.
eg. generate a 2 core load:
sha1sum /dev/zero | sha1sum /dev/zero

To load 3 cores for 5 seconds:
seq 3 | xargs -P0 -n1 timeout 5 yes > /dev/null
This results in high kernel (sys) load from the many write() system calls.
If you prefer mostly userland cpu load:
seq 3 | xargs -P0 -n1 timeout 5 md5sum /dev/zero
If you just want the load to continue until you press Ctrl-C:
seq 3 | xargs -P0 -n1 md5sum /dev/zero

One core (doesn't invoke external process):
while true; do true; done
Two cores:
while true; do /bin/true; done
The latter only makes both of mine go to ~50% though...
This one will make both go to 100%:
while true; do echo; done

Here is a program that you can download Here
Install easily on your Linux system
./configure
make
make install
and launch it in a simple command line
stress -c 40
to stress all your CPUs (however you have) with 40 threads each running a complex sqrt computation on a ramdomly generated numbers.
You can even define the timeout of the program
stress -c 40 -timeout 10s
unlike the proposed solution with the dd command, which deals essentially with IO and therefore doesn't really overload your system because working with data.
The stress program really overloads the system because dealing with computation.

An infinite loop is the idea I also had. A freaky-looking one is:
while :; do :; done
(: is the same as true, does nothing and exits with zero)
You can call that in a subshell and run in the background. Doing that $num_cores times should be enough. After sleeping the desired time you can kill them all, you get the PIDs with jobs -p (hint: xargs)

:(){ :|:& };:
This fork bomb will cause havoc to the CPU and will likely crash your computer.

I would split the thing in 2 scripts :
infinite_loop.bash :
#!/bin/bash
while [ 1 ] ; do
# Force some computation even if it is useless to actually work the CPU
echo $((13**99)) 1>/dev/null 2>&1
done
cpu_spike.bash :
#!/bin/bash
# Either use environment variables for NUM_CPU and DURATION, or define them here
for i in `seq ${NUM_CPU}` : do
# Put an infinite loop on each CPU
infinite_loop.bash &
done
# Wait DURATION seconds then stop the loops and quit
sleep ${DURATION}
killall infinite_loop.bash

to increase load or consume CPU 100% or X%
sha1sum /dev/zero &
on some system this will increase the load in slots of X%, in that case you have to run the same command multiple time.
then you can see CPU uses by typing command
top
to release the load
killall sha1sum

cat /dev/urandom > /dev/null

#!/bin/bash
duration=120 # seconds
instances=4 # cpus
endtime=$(($(date +%s) + $duration))
for ((i=0; i<instances; i++))
do
while (($(date +%s) < $endtime)); do :; done &
done

I've used bc (binary calculator), asking them for PI with a big lot of decimals.
$ for ((i=0;i<$NUMCPU;i++));do
echo 'scale=100000;pi=4*a(1);0' | bc -l &
done ;\
sleep 4; \
killall bc
with NUMCPU (under Linux):
$ NUMCPU=$(grep $'^processor\t*:' /proc/cpuinfo |wc -l)
This method is strong but seem system friendly, as I've never crashed a system using this.

#!/bin/bash
while [ 1 ]
do
#Your code goes here
done

I went through the Internet to find something like it and found this very handy cpu hammer script.
#!/bin/sh
# unixfoo.blogspot.com
if [ $1 ]; then
NUM_PROC=$1
else
NUM_PROC=10
fi
for i in `seq 0 $((NUM_PROC-1))`; do
awk 'BEGIN {for(i=0;i<10000;i++)for(j=0;j<10000;j++);}' &
done

Using examples mentioned here, but also help from IRC, I developed my own CPU stress testing script. It uses a subshell per thread and the endless loop technique. You can also specify the number of threads and the amount of time interactively.
#!/bin/bash
# Simple CPU stress test script
# Read the user's input
echo -n "Number of CPU threads to test: "
read cpu_threads
echo -n "Duration of the test (in seconds): "
read cpu_time
# Run an endless loop on each thread to generate 100% CPU
echo -e "\E[32mStressing ${cpu_threads} threads for ${cpu_time} seconds...\E[37m"
for i in $(seq ${cpu_threads}); do
let thread=${i}-1
(taskset -cp ${thread} $BASHPID; while true; do true; done) &
done
# Once the time runs out, kill all of the loops
sleep ${cpu_time}
echo -e "\E[32mStressing complete.\E[37m"
kill 0

Utilizing ideas here, created code which exits automatically after a set duration, don't have to kill processes --
#!/bin/bash
echo "Usage : ./killproc_ds.sh 6 60 (6 threads for 60 secs)"
# Define variables
NUM_PROCS=${1:-6} #How much scaling you want to do
duration=${2:-20} # seconds
function infinite_loop {
endtime=$(($(date +%s) + $duration))
while (($(date +%s) < $endtime)); do
#echo $(date +%s)
echo $((13**99)) 1>/dev/null 2>&1
$(dd if=/dev/urandom count=10000 status=none| bzip2 -9 >> /dev/null) 2>&1 >&/dev/null
done
echo "Done Stressing the system - for thread $1"
}
echo Running for duration $duration secs, spawning $NUM_PROCS threads in background
for i in `seq ${NUM_PROCS}` ;
do
# Put an infinite loop
infinite_loop $i &
done

You can try to test the performance of cryptographic algorithms.
openssl speed -multi 4

If you do not want to install additional software, you may use a compression utility which utilizes all CPU cores automatically. For example, xz:
cat /dev/zero | xz -T0 > /dev/null
This takes infinite stream of dummy data from /dev/zero and compresses it using all cores available in the system.

This does a trick for me:
bash -c 'for (( I=100000000000000000000 ; I>=0 ; I++ )) ; do echo $(( I+I*I )) & echo $(( I*I-I )) & echo $(( I-I*I*I )) & echo $(( I+I*I*I )) ; done' &>/dev/null
and it uses nothing except bash.

To enhance dimba's answer and provide something more pluggable (because i needed something similar). I have written the following using the dd load-up concept :D
It will check current cores, and create that many dd threads.
Start and End core load with Enter
#!/bin/bash
load_dd() {
dd if=/dev/zero of=/dev/null
}
fulload() {
unset LOAD_ME_UP_SCOTTY
export cores="$(grep proc /proc/cpuinfo -c)"
for i in $( seq 1 $( expr $cores - 1 ) )
do
export LOAD_ME_UP_SCOTTY="${LOAD_ME_UP_SCOTTY}$(echo 'load_dd | ')"
done
export LOAD_ME_UP_SCOTTY="${LOAD_ME_UP_SCOTTY}$(echo 'load_dd &')"
eval ${LOAD_ME_UP_SCOTTY}
}
echo press return to begin and stop fullload of cores
read
fulload
read
killall -9 dd

Dimba's dd if=/dev/zero of=/dev/null is definitely correct, but also worth mentioning is verifying maxing the cpu to 100% usage. You can do this with
ps -axro pcpu | awk '{sum+=$1} END {print sum}'
This asks for ps output of a 1-minute average of the cpu usage by each process, then sums them with awk. While it's a 1 minute average, ps is smart enough to know if a process has only been around a few seconds and adjusts the time-window accordingly. Thus you can use this command to immediately see the result.

awk is a good way to write a long-running loop that's CPU bound without generating a lot of memory traffic or system calls, or using any significant amount of memory or polluting caches so it slows down other cores a minimal amount. (stress or stress-ng can also do that if you either installed, if you use a simple CPU-stress method.)
awk 'BEGIN{for(i=0;i<100000000;i++){}}' # about 3 seconds on 4GHz Skylake
It's a counted loop so you can make it exit on its own after a finite amount of time. (Awk uses FP numbers, so a limit like 2^54 might not be reachable with i++ due to rounding, but that's way larger than needed for a few seconds to minutes.)
To run it in parallel, use a shell loop to start it in the background n times
for i in {1..6};do awk 'BEGIN{for(i=0;i<100000000;i++){}}' & done
###### 6 threads each running about 3 seconds
$ for i in {1..6};do awk 'BEGIN{for(i=0;i<100000000;i++){}}' & done
[1] 3047561
[2] 3047562
[3] 3047563
[4] 3047564
[5] 3047565
[6] 3047566
$ # this shell is usable.
(wait a while before pressing return)
[1] Done awk 'BEGIN{for(i=0;i<100000000;i++){}}'
[2] Done awk 'BEGIN{for(i=0;i<100000000;i++){}}'
[3] Done awk 'BEGIN{for(i=0;i<100000000;i++){}}'
[4] Done awk 'BEGIN{for(i=0;i<100000000;i++){}}'
[5]- Done awk 'BEGIN{for(i=0;i<100000000;i++){}}'
[6]+ Done awk 'BEGIN{for(i=0;i<100000000;i++){}}'
$
I used perf to see what kind of load it put on the CPU: it runs 2.6 instructions per clock cycle, so it's not the most friendly to a hyperthread sharing the same physical core. But it has a very small cache footprint, getting negligible cache misses even in L1d cache. And strace will show it makes no system calls until exit.
$ perf stat -r5 -d awk 'BEGIN{for(i=0;i<100000000;i++){}}'
Performance counter stats for 'awk BEGIN{for(i=0;i<100000000;i++){}}' (5 runs):
3,277.56 msec task-clock # 0.997 CPUs utilized ( +- 0.24% )
7 context-switches # 2.130 /sec ( +- 12.29% )
1 cpu-migrations # 0.304 /sec ( +- 40.00% )
180 page-faults # 54.765 /sec ( +- 0.18% )
13,708,412,234 cycles # 4.171 GHz ( +- 0.18% ) (62.29%)
35,786,486,833 instructions # 2.61 insn per cycle ( +- 0.03% ) (74.92%)
9,696,339,695 branches # 2.950 G/sec ( +- 0.02% ) (74.99%)
340,155 branch-misses # 0.00% of all branches ( +-122.42% ) (75.08%)
12,108,293,527 L1-dcache-loads # 3.684 G/sec ( +- 0.04% ) (75.10%)
217,064 L1-dcache-load-misses # 0.00% of all L1-dcache accesses ( +- 17.23% ) (75.10%)
48,695 LLC-loads # 14.816 K/sec ( +- 31.69% ) (49.90%)
5,966 LLC-load-misses # 13.45% of all LL-cache accesses ( +- 31.45% ) (49.81%)
3.28711 +- 0.00772 seconds time elapsed ( +- 0.23% )
The most "friendly" to the other hyperthread on an x86 CPU would be a C program like this, which just runs a pause instruction in a loop. (Or portably, a Rust program that runs std::hint::spin_loop.) As far as the OS's process scheduler, it stays in user-space (nothing like a yield() system call), but in hardware it doesn't take up many resources, letting the other logical core have the front-end for multiple cycles.
#include <immintrin.h>
int main(){ // use atoi(argv[1])*10000ULL as a loop count if you want.
while(1) _mm_pause();
}

I combined some of the answers and added a way to scale the stress to all available cpus:
#!/bin/bash
function infinite_loop {
while [ 1 ] ; do
# Force some computation even if it is useless to actually work the CPU
echo $((13**99)) 1>/dev/null 2>&1
done
}
# Either use environment variables for DURATION, or define them here
NUM_CPU=$(grep -c ^processor /proc/cpuinfo 2>/dev/null || sysctl -n hw.ncpu)
PIDS=()
for i in `seq ${NUM_CPU}` ;
do
# Put an infinite loop on each CPU
infinite_loop &
PIDS+=("$!")
done
# Wait DURATION seconds then stop the loops and quit
sleep ${DURATION}
# Parent kills its children
for pid in "${PIDS[#]}"
do
kill $pid
done

Just paste this bad boy into the SSH or console of any server running linux. You can kill the processes manually, but I just shutdown the server when I'm done, quicker.
Edit: I have updated this script to now have a timer feature so that there is no need to kill the processes.
read -p "Please enter the number of minutes for test >" MINTEST && [[ $MINTEST == ?(-)+([0-9]) ]]; NCPU="$(grep -c ^processor /proc/cpuinfo)"; ((endtime=$(date +%s) + ($MINTEST*60))); NCPU=$((NCPU-1)); for ((i=1; i<=$NCPU; i++)); do while (($(date +%s) < $endtime)); do : ; done & done

How to log the memory consumption on Linux?

Is there any ready-to-use solution to log the memory consumption from the start of the system? I'd like to log the data to simple text file or some database so I can analyze it later.
I'm working on Linux 2.4-based embedded system. I need to debug the problem related to memory consumption. My application automatically start on every system start. I need the way to get the data with timestamps from regular intervals (as often as possible), so I can track down problem.
The symptoms of my problem: when system starts it launched my main application and GUI to visualize the main parameters of the system. GUI based on GTK+ (X server). If I disable GUI and X server then my application works OK. If I enable GUI and X server it does not work when I have 256 MiB or 512 MiB of physical memory installed on the motherboard. If I have 1 GiB of memory installed then everything is OK.

The following script prints time stamps and a header.
#!/bin/bash -e
echo " date time $(free -m | grep total | sed -E 's/^ (.*)/\1/g')"
while true; do
echo "$(date '+%Y-%m-%d %H:%M:%S') $(free -m | grep Mem: | sed 's/Mem://g')"
sleep 1
done
The output looks like this (tested on Ubuntu 15.04, 64-bit).
date time total used free shared buffers cached
2015-08-01 13:57:27 24002 13283 10718 522 693 2308
2015-08-01 13:57:28 24002 13321 10680 522 693 2308
2015-08-01 13:57:29 24002 13355 10646 522 693 2308
2015-08-01 13:57:30 24002 13353 10648 522 693 2308

A small script like
rm memory.log
while true; do free >> memory.log; sleep 1; done

I am a big fan of logging everything and I find it useful to know which processes are using the memory and how much each process is using (as well as sumary statistics). The following command records a top printout ordered by memory consumption every 0.5 seconds:
top -bd0.5 -o +%MEM > memory.log
Just note that the log file will grow a lot faster than if you only store the total memory utilization statistics so be sure you don't run out of disk space.

There's a program called
sar
on *nix systems. You could try to use that to monitor memory usage. It takes measurements at regular intervals. Do a
man sar
for more details. I think the option is -r for taking memory measurements, -i to specify the interval you'd like.

I think adding a crontab entry will be enough
*/5 * * * * free -m >> some_output_file
There are other tools like SeaLion, New Relic, Server Density etc which will almost do the same but are much easier to install and configure. My favorite is SeaLion, as it being free and also it gives a awesome timeline view of raw outputs of common linux commands.

You could put something like
vmstat X >> mylogfile
into a startup script. Since your application is already in startup you could just add this line to the end of the initialization script your application is already using.
(where X is # of seconds between log messages)

To periodically log the memory usage efficiently, I combined another answer here with a method to only retain the top-K memory-using processes.
top -bd 1.5 -o +%MEM | grep "load average" -A 9 > memory_usage.log
This command will record, every 1.5s, the top header information and the 3 highest memory-consuming processes (there's a 6-line offset for top's header information). This saves lots of disk space over recording top's information for every process.

So I know that I am late to this game, but I just came up with this answer, as I needed to do this, and really didn't want the extra fields that vmstat, free, etc... all will seem to output without excess filtering. So here is the answer that I came up with:
top -bd 0.1 | grep 'KiB Mem' | cut -d' ' -f10 > memory.txt
OR:
top -bd 0.1 | grep 'KiB Mem' | cut -d' ' -f10 | tee memory.txt
the standard output from top when grep ing with Kib Mem is:
KiB Mem : 16047368 total, 8708172 free, 6015720 used, 1323476 buff/cache
By running this through cut, we filter down to literally just the number prior to used
The user can indeed modify the 0.1 to another number in order to run different capture sample rates. In my case I wanted to use top also because you can run memory stats faster than 1 second per capture, as you can see here I wanted to capture a stat every 1/10th of a second.
NOTES:
It does turn out that piping through cut cause MASSIVE delay in getting anything out to file. As we later found out, it is much faster to leave out the cut command during data acquisition, then perform the cut command on the output file later.
Also, we had no need for timestamps in our tests.
This thus looks as follows:
Begin Logging:
top -bd 0.1 | grep 'KiB Mem' | tee memory_raw.txt
Exit Logging:
ctrl-z (to exit logging)
Filter:
2 levels of cut (filtering), first by comma, then by space. This is due to the alignment of top and provides much cleaner output:
cut memory_raw -d',' -f3 | tee memory_used_withlabel.txt
cut memory_used_withlabel.txt -d' ' -f3 | tee memory_used.txt

Get program execution time in the shell

I want to execute something in a linux shell under a few different conditions, and be able to output the execution time of each execution.
I know I could write a perl or python script that would do this, but is there a way I can do it in the shell? (which happens to be bash)

Use the built-in time keyword:
$ help time
time: time [-p] PIPELINE
Execute PIPELINE and print a summary of the real time, user CPU time,
and system CPU time spent executing PIPELINE when it terminates.
The return status is the return status of PIPELINE. The `-p' option
prints the timing summary in a slightly different format. This uses
the value of the TIMEFORMAT variable as the output format.
Example:
$ time sleep 2
real 0m2.009s
user 0m0.000s
sys 0m0.004s

You can get much more detailed information than the bash built-in time (i.e time(1), which Robert Gamble mentions). Normally this is /usr/bin/time.
Editor's note:
To ensure that you're invoking the external utility time rather than your shell's time keyword, invoke it as /usr/bin/time.
time is a POSIX-mandated utility, but the only option it is required to support is -p.
Specific platforms implement specific, nonstandard extensions: -v works with GNU's time utility, as demonstrated below (the question is tagged linux); the BSD/macOS implementation uses -l to produce similar output - see man 1 time.
Example of verbose output:
$ /usr/bin/time -v sleep 1
Command being timed: "sleep 1"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 1%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.05
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 210
Voluntary context switches: 2
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

#!/bin/bash
START=$(date +%s)
# do something
# start your script work here
ls -R /etc > /tmp/x
rm -f /tmp/x
# your logic ends here
END=$(date +%s)
DIFF=$(( $END - $START ))
echo "It took $DIFF seconds"

For a line-by-line delta measurement, try gnomon.
$ npm install -g gnomon
$ <your command> | gnomon --medium=1.0 --high=4.0 --ignore-blank --real-time=100
A command line utility, a bit like moreutils's ts, to prepend timestamp information to the standard output of another command. Useful for long-running processes where you'd like a historical record of what's taking so long.
You can also use the --high and/or --medium options to specify a length threshold in seconds, over which gnomon will highlight the timestamp in red or yellow. And you can do a few other things, too.

Should you want more precision, use %N with date (and use bc for the diff, because $(()) only handles integers).
Here's how to do it:
start=$(date +%s.%N)
# do some stuff here
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time: %.6f seconds" $dur
Example:
start=$(date +%s.%N); \
sleep 0.1s; \
dur=$(echo "$(date +%s.%N) - $start" | bc); \
printf "Execution time: %.6f seconds\n" $dur
Result:
Execution time: 0.104623 seconds

If you intend to use the times later to compute with, learn how to use the -f option of /usr/bin/time to output code that saves times. Here's some code I used recently to get and sort the execution times of a whole classful of students' programs:
fmt="run { date = '$(date)', user = '$who', test = '$test', host = '$(hostname)', times = { user = %U, system = %S, elapsed = %e } }"
/usr/bin/time -f "$fmt" -o $timefile command args...
I later concatenated all the $timefile files and pipe the output into a Lua interpreter. You can do the same with Python or bash or whatever your favorite syntax is. I love this technique.

If you only need precision to the second, you can use the builtin $SECONDS variable, which counts the number of seconds that the shell has been running.
while true; do
start=$SECONDS
some_long_running_command
duration=$(( SECONDS - start ))
echo "This run took $duration seconds"
if some_condition; then break; fi
done

You can use time and subshell ():
time (
for (( i=1; i<10000; i++ )); do
echo 1 >/dev/null
done
)
Or in same shell {}:
time {
for (( i=1; i<10000; i++ )); do
echo 1 >/dev/null
done
}

The way is
$ > g++ -lpthread perform.c -o per
$ > time ./per
output is >>
real 0m0.014s
user 0m0.010s
sys 0m0.002s

one possibly simple method ( that may not meet different users needs ) is the use of shell PROMPT.it is a simple solution that can be useful in some cases. You can use the bash prompting feature as in the example below:
export PS1='[\t \u#\h]\$'
The above command will result in changing the shell prompt to :
[HH:MM:SS username#hostname]$
Each time you run a command (or hit enter) returning back to the shell prompt, the prompt will display current time.
notes:
1) beware that if you waited for sometime before you type your next command, then this time need to be considered, i.e the time displayed in the shell prompt is the timestamp when the shell prompt was displayed, not when you enter command. some users choose to hit Enter key to get a new prompt with a new timestamp before they are ready for the next command.
2) There are other available options and modifiers that can be used to change the bash prompt, refer to ( man bash ) for more details.

perf stat Linux CLI utility
This tool is overkill for just getting time. But it can do so much more for you to help profile and fix slowness that it is worth knowing about. Ubuntu 22.04 setup:
sudo apt install linux-tools-common linux-tools-generic
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
Usage:
perf stat <mycmd>
Sample run with stress-ng:
perf stat stress-ng --cpu 1 --cpu-method matrixprod -t 5
Sample output:
Performance counter stats for 'stress-ng --cpu 1 --cpu-method matrixprod -t 5':
5,005.46 msec task-clock # 0.999 CPUs utilized
88 context-switches # 17.581 /sec
1 cpu-migrations # 0.200 /sec
1,188 page-faults # 237.341 /sec
18,847,667,167 cycles # 3.765 GHz
26,544,261,897 instructions # 1.41 insn per cycle
3,239,655,001 branches # 647.225 M/sec
25,393,369 branch-misses # 0.78% of all branches
5.012218939 seconds time elapsed
4.998051000 seconds user
0.009122000 seconds sys
perf can also do a bunch more advanced things, e.g. here I show how to use it to profile code: How do I profile C++ code running on Linux?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string