Why does glob lstat matching entries?

Why does glob lstat matching entries? - linux

Looking into behavior in this question, I was surprised to see that perl lstat()s every path matching a glob pattern:
$ mkdir dir
$ touch dir/{foo,bar,baz}.txt
$ strace -e trace=lstat perl -E 'say $^V; <dir/b*>'
v5.10.1
lstat("dir/baz.txt", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
lstat("dir/bar.txt", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
I see the same behavior on my Linux system with glob(pattern) and <pattern>, and with later versions of perl.
My expectation was that the globbing would simply opendir/readdir under the hood, and that it would not need to inspect the actual pathnames it was searching.
What is the purpose of this lstat? Does it affect the glob()s return?

This strange behavior has been noticed before on PerlMonks. It turns out that glob calls lstat to support its GLOB_MARK flag, which has the effect that:
Each pathname that is a directory that matches the pattern has a slash appended.
To find out whether a directory entry refers to a subdir, you need to stat it. This is apparently done even when the flag is not given.

I was wondering the same thing - "What is the purpose of this lstat? Does it affect the glob()s return?"
Within bsd_glob.c glob2() I noticed a g_stat call within an if branch that required the GLOB_MARK flag to be set, I also noticed a call to g_lstat just before that was not guarded by a flag check. Both are within an if branch for when the end of pattern is reached.
If I remove these 2 lines in the glob2 function in perl-5.12.4/ext/File-Glob/bsd_glob.c
- if (g_lstat(pathbuf, &sb, pglob))
- return(0);
the only perl test (make test) that fails is test 5 in ext/File-Glob/t/basic.t with:
not ok 5
# Failed test at ../ext/File-Glob/t/basic.t line 92.
# Structures begin differing at:
# $got->[0] = 'asdfasdf'
# $expected->[0] = Does not exist
Test 5 in t/basic.t is
# check nonexistent checks
# should return an empty list
# XXX since errfunc is NULL on win32, this test is not valid there
#a = bsd_glob("asdfasdf", 0);
SKIP: {
skip $^O, 1 if $^O eq 'MSWin32' || $^O eq 'NetWare';
is_deeply(\#a, []);
}
If I replace the 2 lines removed with:
+ if (!((pglob->gl_flags & GLOB_NOCHECK) ||
+ ((pglob->gl_flags & GLOB_NOMAGIC) &&
+ !(pglob->gl_flags & GLOB_MAGCHAR)))){
+ if (g_lstat(pathbuf, &sb, pglob))
+ return(0);
+ }
I don't see any failures from "make test" for perl-5.12.4 on linux x86_64 (RHEL6.3 2.6.32-358.11.1.el6.x86_64) and when using:
strace -fe trace=lstat perl -e 'use File::Glob q{:glob};
print scalar bsd_glob(q{/var/log/*},GLOB_NOCHECK)'
I no longer see the lstat calls for each file in the dir.
I don't mean to suggest that the perl tests for glob (File-Glob) are comprehensive (they are not), or that a change such as this will not break existing
behaviour (this seems likely). As far as I can tell the code with this (g_l)stat call existed in original-bsd/lib/libc/gen/glob.c 24 years ago in 1990.
Also see:
Chapter 6. Benchmarking Perl of "Mastering Perl" By brian d foy, Randal L. Schwartz
contains a section on comparing code where code using glob() and opendir() is compared.
"future globs (was "UNIX mindset...")" in comp.unix.wizards from Dick Dunn in 1991.
Usenet newsgroup mod.sources "'Globbing' library routine (glob)" from Guido van Rossum in July 1986 - I don't see a reference to "stat" in this code.

Related

How to compare two parts of a line multiple times in multiple files with specific extension?

NOTE BEFORE READING: The following question is described very precisely and that is the reason for the length of a question. If you want to understand the problem, it's better to read the entire thing. Many thanks for all the answers!
I am working on a bash script (.sh file) which will check certain values in every file of a directory. Bash script will be executed in a pre-commit (pre-commit is not a part of the question).
There is a directory that contains multiple .c files in multiple subdirectories. I want to check a part of two lines which are NOT in every .c file but only in some of them. The structure of a file that contains the useful information is as following:
/*
## SYMBOL = some_symbol1
## A2L_TYPE = PARAMETER
.
.
.
#! DEFAULT = some_value1
## END
*/
some_symbol1 = some_value1
/*
## SYMBOL = some_symbol2
## A2L_TYPE = PARAMETER
.
.
.
#! DEFAULT = some_value2
## END
*/
some_symbol2 = some_value2
This kind of structure is automatically generated by another script.
I want to check if some_value1 (in comment) is equal to some_value1 (in variable).
There are hundreds of these variable in each .c file (not necessarily in each .c file).
The main functionality of a script should be:
Check some_value1 in comment and variable and throw an error if they are not the same. Script has to go through EVERY .c file in a directory (bash is in root) and ALL subdirectories to find previously mentioned structure.
Value of variable can be something as 0.06F, where in comment, there is 0.06 (compare only the numbers)
Value of variable can also be an array: { 0.0F, 0.45F, 0.3F } where in the comment, there is [ 0.0, 0.45, 0.3 ] (without F and difference in braces)
To summarize:
I want to build a check script that compares some_value1 (in comment) and some_value1 (in variable) and throw an error if they don't match
Useful information is not in EVERY .c file but only in some of them (don't know which)
Values after #! DEFAULT is a comment where the value of variable is a number (maybe this is not that important?)
between A2L_TYPE and DEFAULT, there can be desired number of unimportant stuff. (still a comment)
What I tried so far is for loop through every .c file and a nested for loop to read every line in each .c file. What I wanted to implement was a grep command inside for loop to check each line if there is a #! DEFAULT pattern and save it to the variable.
Latest code that I tried:
!/bin/bash
shopt -s globstar
for d in */**/*.c
do
while IFS="" read -r p || [ -n "$p" ]
do
grep -P "#! DEFAULT" $d
done < $d
done
This is currently not working because it gives an error that certain grep targets are directories
If any has any questions, I will try to explain it better.

# search for files with extension ".c"
# execute awk on any matches, using '= ' as field separator
find . -type f -name '*.c' -exec awk -F'=[[:space:]]*' '
# check if first three lines match template
( NR==1 && /^\/\*/ ) ||
( NR==2 && /^## SYMBOL = / ) ||
( NR==3 && /^## A2L_TYPE = PARAMETER/ ) { ok++ }
# template mismatch - skip this file
( NR==4 && ok!=3 ) {
printf "%s : ignored\n", FILENAME
nextfile
}
# store first occurrence of some_value1
# note line number where second occurrence expected
/^#! DEFAULT =/ { v[1]=v1=$2; n=NR+3 }
# test second occurrence
NR==n {
v[2]=v2=$2;
# prune everything except numbers and array delimiters
for (s in v) gsub(/[^0-9.,]/,"",v[s]);
# output result
# match exactly or only number list
printf "%s #(%d,%d) : ", FILENAME,n-3,n
if (v1==v2 || v[1]==v[2])
printf "match (%s)==(%s)\n", v1,v2
else
printf "mismatch (%s)!=(%s)\n", v1,v2
# no need to check rest of this file
# elide to check multiple values per file
nextfile
}
' {} +

Setting value of command prompt ( PS1) based on present directory string length

I know I can do this to reflect just last 2 directories in the PS1 value.
PS1=${PWD#"${PWD%/*/*}/"}#
but lets say we have a directory name that's really messy and will reduce my working space , like
T-Mob/2021-07-23--07-48-49_xperia-build-20191119010027#
OR
2021-07-23--07-48-49_nokia-build-20191119010027/T-Mob#
those are the last 2 directories before the prompt
I want to set a condition if directory length of either of the last 2 directories is more than a threshold e.g. 10 chars , shorten the name with 1st 3 and last 3 chars of the directory (s) whose length exceeds 10
e.g.
2021-07-23--07-48-49_xperia-build-20191119010027 &
2021-07-23--07-48-49_nokia-build-20191119010027
both gt 10 will be shortened to 202*027 & PS1 will be respectively
T-Mob/202*027/# for T-Mob/2021-07-23--07-48-49_xperia-build-20191119010027# and
202*027/T-Mob# for 2021-07-23--07-48-49_nokia-build-20191119010027/T-Mob#
A quick 1 Liner to get this done ?
I cant post this in comments so Updating here. Ref to Joaquins Answer ( thx J)
PS1=''`echo ${PWD#"${PWD%/*/*}/"} | awk -v RS='/' 'length() <=10{printf $0"/"}; length()>10{printf "%s*%s/", substr($0,1,3), substr($0,length()-2,3)};'| tr -d "\n"; echo "#"`''
see below o/p's
/root/my-applications/bin # it shortened as expected
my-*ons/bin/#cd - # going back to prev.
/root
my-*ons/bin/# #value of prompt is the same but I am in /root

A one-liner is basically always the wrong choice. Write code to be robust, readable and maintainable (and, for something that's called frequently or in a tight loop, to be efficient) -- not to be terse.
Assuming availability of bash 4.3 or newer:
# Given a string, a separator, and a max length, shorten any segments that are
# longer than the max length.
shortenLongSegments() {
local -n destVar=$1; shift # arg1: where do we write our result?
local maxLength=$1; shift # arg2: what's the maximum length?
local IFS=$1; shift # arg3: what character do we split into segments on?
read -r -a allSegments <<<"$1"; shift # arg4: break into an array
for segmentIdx in "${!allSegments[#]}"; do # iterate over array indices
segment=${allSegments[$segmentIdx]} # look up value for index
if (( ${#segment} > maxLength )); then # value over maxLength chars?
segment="${segment:0:3}*${segment:${#segment}-3:3}" # build a short version
allSegments[$segmentIdx]=$segment # store shortened version in array
fi
done
printf -v destVar '%s\n' "${allSegments[*]}" # build result string from array
}
# function to call from PROMPT_COMMAND to actually build a new PS1
buildNewPs1() {
# declare our locals to avoid polluting global namespace
local shorterPath
# but to cache where we last ran, we need a global; be explicit.
declare -g buildNewPs1_lastDir
# do nothing if the directory hasn't changed
[[ $PWD = "$buildNewPs1_lastDir" ]] && return 0
shortenLongSegments shorterPath 10 / "$PWD"
PS1="${shorterPath}\$"
# update the global tracking where we last ran this code
buildNewPs1_lastDir=$PWD
}
PROMPT_COMMAND=buildNewPs1 # call buildNewPs1 before rendering the prompt
Note that printf -v destVar %s "valueToStore" is used to write to variables in-place, to avoid the performance overhead of var=$(someFunction). Similarly, we're using the bash 4.3 feature namevars -- accessible with local -n or declare -n -- to allow destination variable names to be parameterized without the security risk of eval.
If you really want to make this logic only apply to the last two directory names (though I don't see why that would be better than applying it to all of them), you can do that easily enough:
buildNewPs1() {
local pathPrefix pathFinalSegments
pathPrefix=${PWD%/*/*} # everything but the last 2 segments
pathSuffix=${PWD#"$pathPrefix"} # only the last 2 segments
# shorten the last 2 segments, store in a separate variable
shortenLongSegments pathSuffixShortened 10 / "$pathSuffix"
# combine the unshortened prefix with the shortened suffix
PS1="${pathPrefix}${pathSuffixShortened}\$"
}
...adding the performance optimization that only rebuilds PS1 when the directory changed to this version is left as an exercise to the reader.

Probably not the best solution, but a quick solution using awk:
PS1=`echo ${PWD#"${PWD%/*/*}/"} | awk -v RS='/' 'length()<=10{printf $0"/"}; length()>10{printf "%s*%s/", substr($0,1,3), substr($0,length()-2,3)};'| tr -d "\n"; echo "#"`
I got this results with your examples:
T-Mob/202*027/#
202*027/T-Mob/#

How can I detect a sequence of "hollows" (holes, lines not matching a pattern) bigger than n in a text file?

Case scenario:
$ cat Status.txt
1,connected
2,connected
3,connected
4,connected
5,connected
6,connected
7,disconnected
8,disconnected
9,disconnected
10,disconnected
11,disconnected
12,disconnected
13,disconnected
14,connected
15,connected
16,connected
17,disconnected
18,connected
19,connected
20,connected
21,disconnected
22,disconnected
23,disconnected
24,disconnected
25,disconnected
26,disconnected
27,disconnected
28,disconnected
29,disconnected
30,connected
As can be seen, there are "hollows", understanding them as lines with the "disconnected" value inside the sequence file.
I want, in fact, to detect these "holes", but it would be useful if I could set a minimum n of missing numbers in the sequence.
I.e: for ' n=5' a detectable hole would be the 7... 13 part, as there are at least 5 "disconnected" in a row on the sequence. However, the missing 17 should not be considered as detectable in this case. Again, at line 21 whe get a valid disconnection.
Something like:
$ detector Status.txt -n 5 --pattern connected
7
21
... that could be interpreted like:
- Missing more than 5 "connected" starting at 7.
- Missing more than 5 "connected" starting at 21.
I need to script this on Linux shell, so I was thinking about programing some loop, parsing strings and so on, but I feel like if this could be done by using linux shell tools and maybe some simpler programming. Is there a way?
Even when small programs like csvtool are a valid solution, some more common Linux commands (like grep, cut, awk, sed, wc... etc) could be worth for me when working with embedded devices.

#!/usr/bin/env bash
last_connected=0
min_hole_size=${1:-5} # default to 5, or take an argument from the command line
while IFS=, read -r num state; do
if [[ $state = connected ]]; then
if (( (num-last_connected) > (min_hole_size+1) )); then
echo "Found a hole running from $((last_connected + 1)) to $((num - 1))"
fi
last_connected=$num
fi
done
# Special case: Need to also handle a hole that's still open at EOF.
if [[ $state != connected ]] && (( num - last_connected > min_hole_size )); then
echo "Found a hole running from $((last_connected + 1)) to $num"
fi
...emits, given your file on stdin (./detect-holes <in.txt):
Found a hole running from 7 to 13
Found a hole running from 21 to 29
See:
BashFAQ #1 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
The conditional expression -- the [[ ]] syntax used to make it safe to do string comparisons without quoting expansions.
Arithmetic comparison syntax -- valid in $(( )) in all POSIX-compliant shells; also available without the expansion side effects as (( )) as a bash extension.

This is the perfect use case for awk, since the machinery of line reading, column splitting, and matching is all built in. The only tricky bit is getting the command line argument to your script, but it's not too bad:
#!/usr/bin/env bash
awk -v window="$1" -F, '
BEGIN { if (window=="") {window = 1} }
$2=="disconnected"{if (consecutive==0){start=NR}; consecutive++}
$2!="disconnected"{if (consecutive>window){print start}; consecutive=0}
END {if (consecutive>window){print start}}'
The window value is supplied as the first command line argument; left out, it defaults to 1, which means "display the start of gaps with at least two consecutive disconnections". Probably could have a better name. You can give it 0 to include single disconnections. Sample output below. (Note that I added series of 2 disconnections at the end to test the failure that Charles metions).
njv#organon:~/tmp$ ./tst.sh 0 < status.txt # any number of disconnections
7
17
21
31
njv#organon:~/tmp$ ./tst.sh < status.txt # at least 2 disconnections
7
21
31
njv#organon:~/tmp$ ./tst.sh 8 < status.txt # at least 9 disconnections
21

Awk solution:
detector.awk script:
#!/bin/awk -f
BEGIN { FS="," }
$2 == "disconnected"{
if (f && NR-c==nr) c++;
else { f=1; c++; nr=NR }
}
$2 == "connected"{
if (f) {
if (c > n) {
printf "- Missing more than 5 \042connected\042 starting at %d.\n", nr
}
f=c=0
}
}
Usage:
awk -f detector.awk -v n=5 status.txt
The output:
- Missing more than 5 "connected" starting at 7.
- Missing more than 5 "connected" starting at 21.

Using $ notation in middle of C-Shell statement

I have a bunch of directories to process, so I start a for loop like this:
foreach n (1 2 3 4 5 6 7 8)
Then I have a bunch of commands where I am copying over a few files from different places
cp file1 dir$n
cp file2 dir$n
but I have a couple commands where the $n is in the middle of the command like this:
cp -r dir$nstep1 dir$n
When I run this command, the shell complains that it cannot find the variable $nstep1. What i want to do is evaluate the $n first and then concatenate the text around it. I tried using `` and (), but neither of those work. How to do this in csh?

In this respect behavior is similar to POSIX shells:
cp -r "dir${n}step1" "dir${n}"
The quotes prevent string-splitting and glob expansion. To observe what this means, compare the following:
# prints "hello * cruel * world" on one line
set n=" * cruel * "
printf '%s\n' "hello${n}world"
...to this:
# prints "hello" on one line
# ...then a list of files in the current directory each on their own lines
# ...then "cruel" on another line
# ...then a list of files again
# ... and then "world"
set n=" * cruel * "
printf '%s\n' hello${n}world
In real-world cases, correct quoting can thus be the difference between deleting the oddly-named file you're trying to operate on, and deleting everything else in the directory as well.

How to determine the date-and-time that a Linux process was started?

If I look at /proc/6945/stat then I get a series of numbers, one of which is the number of CPU-centiseconds for which the process has been running.
But I'm running these processes on heavily-loaded boxes, and what I'm interested in is the clock-time when the job will finish, for which I want to know the clock-time that it started.
The timestamps on files in /proc/6945 look to be in the right sort of range but I can't find a particular file which consistently has the right clock-time on it.
As always I can't modify the process.

Timestamps of the directories in /proc are useless.
I was advised to look at 'man proc'; this says that /proc/$PID/stat field 21 records the start-time of the process in kernel jiffies since boot ... so:
open A,"< /proc/stat"; while (<A>) { if (/^btime ([0-9]*)/) { $btime = $1 } }
to obtain the boot time, then
my #statl = split " ",`cat /proc/$i/stat`;
$starttime_jiffies = $statl[21];
$starttime_ut = $btime + $starttime_jiffies / $jiffies_per_second;
$cputime = time-$starttime_ut
but I set $jiffies_per_second to 100 because I don't know how to ask the kernel for its value from perl.

I have a project on github that does this in perl. You can find it here:
https://github.com/cormander/psj
The code you're wanting is in lib/Proc/PID.pm, and here is the snippit (with comments removed):
use POSIX qw(ceil sysconf _SC_CLK_TCK);
sub _start_time {
my $pid = shift->pid;
my $tickspersec = sysconf(_SC_CLK_TCK);
my ($secs_since_boot) = split /\./, file_read("/proc/uptime");
$secs_since_boot *= $tickspersec;
my $start_time = (split / /, file_read("/proc/$pid/stat"))[21];
return ceil(time() - (($secs_since_boot - $start_time) / $tickspersec));
}
Beware the non-standard code function file_read in here, but that should be pretty straight forward.

Use the creation timestamp of the /proc/6945 directory (or whatever PID), rather than looking at the files it contains. For example:
ls -ld /proc/6945

Bash command to get the start date of some process:
date -d #$(cat /proc/PID/stat | awk "{printf \"%.0f\", $(grep btime /proc/stat | cut -d ' ' -f 2)+\$22/$(getconf CLK_TCK);}")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why does glob lstat matching entries? - linux

Related

How to compare two parts of a line multiple times in multiple files with specific extension?

Setting value of command prompt ( PS1) based on present directory string length

How can I detect a sequence of "hollows" (holes, lines not matching a pattern) bigger than n in a text file?

Using $ notation in middle of C-Shell statement

How to determine the date-and-time that a Linux process was started?

Categories

Resources