Order of the file reading from a directory in linux - linux

If in a directory, suppose there are 100 files with names like file.pcap1, file.pcap2, file.pcap3,....., file.pcap100. In a shell script, to read these file one-by-one, i have written a line like:
for $file in /root/*pcap*
do
Something
done
What is the order by which files are read? Are they read in the increasing order of the numbers which are at the end of the file names? Is this the same case for all types of linux machines?

It is sorted by file name. Just like the default ls (with no flags).
Also, you need to remove the $ in your foreach:
for file in /root/*pcap*

POSIX shell returns paths sorted using current locale:
If the pattern matches any existing filenames or pathnames, the
pattern shall be replaced with those filenames and pathnames, sorted
according to the collating sequence in effect in the current locale
It means pcap10 comes before pcap2. You probably want natural sorting order instead e.g., Python analog of natsort function (sort a list using a “natural order” algorithm).

Related

copy and append specific lines to a file with specific name format?

I am copying some specific lines from one file to another.
grep '^stringmatch' /path/sfile-*.cfg >> /path/nfile-*.cfg
Here what's happening: its creating a new file called nfile-*.cfg and copying those lines in that. The file names sfile- * and nfile- * are randomly generated and are generally followed by a number. Both sfile-* and nfile-* are existing files and there is only one such file in the same directory. Only the number that follows is randomly generated. The numbers following in sfile and nfile need not be same. The files are not created simultaneously but are generated when a specific command is given. But some lines from one file to the another file needs to be appended.
I'm guessing you actually want something like
for f in /path/sfile-*.cfg; do
grep '^stringmatch' "$f" >"/path/nfile-${f#/path/sfile-}"
done
This will loop over all sfile matches and create an nfile target file with the same number after the dash as the corresponding source sfile. (The parameter substitution ${variable#prefix} returns the value of variable with any leading match on the pattern prefix removed.)
If there is only one matching file, the loop will only run once. If there are no matches on the wildcard, the loop will still run once unless you enable nullglob, which changes the shell's globbing behavior so that wildcards with no matches expand into nothing, instead of to the wildcard expression itself. If you don't want to enable nullglob, a common workaround is to add this inside the loop, before the grep;
test -e "$f" || break
If you want the loop to only process the first match if there are several, add break on a line by itself before the done.
If I interpret your question correctly, you want to output to an existing nfile, which has a random number in it, but instead the shell is creating a file with an asterisk in it, so literally nfile-*.cfg.
This is happening because the nfile doesn't exist when you first run the command. If the file doesn't exist, bash will fail to expand nfile-*.cfg and will instead use the * as a literal character. This is correct behaviour in bash.
So, it looks like the problem is that the nfile doesn't exist when you start your grep. You'll need to create one.
I'll leave code to others, but I hope the explanation is useful.

in linux, how to compute the md5 of several files at once, and put the output in a text file?

I have many files in mypath that look like file_01.gz, file_02.gz, etc
I would like to compute the md5 checksums for each of them, and store the output in a text file with something like
filename md5
file01 fjiroeghreio
Is that possible on linux?
Many thanks!
md5sum file*.gz > output.txt
Output file is space separated, without header
You can use the shell filename expansion:
md5sum *.gz > file
Linux already has a tool called md5sum, so all you need to do is call it for every file you want. In the approach below you get the default format of the md5sum tool, "SUM NAME", one per line for each file found. By using the double redirect (>>) each call will append to the bottom of the output file, sums.txt
#!/bin/bash
for f in *.gz; do
md5sum "$f" >> sums.txt
done
The above is illustrative only, you should probably check for the pre-existence of the output file, deal with errors etc.
There's lots of ways of doing this, so it all depends on further requirements. Must the format be of the form you state, must it recurse directories etc.?

Generate file names for proper sequential sorting under shell globing

I was generating a sequence of png images in my program, which files I was planning to get is passed through some tool that converts them to a video file. I am generating files one by one, in the proper sequence that I want them. I want to name them in such a way that the subsequent video conversion tool will take them in proper sequence under the file name globbing used by the shell ( I am using bash with Linux.). I tried adding a numeric sequence like 'scene1.png, scene10.png, scene12.png, but the shell doesn't sort globs numerically. I could pass a sorted list like this:
convert -antialias -delay 1x10 $(ls povs/*.png | sort -V) mymovie.mp4
But some programs do their own globbing and don't use shells globbing ( like FFmpeg), and so this approach does not always work. so I am looking for a scheme of naming files that are guaranteed to be in sequence as per shell globbing rules.
You may prefix your files with a zero padded integer.
This script emulates what ls * should output after renaming :
$ for i in {1..12};do
$ printf '%05d_%s\n' ${i} file${i}
$ done;
00000_file0
00001_file1
00002_file2
00003_file3
00004_file4
00005_file5
00006_file6
00007_file7
00008_file8
00009_file9
00010_file10
00011_file11

how to use do loop to read several files with similar names in shell script

I have several files named scale1.dat, scale2.dat scale3.dat ... up to scale9.dat.
I want to read these files in do loop one by one and with each file I want to do some manipulation (I want to write the 1st column of each scale*.dat file to scale*.txt).
So my question is, is there a way to read files with similar names. Thanks.
The regular syntax for this is
for file in scale*.dat; do
awk '{print $1}' "$file" >"${file%.dat}.txt"
done
The asterisk * matches any text or no text; if you want to constrain to just single non-zero digits, you could say for file in scale[1-9].dat instead.
In Bash, there is a non-standard additional glob syntax scale{1..9}.dat but this is Bash-only, and so will not work in #!/bin/sh scripts. (Your question has both sh and bash so it's not clear which you require. Your comment that the Bash syntax is not working for you suggests that you may need a POSIX portable solution.) Furthermore, Bash has something called extended globbing, which allows for quite elaborate pattern matching. See also http://mywiki.wooledge.org/glob
For a simple task like this, you don't really need the shell at all, though.
awk 'FNR==1 { if (f) close (f); f=FILENAME; sub(/\.dat/, ".txt", f); }
{ print $1 >f }' scale[1-9]*.dat
(Okay, maybe that's slightly intimidating for a first-timer. But the basic point is that you will often find that the commands you want to use will happily work on multiple files, and so you don't need shell loops at all in those cases.)
I don't think so. Similar names or not, you will have to iterate through all your files (perhaps with a for loop) and use a nested loop to iterate through lines or words or whatever you plan to read from those files.
Alternatively, you can copy your files into one (say, scale-all.dat) and read that single file.

Find required files by pattern and the change the pattern on Linux

I need to find all *.xml files that matched by pattern on Linux. I need to have written the file name on the screen and then change the pattern in the file just was found.
For instance.
I can start the script with arguments for keyword and for value, i.e
script.sh keyword "another word"
Script should find all files with keyword and do the following changes in the files containing keyword.
<keyword></keyword> should be the same <keyword></keyword>
<keyword>some word</keyword> should be like this <keyword>some word, another word</keyword>
In other words if initially value in keyword node was empty, then I don't need to change it and if it contains some value then I need to extend it with the value I will specify.
What is best way to do this on Linux? Using find, grep, sed?
Performance is also important since the number of files are thousands.
Thank you.
It seems using a combination of find, grep and sed would do this and they are pretty fast since you'll be doing text processing so there might not be a need for xml processing but if you could you give an example or rephrase your question I might be able to provide more help.

Resources