List filenames up to a limit per each text file in Bash - linux

How can I create a number of text files, which include the filenames of the files in a specific directory, up to a maximum of 999 rows per text file?
I started from this:
find ./J0902-405/*.evt -maxdepth 1 -type f -fprintf files_xselect.list %f\\n
And it writes the filenames properly in the textfile.
But afterwords, I need to put the 999 rows limit, and after that limit, create another text file with the following 999 names, and so on, until all the *.evt files are listed.

find ./J0902-405/*.evt -maxdepth 1 -type f | split -l999 -
From the manual page:
NAME
split - split a file into pieces
SYNOPSIS
split [OPTION]... [INPUT [PREFIX]]
DESCRIPTION
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000
lines, and default PREFIX is `x'. With no INPUT, or when INPUT is -, read standard
input.
-l, --lines=NUMBER
put NUMBER lines per output file

#DopeGhoti's answer is the right approach, but let me flesh it out a bit, for those new to split (like me):
find ./J0902-405 -maxdepth 1 -name '*.evt' -type f -printf '%f\n' | \
split -l 999 -d - files_xselect.list.
-name ... with a quoted filename pattern lets find do the pathname expansion (as opposed to the shell - no point in letting both the shell and find do the work).
-printf '%f\n' ensures that only filenames (no path components) are output, as in the OP.
-l 999 specifies the split size in lines; default is 1000.
-d causes numerical suffixes to be used for the output files (00, 01, ...) rather than the default letters (aa, ab, ...) [note: won't work on OSX]; default suffix length is 2; to control the number of digits/chars. in the suffix, use -a {length}.
- causes split to read from stdin - in this case, the output from find.
files_xselect.list. is the output-file prefix; thus, we get files files_xselect.list.00, files_xselect.list.01, ...
If you want more control over the output filename - e.g., to move the suffix data to a different part of the filename - you can use the --filter option (note: won't work on OS X), which accepts a shell command to which the output data for each file is piped, along with variable $FILE containing the name of the respective output filename; this gives you the chance to modify the output filename based on it:
For instance, to create output files named files_xselect.00.list, ... - i.e., to place the suffix data before the filename extension, you'd use:
... | split -l 999 -d --filter='> ${FILE}.list' - 'files_xselect.'

Something like
#!/bin/bash
for file in ./J0902-405/*.evt; do
[[ $i > 999 ]] && i=0 && j=$((j+1))
[[ -f $file ]] && i=$((i+1)) && echo "${file##*/}" >> "fileofnames$j.txt"
done

Related

Bash script that counts and prints out the files that start with a specific letter

How do i print out all the files of the current directory that start with the letter "k" ?Also needs to count this files.
I tried some methods but i only got errors or wrong outputs. Really stuck on this as a newbie in bash.
Try this Shellcheck-clean pure POSIX shell code:
count=0
for file in k*; do
if [ -f "$file" ]; then
printf '%s\n' "$file"
count=$((count+1))
fi
done
printf 'count=%d\n' "$count"
It works correctly (just prints count=0) when run in a directory that contains nothing starting with 'k'.
It doesn't count directories or other non-files (e.g. fifos).
It counts symlinks to files, but not broken symlinks or symlinks to non-files.
It works with 'bash' and 'dash', and should work with any POSIX-compliant shell.
Here is a pure Bash solution.
files=(k*)
printf "%s\n" "${files[#]}"
echo "${#files[#]} files total"
The shell expands the wildcard k* into the array, thus populating it with a list of matching files. We then print out the array's elements, and their count.
The use of an array avoids the various problems with metacharacters in file names (see e.g. https://mywiki.wooledge.org/BashFAQ/020), though the syntax is slightly hard on the eyes.
As remarked by pjh, this will include any matching directories in the count, and fail in odd ways if there are no matches (unless you set nullglob to true). If avoiding directories is important, you basically have to get the directories into a separate array and exclude those.
To repeat what Dominique also said, avoid parsing ls output.
Demo of this and various other candidate solutions:
https://ideone.com/XxwTxB
To start with: never parse the output of the ls command, but use find instead.
As find basically goes through all subdirectories, you might need to limit that, using the -maxdepth switch, use value 1.
In order to count a number of results, you just count the number of lines in your output (in case your output is shown as one piece of output per line, which is the case of the find command). Counting a number of lines is done using the wc -l command.
So, this comes down to the following command:
find ./ -maxdepth 1 -type f -name "k*" | wc -l
Have fun!
This should work as well:
VAR="k"
COUNT=$(ls -p ${VAR}* | grep -v ":" | wc -w)
echo -e "Total number of files: ${COUNT}\n" 1>&2
echo -e "Files,that begin with ${VAR} are:\n$(ls -p ${VAR}* | grep -v ":" )" 1>&2

Shell script how to list file name in ascending order

I am new to linux, writing a bash script below.
The files in the current folder are stored as 1.jpg,2.jpg, and so on, I have to process files sequentially according to their names but in the below loop I get file names is some different order.
for i in ./*.jpg
do
filename=$(basename "$i")
echo "filename is ./$filename"
done
output I get is like this
filename is ./10.jpg
filename is ./11.jpg
filename is ./12.jpg
filename is ./13.jpg
filename is ./14.jpg
filename is ./15.jpg
filename is ./16.jpg
filename is ./17.jpg
filename is ./18.jpg
filename is ./19.jpg
filename is ./1.jpg
filename is ./20.jpg
filename is ./21.jpg
filename is ./22.jpg
filename is ./27.jpg
filename is ./28.jpg
filename is ./29.jpg
filename is ./2.jpg
filename is ./3.jpg
filename is ./4.jpg
filename is ./6.jpg
filename is ./7.jpg
filename is ./8.jpg
filename is ./9.jpg
Any assistance as to how can I process them in the sequence of names 1.jpg, 2.jpg etc
Pathname expansion (glob expansion) returns a list of filenames which is alphabetically sorted according to your current locale. If you have something simple like UTF-8 or C, your sorting order will be ASCII sorted. This is visible in the result of the OP. The file with name 19.jpg is sorted before 1.jpg because the lt;dot>-character has a higher lexicographical order than the character 9.
If you want to traverse your files in a different sorting order, then a different approach needs to be taken.
Under the bold assumption that the OP requests to traverse the files in a numeric sorted way, i.e. order the names according to a number at the beginning of the file-name, you can do the following:
while IFS= read -r -d '' file; do
echo "filename: $file"
done < <(find . -maxdepth 1 -type f -name '*.jpg' -print0 | sort -z -n)
Here we use find to list all files in the current directory (depth==1) we print them with a \0 as a separator, and use sort to ask for the requested sorting, indicating that we have \0 as the field separator. Instead of using a for-loop, we use a while-loop to read the information.
See BashPitFall001 For some details
note: sort -z is a GNU extension
not quite sure if this is what you're asking, but you have the echo inside your loop which will cause it to be printed in a different row each time.
you can do:
list ""
for i in ./*.jpg
do
filename=$(basename "$i")
list="$list $filename"
done
echo "files: $list"
which would output
files: 1.jpg 2.jpg
Nevertheless, You should clarify your question.
From your requirement to "process them in the sequence of names 1.jpg, 2.jpg etc", this will accomplish that. The sort specifies a numeric key obtained by defining the first field as the string before a "." delimiter.
#!/usr/bin/env bash
shopt -s nullglob
allfiles=(*.jpg);
for f in "${allfiles[#]}"
do
echo "$f"
done | sort -t"." -k1n

How to list all the folder in a folder and exclude a specific one

Let's say I have a folder like this:
my_folder
====my_sub_folder_1
====my_sub_folder_2
====my_sub_folder_3
====exclude
I would like a command that return a string like this :
["my_sub_folder_1", "my_dub_folder_2", "my_dub_folder_3"]
(Notice the exclusion of the excude folder)
The best I could is :
ls -dxm */
That return the following.
my_sub_folder_1/, my_dub_folder_2/, my_dub_folder_3/
So I'm still trying to remove the / at the end of each folder, add the [] and the "".
If it's possible I would like to do that in one line so I could diretly put in a shell variable, other wise I will put it in .sh file that will return the string I'm trying to build.
(I don't know if the last part is really possible).
Assuming you are executing the script in the directory where my_folder
belongs, how about:
while IFS= read -r -d "" f; do
ary+=("$f")
done < <(find "my_folder" -maxdepth 1 -mindepth 1 -type d -not -name "exclude" -printf "\"%f\"\0")
(IFS=","; echo "[${ary[*]}]")
[Explanations]
-printf option to find command specifies the output format. The format "\"%f\"\0"
prints the filename (excluding leading directory name) wrapped by
double quotes and followed by a NUL character \0.
The NUL character is used as a filename delimiter and the filenames
are split again in the read builtin by specifying the delimiter
to the NUL character with -d "".
Then the filenames (with double quotes) are stored in the array ary
one by one.
Finally echo "[${ary[*]}]" command prints out the elements of ary
separated by IFS. The whole output are surrounded by the square brackets [].
The last line is surrounded by parens () to be executed in the subprocess.
The purpose is just not to overwrite the current IFS.
If you save the script in my answer as my_script.sh, then you can assign
a variable MY_VAR to the output by saying:
MY_VAR=$(./my_script.sh)
echo "$MY_VAR"
# or another_command "$MY_VAR" or whatever
Alternatively you can assign the variable within the script by modifying
the last line as:
MY_VAR=$(IFS=","; echo "[${ary[*]}]")
echo "$MY_VAR"
Hope this helps.
In bash this can be done as follows, it's close but it doesn't work in one line.
Change the Internal Field Separator to be a new line rather than a space. This allows spaces in directory names to be ignored.
Then perform the following:
List the directories, one per line
Use grep to remove the directory to be excluded
Iterate over the results:
Output the directory name with the last character removed
Pipe everything to xargs to recombine into a single line and store in $var
Trim the last , from ${var} and wrap in '[]'
IFS=$'\n'
var=`for d in \`ls -d1 */ | grep -v exclude_dir \`; do echo '\"'${d::-1}'\",' ; done | xargs`
echo '['${var::-1}']'

How to output difference of files from two folders and save the output with the same name on different folder

I have two folders which have same file names, but different contents. So, I am trying to generate a script to get the difference and to see what is being changed. I wrote a script below :
folder1="/opt/dir1"
folder2=`ls/opt/dir2`
find "$folder1/" /opt/dir2/ -printf '%P\n' | sort | uniq -d
for item in `ls $folder1`
do
if [[ $item == $folder2 ]]; then
diff -r $item $folder2 >> output.txt
fi
done
I believe this script has to work, but it is not giving any output on output folder.
So the desired output should be in one file . Ex:
cat output.txt
diff -r /opt/folder1/file1 /opt/folder2/file1
1387c1387
< ALL X'25' BY SPACE
---
> ALL X'0A' BY SPACE
diff -r /opt/folder1/file2 /opt/folder2/file2
2591c2591
< ALL X'25' BY SPACE
---
> ALL X'0A' BY SPACE
Any help is appreciated!
Ok. So twofold:
First get the files in one folder. Never use ls. Forget it exists. ls is for nice printing in our console. In scripts, use find.
Then for each file do some command. A simple while read loop.
So:
{
# make find print relative to `/opr/dir1` director
cd /opt/dir1 &&
# Use `%P` so that print without leading `./`
find . -mindepth 1 -type f -print "%P\n"
} |
while IFS= read -r file; do
diff /opt/dir1/"$file" /opt/dir2/"$file" >> output/"$file"
done
Notes:
always quote your variable
Why you shouldn't parse the output of ls(1)

Rename the most recent file in each group

i try to create a script that should detect the latest file of each group, and add prefix to its original name.
ll $DIR
asset_10.0.0.1_2017.11.19 #latest
asset_10.0.0.1_2017.10.28
asset_10.0.0.2_2017.10.02 #latest
asset_10.0.0.2_2017.08.15
asset_10.1.0.1_2017.11.10 #latest
...
2 questions:
1) how to find the latest file of each group?
2) how to rename adding only a prefix
I tried the following procedure, but it looks for the latest file in the entire directory, and doesn't keep the original name to add a prefix to it:
find $DIR -type f ! -name 'asset*' -print | sort -n | tail -n 1 | xargs -I '{}' cp -p '{}' $DIR...
What would be the best approach to achieve this? (keeping xargs if possible)
Selecting the latest entry in each group
You can use sort to select only the latest entry in each group:
find . -print0 | sort -r -z | sort -t_ -k2,2 -u -z | xargs ...
First, sort all files in reversed lexicographical order (so that the latest entry appears first for each group). Then, by sorting on group name only (that's second field -k2,2 when split on underscores via -t_) and printing unique groups we get only the first entry per each group, which is also the latest.
Note that this works because sort uses a stable sorting algorithm - meaning the order or already sorted items will not be altered by sorting them again. Also note we can't use uniq here because we can't specify a custom field delimiter for uniq (it's always whitespace).
Copying with prefix
To add prefix to each filename found, we need to split each path find produces to a directory and a filename (basename), because we need to add prefix to filename only. The xargs part above could look like:
... | xargs -0 -I '{}' sh -c 'd="${1%/*}"; f="${1##*/}"; cp -p "$d/$f" "$d/prefix_$f"' _ '{}'
Path splitting is done with shell parameter expansion, namely prefix (${1##*/}) and suffix (${1%/*}) substring removal.
Note the use of NUL-terminated output (paths) in find (-print0 instead of -print), and the accompanying use of -z in sort and -0 in xargs. That way the complete pipeline will properly handle filenames (paths) with "special" characters like newlines and similar.
If you want to do this in bash alone, rather than using external tools like find and sort, you'll need to parse the "fields" in each filename.
Something like this might work:
declare -A o=() # declare an associative array (req bash 4)
for f in asset_*; do # step through the list of files,
IFS=_ read -a a <<<"$f" # assign filename elements to an array
b="${a[0]}_${a[1]}" # define a "base" of the first two elements
if [[ "${a[2]}" > "${o[$b]}" ]]; then # compare the date with the last value
o[$b]="${a[2]}" # for this base and reassign if needed
fi
done
for i in "${!o[#]}"; do # now that we're done, step through results
printf "%s_%s\n" "$i" "${o[$i]}" # and print them.
done
This doesn't exactly sort, it just goes through the list of files and grabs the highest sorting value for each filename base.

Resources