How to search for text files containing a text string on a linux system - linux

I want to collect and move all text files that contain the string “MEDIUM” that are present in my subdirectories on a linux system, to a new folder called MEDIUM_files. I am able to collect all files containing MEDIUM by using
ls *MEDIUM*
but I only want the text files.
All the files contain MEDIUM, but they also differ in number. For example the file name contain different numbers at the end such as "MEDIUM_30_1.txt" or through "MEDIUM_1850_20.txt"
How can I specify a file type as well as containing a string?

find . -type f | grep MEDIUM | grep '\.txt$' | xargs -I{} mv {} MEDIUM_files

Related

How do I get all files with .md extension, in all subdirectories, that contain a phrase? [duplicate]

This question already has answers here:
Linux search text string from .bz2 files recursively in subdirectories
(4 answers)
Closed 6 months ago.
I have a parent folder named 'dev', and inside it are all my project folders. The ReadMe files of these projects contain the app type "type: game", for example. What I would like to do is to:
search through all subdirectories of the dev folder to find all the files with *.md" extension
then return the names of those directories which contain a .md files with containing the phrase "game"
I've tried piping find into grep like so:
find -type f -name "*.md" | grep -ril "type: game"
But it just returns the names of files from all subdirectories which contain the phrase "game" in any file.
find . -type f -name "*.md" -print0 | xargs -0 grep -il "type: game" | sed -e 's/[^\/]*$//'
This finds any files in the current directory and sub-directories with names ending with .md, then greps for files containing the string. We then use sed to trim the filename leaving only the directories containing a file ending in .md with the "type: game" inside.

Create a directory list printer

On Windows there are many programs that can recursively print a detailed list of directory contents. I haven't found any for Linux and so I'm trying to create a script that does so.
This is what I'm going for:
For each file print full path name(Tab)size in Mb(Tab)file extension
If there are several directories, skip a line for each different directory traversed.
For the directory name, print the directory name and leave blank-spaced tabs for extension and size.
Sample output is as follows:
Path and Name Size MiB Extension
C:\Users\xxx\Desktop\beers\
C:\Users\xxx\Desktop\beer1\- random name.pdf 5.11 pdf
C:\Users\xxx\Desktop\beer1\- random name2.djvu 5.11 djvu
C:\Users\xxx\Desktop\beer2\
C:\Users\xxx\Desktop\beer2\- random name.mp4 253.91 mp4
Based on a user comment and some research, I have:
ls -R -lh /mnt/folder300/ | cut -d' ' -f 5- > folder300.txt
With this, I intend to take the output of ls -R -lh and omit the first 4 fields.
But, I notice this clips text for example on nested directories. What am I doing wrong?
This bash command give size and full path
find ~+ -type f -maxdepth 100 -exec du -bh {} ;

How to grep/find for a list of file names?

So for example, I have a text document of a list of file names I may have in a directory. I want to grep or use find to find out if those file names exist in a specific directory and the subdirectories within it. Current I can do it manually via find . | grep filename but that's one at a time and when I have over 100 file names I need to check to see if I have them or not that can be really pesky and time-consuming.
What's the best way to go about this?
xargs is what you want here. The case is following:
Assume you have a file named filenames.txt that contains a list of files
a.file
b.file
c.file
d.file
e.file
and only e.file doesn't exist.
the command in terminal is:
cat filenames.txt | xargs -I {} find . -type f -name {}
the output of this command is:
a.file
b.file
c.file
d.file
Maybe this is helpful.
If the files didn't move, since the last time, updatedb ran, often < 24h, your fastest search is by locate.
Read the filelist into an array and search by locate. In case the filenames are common (or occur as a part of other files), grep them by the base dir, where to find them:
< file.lst mapfile filearr
locate ${filearr[#]} | grep /path/where/to/find
If the file names may contain whitespace or characters, which might get interpreted by the bash, the usual masking mechanisms have to been taken.
A friend had helped me figure it out via find . | grep -i -Ff filenames.txt

Batch copy files from text list of file names in linux

I have a list of images in a text file in the following format:
abc.jpg
xyz.jpg
The list contains about a hundred images in various directories within a specific directory. I would like to create a shell script that finds and copies these files into a specified directory.
Can this script be adapted to what I need?
Copy list of file names from multiple directories
you do not need a script for this, a simple oneliner will do:
(assuming, that the full filepath, or the relative filepath to were you executing the command is written in your_file.txt file for every image)
cat your_file.txt | xargs find path_to_root_dir -name | xargs -I{} cp {} specfic_directory/
xargs will take multiple lines and runs the command you give it with the input of every line with -I you can specify a variable, for where the content of the line is placed in the command (default is at the end).
So this will take every line from you file, search the file in all subdirectories of path_to_root_dir and then does a copy.

Finding and Listing Duplicate Words in a Plain Text file

I have a rather large file that I am trying to make sense of.
I generated a list of my entire directory structure that contains a lot of files using the du -ah command.
The result basically lists all the folders under a specific folder and the consequent files inside the folder in plain text format.
eg:
4.0G ./REEL_02/SCANS/200113/001/Promise Pegasus/BMB 10/RED EPIC DATA/R3D/18-09-12/CAM B/B119_0918NO/B119_0918NO.RDM/B119_C004_0918XJ.RDC/B119_C004_0918XJ_003.R3D
3.1G ./REEL_02/SCANS/200113/001/Promise Pegasus/BMB 10/RED EPIC DATA/R3D/18-09-12/CAM B/B119_0918NO/B119_0918NO.RDM/B119_C004_0918XJ.RDC/B119_C004_0918XJ_004.R3D
15G ./REEL_02/SCANS/200113/001/Promise Pegasus/BMB 10/RED EPIC DATA/R3D/18-09-12/CAM B/B119_0918NO/B119_0918NO.RDM/B119_C004_0918XJ.RDC
Is there any command that I can run or utility that I can use that will help me identify if there is more than one record of the same filename (usually the last 16 characters in each line + extension) and if such duplicate entries exist, to write out the entire path (full line) to a different text file so i can find and move out duplicate files from my NAS, using a script or something.
Please let me know as this is incredibly stressful to do when the plaintext file itself is 5.2Mb :)
Split each line on /, get the last item (cut cannot do it, so revert each line and take the first one), then sort and run uniq with -d which shows duplicates.
rev FILE | cut -f1 -d/ | rev | sort | uniq -d
I'm not entirely sure what you want to achieve here, but I have the feeling that you are doing it in a difficult way anyway :) Your text file seems to contain spaces in files which make it hard to parse.
I take it that you want to find all files whose name is duplicate. I would start with something like:
find DIR -type f -printf '%f\n' | uniq -d
That means
DIR - look for files in this directory
'-type f' - print only files (not directories or other special files)
-printf '%f' - do not use default find output format, print only file name of each file
uniq -d - print only lines which occur multiple times
You may want to list only some files, not all of them. You can limit which files are taken into account by more rules to find. If you care only about *.R3D and *.RDC files you can use
find . \( -name '*.RDC' -o -name '*.R3D' \) -type f -printf '%f\n' | ...
If I wrongly guessed what you need, sorry :)
I think you are looking for fslint: http://www.pixelbeat.org/fslint/
It can find duplicate files, broken links, and stuff like that.
The following will scan the current subdirectory (using find) and print the full path to duplicate files. You can adapt it take a different action, e.g. delete/move the duplicate files.
while IFS="|" read FNAME LINE; do
# FNAME contains the filename (without dir), LINE contains the full path
if [ "$PREV" != "$FNAME" ]; then
PREV="$FNAME" # new filename found. store
else
echo "Duplicate : $LINE" # duplicate filename. Do something with it
fi
done < <(find . -type f -printf "%f|%p\n" | sort -s)
To try it out, simply copy paste that into a bash shell or save it as a script.
Note that:
due to the sort, the list of files will have to be loaded into memory before the loop begins so the performance will be affected by the number of files returned
the order the files appears after a sort will affect which files are treated as duplicates since the first occurence is assumed to be the original. The -s options ensures a stable sort, which means the order will be dictated by find.
A more straight-forward by less robust robust approach would be something along the lines of:
find . -type f -printf "%20f %p\n" | sort | uniq -D -w20 | cut -c 22-
That will print all files that have duplicate entries, assuming that the longest filename will be 30 characters long. The output differs from the solution above in all entries with the same name are listed (not N-1 entries as above).
You'll need to change the numbers in the find, uniq and cut commands to match the actual case. A number too small may result in false positives.
find . -type f -printf "%20f %p\n" | sort | uniq -D -w20 | cut -c 22-
---------------------------------- ---- ------------ ----------
| | | |
Find all files in current dir | | |
and subdirs and print out | print out all |
the filename (padded to 20 | entries that |
characters) followed by the | have duplicates |
full path | but only look at |
| the first 20 chars |
| |
Sort the output Discard the first
21 chars of each line

Resources