Merge Files and Prepend Filename and Directory - linux

I need to merge files in a directory and include the directory, filename, and line number in each line of the output. I've found many helpful posts about including the filename and line number but not the directory name. Grep -n gets line numbers and I've seen some find commands that get some of the other parts but I can't seem to pull them all together. (I'm using Ubuntu for all of the data processing.)
Imagine two files in directory named "8". (Each directory in the data I have is a number. The data were provided that way.)
file1.txt
JohnPaulGeorgeRingo
file2.txt
MickKeefBillBrianCharlie
The output should look like this:
8:file1.txt:1:John8:file1.txt:2:Paul8:file1.txt:3:George8:file1.txt:4:Ringo8:file2.txt:1:Mick8:file2.txt:2:Keef8:file2.txt:3:Bill8:file2.txt:4:Brian8:file2.txt:5:Charlie
The separators don't have to be colons. Tabs would work just fine.
Thanks much!

If it's just one directory level deep you could try something like so. We go into each directory, print each line with its number and then append the directory name to the front with sed:
$ for x in `ls`; do
(cd $x ; grep -n . *) | sed -e 's/^/'$x:'/g'
done
1:c.txt:2:B
1:c.txt:3:C
2:a.txt:1:A
2:a.txt:2:B

Related

Linux help: how do you move files to folders based on a .txt file?

I have a .txt file containing a column of IDs and their ages (as integers). I've already created separate folders in my directory for each age category (ranging from 20-86). For every ID in my .txt file I would like to move their image (which is currently stored in the folder "data") to the appropriate folder, based on their age category listed in column two of my .txt file.
Any help on how to do this in Linux would be really appreciated!
Updated example with files ending in different suffixes.
Current working directory:
data/ 20/ 21/ 22/ 23/ 24/ ...
text file:
ID001 21
ID002 23
ID003 20
ID004 22
ID005 21
ls data/
ID001-XXX-2125.jpg
ID002-YYY-2370.jpg
ID003-XXX-2125.jpg
ID004-YYY-2370.jpg
ID005-XXX-2125.jpg
Desired output:
20/
ID003-XXX-2125.jpg
21/
ID001-XXX-2125.jpg
ID005-XXX-2370.jpg
22/
ID004-YYY-2370.jpg
23/
ID002-YYY-2370.jpg
As you suggest, awk can do this kind of task (though see Ed Morton's remark below). You may try the following, which is tested with GNU awk. From your working directory you can do:
awk '{system("mv data/"$1"*.jpg " $2)}' inputfile
Explanation: Here the system() function is used. The system() function allows you to execute a command supplied as an expression. In this case:
We use the mv (move) command and use the first field $1 in the input file to address the JPG file in the data directory.
Then we use the second field $2 of the input file for the destination directory.
The system() function is modeled after the standard C library function. Further reading in The GNU Awk User’s Guide
while read fil id
do
mv -f "data/"*"$fil"*".jpg" "$id/"
done < file
Read the two fields from the file (called file in this case) in a loop and use the variables to construct and execute the mv command.
Consider having your .txt file is in your current working directory. Can you try this written and tested script?
#!/bin/sh
DIR_CWD="/path/to/current_working_directory"
cd "$DIR_CWD/data"
for x in *; do
ID_number=`echo $x | awk -F"-" '{print $1}'`
DIR_age=`cat "$DIR_CWD/file.txt" | grep $ID_number | awk '{print $2}'`
mv -- "$DIR_CWD/data/$x" "$DIR_CWD/$DIR_age"
done
Note that DIR_CWD must be stated as the path of your current working directory.

Search multiple strings from file in multiple files in specific column and output the count in unix shell scripting

I have searched extensively on the internet about this but haven't found much details.
Problem Description:
I am using aix server.
I have a pattern.txt file that contains customer_id for 100 customers in the following sample format:
160471231
765082023
75635713
797649756
8011688321
803056646
I have a directory (/home/aswin/temp) with several files (1.txt, 2.txt, 3.txt and so on) which are pipe(|) delimited. Sample format:
797649756|1001|123270361|797649756|O|2017-09-04 23:59:59|10|123769473
803056646|1001|123345418|1237330|O|1999-02-13 00:00:00|4|1235092
64600123|1001|123885297|1239127|O|2001-08-19 00:00:00|10|1233872
75635713|1001|123644701|75635713|C|2006-11-30 00:00:00|11|12355753
424346821|1001|123471924|12329388|O|1988-05-04 00:00:00|15|123351096
427253285|1001|123179704|12358099|C|2012-05-10 18:00:00|7|12352893
What I need to do search all the strings from pattern.txt file in all files in the directory, in first column of each file and list each filename with number of matches. so if same row has more than 1 match it should be counted as 1.
So the output should be something like (only the matches in first column should count):
1.txt:4
2.txt:3
3.txt:2
4.txt:5
What I have done till now:
cd /home/aswin/temp
grep -srcFf ./pattern.txt * /dev/null >> logfile.txt
This is giving the output in the desired format, but it searching the strings in all columns and not just first column. So the output count is much more than expected.
Please help.
If you want to do that with grep, you must change the pattern.
With your command, you search for pattern in /dev/null and the output is /dev/null:0
I think you want 2>/dev/null but this is not needed because you tell -s to grep.
Your pattern file is in the same directory so grep search in it and output pattern.txt:6
All your files are in the same directory so the -r is not needed.
You put the logfile in the same directory, so the second time you run the command grep search in it and output logfile.txt:0
If you can modify the pattern file, you write each line like ^765082023|
and you rename this file without .txt
So this command give you what you look for.
grep -scf pattern *.txt >>logfile
If you can't modify the pattern file, you can use awk.
awk -F'|' '
NR==FNR{a[$0];next}
FILENAME=="pattern.txt"{next}
$1 in a {b[FILENAME]++}
END{for(i in b){print i,":",b[i]}}
' pattern.txt *.txt >>logfile.txt

Creating a file by merging two files

I would like to merge two files and create a new file using Linux command.
I have the two files named as a1b.txt and a1c.txt
Content of a1b.txt
Hi,Hi,Hi
How,are,you
Content of a1c.txt
Hadoop|are|world
Data|Big|God
And I need a new file called merged.txt with the below content(expected output)
Hi,Hi,Hi
How,are,you
Hadoop|are|world
Data|Big|God
To achieve that in terminal I am running the below command,but it gives me output like below
Hi,Hi,Hi
How,are,youHadoop|are|world
Data|Big|God
cat /home/cloudera/inputfiles/a1* > merged.txt
Could somebody help on getting the expected ouput
Probably your files do not have newline characters. Here is how to put the newline character to them.
$ sed -i -e '$a\' /home/cloudera/inputfiles/a1*
$ cat /home/cloudera/inputfiles/a1* > merged.txt
If you are allowed to be destructive (not have to keep the original two files unmodified) then:
robert#debian:/tmp$ cat fileB.txt >> fileA.txt
robert#debian:/tmp$ cat fileA.txt
this is file A
This is file B.

Launching program several times

I am using Mac Os. This is command line code to lauch my programm (two parts)
nucmer --mum file1.txt file2.txt
show-snps -Clr -x 2 out.delta > out_file1.snps
First part of the programm creates file out.delta. My file2.txt is always the same, but I want to launch this both parts 35000 times whith different file1.txt. All the file1s are located in the same directory.
Is it possible to do it using BASH?
Keep all the input files in a directory. Create a wrapper script to invoke nucmer script and then show-snps script. Your wrapper script will accept path to file directory as input. Iterate over all files in the directory and call your two scripts.
You could do something along these lines:
find . -maxdepth 1 -type f -print | grep -v './out_' | while read f
do
b=$(basename ${f})
nucmer --mum ${f} file2.txt
show-snps -Clr -x 2 out.delta > out_${b}.snps
done
The find bit finds all files in the current directory. grep filters out any previous output files, in case you've run some previously. The basename line strips off the leading ./ and trailing extension, and then your two programs get run with the input file name and an output filename based on the basename output.
If you don't get an argument list too long error, you could just use for:
for f in file*.txt; do nucmer --mum $f second.txt; show-snps -Clr -x 2 out.delta > out_${f%.txt}.snps; done

Comparing part of a filename from a text file to filenames from a directory (grep + awk)

This is not exactly the easiest one to explain in a title.
I have a file inputfile.txt that contains parts of filenames:
file1.abc
filed.def
fileq.lmn
This file is an input file that I need to use to find the full filenames of an actual directory. The ends of the filenames are different from case to case, but part of them is always the same.
I figured that I could grep text from the input file to the ls command in said directory (or the ls command to a simple text file), and then use awk to output my full desired result, but I'm having some trouble doing that.
file1.abc is read from the input file inputfile.txt
It's checked against the directory contents.
If the file exists, specific directories based on the filename are created.
(I'm also in a Busybox environment.. I don't have a lot at my disposal)
Something like this...
cat lscommandoutput.txt \
| awk -F: '{print("mkdir" system("grep $0"); inputfile.txt}' \
| /bin/sh
Thank you.
Edit: My apologies for not being clear on this.
The output should be the full filename of each line found in lscommandoutput.txt using the inputfile.txt to grep those specific lines.
If inputfile.txt contains:
file1.abc
filed.def
fileq.lmn
and lscommandoutput.txt contains:
file0.oba.ca-1.fil
file1.abc.de-1.fil
filed.def.com-2.fil
fileh.jkl.open-1.fil
fileq.lmn.he-2.fil
The extra lines that aren't contained in the inputfile.txt are ignored. The ones that are in the inputfile.txt have a directory created for them with the name that got grepped from lscommandoutput.txt.
/dir/dir2/file1.abc.de-1.fil/ <-- directory in which files can be placed in
/dir/dir2/filed.def.com-2.fil/
/dir/dir2/fileq.lmn.he-2.fil/
Hopefully that is a little bit clearer.
First, you win a useless use of cat award
Secondly, you've explained this really badly. If you can't describe the problem clearly in plain English it's not surprising you are having trouble turning it into a script or set of commands.
grep -f is a good way to get the directory names, but I don't understand what you want to do with them afterwards.
My problem now is using the outputted file with the one file I want to put the folders
Wut? What does "the one file I want to put the folders" mean? Where does the file come from? Is it the file named in inputlist.txt? Does it go in the directory that it matched?
If you just want to create the directories you can do:
fgrep -f ./inputfile.txt ./lscommandoutput.txt | xargs mkdir
N.B. you probably want fgrep so that the input strings aren't treated as regular expressions and regex metacharacters such as . are ignored.

Resources