Search for lines in a file that contain de lines of a second file - linux

So I have a first file with a ID in each line, for example:
458-12-345
466-44-3-223
578-4-58-1
599-478
854-52658
955-12-32
Then I have a second file. It has a ID in each file followed by information, for example:
111-2457-1 0.2545 0.5484 0.6914 0.4222
112-4844-487 0.7475 0.4749 0.1114 0.8413
115-44-48-5 0.4464 0.8894 0.1140 0.1044
....
The first file only has 1000 lines, with the IDs of the info I need, while the second file has more than 200,000 lines.
I used the following bash command in a fedora with good results:
cat file1.txt | while read line; do cat file2.txt | egrep "^$line\ "; done > file3.txt
However I'm now trying to replicate the results in Ubuntu, and the output is a blank file. Is there a reason for this not to work in Ubuntu?
Thanks!

You can grep for several strings at once:
grep -f id_file data_file
Assuming that id_file contains all the IDs and data_file contains the IDs and data.

Typical job for awk:
awk 'FNR==NR{i[$1]=1;next} i[$1]{print}' file1 file2
This will print the lines from the second file that have an index in the first one. For even more speed, use mawk.

this line works fine for me in Ubuntu:
cat 1.txt | while read line; do cat 2.txt | grep "$line"; done
However, this may be slow as the second file (200000 lines) will be grepped 1000 times (number of lines in the first file)

Related

Automate and looping through batch script

I'm new to batch. I want iterate through a list and use the output content to replace a string in another file.
ls -l somefile | grep .txt | awk 'print $4}' | while read file
do
toreplace="/Team/$file"
sed 's/dataFile/"$toreplace"/$file/ file2 > /tmp/test.txt
done
When I run the code I get the error
sed: 1: "s/dataFile/"$torepla ...": bad flag in substitute command: '$'
Example of somefile with which has list of files paths
foo/name/xxx/2020-01-01.txt
foo/name/xxx/2020-01-02.txt
foo/name/xxx/2020-01-03.txt
However, my desired output is to use the list of file paths in somefile directory to replace a string in another file2 content. Something like this:
This is the directory of locations where data from /Team/foo/name/xxx/2020-01-01.txt ............
I'm not sure if I understand your desired outcome, but hopefully this will help you to figure out your problem:
You have three files in a directory:
TEAM/foo/name/xxx/2020-01-02.txt
TEAM/foo/name/xxx/2020-01-03.txt
TEAM/foo/name/xxx/2020-01-01.txt
And you have another file called to_be_changed.txt which contains the text This is the directory of locations where data from TO_BE_REPLACED ............ and you want to grab the filenames of your three files and insert them into your to_be_changed.txt file, you can do it with:
while read file
do
filename="$file"
sed "s/TO_BE_REPLACED/${filename##*/}/g" to_be_changed.txt >> changed.txt
done < <(find ./TEAM/ -name "*.txt")
And you will then have made a file called changed.txt which contains:
This is the directory of locations where data from 2020-01-02.txt ............
This is the directory of locations where data from 2020-01-03.txt ............
This is the directory of locations where data from 2020-01-01.txt ............
Is this what you're trying to achieve? If you need further clarification I'm happy to edit this answer to provide more details/explanation.
ls -l somefile | grep .txt | awk 'print $4}' | while read file
No. No, no, nono.
ls -l somefile is only going to show somefile unless it's a directory.
(Don't name a directory "somefile".)
If you mean somefile.txt, please clarify in your post.
grep .txt is going to look through the lines presented for the three characters txt preceded by any character (the dot is a regex wildcard). Since you asked for a long listing of somefile it shouldn't find any, so nothing should be passed along.
awk 'print $4}' is a typo which won't compile. awk will crash.
Keep it simple. What I suspect you meant was
for file in *.txt
Then in
toreplace="/Team/$file"
sed 's/dataFile/"$toreplace"/$file/ file2 > /tmp/test.txt
it's unlear what you expect $file to be - awk's $4 from an ls -l seems unlikely.
Assuming it's the filenames from the for above, then try
sed "s,dataFile,/Team/$file," file2 > /tmp/test.txt
Does that help? Correct me as needed. Sorry if I seem harsh.
Welcome to SO. ;)

Want to append records in two file using shell script

My first input file contains records name abc.txt:
abc#gmail.com
bscd#yahoo.co.in
abcd.21#gmail.com
1234#hotmail.com
My second file contains record name details.txt:
123456^atulsample^1203320
I want my final file having output to be Final.txt:
abc#gmail.com^123456^atulsample^1203320
bscd#yahoo.co.in^123456^atulsample^1203320
abcd.21#gmail.com^123456^atulsample^1203320
I have uses sed command but I am not getting my required output.
Kindly help as I don't have much knowledge in shell scripting.
try something like this;
#!/bin/bash
while read -r line
do
detail="$line"
sed '/^[ \t]*$/d' abc.txt | sed "s/$/^${detail}/" >> Final.txt
done < "details.txt"
this is to delete blank lines;
sed '/^[ \t]*$/d' abc.txt
this is to append from details.txt
sed "s/$/^${detail}/"

How to copy data from file to another file starting from specific line

I have two files data.txt and results.txt, assuming there are 5 lines in data.txt, I want to copy all these lines and paste them in file results.txt starting from the line number 4.
Here is a sample below:
Data.txt file:
stack
ping
dns
ip
remote
Results.txt file:
# here are some text
# please do not edit these lines
# blah blah..
this is the 4th line that data should go on.
I've tried sed with various combinations but I couldn't make it work, I'm not sure if it fit for that purpose as well.
sed -n '4p' /path/to/file/data.txt > /path/to/file/results.txt
The above code copies line 4 only. That isn't what I'm trying to achieve. As I said above, I need to copy all lines from data.txt and paste them in results.txt but it has to start from line 4 without modifying or overriding the first 3 lines.
Any help is greatly appreciated.
EDIT:
I want to override the copied data starting from line number 4 in
the file results.txt. So, I want to leave the first 3 lines without
modifications and override the rest of the file with the data copied
from data.txt file.
Here's a way that works well from cron. Less chance of losing data or corrupting the file:
# preserve first lines of results
head -3 results.txt > results.TMP
# append new data
cat data.txt >> results.TMP
# rename output file atomically in case of system crash
mv results.TMP results.txt
You can use process substitution to give cat a fifo which it will be able to read from :
cat <(head -3 result.txt) data.txt > result.txt
head -n 3 /path/to/file/results.txt > /path/to/file/results.txt
cat /path/to/file/data.txt >> /path/to/file/results.txt
if you can use awk:
awk 'NR!=FNR || NR<4' Result.txt Data.txt

obtain the line count of all files in a directory

I have 3 files in directory "work" which will be pumped on daily basis.
files are as shown below:
ZNAMI DOWN COND RESULT_17-08-2015.csv
ZNAMI UP CND RESULT_18-08-2015.csv
ZNAMI DOWN COND RESULT_17-08-2015.csv
These files have many rows with just ",,,,,,,,," as input along with actual data.
What I need to perform is as below:
open each file [should be dynamic as everyday , date part changes].
Remove the lines with ",,,,,,,,,".
Get the line count.
I tried wc -l *.csv but it does not give the total count of all lines.
I also tried sed -i ",,,,,,,,,"d *.csv to remove the lines . But it is not working.
Have a try with this:
grep -v ",,,,,,,,," *.csv | wc -l
This will print every line from *.csv file that does not contain ,,,,,,,,, to the standard output. Piping it into wc will yield total count of such lines.
Using awk:
awk '!/,,,,,,,,,/{n++;} END{print n;}' *.csv
This counts every line that does not contain ,,,,,,,,,.
!/,,,,,,,,,/{n++;}
In awk, ! is negation. So, this increments n for every line that does not match ,,,,,,,,,.
END{print n;}
After we have read last line of the last file, print out the value of n.

How to find the particular text stored in the file "data.txt" and it occurs only once

The line I seek is stored in the file data.txt and is the only line of text that occurs only once.
How do I go about finding that particular line using linux?
This is a little bit old, but I think you are looking for this...
cat data.txt | sort | uniq -u
This will show the unique values that only occur once in the file. I assume you are familiar with "over the wire" if you are asking?? If so, this is what you are looking for.
To provide some context (I need more rep to comment) this is a question that features in an online "wargame" called Bandit that involves using the command line to discover passwords on an online Linux server to advance up the levels.
For those who would like to see data.txt in full I've Pastebin'd it here however it looks like this:
NN4e37KW2tkIb3dC9ZHyOPdq1FqZwq9h
jpEYciZvDIs6MLPhYoOGWQHNIoQZzE5q
3rpovhi1CyT7RUTunW30goGek5Q5Fu66
JOaWd4uAPii4Jc19AP2McmBNRzBYDAkO
JOaWd4uAPii4Jc19AP2McmBNRzBYDAkO
9WV67QT4uZZK7JHwmOH0jnhurJMwoGZU
a2GjmWtTe3tTM0ARl7TQwraPGXgfkH4f
7yJ8imXc7NNiovDuAl1ZC6xb0O0mMBx1
UsvVyFSfZZWbi6wgC7dAFyFuR6jQQUhR
FcOJhZkHlnwqcD8QbvjRyn886rCrnWZ7
E3ugYDa6Wh2y8C8xQev7vOS8O3OgG1Hw
E3ugYDa6Wh2y8C8xQev7vOS8O3OgG1Hw
ME7nnzbId4W3dajsl6Xtviyl5uhmMenv
J5lN3Qe4s7ktiwvcCj9ZHWrAJcUWEhUq
aouHvjzagN8QT2BCMB6e9rlN4ffqZ0Qq
ZRF5dlSuwuVV9TLhHKvPvRDrQ2L5ODfD
9ZjR3NTHue4YR6n4DgG5e0qMQcJjTaiM
QT8Bw9ofH4x3MeRvYAVbYvV1e1zq3Xim
i6A6TL6nqvjCAPvOdXZWjlYgyvqxmB7k
tx7tQ6kgeJnC446CHbiJY7fyRwrwuhrs
One way to do it is to use:
sort data.txt | uniq -u
The sort command is like cat in that it displays the contents of the file however it sorts the file lexicographically by lines (it reorders them alphabetically so that matching ones are together).
The | is a pipe that redirects the output from one command into another.
The uniq command reports or omits repeated lines and by passing it the -u argument we tell it to report only unique lines.
Used together like this, the command will sort data.txt lexicographically by each line, find the unique line and print it back in the terminal for you.
sort -u data.txt | while read line; do if [ $(grep -c $line data.txt) == 1 ] ;then echo $line; fi; done
was mine solution, until I saw here easy one:
sort data.txt | uniq -u
Add more information to you post.
How data.txt look like?
Like this:
11111111
11111111
pass1111
11111111
Or like this
afawfdgd
password
somethin
gelse...
And, do you know the password is in file or you search for not repeat string.
If you know password, use something like this
cat data.txt | grep 'password'
If you don`t know the password and this password is only unique line in file you must create a script.
For example in Python
file = open("data.txt","r")
f = file.read()
for line in f:
if 'pass' in line:
print pass
Of course replace pass with something else.
For example some slice from line.
And one with only one tool in use, awk:
awk '{a[$1]++}END{for(i in a){if(a[i] == 1){print i} }}' data.txt
sort data.txt | uniq -c | grep 1\ ?*
and it will print the only text that occurs only one time
do not forget to put space after the backslash
sort data.txt | uniq -c | grep 1
you will find only one that accures one time

Resources