How to find the particular text stored in the file "data.txt" and it occurs only once - linux

The line I seek is stored in the file data.txt and is the only line of text that occurs only once.
How do I go about finding that particular line using linux?

This is a little bit old, but I think you are looking for this...
cat data.txt | sort | uniq -u
This will show the unique values that only occur once in the file. I assume you are familiar with "over the wire" if you are asking?? If so, this is what you are looking for.

To provide some context (I need more rep to comment) this is a question that features in an online "wargame" called Bandit that involves using the command line to discover passwords on an online Linux server to advance up the levels.
For those who would like to see data.txt in full I've Pastebin'd it here however it looks like this:
NN4e37KW2tkIb3dC9ZHyOPdq1FqZwq9h
jpEYciZvDIs6MLPhYoOGWQHNIoQZzE5q
3rpovhi1CyT7RUTunW30goGek5Q5Fu66
JOaWd4uAPii4Jc19AP2McmBNRzBYDAkO
JOaWd4uAPii4Jc19AP2McmBNRzBYDAkO
9WV67QT4uZZK7JHwmOH0jnhurJMwoGZU
a2GjmWtTe3tTM0ARl7TQwraPGXgfkH4f
7yJ8imXc7NNiovDuAl1ZC6xb0O0mMBx1
UsvVyFSfZZWbi6wgC7dAFyFuR6jQQUhR
FcOJhZkHlnwqcD8QbvjRyn886rCrnWZ7
E3ugYDa6Wh2y8C8xQev7vOS8O3OgG1Hw
E3ugYDa6Wh2y8C8xQev7vOS8O3OgG1Hw
ME7nnzbId4W3dajsl6Xtviyl5uhmMenv
J5lN3Qe4s7ktiwvcCj9ZHWrAJcUWEhUq
aouHvjzagN8QT2BCMB6e9rlN4ffqZ0Qq
ZRF5dlSuwuVV9TLhHKvPvRDrQ2L5ODfD
9ZjR3NTHue4YR6n4DgG5e0qMQcJjTaiM
QT8Bw9ofH4x3MeRvYAVbYvV1e1zq3Xim
i6A6TL6nqvjCAPvOdXZWjlYgyvqxmB7k
tx7tQ6kgeJnC446CHbiJY7fyRwrwuhrs
One way to do it is to use:
sort data.txt | uniq -u
The sort command is like cat in that it displays the contents of the file however it sorts the file lexicographically by lines (it reorders them alphabetically so that matching ones are together).
The | is a pipe that redirects the output from one command into another.
The uniq command reports or omits repeated lines and by passing it the -u argument we tell it to report only unique lines.
Used together like this, the command will sort data.txt lexicographically by each line, find the unique line and print it back in the terminal for you.

sort -u data.txt | while read line; do if [ $(grep -c $line data.txt) == 1 ] ;then echo $line; fi; done
was mine solution, until I saw here easy one:
sort data.txt | uniq -u

Add more information to you post.
How data.txt look like?
Like this:
11111111
11111111
pass1111
11111111
Or like this
afawfdgd
password
somethin
gelse...
And, do you know the password is in file or you search for not repeat string.
If you know password, use something like this
cat data.txt | grep 'password'
If you don`t know the password and this password is only unique line in file you must create a script.
For example in Python
file = open("data.txt","r")
f = file.read()
for line in f:
if 'pass' in line:
print pass
Of course replace pass with something else.
For example some slice from line.

And one with only one tool in use, awk:
awk '{a[$1]++}END{for(i in a){if(a[i] == 1){print i} }}' data.txt

sort data.txt | uniq -c | grep 1\ ?*
and it will print the only text that occurs only one time
do not forget to put space after the backslash

sort data.txt | uniq -c | grep 1
you will find only one that accures one time

Related

Can grep show output only if the line contain another search string? [duplicate]

I am trying to extract text from a file between a < and a >, but only on a line starting with another specific pattern.
So in a file that looks like:
XXX Something here
XXX Something more here
XXX <\Lines like this are a problem>
ZZZ something <\This is the text I need>
XXX Don't need any of this
I would like to print only the <\This is the text I need>.
If I do
sed -n '/^ZZZ/p' FILENAME
it pulls the correct lines I need to look at, but obviously prints the whole line.
sed -n '/<\/,/>/p' FILENAME prints way too much.
I have looked into grouping and tried
sed -n '/^ZZZ/{/<\/,/>/} FILENAME
but this doesn't seem to work at all.
Any suggestions? They will be much appreciated.
(Apologies for formatting, never posted on here before)
sed -n '/^ZZZ/ { s/^.*\(<.*>\).*$/\1/p }'
If it does not have to be sed and you have a fairly recent grep, you may use grep's option -o as in
grep '^ZZZ' | grep -o '<[^>]*>'
An awk version
awk -F"<|>" '/^ZZZ/ {print "<"$2">"}' file
<\This is the text I need>

renaming files using loop in unix

I have a situation here.
I have lot of files like below in linux
SIPTV_FIPTV_ID00$line_T20141003195717_C0000001000_FWD148_IPV_001.DATaac
SIPTV_FIPTV_ID00$line_T20141003195717_C0000001000_FWD148_IPV_001.DATaag
I want to remove the $line and make a counter from 0001 to 6000 for my 6000 such files in its place.
Also i want to remove the trailer 3 characters after this is done for each file.
After fix file should be like
SIPTV_FIPTV_ID0000001_T20141003195717_C0000001000_FWD148_IPV_001.DAT
SIPTV_FIPTV_ID0000002_T20141003195717_C0000001000_FWD148_IPV_001.DAT
Please help.
With some assumption, I think this should do it:
1. list of the files is in a file named input.txt, one file per line
2. the code is running in the directory the files are in
3. bash is available
awk '{i++;printf "mv \x27"$0"\x27 ";printf "\x27"substr($0,1,16);printf "%05d", i;print substr($0,22,47)"\x27"}' input.txt | bash
from the command prompt give the following command
% echo *.DAT??? | awk '{
old=$0;
sub("\\$line",sprintf("%4.4d",++n));
sub("...$","");
print "mv", old, $1}'
%
and check the output, if it looks OK
% echo *.DAT??? | awk '{
old=$0;
sub("\\$line",sprintf("%4.4d",++n));
sub("...$","");
print "mv", old, $1}' | sh
%
A commentary: echo *.DAT??? is meant to give as input to awk a list of all the filenames that you want to modify, you may want something more articulated if the example names you gave aren't representative of the whole spectrum... regarding the awk script itself, I used sprintf to generate a string with the correct number of zeroes for the replacement of $line, the idiom `"\\$..." with two backslashes to quote the dollar sign is required by gawk and does no harm in mawk, and as a last remark I have to say that in similar cases I prefer to make at least a dry run before passing the commands to the shell...

What is the usage of sorted command?

I have read most of the example comes with sort command. How ever I am not sure what is the usage of sort command in this style?
sort <word> sorted
That would just be two file names, as in
sort file1 file2 file3...
If you pass multiple file names, sort concatenates them and sorts all of them together.
If you're asking how to sort a string with the sort command:
echo "tatoine" | grep -o . | sort | tr -d "\n"
aeinott
because sort operate on lines so you've got to cut the string in multiple lines with one letter on each (grep -o .) and after sorting you just delete the new lines with the tr command.
Are those < and > symbols explicit, or do they indicate a parameter that is to be replaced? If the latter, then you're reading from a file called "word", and writing the sorted data to a file called "sorted".
Are you trying to save the content in a sorted order?
Let's say you have a file name.txt with the following content.
Zoe
John
Amy
Mary
Mark
Peter
You can use the sort commmand "sort name.txt" and the output goes to the console
You can save the output using "sort name.txt -o sortedname.txt"
e.g.
Amy
John
Mark
mary
Peter
Zoe
You can found more option with the command "man sort" and "info sort"
rojomoke was right about the > and < commands. Those are redirection commands.
We usually read the data from standard input (stdin) and output goes to standard output aka the screen (stdout)
< means get the data from somewhere else. e.g. a file.
> means redirect the output to somewhere else e.g. a file.
So for the command above "sort name.txt -o sortedname.txt", I could have written as follow.
sort < name.txt > sortedname.txt
You can read more about the redirection in this wiki entry.
https://en.wikipedia.org/wiki/Redirection_(computing)
Commands like | >> will come in handy down the road.

Linux sort -Help Wanted

I'm stuck in a problem for few days. Here it is maybe u got bigger brains than me!
I got a bunch of CSV files and i want them concatenated into a single .csv file, numeric sorted. Ok, first encountered problem is with the ID (i want to sort unly by ID) name.
eg
sort -f *.csv > output.csv This would work if i had standard ids like id001, id002, id010, id100
but my ids are like id1, id2, id10, id100 and this make my sort job inaccurate.
Ok
sort -t, -V *.csv > output.csv - This works perfectly on my test machine (sort --version GNU coreutils 8.5.0) but my live machine from work got 5.3.0 sort version (and they didn't had implemented -V syntax on it) and i cannot update it!
I'm feel so noob and unlucky
If you have a better idea please bring it on.
my csv file looks like
cn41 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn42 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn43 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn44 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn45 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
This is actually copy / paste from a csv. So let's say, this is my first CSV. and the other one looks like
cn201 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn202 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn203 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn204 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn205 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
Looking forward reading you!
Regards
You can use the -kX.Y for column X starting on Y character, together with -n for numeric:
sort -t, -k2.3 -n *csv
Given your sample file, it produces:
$ sort -t, -k2.3 -n file
,id1,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id2,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id10,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id40,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id101,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id201,aaaaaaaaa,bbbbbbbbbb,ccccccccccc,ddddddd
Update
For your given input, I would do:
$ cat *csv | sort -k1.3 -n
cn41 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn42 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn43 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn44 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn45 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
cn201 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn202 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn203 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn204 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn205 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
If your CSV format is fixed, you can use the shell equivalent of the decorate-sort-undecorate pattern:
cat *.csv | sed 's/^,id//' | sort -n | sed 's/^/,id/' >output.csv
The -n option is present even in ancient version of sort.
UPDATE: the updated input contains a number with a different prefix, and at a different position in the line. Here is a version that handles both kinds of input, as well as other inputs that have a number somewhere in the line, sorting by the first number:
cat *.csv | sed 's/^\([^0-9]*\)\([0-9][0-9]*\)/\2 \1\2/' \
| sort -n \
| sed 's/^[^ ]* //' > output.csv
You could try the -g option:
sort -t, -k 2.3 -g fileName
-t seperator
-k key/column
-g general numeric sort

Count the number of occurrences in a string. Linux

Okay so what I am trying to figure out is how do I count the number of periods in a string and then cut everything up to that point but minus 2. Meaning like this:
string="aaa.bbb.ccc.ddd.google.com"
number_of_periods="5"
number_of_periods=`expr $number_of_periods-2`
string=`echo $string | cut -d"." -f$number_of_periods`
echo $string
result: "aaa.bbb.ccc.ddd"
The way that I was thinking of doing it was sending the string to a text file and then just greping for the number of times like this:
grep -c "." infile
The reason I don't want to do that is because I want to avoid creating another text file for I do not have permission to do so. It would also be simpler for the code I am trying to build right now.
EDIT
I don't think I made it clear but I want to make finding the number of periods more dynamic because the address I will be looking at will change as the script moves forward.
If you don't need to count the dots, but just remove the penultimate dot and everything afterwards, you can use Bash's built-in string manuipulation.
${string%substring}
Deletes shortest match of $substring from back of $string.
Example:
$ string="aaa.bbb.ccc.ddd.google.com"
$ echo ${string%.*.*}
aaa.bbb.ccc.ddd
Nice and simple and no need for sed, awk or cut!
What about this:
echo "aaa.bbb.ccc.ddd.google.com"|awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
(further shortened by helpful comment from #steve)
gives:
aaa.bbb.ccc.ddd
The awk command:
awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
works by separating the input line into fields (FS) by ., then joining them as output (OFS) with ., but the number of fields (NF) has been reduced by 2. The final 1 in the command is responsible for the print.
This will reduce a given input line by eliminating the last two period separated items.
This approach is "shell-agnostic" :)
Perhaps this will help:
#!/bin/sh
input="aaa.bbb.ccc.ddd.google.com"
number_of_fields=$(echo $input | tr "." "\n" | wc -l)
interesting_fields=$(($number_of_fields-2))
echo $input | cut -d. -f-${interesting_fields}
grep -o "\." <<<"aaa.bbb.ccc.ddd.google.com" | wc -l
5

Resources