How to copy lines that contain match by findstr - findstr

I have a file (source.txt) which contains such lines, for example:
123 sdf asdfa 342 ololo
asdf ololo sdf sdfa s3
asdf asf ad 34234 1klj
asdf 2342 fgasd34 dlll
ololo sdfsfd asdf342 323
And I want to copy all lines that contain "ololo" to another file (result.txt), so that result.txt file will contain the following lines (the 1st, the 2nd and the 5th):
123 sdf asdfa 342 ololo
asdf ololo sdf sdfa s3
ololo sdfsfd asdf342 323
How could I do this? I write the following command:
findstr "ololo" D:\source.txt >D:\result.txt
but in the result.txt file I get the following output:
ololo
ololo2
ololo3
How can I copy all line with match to result file?

strfind will just find where your required string present, problem with that is when a string is subset of any word it takes comeplete work.
Use the command find(strcmp(ololo, {source.txt})) or strcmpi(not a case sensitive). you will get exact positions where ololo is present in your cell.

Related

Reprocess a text file to consolidate lines containing Carriage Return (CR) characters into on-screen results

Let's say I have a program (e.g. in Perl) that writes to STDOUT something like this:
print "123\t- 456";
print "\r+\n";
On my screen I see the following result:
123 + 456
However, when I redirect the output to a file >output.txt, such file will contain the following text:
123 - 456
+
How can I "reprocess" such text file into the result same as shown on the screen?
The col command will do this with the -b option to replace backspaces with the last character written to a column.
col -b < output.txt

Add a New Line After a Match is Found Using Matched Pattern in Linux

Suppose I have a file with contents like this:
[abc]
123
456
[def]
789
012
I want to insert a new line each time we find a string within a square bracket (here is [abc], [def]), and the added new line is like this:
foo=found_content,
Where found_content is the content within the square bracket. So after we run the command the content of the file should look like this:
[abc]
foo=abc
123
456
[def]
foo=def
789
012
How can we achieve that? Thanks!
This one-liner should give you a hand:
awk -F'[][]' '7;NF==3{print "foo="$2}' file
The idea is, set the [ or ] as FS, for the [...] line, add the new line foo=field2 below it.

Linux command to split each lines in a file based on a character and write only the specified columns to another file

Suppose the input file file.txt is
abc/def/ghi/jkl/mno
pqr/st/u/vwxy/z
bla/123/45678/9
How to split the lines based on the character '/' and write the specified columns (here it is second and fourth) to another file so that the file should look like
def jkl
st vwxy
123 9
You can use perl, for example:
cat file.txt | perl -ne 'chomp(#cols = split("/", $_)); print "#cols[1, 3]\n";' > output

How to remove all Strings containing this?

I am using this code to combine all text files, however it is putting the name of the files with their extension in the final file, I would like to know how to clear this:
Find /v "" *.txt> "Combined.txt"
The file text:
---------- 001.TXT
abc
---------- 002.TXT
blue
123
---------- 003.TXT
abc
---------- 004.TXT
yellow
123
---------- 005.TXT
abc
---------- COMBINED.TXT
Try this:
Find/V "" *.txt|Find/V "---------- ">"Combined.txt"
…it should work fine unless you have a text file name containing ten contiguous dashes followed by a space, (which is much less likely than not having a filename containing TXT).
When I provided you with the Find option yesterday, I provided it as an additional option which also should prepend files with their names. Remember also that I explained that there were options too I you didn't want to include Combined.txt in the results.
You can of course just change the extension then rename it thus:
Find/V "" *.txt|Find/V "---------- ">"Combined.log"&&Ren "Combined.log" "Combined.txt"
I solve my problem with this code:
Find /v "" *.txt> "Combined.txt"
Type Combined.txt | findstr /v TXT > Output.txt
del Combined.txt
ren Output.txt Combined.txt
First line combine all text files.
Second line Search and Delete lines containg the string "TXT" and create new Output.txt file.
Third line delete the Combined.txt file.
Fourth Rename Output.txt to Combined.txt
The result looks like this:
abc
blue
123
abc
yellow
123
abc
It's all :)
del combined.txt
copy *.txt >combined.tmp
ren combined.tmp combined.txt
Remove the first line to include the current "combined.txt" into the final version.

Remove Lines from File which not appear in another File, error

I have two files, similar to the ones below:
File 1 - with phenotype informations, the first column are the individual, the orinal file has 400 rows:
215 2 25 13.8354303 15.2841303
222 2 25.2 15.8507278 17.2994278
216 2 28.2 13.0482192 14.4969192
223 11 15.4 9.2714745 11.6494745
File 2 - with SNPs information, the original file has 400 lines and 42,000 characters per line.
215 20211111201200125201212202220111202005111102
222 20111011212200025002211001111120211015112111
216 20210005201100025210212102210212201005101001
223 20222120201200125202202102210121201005010101
217 20211010202200025201202102210121201005010101
218 02022000252012021022101212010050101012021101
And I need to remove from file 2 individuals that do not appear in the file 1, for example:
215 20211111201200125201212202220111202005111102
222 20111011212200025002211001111120211015112111
216 20210005201100025210212102210212201005101001
223 20222120201200125202202102210121201005010101
I could do this with this code:
awk 'NR==FNR{a[$1]; next}$1 in a{print $0}' file1 file2> file3
However, when I do my main analysis with the generated file the following error appears:
*** Error in `./airemlf90': free(): invalid size: 0x00007f5041cc2010 ***
*** Error in `./postGSf90': free(): invalid size: 0x00007fec4a04f010 ***
airemlf90 and postGSf90 are software. But when I use original file this problem does not occur. Does the command that I made to delete individuals is adequate? Another detail that did not say is that some individuals have identification with 4 characters, can be this the error?
Thanks
I wrote a small python script in a few minutes. Works well, I have tested with 42000-char lines and it works fine.
import sys,re
# rudimentary argument parsing
file1 = sys.argv[1]
file2 = sys.argv[2]
file3 = sys.argv[3]
present = set()
# first read file 1, discard all fields except the first one (the key)
with open(file1,"r") as f1:
for l in f1:
toks = re.split("\s+",l) # same as awk fields
if toks: # robustness against empty lines
present.add(toks[0])
#now read second one and write in third one only if id is in the set
with open(file2,"r") as f2:
with open(file3,"w") as f3:
for l in f2:
toks = re.split("\s+",l)
if toks and toks[0] in present:
f3.write(l)
(First install python if not already present.)
Call my sample script mytool.py and run it like this:
python mytool.py file1.txt file2.txt file3.txt
To process several files at once simply in a bash file (to replace the original solution) it's easy (although not optimal because could be done in a whirl in python)
<whatever the for loop you need>; do
python my_tool.py $1 $2 $3
done
exactly like you would call awk with 3 files.

Resources