I am trying to use grep in Python to search for words in a text file. I tried something like this -
subprocess.call(['/bin/grep', str(word), "textFile.txt"])
This line is printing all output on console. Also, it is returning true even if the word is not matching exactly. For example, it returns a word even for this match -
xxxwordsxxx
def find_words(in_file, out_file):
for word in in_file:
word = word.rstrip()
subprocess.call(["grep", "-w", word, "textFile.txt"])
edit
My in_file and textFile.txt are the same.
How do I implement a search for the exact word? If this is not a correct way, is there any other way I could do this search? (It is a huge text file and I have to find duplicates of all the words in the file)
Try using parameter -w:
import subprocess
word = input("select word to filter: ")
subprocess.call(['/bin/grep', "-w", word, "textFile.txt"]) #str is not needed
You can use the .split() method to iterate over individual words from the line. For example:
string = "My Name Is Josh"
substring = "Name"
for word in string.split():
if substring == word:
print("Match Found")
Related
I have this file that contains these kinds of strings "1405079/1" the only common in them is the "/1" at the end. I want to be able to find those strings and remove them, below is sample code
but it's not doing anything.
with open("jobstat.txt","r") as jobstat:
with open("runjob_output.txt", "w") as runjob_output:
for line in jobstat:
string_to_replace = ' */1'
line = line.replace(string_to_replace, " ")
with open("jobstat.txt","r") as jobstat:
with open("runjob_output.txt", "w") as runjob_output:
for line in jobstat:
string_to_replace ='/1'
line =line.rstrip(string_to_replace)
print(line)
Anytime you have a "pattern" you want to match against, use a regular expression. The pattern here, given the information you've provided, is a string with an arbitrary number of digits followed by /1.
You can use re.sub to match against that pattern, and replace instances of it with another string.
import re
original_string= "some random text with 123456/1, and midd42142/1le of words"
pattern = r"\d*\/1"
replacement = ""
re.sub(pattern, replacement, original_string)
Output:
'some random text with , and middle of words'
Replacing instances of the pattern with something else:
>>> re.sub(pattern, "foo", original_string)
'some random text with foo, and middfoole of words'
I am trying to filter sentences from my pandas data-frame having 50 million records using keyword search. If any words in sentence starts with any of these keywords.
WordsToCheck=['hi','she', 'can']
text_string1="my name is handhit and cannary"
text_string2="she can play!"
If I do something like this:
if any(key in text_string1 for key in WordsToCheck):
print(text_string1)
I get False positive as handhit as hit in the last part of word.
How can I smartly avoid all such False positives from my result set?
Secondly, is there any faster way to do it in python? I am using apply function currently.
I am following this link so that my question is not a duplicate: How to check if a string contains an element from a list in Python
If the case is important you can do something like this:
def any_word_starts_with_one_of(sentence, keywords):
for kw in keywords:
match_words = [word for word in sentence.split(" ") if word.startswith(kw)]
if match_words:
return kw
return None
keywords = ["hi", "she", "can"]
sentences = ["Hi, this is the first sentence", "This is the second"]
for sentence in sentences:
if any_word_starts_with_one_of(sentence, keywords):
print(sentence)
If case is not important replace line 3 with something like this:
match_words = [word for word in sentence.split(" ") if word.lower().startswith(kw.lower())]
I want to print out the entire string if it contains a particular word. for example
a = ['www.facbook.com/xyz','www.google.com/xyz','www.amazon.com/xyz','www.instagram.com/xyz']
if I am looking to find the word amazon then the code should print www.amazon.com/xyz
I have found many examples in which you can find out if a string contains a word but I need to print out the entire string which contains the word.
Try this -
your_list = ['www.facebook.com/xyz', 'www.google.com/xyz', 'www.amazon.com/xyz', 'www.instagram.com/xyz']
word = 'amazon'
res = [x for x in your_list if word in x]
print (*res)
Output:
www.amazon.com/xyz
This works fine if there are only one or two strings containing the word, if there are multiple strings in the list containing that name it will print them in a horizontal line.
It needs to print line by separate line but I do not know how to incorporate this in the code. It would be interesting to see how that looks.
I am making a small project in python that lets you make notes then read them by using specific arguments. I attempted to make an if statement to check if the string has a comma in it, and if it does, than my python file should find the comma then find the character right below that comma and turn it into an integer so it can read out the notes the user created in a specific user-defined range.
If that didn't make sense then basically all I am saying is that I want to find out what line/bit of code is causing this to not work and return nothing even though notes.txt has content.
Here is what I have in my python file:
if "," not in no_cs: # no_cs is the string I am searching through
user_out = int(no_cs[6:len(no_cs) - 1])
notes = open("notes.txt", "r") # notes.txt is the file that stores all the notes the user makes
notes_lines = notes.read().split("\n") # this is suppose to split all the notes into a list
try:
print(notes_lines[user_out])
except IndexError:
print("That line does not exist.")
notes.close()
elif "," in no_cs:
user_out_1 = int(no_cs.find(',') - 1)
user_out_2 = int(no_cs.find(',') + 1)
notes = open("notes.txt", "r")
notes_lines = notes.read().split("\n")
print(notes_lines[user_out_1:user_out_2]) # this is SUPPOSE to list all notes in a specific range but doesn't
notes.close()
Now here is the notes.txt file:
note
note1
note2
note3
and lastly here is what I am getting in console when I attempt to run the program and type notes(0,2)
>>> notes(0,2)
jeffv : notes(0,2)
[]
A great way to do this is to use the python .partition() method. It works by splitting a string from the first occurrence and returns a tuple... The tuple consists of three parts 0: Before the separator 1: The separator itself 2: After the separator:
# The whole string we wish to search.. Let's use a
# Monty Python quote since we are using Python :)
whole_string = "We interrupt this program to annoy you and make things\
generally more irritating."
# Here is the first word we wish to split from the entire string
first_split = 'program'
# now we use partition to pick what comes after the first split word
substring_split = whole_string.partition(first_split)[2]
# now we use python to give us the first character after that first split word
first_character = str(substring_split)[0]
# since the above is a space, let's also show the second character so
# that it is less confusing :)
second_character = str(substring_split)[1]
# Output
print("Here is the whole string we wish to split: " + whole_string)
print("Here is the first split word we want to find: " + first_split)
print("Now here is the first word that occurred after our split word: " + substring_split)
print("The first character after the substring split is: " + first_character)
print("The second character after the substring split is: " + second_character)
output
Here is the whole string we wish to split: We interrupt this program to annoy you and make things generally more irritating.
Here is the first split word we want to find: program
Now here is the first word that occurred after our split word: to annoy you and make things generally more irritating.
The first character after the substring split is:
The second character after the substring split is: t
I am trying to compare the strings of file "formatted_words.txt" with another customised file "dictionary.txt" and in the output I am trying to print those words from "formatted_words.txt"formatted_words file which are present in file "dictionary.txt"dictionary file.
from itertools import izip
with open("formatted_words.txt") as words_file:
with open("dictionary.txt") as dict_file:
all_strings = list(map(str.strip,dict_file))
for word in words_file:
for a_string in all_strings:
if word in a_string:
print a_string
Nevertheless, in the output, all the words of the file "formatted_words.txt" are getting printed, though many words from this file are not in the "dictionary.txt".I cannot use any builtin python dictionary.Any help would be appreciated.
Using sets:
with open('formatted_words.txt') as words_file:
with open('dictionary.txt') as dict_file:
all_strings = set(map(str.strip, dict_file))
words = set(map(str.strip, words_file))
for word in all_strings.intersection(words):
print(word)
Prints nothing because the intersection is empty