How do I use text replace with plural/singular strings - string

I'm trying to replace words from strings using text.replace().
It works well till the replace words with plurals are used as follows:
def replacing():
texter = []
del texter[:]
repl = ['diabetes', 'mellitus', 'dm', ]
it = ''
try:
it = iter(np.array(repl))
except:
pass
txt = "tell me if its can also cause coronavirus"
for i in range(len(np.array(repl1))):
try:
p = it.__next__()
x = txt.replace("its", p)
texter.append(x)
x = txt.replace("it", p)
texter.append(x)
xxx = txt.replace("them", p)
texter.append(xxx)
xxxx = txt.replace("the same", p)
texter.append(xxx)
xxxxx = txt.replace("this", p)
texter.append(xxx)
except StopIteration:
break
mm = list(OrderedDict.fromkeys(texter))
print (mm)
replacing()
This is the result of this code:
['tell me if diabetes can also cause coronavirus', 'tell me if diabetess can also cause coronavirus', 'tell me if mellitus can also cause coronavirus', 'tell me if mellituss can also cause coronavirus', 'tell me if dm can also cause coronavirus', 'tell me if dms can also cause coronavirus']
Notice the misspell replaced words as 'diabetess' instead of 'diabetes', 'mellituss' instead of mellitus and 'dms' instead of 'dm'.
I noted the keywords 'it and its', since are similar end up bringing the errors.
How can I avoid this

The issue is that you are replacing "it" and "its" separately. txt.replace("it", p) creates a copy of txt with "it" replaced by p, so "its" becomes "diabetess". Use the re module to specify that you want to replace "it" or "its". Your for loop would look like this:
for i in range(len(np.array(repl))):
try:
p = it.__next__()
x = re.sub("its|it", p, txt)
texter.append(x)
xxx = txt.replace("them", p)
texter.append(xxx)
xxxx = txt.replace("the same", p)
texter.append(xxx)
xxxxx = txt.replace("this", p)
texter.append(xxx)
except StopIteration:
break

Related

Python - horizontal output, with braquets, commas and quotations marks

s = "That that occurs sometimes. It sometimes means that which, and sometimes just that"
target = "that"
words = s.split()
b = []
for i,w in enumerate(words):
if w == target:
if i > 0:
b = words[i-1]
print([b].sep="",end",")
"I used, end=",sep=",but nothing worked.I need the output horizontally, with square brackets, commas and quotations marks. the brackets appear in the middle, and a comma at the end."
"Current output"
['That'],['means'],['just'],
"I need this output"
['That','means','just']
try this code it will work fine
s = "That that occurs sometimes. It sometimes means that which, and sometimes just that"
target = "that"
words = s.split()
b = []
for i,w in enumerate(words):
if w == target:
if i > 0:
b.append(words[i-1])
print(b,sep="")

Convert ALL UPPERCASE to Title

#!/usr/bin/python3.4
import urllib.request
import os
import re
os.chdir('/home/whatever/')
a = open('Shopstxt.csv','r')
b = a.readlines()
a.close()
c = len(b)
d = list(zip(*(e.split(';') for e in b)))
shopname = []
shopaddress = []
shopcity = []
shopphone = []
shopwebsite = []
f = d[0]
g = d[1]
h = d[2]
i = d[3]
j = d[4]
e = -1
for n in range(0, 5):
e = e + 1
sn = f[n]
sn.title()
print(sn)
shopname.append(sn)
sa = g[n]
sa.title()
shopaddress.append(sa)
sc = h[n]
sc.title()
shopcity.append(sc)
Shopstxt.csv is all upper case letters and I want to convert them to title. I thought this would do it but it doesn't...it still leaves them all upper case. What am I doing wrong?
I also want to save the file back. Just wanting to check on a couple of things real quick like as well...time pressed.
When I combine the file back together, before writing it back to the drive do I have to add an '\n' at the end of each line or does it automatically include the '\n' when I write each line to the file?
Strings are immutable, so you need to asign the result of title():
sa = sa.title()
sc = sc.title()
Also, if you do this:
with open("bla.txt", "wt") as outfile:
outfile.write("stuff")
outfile.write("more stuff")
then this will not automatically add line endings.
A quick way to add line endings would be this:
textblobb = "\n".join(list_of_text_lines)
with open("bla.txt", "wt") as outfile:
outfile.write(textblobb)
As long as textblobb isn't inefficiently large and fits into memory, that should do the trick nicely.
Use the .title() method when defining your variables like I did in the code below. As others have mentioned, strings are immutable so save yourself a step and create the string you need in one line.
for n in range(0, 5):
e = e + 1
sn = f[n].title() ### Grab and modify the list index before assigning to your variable
print(sn)
shopname.append(sn)
sa = g[n].title() ###
shopaddress.append(sa)
sc = h[n].title() ###
shopcity.append(sc)

Get filename from user and convert the number into list

So far, I have this:
def main():
bad_filename = True
l =[]
while bad_filename == True:
try:
filename = input("Enter the filename: ")
fp = open(filename, "r")
for f_line in fp:
a=(f_line)
b=(f_line.strip('\n'))
l.append(b)
print (l)
bad_filename = False
except IOError:
print("Error: The file was not found: ", filename)
main()
this is my program and when i print this what i get
['1,2,3,4,5']
['1,2,3,4,5', '6,7,8,9,0']
['1,2,3,4,5', '6,7,8,9,0', '1.10,2.20,3.30,0.10,0.30']
but instead i need to get
[1,2,3,4,5]
[6,7,8,9,0.00]
[1.10,2.20,3.3.0,0.10,0.30]
Each line of the file is a series on numbers separated by commas, but to python they are just characters. You need one more conversion step to get your string into a list. First split on commas to create a list of strings each of which is a number. Then use what is called "list comprehension" (or a for loop) to convert each string into a number:
b = f_line.strip('\n').split(',')
c = [float(v) for v in b]
l.append(c)
If you really want to reset the list each time through the loop (your desired output shows only the last line) then instead of appending, just assign the numerical list to l:
b = f_line.strip('\n').split(',')
l = [float(v) for v in b]
List comprehension is a shorthand way of saying:
l = []
for v in b:
l.append(float(v))
You don't need a or the extra parentheses around the assignment of a and b.

Find string between two substrings [duplicate]

This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
How do I find a string between two substrings ('123STRINGabc' -> 'STRING')?
My current method is like this:
>>> start = 'asdf=5;'
>>> end = '123jasd'
>>> s = 'asdf=5;iwantthis123jasd'
>>> print((s.split(start))[1].split(end)[0])
iwantthis
However, this seems very inefficient and un-pythonic. What is a better way to do something like this?
Forgot to mention:
The string might not start and end with start and end. They may have more characters before and after.
import re
s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))
s = "123123STRINGabcabc"
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
def find_between_r( s, first, last ):
try:
start = s.rindex( first ) + len( first )
end = s.rindex( last, start )
return s[start:end]
except ValueError:
return ""
print find_between( s, "123", "abc" )
print find_between_r( s, "123", "abc" )
gives:
123STRING
STRINGabc
I thought it should be noted - depending on what behavior you need, you can mix index and rindex calls or go with one of the above versions (it's equivalent of regex (.*) and (.*?) groups).
start = 'asdf=5;'
end = '123jasd'
s = 'asdf=5;iwantthis123jasd'
print s[s.find(start)+len(start):s.rfind(end)]
gives
iwantthis
s[len(start):-len(end)]
String formatting adds some flexibility to what Nikolaus Gradwohl suggested. start and end can now be amended as desired.
import re
s = 'asdf=5;iwantthis123jasd'
start = 'asdf=5;'
end = '123jasd'
result = re.search('%s(.*)%s' % (start, end), s).group(1)
print(result)
Just converting the OP's own solution into an answer:
def find_between(s, start, end):
return (s.split(start))[1].split(end)[0]
If you don't want to import anything, try the string method .index():
text = 'I want to find a string between two substrings'
left = 'find a '
right = 'between two'
# Output: 'string'
print(text[text.index(left)+len(left):text.index(right)])
source='your token _here0#df and maybe _here1#df or maybe _here2#df'
start_sep='_'
end_sep='#df'
result=[]
tmp=source.split(start_sep)
for par in tmp:
if end_sep in par:
result.append(par.split(end_sep)[0])
print result
must show:
here0, here1, here2
the regex is better but it will require additional lib an you may want to go for python only
Here is one way to do it
_,_,rest = s.partition(start)
result,_,_ = rest.partition(end)
print result
Another way using regexp
import re
print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]
or
print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)
Here is a function I did to return a list with a string(s) inbetween string1 and string2 searched.
def GetListOfSubstrings(stringSubject,string1,string2):
MyList = []
intstart=0
strlength=len(stringSubject)
continueloop = 1
while(intstart < strlength and continueloop == 1):
intindex1=stringSubject.find(string1,intstart)
if(intindex1 != -1): #The substring was found, lets proceed
intindex1 = intindex1+len(string1)
intindex2 = stringSubject.find(string2,intindex1)
if(intindex2 != -1):
subsequence=stringSubject[intindex1:intindex2]
MyList.append(subsequence)
intstart=intindex2+len(string2)
else:
continueloop=0
else:
continueloop=0
return MyList
#Usage Example
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y68")
for x in range(0, len(List)):
print(List[x])
output:
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","3")
for x in range(0, len(List)):
print(List[x])
output:
2
2
2
2
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y")
for x in range(0, len(List)):
print(List[x])
output:
23
23o123pp123
To extract STRING, try:
myString = '123STRINGabc'
startString = '123'
endString = 'abc'
mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]
You can simply use this code or copy the function below. All neatly in one line.
def substring(whole, sub1, sub2):
return whole[whole.index(sub1) : whole.index(sub2)]
If you run the function as follows.
print(substring("5+(5*2)+2", "(", "("))
You will pobably be left with the output:
(5*2
rather than
5*2
If you want to have the sub-strings on the end of the output the code must look like below.
return whole[whole.index(sub1) : whole.index(sub2) + 1]
But if you don't want the substrings on the end the +1 must be on the first value.
return whole[whole.index(sub1) + 1 : whole.index(sub2)]
These solutions assume the start string and final string are different. Here is a solution I use for an entire file when the initial and final indicators are the same, assuming the entire file is read using readlines():
def extractstring(line,flag='$'):
if flag in line: # $ is the flag
dex1=line.index(flag)
subline=line[dex1+1:-1] #leave out flag (+1) to end of line
dex2=subline.index(flag)
string=subline[0:dex2].strip() #does not include last flag, strip whitespace
return(string)
Example:
lines=['asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd',
'afafoaltat $I GOT BETTER!$ derpity derp derp']
for line in lines:
string=extractstring(line,flag='$')
print(string)
Gives:
A NEWT?
I GOT BETTER!
This is essentially cji's answer - Jul 30 '10 at 5:58.
I changed the try except structure for a little more clarity on what was causing the exception.
def find_between( inputStr, firstSubstr, lastSubstr ):
'''
find between firstSubstr and lastSubstr in inputStr STARTING FROM THE LEFT
http://stackoverflow.com/questions/3368969/find-string-between-two-substrings
above also has a func that does this FROM THE RIGHT
'''
start, end = (-1,-1)
try:
start = inputStr.index( firstSubstr ) + len( firstSubstr )
except ValueError:
print ' ValueError: ',
print "firstSubstr=%s - "%( firstSubstr ),
print sys.exc_info()[1]
try:
end = inputStr.index( lastSubstr, start )
except ValueError:
print ' ValueError: ',
print "lastSubstr=%s - "%( lastSubstr ),
print sys.exc_info()[1]
return inputStr[start:end]
from timeit import timeit
from re import search, DOTALL
def partition_find(string, start, end):
return string.partition(start)[2].rpartition(end)[0]
def re_find(string, start, end):
# applying re.escape to start and end would be safer
return search(start + '(.*)' + end, string, DOTALL).group(1)
def index_find(string, start, end):
return string[string.find(start) + len(start):string.rfind(end)]
# The wikitext of "Alan Turing law" article form English Wikipeida
# https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886
string = """..."""
start = '==Proposals=='
end = '==Rival bills=='
assert index_find(string, start, end) \
== partition_find(string, start, end) \
== re_find(string, start, end)
print('index_find', timeit(
'index_find(string, start, end)',
globals=globals(),
number=100_000,
))
print('partition_find', timeit(
'partition_find(string, start, end)',
globals=globals(),
number=100_000,
))
print('re_find', timeit(
're_find(string, start, end)',
globals=globals(),
number=100_000,
))
Result:
index_find 0.35047444528454114
partition_find 0.5327825636197754
re_find 7.552149639286381
re_find was almost 20 times slower than index_find in this example.
My method will be to do something like,
find index of start string in s => i
find index of end string in s => j
substring = substring(i+len(start) to j-1)
This I posted before as code snippet in Daniweb:
# picking up piece of string between separators
# function using partition, like partition, but drops the separators
def between(left,right,s):
before,_,a = s.partition(left)
a,_,after = a.partition(right)
return before,a,after
s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) 'Daniweb forum' tcha tcha tchaa"
print between('<a>','</a>',s)
print between('(',')',s)
print between("'","'",s)
""" Output:
('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa')
"""
Parsing text with delimiters from different email platforms posed a larger-sized version of this problem. They generally have a START and a STOP. Delimiter characters for wildcards kept choking regex. The problem with split is mentioned here & elsewhere - oops, delimiter character gone. It occurred to me to use replace() to give split() something else to consume. Chunk of code:
nuke = '~~~'
start = '|*'
stop = '*|'
julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke))
keep = [chunk for chunk in julien if start in chunk and stop in chunk]
logging.info('keep: %s',keep)
Further from Nikolaus Gradwohl answer, I needed to get version number (i.e., 0.0.2) between('ui:' and '-') from below file content (filename: docker-compose.yml):
version: '3.1'
services:
ui:
image: repo-pkg.dev.io:21/website/ui:0.0.2-QA1
#network_mode: host
ports:
- 443:9999
ulimits:
nofile:test
and this is how it worked for me (python script):
import re, sys
f = open('docker-compose.yml', 'r')
lines = f.read()
result = re.search('ui:(.*)-', lines)
print result.group(1)
Result:
0.0.2
This seems much more straight forward to me:
import re
s = 'asdf=5;iwantthis123jasd'
x= re.search('iwantthis',s)
print(s[x.start():x.end()])

Import multiline SQL query to single string

In R, how can I import the contents of a multiline text file (containing SQL) to a single string?
The sql.txt file looks like this:
SELECT TOP 100
setpoint,
tph
FROM rates
I need to import that text file into an R string such that it looks like this:
> sqlString
[1] "SELECT TOP 100 setpoint, tph FROM rates"
That's so that I can feed it to the RODBC like this
> library(RODBC)
> myconn<-odbcConnect("RPM")
> results<-sqlQuery(myconn,sqlString)
I've tried the readLines command as follows but it doesn't give the string format that RODBC needs.
> filecon<-file("sql.txt","r")
> sqlString<-readLines(filecon, warn=FALSE)
> sqlString
[1] "SELECT TOP 100 " "\t[Reclaim Setpoint Mean (tph)] as setpoint, "
[3] "\t[Reclaim Rate Mean (tph)] as tphmean " "FROM [Dampier_RC1P].[dbo].[Rates]"
>
The versatile paste() command can do that with argument collapse="":
lines <- readLines("/tmp/sql.txt")
lines
[1] "SELECT TOP 100 " " setpoint, " " tph " "FROM rates"
sqlcmd <- paste(lines, collapse="")
sqlcmd
[1] "SELECT TOP 100 setpoint, tph FROM rates"
Below is an R function that reads in a multiline SQL query (from a text file) and converts it into a single-line string. The function removes formatting and whole-line comments.
To use it, run the code to define the functions, and your single-line string will be the result of running
ONELINEQ("querytextfile.sql","~/path/to/thefile").
How it works: Inline comments detail this; it reads each line of the query and deletes (replaces with nothing) whatever isn't needed to write out a single-line version of the query (as asked for in the question). The result is a list of lines, some of which are blank and get filtered out; the last step is to paste this (unlisted) list together and return the single line.
#
# This set of functions allows us to read in formatted, commented SQL queries
# Comments must be entire-line comments, not on same line as SQL code, and begun with "--"
# The parsing function, to be applied to each line:
LINECLEAN <- function(x) {
x = gsub("\t+", "", x, perl=TRUE); # remove all tabs
x = gsub("^\\s+", "", x, perl=TRUE); # remove leading whitespace
x = gsub("\\s+$", "", x, perl=TRUE); # remove trailing whitespace
x = gsub("[ ]+", " ", x, perl=TRUE); # collapse multiple spaces to a single space
x = gsub("^[--]+.*$", "", x, perl=TRUE); # destroy any comments
return(x)
}
# PRETTYQUERY is the filename of your formatted query in quotes, eg "myquery.sql"
# DIRPATH is the path to that file, eg "~/Documents/queries"
ONELINEQ <- function(PRETTYQUERY,DIRPATH) {
A <- readLines(paste0(DIRPATH,"/",PRETTYQUERY)) # read in the query to a list of lines
B <- lapply(A,LINECLEAN) # process each line
C <- Filter(function(x) x != "",B) # remove blank and/or comment lines
D <- paste(unlist(C),collapse=" ") # paste lines together into one-line string, spaces between.
return(D)
}
# TODO: add eof newline automatically to remove warning
#############################################################################################
Here's the final version of what I'm using. Thanks Dirk.
fileconn<-file("sql.txt","r")
sqlString<-readLines(fileconn)
sqlString<-paste(sqlString,collapse="")
gsub("\t","", sqlString)
library(RODBC)
sqlconn<-odbcConnect("RPM")
results<-sqlQuery(sqlconn,sqlString)
library(qcc)
tph <- qcc(results$tphmean[1:50], type="xbar.one", ylim=c(4000,12000), std.dev=600)
close(fileconn)
close(sqlconn)
This is what I use:
# Set Filename
fileName <- 'Input File.txt'
doSub <- function(src, dest_var_name, src_pattern, dest_pattern) {
assign(
x = dest_var_name
, value = gsub(
pattern = src_pattern
, replacement = dest_pattern
, x = src
)
, envir = .GlobalEnv
)
}
# Read File Contents
original_text <- readChar(fileName, file.info(fileName)$size)
# Convert to UNIX line ending for ease of use
doSub(src = original_text, dest_var_name = 'unix_text', src_pattern = '\r\n', dest_pattern = '\n')
# Remove Block Comments
doSub(src = unix_text, dest_var_name = 'wo_bc_text', src_pattern = '/\\*.*?\\*/', dest_pattern = '')
# Remove Line Comments
doSub(src = wo_bc_text, dest_var_name = 'wo_bc_lc_text', src_pattern = '--.*?\n', dest_pattern = '')
# Remove Line Endings to get Flat Text
doSub(src = wo_bc_lc_text, dest_var_name = 'flat_text', src_pattern = '\n', dest_pattern = ' ')
# Remove Contiguous Spaces
doSub(src = flat_text, dest_var_name = 'clean_flat_text', src_pattern = ' +', dest_pattern = ' ')
try paste(sqlString, collapse=" ")
It's possible to use readChar() instead of readLines(). I had an ongoing issue with mixed commenting (-- or /* */) and this has always worked well for me.
sql <- readChar(path.to.file, file.size(path.to.file))
query <- sqlQuery(con, sql, stringsAsFactors = TRUE)
I use sql <- gsub("\n","",sql) and sql <- gsub("\t","",sql) together.

Resources