Related
s = "That that occurs sometimes. It sometimes means that which, and sometimes just that"
target = "that"
words = s.split()
b = []
for i,w in enumerate(words):
if w == target:
if i > 0:
b = words[i-1]
print([b].sep="",end",")
"I used, end=",sep=",but nothing worked.I need the output horizontally, with square brackets, commas and quotations marks. the brackets appear in the middle, and a comma at the end."
"Current output"
['That'],['means'],['just'],
"I need this output"
['That','means','just']
try this code it will work fine
s = "That that occurs sometimes. It sometimes means that which, and sometimes just that"
target = "that"
words = s.split()
b = []
for i,w in enumerate(words):
if w == target:
if i > 0:
b.append(words[i-1])
print(b,sep="")
#!/usr/bin/python3.4
import urllib.request
import os
import re
os.chdir('/home/whatever/')
a = open('Shopstxt.csv','r')
b = a.readlines()
a.close()
c = len(b)
d = list(zip(*(e.split(';') for e in b)))
shopname = []
shopaddress = []
shopcity = []
shopphone = []
shopwebsite = []
f = d[0]
g = d[1]
h = d[2]
i = d[3]
j = d[4]
e = -1
for n in range(0, 5):
e = e + 1
sn = f[n]
sn.title()
print(sn)
shopname.append(sn)
sa = g[n]
sa.title()
shopaddress.append(sa)
sc = h[n]
sc.title()
shopcity.append(sc)
Shopstxt.csv is all upper case letters and I want to convert them to title. I thought this would do it but it doesn't...it still leaves them all upper case. What am I doing wrong?
I also want to save the file back. Just wanting to check on a couple of things real quick like as well...time pressed.
When I combine the file back together, before writing it back to the drive do I have to add an '\n' at the end of each line or does it automatically include the '\n' when I write each line to the file?
Strings are immutable, so you need to asign the result of title():
sa = sa.title()
sc = sc.title()
Also, if you do this:
with open("bla.txt", "wt") as outfile:
outfile.write("stuff")
outfile.write("more stuff")
then this will not automatically add line endings.
A quick way to add line endings would be this:
textblobb = "\n".join(list_of_text_lines)
with open("bla.txt", "wt") as outfile:
outfile.write(textblobb)
As long as textblobb isn't inefficiently large and fits into memory, that should do the trick nicely.
Use the .title() method when defining your variables like I did in the code below. As others have mentioned, strings are immutable so save yourself a step and create the string you need in one line.
for n in range(0, 5):
e = e + 1
sn = f[n].title() ### Grab and modify the list index before assigning to your variable
print(sn)
shopname.append(sn)
sa = g[n].title() ###
shopaddress.append(sa)
sc = h[n].title() ###
shopcity.append(sc)
So far, I have this:
def main():
bad_filename = True
l =[]
while bad_filename == True:
try:
filename = input("Enter the filename: ")
fp = open(filename, "r")
for f_line in fp:
a=(f_line)
b=(f_line.strip('\n'))
l.append(b)
print (l)
bad_filename = False
except IOError:
print("Error: The file was not found: ", filename)
main()
this is my program and when i print this what i get
['1,2,3,4,5']
['1,2,3,4,5', '6,7,8,9,0']
['1,2,3,4,5', '6,7,8,9,0', '1.10,2.20,3.30,0.10,0.30']
but instead i need to get
[1,2,3,4,5]
[6,7,8,9,0.00]
[1.10,2.20,3.3.0,0.10,0.30]
Each line of the file is a series on numbers separated by commas, but to python they are just characters. You need one more conversion step to get your string into a list. First split on commas to create a list of strings each of which is a number. Then use what is called "list comprehension" (or a for loop) to convert each string into a number:
b = f_line.strip('\n').split(',')
c = [float(v) for v in b]
l.append(c)
If you really want to reset the list each time through the loop (your desired output shows only the last line) then instead of appending, just assign the numerical list to l:
b = f_line.strip('\n').split(',')
l = [float(v) for v in b]
List comprehension is a shorthand way of saying:
l = []
for v in b:
l.append(float(v))
You don't need a or the extra parentheses around the assignment of a and b.
This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
How do I find a string between two substrings ('123STRINGabc' -> 'STRING')?
My current method is like this:
>>> start = 'asdf=5;'
>>> end = '123jasd'
>>> s = 'asdf=5;iwantthis123jasd'
>>> print((s.split(start))[1].split(end)[0])
iwantthis
However, this seems very inefficient and un-pythonic. What is a better way to do something like this?
Forgot to mention:
The string might not start and end with start and end. They may have more characters before and after.
import re
s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))
s = "123123STRINGabcabc"
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
def find_between_r( s, first, last ):
try:
start = s.rindex( first ) + len( first )
end = s.rindex( last, start )
return s[start:end]
except ValueError:
return ""
print find_between( s, "123", "abc" )
print find_between_r( s, "123", "abc" )
gives:
123STRING
STRINGabc
I thought it should be noted - depending on what behavior you need, you can mix index and rindex calls or go with one of the above versions (it's equivalent of regex (.*) and (.*?) groups).
start = 'asdf=5;'
end = '123jasd'
s = 'asdf=5;iwantthis123jasd'
print s[s.find(start)+len(start):s.rfind(end)]
gives
iwantthis
s[len(start):-len(end)]
String formatting adds some flexibility to what Nikolaus Gradwohl suggested. start and end can now be amended as desired.
import re
s = 'asdf=5;iwantthis123jasd'
start = 'asdf=5;'
end = '123jasd'
result = re.search('%s(.*)%s' % (start, end), s).group(1)
print(result)
Just converting the OP's own solution into an answer:
def find_between(s, start, end):
return (s.split(start))[1].split(end)[0]
If you don't want to import anything, try the string method .index():
text = 'I want to find a string between two substrings'
left = 'find a '
right = 'between two'
# Output: 'string'
print(text[text.index(left)+len(left):text.index(right)])
source='your token _here0#df and maybe _here1#df or maybe _here2#df'
start_sep='_'
end_sep='#df'
result=[]
tmp=source.split(start_sep)
for par in tmp:
if end_sep in par:
result.append(par.split(end_sep)[0])
print result
must show:
here0, here1, here2
the regex is better but it will require additional lib an you may want to go for python only
Here is one way to do it
_,_,rest = s.partition(start)
result,_,_ = rest.partition(end)
print result
Another way using regexp
import re
print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]
or
print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)
Here is a function I did to return a list with a string(s) inbetween string1 and string2 searched.
def GetListOfSubstrings(stringSubject,string1,string2):
MyList = []
intstart=0
strlength=len(stringSubject)
continueloop = 1
while(intstart < strlength and continueloop == 1):
intindex1=stringSubject.find(string1,intstart)
if(intindex1 != -1): #The substring was found, lets proceed
intindex1 = intindex1+len(string1)
intindex2 = stringSubject.find(string2,intindex1)
if(intindex2 != -1):
subsequence=stringSubject[intindex1:intindex2]
MyList.append(subsequence)
intstart=intindex2+len(string2)
else:
continueloop=0
else:
continueloop=0
return MyList
#Usage Example
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y68")
for x in range(0, len(List)):
print(List[x])
output:
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","3")
for x in range(0, len(List)):
print(List[x])
output:
2
2
2
2
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y")
for x in range(0, len(List)):
print(List[x])
output:
23
23o123pp123
To extract STRING, try:
myString = '123STRINGabc'
startString = '123'
endString = 'abc'
mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]
You can simply use this code or copy the function below. All neatly in one line.
def substring(whole, sub1, sub2):
return whole[whole.index(sub1) : whole.index(sub2)]
If you run the function as follows.
print(substring("5+(5*2)+2", "(", "("))
You will pobably be left with the output:
(5*2
rather than
5*2
If you want to have the sub-strings on the end of the output the code must look like below.
return whole[whole.index(sub1) : whole.index(sub2) + 1]
But if you don't want the substrings on the end the +1 must be on the first value.
return whole[whole.index(sub1) + 1 : whole.index(sub2)]
These solutions assume the start string and final string are different. Here is a solution I use for an entire file when the initial and final indicators are the same, assuming the entire file is read using readlines():
def extractstring(line,flag='$'):
if flag in line: # $ is the flag
dex1=line.index(flag)
subline=line[dex1+1:-1] #leave out flag (+1) to end of line
dex2=subline.index(flag)
string=subline[0:dex2].strip() #does not include last flag, strip whitespace
return(string)
Example:
lines=['asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd',
'afafoaltat $I GOT BETTER!$ derpity derp derp']
for line in lines:
string=extractstring(line,flag='$')
print(string)
Gives:
A NEWT?
I GOT BETTER!
This is essentially cji's answer - Jul 30 '10 at 5:58.
I changed the try except structure for a little more clarity on what was causing the exception.
def find_between( inputStr, firstSubstr, lastSubstr ):
'''
find between firstSubstr and lastSubstr in inputStr STARTING FROM THE LEFT
http://stackoverflow.com/questions/3368969/find-string-between-two-substrings
above also has a func that does this FROM THE RIGHT
'''
start, end = (-1,-1)
try:
start = inputStr.index( firstSubstr ) + len( firstSubstr )
except ValueError:
print ' ValueError: ',
print "firstSubstr=%s - "%( firstSubstr ),
print sys.exc_info()[1]
try:
end = inputStr.index( lastSubstr, start )
except ValueError:
print ' ValueError: ',
print "lastSubstr=%s - "%( lastSubstr ),
print sys.exc_info()[1]
return inputStr[start:end]
from timeit import timeit
from re import search, DOTALL
def partition_find(string, start, end):
return string.partition(start)[2].rpartition(end)[0]
def re_find(string, start, end):
# applying re.escape to start and end would be safer
return search(start + '(.*)' + end, string, DOTALL).group(1)
def index_find(string, start, end):
return string[string.find(start) + len(start):string.rfind(end)]
# The wikitext of "Alan Turing law" article form English Wikipeida
# https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886
string = """..."""
start = '==Proposals=='
end = '==Rival bills=='
assert index_find(string, start, end) \
== partition_find(string, start, end) \
== re_find(string, start, end)
print('index_find', timeit(
'index_find(string, start, end)',
globals=globals(),
number=100_000,
))
print('partition_find', timeit(
'partition_find(string, start, end)',
globals=globals(),
number=100_000,
))
print('re_find', timeit(
're_find(string, start, end)',
globals=globals(),
number=100_000,
))
Result:
index_find 0.35047444528454114
partition_find 0.5327825636197754
re_find 7.552149639286381
re_find was almost 20 times slower than index_find in this example.
My method will be to do something like,
find index of start string in s => i
find index of end string in s => j
substring = substring(i+len(start) to j-1)
This I posted before as code snippet in Daniweb:
# picking up piece of string between separators
# function using partition, like partition, but drops the separators
def between(left,right,s):
before,_,a = s.partition(left)
a,_,after = a.partition(right)
return before,a,after
s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) 'Daniweb forum' tcha tcha tchaa"
print between('<a>','</a>',s)
print between('(',')',s)
print between("'","'",s)
""" Output:
('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa')
"""
Parsing text with delimiters from different email platforms posed a larger-sized version of this problem. They generally have a START and a STOP. Delimiter characters for wildcards kept choking regex. The problem with split is mentioned here & elsewhere - oops, delimiter character gone. It occurred to me to use replace() to give split() something else to consume. Chunk of code:
nuke = '~~~'
start = '|*'
stop = '*|'
julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke))
keep = [chunk for chunk in julien if start in chunk and stop in chunk]
logging.info('keep: %s',keep)
Further from Nikolaus Gradwohl answer, I needed to get version number (i.e., 0.0.2) between('ui:' and '-') from below file content (filename: docker-compose.yml):
version: '3.1'
services:
ui:
image: repo-pkg.dev.io:21/website/ui:0.0.2-QA1
#network_mode: host
ports:
- 443:9999
ulimits:
nofile:test
and this is how it worked for me (python script):
import re, sys
f = open('docker-compose.yml', 'r')
lines = f.read()
result = re.search('ui:(.*)-', lines)
print result.group(1)
Result:
0.0.2
This seems much more straight forward to me:
import re
s = 'asdf=5;iwantthis123jasd'
x= re.search('iwantthis',s)
print(s[x.start():x.end()])
In R, how can I import the contents of a multiline text file (containing SQL) to a single string?
The sql.txt file looks like this:
SELECT TOP 100
setpoint,
tph
FROM rates
I need to import that text file into an R string such that it looks like this:
> sqlString
[1] "SELECT TOP 100 setpoint, tph FROM rates"
That's so that I can feed it to the RODBC like this
> library(RODBC)
> myconn<-odbcConnect("RPM")
> results<-sqlQuery(myconn,sqlString)
I've tried the readLines command as follows but it doesn't give the string format that RODBC needs.
> filecon<-file("sql.txt","r")
> sqlString<-readLines(filecon, warn=FALSE)
> sqlString
[1] "SELECT TOP 100 " "\t[Reclaim Setpoint Mean (tph)] as setpoint, "
[3] "\t[Reclaim Rate Mean (tph)] as tphmean " "FROM [Dampier_RC1P].[dbo].[Rates]"
>
The versatile paste() command can do that with argument collapse="":
lines <- readLines("/tmp/sql.txt")
lines
[1] "SELECT TOP 100 " " setpoint, " " tph " "FROM rates"
sqlcmd <- paste(lines, collapse="")
sqlcmd
[1] "SELECT TOP 100 setpoint, tph FROM rates"
Below is an R function that reads in a multiline SQL query (from a text file) and converts it into a single-line string. The function removes formatting and whole-line comments.
To use it, run the code to define the functions, and your single-line string will be the result of running
ONELINEQ("querytextfile.sql","~/path/to/thefile").
How it works: Inline comments detail this; it reads each line of the query and deletes (replaces with nothing) whatever isn't needed to write out a single-line version of the query (as asked for in the question). The result is a list of lines, some of which are blank and get filtered out; the last step is to paste this (unlisted) list together and return the single line.
#
# This set of functions allows us to read in formatted, commented SQL queries
# Comments must be entire-line comments, not on same line as SQL code, and begun with "--"
# The parsing function, to be applied to each line:
LINECLEAN <- function(x) {
x = gsub("\t+", "", x, perl=TRUE); # remove all tabs
x = gsub("^\\s+", "", x, perl=TRUE); # remove leading whitespace
x = gsub("\\s+$", "", x, perl=TRUE); # remove trailing whitespace
x = gsub("[ ]+", " ", x, perl=TRUE); # collapse multiple spaces to a single space
x = gsub("^[--]+.*$", "", x, perl=TRUE); # destroy any comments
return(x)
}
# PRETTYQUERY is the filename of your formatted query in quotes, eg "myquery.sql"
# DIRPATH is the path to that file, eg "~/Documents/queries"
ONELINEQ <- function(PRETTYQUERY,DIRPATH) {
A <- readLines(paste0(DIRPATH,"/",PRETTYQUERY)) # read in the query to a list of lines
B <- lapply(A,LINECLEAN) # process each line
C <- Filter(function(x) x != "",B) # remove blank and/or comment lines
D <- paste(unlist(C),collapse=" ") # paste lines together into one-line string, spaces between.
return(D)
}
# TODO: add eof newline automatically to remove warning
#############################################################################################
Here's the final version of what I'm using. Thanks Dirk.
fileconn<-file("sql.txt","r")
sqlString<-readLines(fileconn)
sqlString<-paste(sqlString,collapse="")
gsub("\t","", sqlString)
library(RODBC)
sqlconn<-odbcConnect("RPM")
results<-sqlQuery(sqlconn,sqlString)
library(qcc)
tph <- qcc(results$tphmean[1:50], type="xbar.one", ylim=c(4000,12000), std.dev=600)
close(fileconn)
close(sqlconn)
This is what I use:
# Set Filename
fileName <- 'Input File.txt'
doSub <- function(src, dest_var_name, src_pattern, dest_pattern) {
assign(
x = dest_var_name
, value = gsub(
pattern = src_pattern
, replacement = dest_pattern
, x = src
)
, envir = .GlobalEnv
)
}
# Read File Contents
original_text <- readChar(fileName, file.info(fileName)$size)
# Convert to UNIX line ending for ease of use
doSub(src = original_text, dest_var_name = 'unix_text', src_pattern = '\r\n', dest_pattern = '\n')
# Remove Block Comments
doSub(src = unix_text, dest_var_name = 'wo_bc_text', src_pattern = '/\\*.*?\\*/', dest_pattern = '')
# Remove Line Comments
doSub(src = wo_bc_text, dest_var_name = 'wo_bc_lc_text', src_pattern = '--.*?\n', dest_pattern = '')
# Remove Line Endings to get Flat Text
doSub(src = wo_bc_lc_text, dest_var_name = 'flat_text', src_pattern = '\n', dest_pattern = ' ')
# Remove Contiguous Spaces
doSub(src = flat_text, dest_var_name = 'clean_flat_text', src_pattern = ' +', dest_pattern = ' ')
try paste(sqlString, collapse=" ")
It's possible to use readChar() instead of readLines(). I had an ongoing issue with mixed commenting (-- or /* */) and this has always worked well for me.
sql <- readChar(path.to.file, file.size(path.to.file))
query <- sqlQuery(con, sql, stringsAsFactors = TRUE)
I use sql <- gsub("\n","",sql) and sql <- gsub("\t","",sql) together.