In R, how can I import the contents of a multiline text file (containing SQL) to a single string?
The sql.txt file looks like this:
SELECT TOP 100
setpoint,
tph
FROM rates
I need to import that text file into an R string such that it looks like this:
> sqlString
[1] "SELECT TOP 100 setpoint, tph FROM rates"
That's so that I can feed it to the RODBC like this
> library(RODBC)
> myconn<-odbcConnect("RPM")
> results<-sqlQuery(myconn,sqlString)
I've tried the readLines command as follows but it doesn't give the string format that RODBC needs.
> filecon<-file("sql.txt","r")
> sqlString<-readLines(filecon, warn=FALSE)
> sqlString
[1] "SELECT TOP 100 " "\t[Reclaim Setpoint Mean (tph)] as setpoint, "
[3] "\t[Reclaim Rate Mean (tph)] as tphmean " "FROM [Dampier_RC1P].[dbo].[Rates]"
>
The versatile paste() command can do that with argument collapse="":
lines <- readLines("/tmp/sql.txt")
lines
[1] "SELECT TOP 100 " " setpoint, " " tph " "FROM rates"
sqlcmd <- paste(lines, collapse="")
sqlcmd
[1] "SELECT TOP 100 setpoint, tph FROM rates"
Below is an R function that reads in a multiline SQL query (from a text file) and converts it into a single-line string. The function removes formatting and whole-line comments.
To use it, run the code to define the functions, and your single-line string will be the result of running
ONELINEQ("querytextfile.sql","~/path/to/thefile").
How it works: Inline comments detail this; it reads each line of the query and deletes (replaces with nothing) whatever isn't needed to write out a single-line version of the query (as asked for in the question). The result is a list of lines, some of which are blank and get filtered out; the last step is to paste this (unlisted) list together and return the single line.
#
# This set of functions allows us to read in formatted, commented SQL queries
# Comments must be entire-line comments, not on same line as SQL code, and begun with "--"
# The parsing function, to be applied to each line:
LINECLEAN <- function(x) {
x = gsub("\t+", "", x, perl=TRUE); # remove all tabs
x = gsub("^\\s+", "", x, perl=TRUE); # remove leading whitespace
x = gsub("\\s+$", "", x, perl=TRUE); # remove trailing whitespace
x = gsub("[ ]+", " ", x, perl=TRUE); # collapse multiple spaces to a single space
x = gsub("^[--]+.*$", "", x, perl=TRUE); # destroy any comments
return(x)
}
# PRETTYQUERY is the filename of your formatted query in quotes, eg "myquery.sql"
# DIRPATH is the path to that file, eg "~/Documents/queries"
ONELINEQ <- function(PRETTYQUERY,DIRPATH) {
A <- readLines(paste0(DIRPATH,"/",PRETTYQUERY)) # read in the query to a list of lines
B <- lapply(A,LINECLEAN) # process each line
C <- Filter(function(x) x != "",B) # remove blank and/or comment lines
D <- paste(unlist(C),collapse=" ") # paste lines together into one-line string, spaces between.
return(D)
}
# TODO: add eof newline automatically to remove warning
#############################################################################################
Here's the final version of what I'm using. Thanks Dirk.
fileconn<-file("sql.txt","r")
sqlString<-readLines(fileconn)
sqlString<-paste(sqlString,collapse="")
gsub("\t","", sqlString)
library(RODBC)
sqlconn<-odbcConnect("RPM")
results<-sqlQuery(sqlconn,sqlString)
library(qcc)
tph <- qcc(results$tphmean[1:50], type="xbar.one", ylim=c(4000,12000), std.dev=600)
close(fileconn)
close(sqlconn)
This is what I use:
# Set Filename
fileName <- 'Input File.txt'
doSub <- function(src, dest_var_name, src_pattern, dest_pattern) {
assign(
x = dest_var_name
, value = gsub(
pattern = src_pattern
, replacement = dest_pattern
, x = src
)
, envir = .GlobalEnv
)
}
# Read File Contents
original_text <- readChar(fileName, file.info(fileName)$size)
# Convert to UNIX line ending for ease of use
doSub(src = original_text, dest_var_name = 'unix_text', src_pattern = '\r\n', dest_pattern = '\n')
# Remove Block Comments
doSub(src = unix_text, dest_var_name = 'wo_bc_text', src_pattern = '/\\*.*?\\*/', dest_pattern = '')
# Remove Line Comments
doSub(src = wo_bc_text, dest_var_name = 'wo_bc_lc_text', src_pattern = '--.*?\n', dest_pattern = '')
# Remove Line Endings to get Flat Text
doSub(src = wo_bc_lc_text, dest_var_name = 'flat_text', src_pattern = '\n', dest_pattern = ' ')
# Remove Contiguous Spaces
doSub(src = flat_text, dest_var_name = 'clean_flat_text', src_pattern = ' +', dest_pattern = ' ')
try paste(sqlString, collapse=" ")
It's possible to use readChar() instead of readLines(). I had an ongoing issue with mixed commenting (-- or /* */) and this has always worked well for me.
sql <- readChar(path.to.file, file.size(path.to.file))
query <- sqlQuery(con, sql, stringsAsFactors = TRUE)
I use sql <- gsub("\n","",sql) and sql <- gsub("\t","",sql) together.
Related
I am wondering how it is possible to combine the following functions into one. The functions remove the entire word if "_" respectively "/" occur in a text.
I have tried the following, and the code fulfils it purpose. It his however cumbersome and I am wondering how to simplify it.
text = "This is _a default/ text"
def filter_string1(string):
a = []
for i in string.split():
if "_" not in i:
a.append(i)
return ' '.join(a)
def filter_string2(string):
a = []
for i in string.split():
if "/" not in i:
a.append(i)
return ' '.join(a)
text_no_underscore = filter_string1(text)
text_no_underscore_no_slash = filter_string2(text_no_underscore)
print(text_no_underscore_no_slash)
The output is (as desired):
"This is text"
You can combine the if conditions.
text = "This is _a default/ text"
def filter(string):
a = []
for i in string.split():
if "_" not in i and "/" not in i:
a.append(i)
return ' '.join(a)
print(filter(text))
There is a function called re.sub in python's re module which will let you accomplish this quickly.
def remove_words(text):
import re
return re.sub(
pattern=r'\s_[\s\S^\/]*\/', # regular expression used to match the parts to remove
repl='', # replace matched parts with empty string
string=text # use `text` as input
)
Explaining the regular expression \s_[\s\S^\/]*\/ (by deconstructing its parts):
\s_ match whitespace character followed by underscore
[\s\S^\/]* match any character sequence not containing a forward slash (sequence may be length 0)
\/ match the forward slash
Testing the function:
text = "This is _a default/ text"
text_no_underscore_no_slash = remove_words(text)
print('Result:', text_no_underscore_no_slash)
# Result: This is text
text = "This is _a longer/ _and also custom/ text"
text_no_underscore_no_slash = remove_words(text)
print('Result:', text_no_underscore_no_slash)
# Result: This is text
By the way, your original code has a bug, I think.
text = "This is _a longer/ _and also custom/ text"
text_no_underscore = filter_string1(text)
text_no_underscore_no_slash = filter_string2(text_no_underscore)
print(text_no_underscore_no_slash == 'This is text')
# False
I am trying to generate a unique MAC id from given a number value. The length on the number is between 1 to 5 digit. I have formatted the MAC table to place each digit starting from first value of MAC.
local MacFormat ={[1] = "0A:BC:DE:FA:BC:DE",[2] = "00:BC:DE:FA:BC:DE",[3] = "00:0C:DE:FA:BC:DE",[4] = "00:00:DE:FA:BC:DE",[5] = "00:00:0E:FA:BC:DE"}
local idNumbers = {[1] = "1",[2]="12",[3]="123",[4]="1234",[5]="12345"}
for w in string.gfind(idNumbers[3], "(%d)") do
print(w)
str = string.gsub(MacFormat[3],"0",tonumber(w))
end
print(str)
---output 33:3C:DE:FA:BC:DE
--- Desired Output 12:3C:DE:FA:BC:DE
I have tried multiple Patterns with *, +, ., but none is working.
for w in string.gfind(idNumbers[3], "(%d)") do
print(w)
str = string.gsub(MacFormat[3],"0",tonumber(w))
end
print(str)
Your loop body is equivalent to
str = string.gsub("00:0C:DE:FA:BC:DE", "0",1)
str = string.gsub("00:0C:DE:FA:BC:DE", "0", 2)
str = string.gsub("00:0C:DE:FA:BC:DE", "0", 3)
So str is "33:3C:DE:FA:BC:DE"
MacFormat[3] is never altered and the result of gsub is overwritten in each line.
You can build the pattern and replacement dynamically:
local MacFormat ={[1] = "0A:BC:DE:FA:BC:DE",[2] = "00:BC:DE:FA:BC:DE",[3] = "00:0C:DE:FA:BC:DE",[4] = "00:00:DE:FA:BC:DE",[5] = "00:00:0E:FA:BC:DE"}
local idNumbers = {[1] = "1",[2]="12",[3]="123",[4]="1234",[5]="12345"}
local p = "^" .. ("0"):rep(string.len(idNumbers[3])):gsub("(..)", "%1:")
local repl = idNumbers[3]:gsub("(..)", "%1:")
local str = MacFormat[3]:gsub(p, repl)
print(str)
-- => 12:3C:DE:FA:BC:DE
See the online Lua demo.
The pattern is "^" .. ("0"):rep(string.len(idNumbers[3])):gsub("(..)", "%1:"): ^ matches the start of string, then a string of zeros (of the same size a idNumbers, see ("0"):rep(string.len(idNumbers[3]))) follows with a : after each pair of zeros (:gsub("(..)", "%1:")).
The replacement is the idNumbers item with a colon inserted after every second char with idNumbers[3]:gsub("(..)", "%1:").
In this current case, the pattern will be ^00:0 and the replacement will be 12:3.
See the full demo here.
From following input string
INPUT => --- text1 +++ text2 ## -71,0 +72,4 ## +t+o+o+l
OUTPUT => --- text1 +++ text2 ## -71,0 +72,4 ## tool
How can I remove + and - signs from string?
This code will replace either a + or - immediately before or after a letter (lower or upper case). It may be possible with only a single regex but I couldn't figure it out!
import re
line = re.sub(r"[\+-]([a-zA-Z])", "\g<1>", string)
line = re.sub(r"([a-zA-Z])[\+-]", "\g<1>", line)
You can do this:
import re
s = input()
s = re.sub(r"[\+-]([a-zA-Z])", "\g<1>", s)
s = re.sub(r"([a-zA-Z])[\+-]", "\g<1>", s)
print(s)
This will remove + and - signs.
I'm trying to reverse the words in a string individually so the words are still in order however just reversed such as "hi my name is" with output "ih ym eman si" however the whole string gets flipped
r = 0
def readReverse(): #creates the function
start = default_timer() #initiates a timer
r = len(n.split()) #n is the users input
if len(n) == 0:
return n
else:
return n[0] + readReverse(n[::-1])
duration = default_timer() - start
print(str(r) + " with a runtime of " + str(duration))
print(readReverse(n))
First split the string into words, punctuation and whitespace with a regular expression similar to this. Then you can use a generator expression to reverse each word individually and finally join them together with str.join.
import re
text = "Hello, I'm a string!"
split_text = re.findall(r"[\w']+|[^\w]", text)
reversed_text = ''.join(word[::-1] for word in split_text)
print(reversed_text)
Output:
olleH, m'I a gnirts!
If you want to ignore the punctuation you can omit the regular expression and just split the string:
text = "Hello, I'm a string!"
reversed_text = ' '.join(word[::-1] for word in text.split())
However, the commas, exclamation marks, etc. will then be a part of the words.
,olleH m'I a !gnirts
Here's the recursive version:
def read_reverse(text):
idx = text.find(' ') # Find index of next space character.
if idx == -1: # No more spaces left.
return text[::-1]
else: # Split off the first word and reverse it and recurse.
return text[:idx][::-1] + ' ' + read_reverse(text[idx+1:])
I have a string stored in sqlite database and I've assigned it to a var, e.g. string
string = "First line and string. This should be another string in a new line"
I want to split this string into two separated strings, the dot (.) must be replace with (\n) new line char
At the moment I'm stuck and any help would be great!!
for row in db:nrows("SELECT * FROM contents WHERE section='accounts'") do
tabledata[int] = string.gsub(row.contentName, "%.", "\n")
int = int+1
end
I tried the other questions posted here in stachoverflow but with zero luck
What about this solution:`
s = "First line and string. This should be another string in a new line"
a,b=s:match"([^.]*).(.*)"
print(a)
print(b)
Are you looking to actually split the string into two different string objects? If so maybe this can help. It's a function I wrote to add some additional functionality to the standard string library. You can use it as-is or rename it to what ever you like.
--[[
string.split (s, p)
====================================================================
Splits the string [s] into substrings wherever pattern [p] occurs.
Returns: a table of substrings or, if no match is made [nil].
--]]
string.split = function(s, p)
local temp = {}
local index = 0
local last_index = string.len(s)
while true do
local i, e = string.find(s, p, index)
if i and e then
local next_index = e + 1
local word_bound = i - 1
table.insert(temp, string.sub(s, index, word_bound))
index = next_index
else
if index > 0 and index <= last_index then
table.insert(temp, string.sub(s, index, last_index))
elseif index == 0 then
temp = nil
end
break
end
end
return temp
end
Using it is very simple, it returns a tables of strings.
Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio
> s = "First line and string. This should be another string in a new line"
> t = string.split(s, "%.")
> print(table.concat(t, "\n"))
First line and string
This should be another string in a new line
> print(table.maxn(t))
2