I'm trying to create this table in redshift via python using psycopg2:
sql = "CREATE TABLE if not exists " + "<schema>.<tablename> " + \
"( vendorid varchar(4), pickup_datetime TIMESTAMP, " + \
"dropoff_datetime TIMESTAMP, store_and_fwd_flag varchar(1), " + \
"ratecode int, pickup_longitude float(4), pickup_latitude float(4)," + \
"dropoff_logitude float(4), dropoff_latitude float(4), " + \
"passenger_count int, trip_distance float(40), fare_amount float(4), " + \
"extra float(4), mta_tax float(4), tip_amount float(4), " + \
"tolls_amount float(4), ehail_fee float(4), improvement_surcharge float(4), " + \
"total_amount float(4), payment_type varchar(4), trip_type varchar(4)) " + \
"DISTSTYLE EVEN SORTKEY (passenger_count, pickup_datetime);"
Schema.tablename to be entered via command line so I need a variable to hold sys.arg[0]...but how do I construct that using OR should I use psycopg2.sql??
If I've understood correctly, you want to run your python script on a command line and pass a couple of arguments that include the schema and table name of the Redshift table.
You can use the argparse library to parse the command line arguments into variables and then concatenate them into the sql string:
import argparse
...
# ---------------------------------------------------------------------------------#
# Parse arguments
# ---------------------------------------------------------------------------------#
parser = argparse.ArgumentParser()
parser.add_argument("schema")
parser.add_argument("table")
args = parser.parse_args()
schema_name = args.schema
table_name = args.table
...
sql = "CREATE TABLE if not exists " + schema_name + "." + table_name + " " +\
"( vendorid varchar(4), pickup_datetime TIMESTAMP, " + \
"dropoff_datetime TIMESTAMP, store_and_fwd_flag varchar(1), " + \
"ratecode int, pickup_longitude float(4), pickup_latitude float(4)," + \
"dropoff_logitude float(4), dropoff_latitude float(4), " + \
"passenger_count int, trip_distance float(40), fare_amount float(4), " + \
"extra float(4), mta_tax float(4), tip_amount float(4), " + \
"tolls_amount float(4), ehail_fee float(4), improvement_surcharge float(4), " + \
"total_amount float(4), payment_type varchar(4), trip_type varchar(4)) " + \
"DISTSTYLE EVEN SORTKEY (passenger_count, pickup_datetime);"
...
You can call your script on the command line like so:
python my_script.py myschema mytable
Related
How do you get information (last modification timestamp, file size) about a file which is included in the installer? It's easy to reference a file on disk, by using its path. But how do you reference a file in the installer when it doesn't have a path?
When the installer initialises, I would like to check if any of the files to be installed are already on the disk. For those files that are already on the disk (same last modification timestamp and same file size), I would like to display a window for the user to be able to select which ones they want to overwrite.
The following preprocessor code will generate a Pascal Scripting function that returns timestamp string for given relative file path:
#define SourcePath "C:\myappfiles"
function GetSourceFileDateTimeString(FileName: string): string;
begin
#define FileEntry(FileName, SourcePath) \
Local[0] = GetFileDateTimeString(SourcePath, "yyyy/mm/dd hh:nn:ss", "-", ":"),\
" if SameText(FileName, '" + FileName + "') then " + \
"Result := '" + Local[0] + "'" + NewLine + \
" else" + NewLine
#define ProcessFile(Root, Path, FindResult, FindHandle) \
FindResult \
? \
Local[0] = FindGetFileName(FindHandle), \
Local[1] = (Len(Path) > 0 ? Path + "\" : "") + Local[0], \
Local[2] = Root + "\" + Local[1], \
(Local[0] != "." && Local[0] != ".." \
? (DirExists(Local[2]) \
? ProcessFolder(Root, Local[1]) \
: FileEntry(Local[1], Local[2])) \
: "") + \
ProcessFile(Root, Path, FindNext(FindHandle), FindHandle) \
: \
""
#define ProcessFolder(Root, Path) \
Local[0] = FindFirst(Root + "\" + Path + "\*", faAnyFile), \
ProcessFile(Root, Path, Local[0], Local[0])
#emit ProcessFolder(SourcePath, "")
RaiseException(Format('Unexpected file "%s"', [FileName]));
end;
The generated script will be like:
function GetSourceFileDateTimeString(FileName: string): string;
begin
if SameText(FileName, 'sub1\file1.exe') then Result := '2022-02-16 18:18:11'
else
if SameText(FileName, 'sub2\file2.exe') then Result := '2022-02-16 18:18:11'
else
if SameText(FileName, 'file3.exe') then Result := '2022-02-19 09:50:14'
else
if SameText(FileName, 'file4.exe') then Result := '2022-02-19 09:50:14'
else
RaiseException(Format('Unexpected file "%s"', [FileName]));
end;
(See Inno Setup: How do I see the output (translation) of the Inno Setup Preprocessor?)
One of the pyspark arg is sql query (string with spaces).
I tried to pass it as - \"select * from table\" and "select * from table"
But it's not treated it as a whole string and select * bash command is getting executed which is corrupting the SQL.
Example: Above query got converted as - \"select' folder1 file1.zip from 'table\"
Driver Logs:
PYSPARK_ARGS=
+ '[' -n 'process --query \"select * from table\"' ']'
+ PYSPARK_ARGS='process --query \"select * from table\"'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' 3 == 2 ']'
+ '[' 3 == 3 ']'
++ python3 -V
+ pyv3='Python 3.7.3'
+ export PYTHON_VERSION=3.7.3
+ PYTHON_VERSION=3.7.3
+ export PYSPARK_PYTHON=python3
+ PYSPARK_PYTHON=python3
+ export PYSPARK_DRIVER_PYTHON=python3
+ PYSPARK_DRIVER_PYTHON=python3
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$#" $PYSPARK_PRIMARY $PYSPARK_ARGS)
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=xx.xx.xx.xx --deploy-mode client --class org.apache.spark.deploy.PythonRunner file:/usr/local/bin/process_sql.py process
--query '\"select' folder1 file1.zip from 'table\"'
Is there a way to safely pass string argument with spaces, single or double quotes?
I have Inno Setup 6.1.2 setup script where the version main.sub.batch is formed like this:
#define AppVerText() \
GetVersionComponents('..\app\bin\Release\app.exe', \
Local[0], Local[1], Local[2], Local[3]), \
Str(Local[0]) + "." + Str(Local[1]) + "." + Str(Local[2])
Later in the setup part, I use it for the name of setup package:
[Setup]
OutputBaseFilename=app.{#AppVerText}.x64
The resulting filename will be app.1.0.2.x64.exe, which is mighty fine. To make it perfect, I'd like to end up with form app.1.00.002.x64.exe with zero-padded components.
I did not find anything like PadLeft in documentation. Unfortunately I also fail to understand how to use my own Pascal function in this context. Can I define a function in Code section for this?
A quick and dirty solution to pad the numbers in the Inno Setup preprocessor:
#define AppVerText() \
GetVersionComponents('..\app\bin\Release\app.exe', \
Local[0], Local[1], Local[2], Local[3]), \
Str(Local[0]) + "." + \
(Local[1] < 10 ? "0" : "") + Str(Local[1]) + "." + \
(Local[2] < 100 ? "0" : "") + (Local[2] < 10 ? "0" : "") + Str(Local[2])
If you want a generic function for the padding, use this:
#define PadStr(S, C, L) Len(S) < L ? C + PadStr(S, C, L - 1) : S
Use it like:
#define AppVerText() \
GetVersionComponents('MyProg.exe', \
Local[0], Local[1], Local[2], Local[3]), \
Str(Local[0]) + "." + PadStr(Str(Local[1]), "0", 2) + "." + \
PadStr(Str(Local[1]), "0", 3)
Pascal Script Code (like this one) won't help here, as it runs on the install-time, while you need this on the compile-time. So the preprocessor is the only way.
Why the following code returns an exception in the last line?
print(m)
print(b)
print(r)
print(p)
print(se)
print("m: " + m + "\n" + "b: " + b + "\n" + "r: " + r + "\n" + "p " + p + "\n" + "se " + se)
1516.13788561
-5731.63903831
0.858729519032
4.15127287882e-05
250.925294078
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-195-7e2d88a4b5a8> in <module>()
9 print(se)
10
---> 11 print("m: " + m + "\n" + "b: " + b + "\n" + "r: " + r + "\n" + "p " + p + "\n" + "se " + se)
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
Try casting the variables as strings for the print function.
print("m: " + str(m) + "\n" + "b: " + str(b) + "\n" + "r: " + str(r) + "\n" + "p " + str(p) + "\n" + "se " + str(se))
I have a project where I build a dotmatrix module and then import it into another program that will print out my initials. I have it done to the point where it will print out my initials, but it won't print the "J" in the
initial = dotmatrix.dotJ("J")
It will just print the "*". I have defined in my module:
def dotJ(char):
"""Creates a capital J in 7 x 7 dots"""
dotJ = " * \n"
dotJ += " * \n"
dotJ += " * \n"
dotJ += " * \n"
dotJ += " * * \n"
dotJ += " * * \n"
dotJ += " *** \n"
return dotJ
where I have the *, I want it to print out whatever is called for in the
initial = dotmatrix.dotJ("J")
A simple way would be to define your dotJ variable with a star and then replace the star with the character:
def dotJ(char):
"""Creates a capital J in 7 x 7 dots"""
dotJ = " * \n"
dotJ += " * \n"
dotJ += " * \n"
dotJ += " * \n"
dotJ += " * * \n"
dotJ += " * * \n"
dotJ += " *** \n"
dotJ = dotJ.replace("*",char)
return dotJ
You could also use a format string. But I suspect this would be less easy to read.