Defining a Python Apache Beam schema from dictionary rows - python-3.x

I would like to obtain row schemas in Apache Beam (Python) for use with SQL transforms. However, I ran into the issue explained below.
The schema is defined as follows:
class RowSchema(typing.NamedTuple):
colA: str
colB: typing.Optional[str]
coders.registry.register_coder(RowSchema, coders.RowCoder)
The following example infers the schema correctly:
with beam.Pipeline(options=pipeline_options) as p:
pcol = (p
| "Create" >> beam.Create(
[
RowSchema(colA='a1', colB='b1'),
RowSchema(colA='a2', colB=None)])
.with_output_types(RowSchema)
| beam.Map(print)
)
The following attempt, however, raises "ValueError: Type names and field names must be valid identifiers: 'run.<locals>.RowSchema'"
with beam.Pipeline(options=pipeline_options) as p:
pcol = (p
| "Create" >> beam.Create(
[
{'colA': 'a1', 'colB': 'b1'},
{'colA': 'a2', 'colB': None}])
| 'ToRow' >> beam.Map(
lambda x: RowSchema(**x)) \
.with_output_types(RowSchema)
| beam.Map(print)
)
Full stack trace:
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "home/src/main.py", line 326, in <module>
run()
File "home/src/main.py", line 267, in run
| 'ToRow' >> beam.Map(lambda x: RowSchema(**x)).with_output_types(RowSchema)
File "home/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 1661, in Map
pardo = FlatMap(wrapper, *args, **kwargs)
File "home/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 1606, in FlatMap
pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
File "home/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 1217, in __init__
super().__init__(fn, *args, **kwargs)
File "home/lib/python3.9/site-packages/apache_beam/transforms/ptransform.py", line 861, in __init__
self.fn = pickler.loads(pickler.dumps(self.fn))
File "home/lib/python3.9/site-packages/apache_beam/internal/pickler.py", line 51, in loads
return desired_pickle_lib.loads(
File "home/lib/python3.9/site-packages/apache_beam/internal/dill_pickler.py", line 289, in loads
return dill.loads(s)
File "home/lib/python3.9/site-packages/dill/_dill.py", line 275, in loads
return load(file, ignore, **kwds)
File "home/lib/python3.9/site-packages/dill/_dill.py", line 270, in load
return Unpickler(file, ignore=ignore, **kwds).load()
File "home/lib/python3.9/site-packages/dill/_dill.py", line 472, in load
obj = StockUnpickler.load(self)
File "home/lib/python3.9/site-packages/dill/_dill.py", line 788, in _create_namedtuple
t = collections.namedtuple(name, fieldnames)
File "/usr/lib/python3.9/collections/__init__.py", line 390, in namedtuple
raise ValueError('Type names and field names must be valid '
ValueError: Type names and field names must be valid identifiers: 'run.<locals>.RowSchema'
The failed attempt works if I change the schema definition to
RowSchema = typing.NamedTuple('RowSchema', [('colA', str), ('colB', typing.Optional[str])])
The error snippet seems to be correctly formatted according to some of the references below.
References:
Apache Beam infer schema using NamedTuple (Python)
https://beam.apache.org/documentation/programming-guide/#inferring-schemas
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang_sql.py
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py
Tested on Python 3.9, Beam 2.37.0, and multiple runners including DirectRunner, DataflowRunner and PortableRunner.

Solved it by simply moving the schema definition outside the run function.
class RowSchema(typing.NamedTuple):
colA: str
colB: typing.Optional[str]
coders.registry.register_coder(RowSchema, coders.RowCoder)
def run(argv=None, save_main_session=True):
...
with beam.Pipeline(options=pipeline_options) as p:
...

Related

How do I search for a Django model by a primary key that doesn't match its type without throwing an error?

I'm using Django 3 and Python 3.7. I have a model (MySql 8 backed table) that has integer primary keys. I have code that searches for such models like so
state = State.objects.get(pk=locality['state'])
The issue is if "locality['state']" contains an empty string, I get the below error
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/fields/__init__.py", line 1768, in get_prep_value
return int(value)
ValueError: invalid literal for int() with base 10: ''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/davea/Documents/workspace/chicommons/maps/web/tests/test_serializers.py", line 132, in test_coop_create_with_incomplete_data
assert not serializer.is_valid(), serializer.errors
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/rest_framework/serializers.py", line 234, in is_valid
self._validated_data = self.run_validation(self.initial_data)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/rest_framework/serializers.py", line 433, in run_validation
value = self.to_internal_value(data)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/rest_framework/serializers.py", line 490, in to_internal_value
validated_value = field.run_validation(primitive_value)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/rest_framework/fields.py", line 565, in run_validation
value = self.to_internal_value(data)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/rest_framework/relations.py", line 519, in to_internal_value
return [
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/rest_framework/relations.py", line 520, in <listcomp>
self.child_relation.to_internal_value(item)
File "/Users/davea/Documents/workspace/chicommons/maps/web/directory/serializers.py", line 26, in to_internal_value
state = State.objects.get(pk=locality['state'])
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/manager.py", line 82, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/query.py", line 404, in get
clone = self._chain() if self.query.combinator else self.filter(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/query.py", line 904, in filter
return self._filter_or_exclude(False, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/query.py", line 923, in _filter_or_exclude
clone.query.add_q(Q(*args, **kwargs))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/sql/query.py", line 1337, in add_q
clause, _ = self._add_q(q_object, self.used_aliases)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/sql/query.py", line 1362, in _add_q
child_clause, needed_inner = self.build_filter(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/sql/query.py", line 1298, in build_filter
condition = self.build_lookup(lookups, col, value)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/sql/query.py", line 1155, in build_lookup
lookup = lookup_class(lhs, rhs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/lookups.py", line 22, in __init__
self.rhs = self.get_prep_lookup()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/lookups.py", line 72, in get_prep_lookup
return self.lhs.output_field.get_prep_value(self.rhs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/django/db/models/fields/__init__.py", line 1770, in get_prep_value
raise e.__class__(
ValueError: Field 'id' expected a number but got ''.
Is there a more "Django" way to search for an object without an error being thrown if the object doesn't exist? I could do this
state = None if str(type(locality['state'])) != "<class 'int'>" else State.objects.get(pk=locality['state'])
but this seems unnecessarily wordy and not how Django was intended to be used.
I would choose Ask forgiveness not permission strategy
try:
state = State.objects.get(pk=int(locality['state']))
except ValueError:
state = None
You could use a logical AND to validate the dict value before using it to look up the data.
state = locality['state'] and State.objects.get(pk=locality['state'])

Dask - trying to read hdfs data getting error ArrowIOError: HDFS file does not exist

I tried creating a dataframe from csv stored in hdfs. Connecting is successful. But when trying to get output of len function getting error.
Code:
from dask_yarn import YarnCluster
from dask.distributed import Client, LocalCluster
import dask.dataframe as dd
import subprocess
import os
# GET HDFS CLASSPATH
classpath = subprocess.Popen(["/usr/hdp/current/hadoop-client/bin/hadoop", "classpath", "--glob"], stdout=subprocess.PIPE).communicate()[0]
os.environ["HADOOP_HOME"] = "/usr/hdp/current/hadoop-client"
os.environ["ARROW_LIBHDFS_DIR"] = "/usr/hdp/3.1.4.0-315/usr/lib/"
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java/"
os.environ["CLASSPATH"] = classpath.decode("utf-8")
# GET HDFS CLASSPATH
classpath = subprocess.Popen(["/usr/hdp/current/hadoop-client/bin/hadoop", "classpath", "--glob"], stdout=subprocess.PIPE).communicate()[0]
cluster = YarnCluster(environment='python:///opt/anaconda3/bin/python3', worker_vcores=32, worker_memory="128GiB", n_workers=10)
client = Client(cluster)
client
df = dd.read_csv('hdfs://masterha/data/batch/82.csv')
len(df)
Error:
>>> len(ddf)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/anaconda3/lib/python3.7/site-packages/dask/dataframe/core.py", line 504, in __len__
len, np.sum, token="len", meta=int, split_every=False
File "/opt/anaconda3/lib/python3.7/site-packages/dask/base.py", line 165, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/dask/base.py", line 436, in compute
results = schedule(dsk, keys, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 2539, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 1839, in gather
asynchronous=asynchronous,
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 756, in sync
self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/utils.py", line 333, in sync
raise exc.with_traceback(tb)
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/utils.py", line 317, in f
result[0] = yield future
File "/opt/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 1695, in _gather
raise exception.with_traceback(traceback)
File "/opt/anaconda3/lib/python3.7/site-packages/dask/bytes/core.py", line 181, in read_block_from_file
with copy.copy(lazy_file) as f:
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/core.py", line 88, in __enter__
f = self.fs.open(self.path, mode=mode)
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/implementations/hdfs.py", line 116, in <lambda>
return lambda *args, **kw: getattr(PyArrowHDFS, item)(self, *args, **kw)
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/spec.py", line 708, in open
path, mode=mode, block_size=block_size, autocommit=ac, **kwargs
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/implementations/hdfs.py", line 116, in <lambda>
return lambda *args, **kw: getattr(PyArrowHDFS, item)(self, *args, **kw)
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/implementations/hdfs.py", line 72, in _open
return HDFSFile(self, path, mode, block_size, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/implementations/hdfs.py", line 171, in __init__
self.fh = fs.pahdfs.open(path, mode, block_size, **kwargs)
File "pyarrow/io-hdfs.pxi", line 431, in pyarrow.lib.HadoopFileSystem.open
File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: HDFS file does not exist: /data/batch/82.csv
It looks like your file "/data/batch/82.csv" doesn't exist. You might want to verify that you have the right path.

snakemake.remote.EGA: NameError: name 'self' is not defined

I want to use the snakemake utility (5.6.0) to use files stored on the EGA. First I wanted to try the code written in the official documentation, so I tried this:
import snakemake.remote.EGA as EGA
ega = EGA.RemoteProvider()
rule get_remote_file_ega:
input:
ega.remote("ega/dataset_id/foo.bam")
output:
"data/foo.bam"
shell:
"cp {input} {output}"
Before executing the script I created environment variables as necessary (EGA_USERNAME and EGA_PASSWORD).
Then I get the following error:
me:~/scripts$ snakemake -s test_ega.smk
Building DAG of jobs...
Traceback (most recent call last):
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/__init__.py", line 551, in snakemake
export_cwl=export_cwl)
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/workflow.py", line 433, in execute
dag.init()
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/dag.py", line 122, in init
job = self.update([job], progress=progress)
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/dag.py", line 603, in update
progress=progress)
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/dag.py", line 655, in update_
missing_input = job.missing_input
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/jobs.py", line 396, in missing_input
for f in self.input
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/jobs.py", line 397, in <genexpr>
if not f.exists and not f in self.subworkflow_input)
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/io.py", line 208, in exists
return self.exists_remote
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/io.py", line 119, in wrapper
v = func(self, *args, **kwargs)
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/io.py", line 258, in exists_remote
return self.remote_object.exists()
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/remote/EGA.py", line 173, in exists
return self.parts.path in self.provider.get_files(self.parts.dataset)
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/remote/EGA.py", line 126, in get_files
"data/metadata/datasets/{dataset}/files".format(dataset=dataset))
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/remote/EGA.py", line 96, in api_request
headers["Authorization"] = "Bearer {}".format(self.token)
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/remote/EGA.py", line 77, in token
self._login()
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/remote/EGA.py", line 45, in _login
"client_id" : self._client_id(),
File "/home/puissant/miniconda3/lib/python3.7/site-packages/snakemake/remote/EGA.py", line 151, in _client_id
return self._credentials("EGA_CLIENT_ID")
NameError: name 'self' is not defined
The part of the code involved is there (EGA.py line 151):
1 #classmethod
2 def _client_id(cls):
3 return self._credentials("EGA_CLIENT_ID")
Could the error come from a "self" instead of a "cls" on line 3? Because after changing it to "cls" the error moved to the next block, built in the same way. My understanding of python objects being limited, I hope I don't say great absurdities.
Have I forgotten any steps or misunderstood any of them?
You are correct, you should be using cls (which presumably stands for 'class' here) rather than self. self is generally the name used for instances of classes i.e. objects. If you are using self elsewhere in the function, you need to switch them to cls.

Using filepaths as global variables in Python

I have a file global_vars.py that contains file paths saved as variables:
from pandas import Timestamp
final_vol_path = 'datasets/final_vols.csv'
final_price_path = 'datasets/final_prices.csv'
final_start_date = Timestamp('2017-01-01')
with other variables written in a similar fashion. However, the functions that I'm using to read in the data throw a FileNotFoundError when attempting to do the following in file1.py:
import scripts.global_vars as gv
read_data(gv.final_vol_path, gv.final_price_path) # throws FileNotFoundError
read_data('datasets/final_vols.csv', 'datasets/final_prices.csv') # this passes
Additionally, I've checked the file paths, and have gotten the following:
gv.final_vol_path == 'datasets/final_vols.csv' # returns True
gv.final_price_path == 'datasets/final_prices.csv' # returns True
Moreover, the pandas Timestamp object is processed without any problems.
Is there any explanation for why the FileNotFoundError is being thrown when attempting to access the file path as a variable from global_vars.py, but is not thrown when the actual string is passed in?
EDIT: The overall directory structure is as follows:
working_dir
L file1.py
L scripts
L global_vars.py
L datasets
L final_vols.csv
L final_prices.csv
EDIT 2: I added in a try-catch block to ensure the rest of the function doesn't break, not sure if that has affected the traceback, but here's what I get:
Traceback (most recent call last):
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\runpy.py", line
184, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\Ananth\Anaconda3\envs\analytics-cpu\Scripts\nose2.exe\__main__.py", line 9, in <module>
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\main.py", line 306, in discover
return main(*args, **kwargs)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\main.py", line 100, in __init__
super(PluggableTestProgram, self).__init__(**kw)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\unittest\main.py", line 93, in __init__
self.parseArgs(argv)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\main.py", line 133, in parseArgs
self.createTests()
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\main.py", line 258, in createTests
self.testNames, self.module)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\loader.py", line 69, in loadTestsFromNames
for name in event.names]
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\loader.py", line 69, in <listcomp>
for name in event.names]
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\loader.py", line 84, in loadTestsFromName
result = self.session.hooks.loadTestsFromName(event)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\events.py", line 224, in __call__
result = getattr(plugin, self.method)(event)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\plugins\loader\testcases.py", line 56, in loadTestsFromName
result = util.test_from_name(name, module)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\util.py", line 106, in test_from_name
parent, obj = object_from_name(name, module)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\nose2\util.py", line 117, in object_from_name
module = __import__('.'.join(parts_copy))
File "C:\Users\Ananth\Desktop\Modules\PortfolioVARModule\tests\test_simulation.py", line 24, in <module>
gv.test_start_date)
File "C:\Users\Ananth\Desktop\Modules\PortfolioVARModule\scripts\prep_data.py", line 119, in read_data
priceDF = pd.read_csv(pricepath).dropna()
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\pandas\io\parsers.py", line 389, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\pandas\io\parsers.py", line 730, in __init__
self._make_engine(self.engine)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\pandas\io\parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "c:\users\ananth\anaconda3\envs\analytics-cpu\lib\site-packages\pandas\io\parsers.py", line 1390, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4184)
File "pandas\parser.pyx", line 667, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:8449)
FileNotFoundError: File b'datasets/corn_price.csv' does not exist
Problem is the addition of the letter b in front of your file's path.
You get the b because you encoded to utf-8.
Try:
read_data(str(gv.final_vol_path,'utf-8'), str(gv.final_price_path, 'utf-8'))

sqlalchemy insert - string argument without an encoding

The code below worked when using Python 2.7, but raises a StatementError when using Python 3.5. I haven't found a good explanation for this online yet.
Why doesn't sqlalchemy accept simple Python 3 string objects in this situation? Is there a better way to insert rows into a table?
from sqlalchemy import Table, MetaData, create_engine
import json
def add_site(site_id):
engine = create_engine('mysql+pymysql://root:password#localhost/database_name', encoding='utf8', convert_unicode=True)
metadata = MetaData()
conn = engine.connect()
table_name = Table('table_name', metadata, autoload=True, autoload_with=engine)
site_name = 'Buffalo, NY'
p_profile = {"0": 300, "1": 500, "2": 100}
conn.execute(table_name.insert().values(updated=True,
site_name=site_name,
site_id=site_id,
p_profile=json.dumps(p_profile)))
add_site(121)
EDIT The table was previously created with this function:
def create_table():
engine = create_engine('mysql+pymysql://root:password#localhost/database_name')
metadata = MetaData()
# Create table for updating sites.
table_name = Table('table_name', metadata,
Column('id', Integer, Sequence('user_id_seq'), primary_key=True),
Column('updated', Boolean),
Column('site_name', BLOB),
Column('site_id', SMALLINT),
Column('p_profile', BLOB))
metadata.create_all(engine)
EDIT Full error:
>>> scd.add_site(121)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1073, in _execute_context
context = constructor(dialect, self, conn, *args)
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/default.py", line 610, in _init_compiled
for key in compiled_params
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/default.py", line 610, in <genexpr>
for key in compiled_params
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/sql/sqltypes.py", line 834, in process
return DBAPIBinary(value)
File "/usr/local/lib/python3.5/dist-packages/pymysql/__init__.py", line 79, in Binary
return bytes(x)
TypeError: string argument without an encoding
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user1/Desktop/server_algorithm/database_tools.py", line 194, in add_site
failed_acks=json.dumps(p_profile)))
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 914, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1078, in _execute_context
None, None)
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
exc_info
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/util/compat.py", line 185, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1073, in _execute_context
context = constructor(dialect, self, conn, *args)
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/default.py", line 610, in _init_compiled
for key in compiled_params
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/default.py", line 610, in <genexpr>
for key in compiled_params
File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/sql/sqltypes.py", line 834, in process
return DBAPIBinary(value)
File "/usr/local/lib/python3.5/dist-packages/pymysql/__init__.py", line 79, in Binary
return bytes(x)
sqlalchemy.exc.StatementError: (builtins.TypeError) string argument without an encoding [SQL: 'INSERT INTO table_name (updated, site_name, site_id, p_profile) VALUES (%(updated)s, %(site_name)s, %(site_id)s, %(p_profile)s)']
As univerio mentioned, the solution was to encode the string as follows:
conn.execute(table_name.insert().values(updated=True,
site_name=site_name,
site_id=site_id,
p_profile=bytes(json.dumps(p_profile), 'utf8')))
BLOBs require binary data, so we need bytes in Python 3 and str in Python 2, since Python 2 strings are sequences of bytes.
If we want to use Python 3 str, we need to use TEXT instead of BLOB.
You simply just need to convert your string to a byte string ex:
site_name=str.encode(site_name),
site_id=site_id,
p_profile=json.dumps(p_profile)))```
or
```site_name = b'Buffalo, NY'```

Resources