Create a spatial Index of the shapefiles of a schema in PostGIS using pyQGIS - spatial-index

I am creating a spatial Index for each shapefile I have and then I import them in a schema and the spatial index is missing. How do I create the spatial Index again in the schema?
layers_fimport = QgsProject.instance().mapLayers().values()
for a in layers_fimport:
a.setCrs(QgsCoordinateReferenceSystem(2056))
a.dataProvider().createSpatialIndex()
for layer in layers_fimport:
mytable=layer.name()
con_string = "dbname='xxxx' host='xxxxx' port='xxx' user='xxxx' password='xxxxx' key=id type=POLYLINE table='"+project_name+"'." + mytable + " (geom)"
err = QgsVectorLayerExporter.exportLayer(layer, con_string, 'postgres', QgsCoordinateReferenceSystem(2056), False)
Spatial_Index_missing

So after some research I found a way to use SQL through pyQGIS and I created the spatial indexes.
import psycopg2
# Create a Spatial Index for the tables in the schema
#-----------------------------------------------------------------------
connection = psycopg2.connect (dbname = "xxxx",
user = "xxxxx",
password = "xxxxxx",
host = "xxxxxxxx",
)
cursor = connection.cursor()
#--No capital letters allowed (schema + tables)
#--------------------------------------------------------------
cursor.execute("CREATE INDEX sidx_l_abluft_geom ON test.l_abluft(geom);")
connection.commit()
print("Query successful")

Related

Get sqlalchemy table Model and Field objects from strings?

Very simple trying to run a query in Python 3 sqlalchemy to delete some records given string names of table and field to query against.
How do you get the table object from a string?
Given 1. how do you run a query via ORM with just a string of the field name?
I would assume all ORM's have an internal array or method like get with the name.
json_config = [
{"table": "tableA",
"field": "modified_on"
"expires": 30},
{"table": "tableB",
"field": "event_on"
"expires": 30}
]
for table_conf_item in self.json_config:
table_name = table_conf_item["table"]
field_name = table_conf_item["field"]
expire_after = table_conf_item["expires"]
table_obj = self.orm_session.TABLES[table_name]
field_obj = self.orm_session.TABLES[table_name].FIELDS[field_name]
result = self.orm_session.delete(table_obj).where(field_obj < expire_after)
self.orm_session.commit()
print(f"{table_name}: removed {result.row_count} objects")
Given the table's name, you can use reflection to get a Table object. Using SQLAlchemy's core layer, this is reasonably straightforward:
import sqlalchemy as sa
engine = sa.create_engine(...)
tbl = sa.Table(name_of_table, metadata, autoload_with=engine)
If you want to work with multiple tables, it may be more efficient to store them a Metadata instance for later access:
metadata = sa.MetaData()
metadata.reflect(engine, only=list_of_table_names)
tbl = metadata.tables[name_of_table]
Once you have a Table object you can reference columns by name like this: tbl.c[name_of_field].
Full example:
import sqlalchemy as sa
# Setup
engine = sa.create_engine('sqlite://', echo=True, future=True)
tbl = sa.Table(
't',
sa.MetaData(),
sa.Column('id', sa.Integer, primary_key=True),
sa.Column('foo', sa.Integer),
)
tbl.create(engine)
with engine.begin() as conn:
vals = [42, 43, 42, 43, 56, 87, 89]
conn.execute(tbl.insert(), [{'foo': v} for v in vals])
del tbl
# Reflect the table.
metadata = sa.MetaData()
metadata.reflect(engine, only=['t'])
tbl = metadata.tables['t']
# Define some statements.
q1 = sa.select(tbl).where(tbl.c['foo'] == 42)
q2 = sa.select(tbl.c['id'], tbl.c['foo']).where(tbl.c['foo'] == 43)
q3 = sa.delete(tbl).where(tbl.c['foo'] != 42)
# Execute the statements.
with engine.connect() as conn:
rows = conn.execute(q1)
for row in rows:
print(row)
print()
rows = conn.execute(q2)
for row in rows:
print(row)
print()
with engine.begin() as conn:
conn.execute(q3)
with engine.connect() as conn:
rows = conn.execute(q1)
for row in rows:
print(row)
print()
Doing the same through the ORM layer is more complicated, as table and column names must be mapped to ORM entity classes (models) and their attributes. This replicates the previous example for a simple mapping (it assumes the same initial data as above).
import sqlalchemy as sa
from sqlalchemy import orm
Base = orm.declarative_base()
class Thing(Base):
__tablename__ = 't'
id = sa.Column(sa.Integer, primary_key=True)
thing_foo = sa.Column('foo', sa.Integer)
engine = sa.create_engine(...)
Base.metadata.create_all(engine)
Session = orm.sessionmaker(engine, future=True)
tablename = 't'
columnname = 'foo'
with Session.begin() as s:
# Get the mappers for the Base class.
mappers = Base.registry.mappers
# Get the mapper for our table.
mapper = next(m for m in mappers if m.entity.__tablename__ == tablename)
# Get the entity class (Thing).
entity = mapper.entity
# Get the column from the Table.
table_column = mapper.selectable.c[columnname]
# Get the mapper property that corresponds to the column
# (the entity attribute may have a different name to the
# column in the database).
mapper_property = mapper.get_property_by_column(table_column)
# Get the queryable entity attribute (Thing.thing_foo).
attr = mapper.all_orm_descriptors[mapper_property.key]
q = sa.select(entity).where(attr != 42)
entities = s.scalars(q)
for entity in entities:
s.delete(entity)
with Session() as s:
for thing in s.scalars(sa.select(Thing)):
print(thing.id, thing.thing_foo)

Problem with one-to-many relationship sqlite3

I created one-to-many relationship table and according to the sqlite3 documentation I can't insert value into the child table if the referenced table column value in the parent table does not exist.
import sqlite3
class Database:
def __init__(self, database_name):
self.database_name = database_name
def create_table(self, table_name, *columns):
columns = ", ".join(columns)
conn = sqlite3.connect(self.database_name)
cursor = conn.cursor()
_SQL = f"CREATE TABLE IF NOT EXISTS {table_name}({columns})"
cursor.execute(_SQL)
conn.commit()
cursor.close()
conn.close()
def insert_values(self, table_name, values, *columns):
dynamic_values = ('?, ' * len(columns))[0:-2]
columns = ", ".join(columns)
conn = sqlite3.connect(self.database_name)
cursor = conn.cursor()
_SQL = f"INSERT INTO {table_name}({columns}) VALUES ({dynamic_values})"
cursor.execute(_SQL, values)
conn.commit()
cursor.close()
conn.close()
def view_values(self, table_name, *columns):
columns = ", ".join(columns)
conn = sqlite3.connect(self.database_name)
cursor = conn.cursor()
_SQL = f"SELECT {columns} FROM {table_name}"
cursor.execute(_SQL)
the_data = cursor.fetchall()
cursor.close()
conn.close()
return the_data
data = Database("games.db")
#
# data.create_table("supplier_groups", "group_id integer PRIMARY KEY", "group_name text NOT NULL")
#
data.insert_values("supplier_groups", ("Domestic", ), "group_name")
# data.create_table("suppliers ", "supplier_id INTEGER PRIMARY KEY",
# "supplier_name TEXT NOT NULL",
# "group_id INTEGER NOT NULL, "
# "FOREIGN KEY (group_id) REFERENCES supplier_groups (group_id)")
data.insert_values("suppliers", ('ABC Inc.', 9), "supplier_name", "group_id")
as you see on this line: data.insert_values("supplier_groups", ("Domestic", ), "group_name") - I'm inserting a value into supplier_groups table
and then right here: data.insert_values("suppliers", ('ABC Inc.', 9), "supplier_name", "group_id") - I'm inserting value into suppliers table with the group_id that does not exist in the group_suppliers table. Python executes it successfully and adds value to the database, however when attemping to execute this command in SQLITE browser I get this error:
Execution finished with errors. Result: FOREIGN KEY constraint failed which is what python should also have done instead of adding it into the database.
So, could anyone explain me what's going on here? Do I understand something in the wrong way? Help would be appreciated
From Section 2. Enabling Foreign Key Support in the sqlite doc:
Assuming the library is compiled with foreign key constraints enabled, it must still be enabled by the application at runtime, using the PRAGMA foreign_keys command. For example:
sqlite> PRAGMA foreign_keys = ON;

Extract large data in Postgresql using python(preferbly in dataframe format)

I have imported many large csv files into tables to my postgresql database, I know how to connect to the database with this code:
import psycopg2
try:
connection = psycopg2.connect(user = "xxx",
password = "xxx",
host = "xxx",
port = "xxx",
database = "xxx")
cursor = connection.cursor()
# Print PostgreSQL Connection properties
print ( connection.get_dsn_parameters(),"\n")
# Print PostgreSQL version
cursor.execute("SELECT version();")
record = cursor.fetchone()
print("You are connected to - ", record,"\n")
except (Exception, psycopg2.Error) as error :
print ("Error while connecting to PostgreSQL", error)
finally:
#closing database connection.
if(connection):
cursor.close()
connection.close()
print("PostgreSQL connection is closed")
But I struggle to extract data from here, is it possible to transform these tables to dataframe format, since I will be doing some ML analysis on these tables.
I'm new to Postgresql, please help me with this issue.
There are a few ways to do it.
A very simple way would be to iterate through the cursor with fetchall()
cursor.execute(query)
rows = cursor.fetchall()
data = []
for row in rows:
data.append({'field1':row[0],'field2':row[1])})
If you are using Pandas Dataframe, you could do:
rows = pd.DataFrame(rows,columns=['field1','field2'])

Unable to create table in Amazon redshift using Psycopg2

I am trying to make a simple script in python, which will fetch data from an endpoint convert it into a dataframe and write it to an Amazon redshift cluster and then automate the script using a cronjob from aws. I am using psycopg2 for connecting to the redshift cluster and the script executes the commands pretty well (creates table in redshift and writes the data as well). But when I try to see the table from a sql client the table doesnt show up
from pandas.io.json import json_normalize
import json
import pandas as pd
import requests
import psycopg2
mm_get = requests.get('endpoint link'})
mm_json=mm_get.json()
data_1 = json_normalize(data = mm_json['data'],
record_path = ['courses','modules'],
record_prefix = 'courses.modules.',
meta = [['courses', 'id'],
['courses', 'title'],
'activated',
'createdAt',
'email',
'employeeId',
'firstName',
'group',
'id',
'lastName',
'phone',
'teams'
]
)
data_2 = json_normalize(data = mm_json['data'],
record_path = 'lessons',
record_prefix = 'lessons.',
meta = 'id',
meta_prefix = 'user.'
)
data_3 = data_1.merge(
data_2,
how = 'outer',
left_on = ['courses.modules.id', 'id'],
right_on = ['lessons.moduleId', 'user.id']
)
cols = data_3.columns
cols = cols.tolist()
cols = pd.DataFrame(cols)
re_cols = pd.DataFrame(cols.loc[:,0].str.replace('.','_').tolist(),index=cols.index)
data_3.teams = data_3.teams.astype(str)
data_3.teams = data_3.teams.str.replace('[','')
data_3.teams = data_3.teams.str.replace(']','')
data_3.teams = data_3.teams.str.replace("'","")
con=psycopg2.connect(dbname='name',
host='hostname',
port='xxxx',user='username',password='password')
cur = con.cursor()
cur.execute('create table testing_learn.test (courses_modules_completionDate DATE, courses_modules_id int, courses_modules_status TEXT,courses_modules_title TEXT, courses_id int,courses_title TEXT, activated bool, createdAt TIMESTAMP, email TEXT, employeeId TEXT, firstName TEXT, group_name TEXT, id TEXT, lastname TEXT, phone int8, teams TEXT, lessons_courseId int, lessons_date DATE, lessons_id int, lessons_lessonNumber int, lessons_moduleId int,lessons_score TEXT, lessons_title TEXT,user_id int);')
cur.close()
data_mat = data_3.as_matrix()
str_mat = b','.join(cur.mogrify('(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)',x) for x in tuple(map(tuple,data_mat)))
cur = con.cursor()
cur.execute('insert into testing_learn.test VALUES '+str_mat.decode('utf-8'))
I am able to see the data when I query the same table from python using psycopg2, but the same table doesnt show up. It would be of great help if anyone could help with what I am doing wrong here. Thank in advance.
According to Psycopg2-2.7.5 official documentation, the main entry points of Psycopg2 includes:
The class connection encapsulates a database session. It allows to:
create new cursor instances using the cursor() method to execute database commands and queries,
terminate transactions using the methods commit() or rollback().
Therefore, you need to call con.commit() every time after you call cur.execute() to make the changes to the database persistent. Otherwise your table won't show up in the database.

Get BigQuery table schema using google.cloud

I can for example get BigQuery data into local python with:
import os
from google.cloud import bigquery
project_id = "example-project"
dataset_id = "exapmle_dataset"
table_id = "table_id"
os.environ["GOOGLE_CLOUD_PROJECT"] = project_id
bq = bigquery.Client()
query = "SELECT * FROM {}.{} LIMIT 5".format(dataset_id, table_id)
resp = bq.run_sync_query(query)
resp.run()
data_list = resp.rows
The result:
print(data_list)
>>> [('BEDD', '1',), ('A75', '1',), ('CE3F', '1',), ('0D8C', '1',), ('3E9C', '1',)]
How do I then go and get the schema for this table? Such that, for example
headings = ('heading1', 'heading2')
# or
schema_dict = {'fields': [{'name': 'heading1', 'type': 'STRING'}, {'name': 'heading2', 'type': 'STRING'}]}
You can use the schema method from your resp variable.
After running the query you can retrieve it:
schema = resp.schema
schema will be a list containing the definition for each column in your query.
As an example, lets say this is your query:
query = "select '1' as fv, STRUCT<i INT64, j INT64> (1, 2) t from `dataset.table` limit 1"
The schema will be a list containing 2 entries:
[<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6e50>,
<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6b10>]
For each object in schema, you have the methods field_type, fields, mode and name so if you run:
schema[0].field_type, schema[0].mode, schema[0].name
The result is "STRING", "NULLABLE", "fv".
As the second column is a record, then if you run:
schema[1].field_type, schema[1].mode, schema[1].name, schema[1].fields
The result is:
"RECORD", "NULLABLE", "t", [google schema 1, google schema 2]
Where google schema 1 contains the definition for the inner fields within the record.
As far as I know, there's no way of getting a dictionary as you showed in your question, which means you'll have to loop over the entries in schema and build it yourself. It should be simple though. Not sure if this is working as I haven't fully tested it but it might give you an idea on how to do it:
def extract_schema(schema_resp):
l = []
for schema_obj in schema_resp:
r = {}
r['name'] = schema_obj.name
r['type'] = schema_obj.field_type
r['mode'] = schema_obj.mode
if schema_obj.fields:
r['fields'] = extract_schema(schema_obj.fields)
l.append(r)
return l
So you'd just have to run schema = extract_schema(resp.schema) and (hopefully) you'll be good to go.

Resources