I'm following the CS50 course and am having a problem with application.py. I was getting the following warnings in the Cloud 9 code editor (Ace):
instance of SQLAlchemy has no column member
instance of SQLAlchemy has no integer member
instance of SQLAlchemy has no text member
instance of scoped_session has no add member
instance of scoped_session has no commit member
Class Registrants has no query member
I created a file .pylintrc in the home directory and added the following two lines:
This got rid of most of the errors but I'm left with:
instance of scoped_session has no add member
instance of scoped_session has no commit member
Here's the code that causing the problem:
from flask import Flask, render_template, redirect, request, url_for
from flask_sqlalchemy import SQLAlchemy
app = Flask(__name__)
# Flask-SQLAlchemy
app.config["SQLALCHEMY_DATABASE_URI"] = "sqlite:///froshims3.db"
app.config["SQLALCHEMY_ECHO"] = True
db = SQLAlchemy(app)
class Registrant(db.Model):
__tablename__ = "registrants"
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.Text)
dorm = db.Column(db.Text)
def __init__(self, name, dorm):
self.name = name
self.dorm = dorm
def index():
return render_template("index.html")
#app.route("/register", methods=["POST"])
def register():
if request.form["name"] == "" or request.form["dorm"] == "":
return render_template("failure.html")
registrant = Registrant(request.form["name"], request.form["dorm"])
return render_template("success.html")
def registrants():
rows = Registrant.query.all()
return render_template("registrants.html", registrants=rows)
#app.route("/unregister", methods=["GET", "POST"])
def unregister():
if request.method == "GET":
rows = Registrant.query.all()
return render_template("unregister.html", registrants=rows)
elif request.method == "POST":
if request.form["id"]:
Registrant.query.filter(Registrant.id == request.form["id"]).delete()
return redirect(url_for("registrants"))
needed to add in the .pylintrc file:
apparently Python is creating some classes at run time and pylint isn't able to pick that information up.
Not really happy with this answer as it ignores the problem rather than fixes it. If anyone has a better solution please let me know. The Staff at CS50 is looking into this but no other solution yet.
Using VSCode I had to add the following item in the Python › Linting: Pylint Args settings.
I have an old question looking for a fresh answer. I've tried the recipes presented in the similar but somewhat aged question "Start a flask application in a seperate thread", and some other similar solutions found in other posts.
The long and short of it is, I need to start a flask application in a 'background' thread, such that a wxPython GUI can run in the foreground. The solutions presented here seem to no longer have the desired effect. The flask app starts and the GUI never runs.
My suspicion is, the existing answers are out of date. That said, I'm open to the possibility that I've mangled something else that's hosing it up, please have a peek and advise accordingly.
Thanks for your eyeballs and brain cycles :)
My code follows.
#!/usr/bin/env python
integrator.py (the app)
import wx
from pubsub import pub
from flask import Flask
from flask_graphql import GraphQLView
from models import db_session
from schema import schema
from models import engine, db_session, Base, Idiom
flaskapp = Flask(__name__)
flaskapp.debug = True
def shutdown_session(exception=None):
class IntegratorTarget(wx.TextDropTarget):
def __init__(self, object):
self.object = object
def OnDropText(self, x, y, data):
pub.sendMessage('default', arg1=data)
return True
class IntegratorFrame(wx.Frame):
def __init__(self, parent, title):
super(IntegratorFrame, self).__init__(parent, title = title,size = wx.DisplaySize())
self.panel = wx.Panel(self)
box = wx.BoxSizer(wx.HORIZONTAL)
dropTarget = IntegratorTarget(self.panel)
pub.subscribe(self.catcher, 'default')
def catcher(self,arg1):
data = arg1
ex = wx.App()
-- eof --
""" models.py """
from sqlalchemy import *
from sqlalchemy.orm import (scoped_session, sessionmaker, relationship, backref)
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine('sqlite:///.praxis/lexicon/unbound.db3', convert_unicode=True)
db_session = scoped_session(sessionmaker(autocommit=False,
Base = declarative_base()
# We will need this for querying
Base.query = db_session.query_property()
class Idiom(Base):
__tablename__ = "idiomae"
id = Column(Integer, primary_key=True)
src = Column(String) # the text of the drag/paste operation
taxonomy = Column(String) # the type of resource referenced in the drag/paste operation
localblob = Column(String) # local path to media referenced in 'src'
timestamp = Column(DateTime) # date and time of capture
-- eof --
""" schema.py """
import graphene
from graphene import relay
from graphene_sqlalchemy import SQLAlchemyObjectType, SQLAlchemyConnectionField
from models import db_session, Idiom as IdiomaticModel
class Idiom(SQLAlchemyObjectType):
class Meta:
model = IdiomaticModel
interfaces = (relay.Node, )
class Query(graphene.ObjectType):
node = relay.Node.Field()
# Allows sorting over multiple columns, by default over the primary key
all_idioms = SQLAlchemyConnectionField(Idiom.connection)
# Disable sorting over this field
# all_departments = SQLAlchemyConnectionField(Department.connection, sort=None)
schema = graphene.Schema(query=Query)
I see where you tell Flask to be multi-threaded, but I don't see where you're starting up the Flask app in a thread. I expected to see something like
app = Flask(__name__)
# any extra configuration
def webserver():
web_thread = threading.Thread(target=webserver)
... continue on with the main thread
I have a working example you can crib from here. Note the need to use appropriate locking of any data structures shared between the primary thread and the threads running Flask.
I'm using Scrapy to grab domains and their creation date using the Whois module. I am then adding them to a MySQL database using SqlAlchemy but I get the below error when adding the creation date to the database because the data type is <class 'datetime.datetime'>
sqlalchemy.orm.exc.UnmappedInstanceError: Class 'datetime.datetime' is not mapped
I tried to convert the date into a string but then I get another error.
class SaveDomainsPipeline(object):
def __init__(self):
engine = db_connect()
self.Session = sessionmaker(bind=engine)
def process_item(self, item, spider):
session = self.Session()
domain = Domains(**item)
domain_item = item['domain']
domain_whois = whois.query(domain_item)
creation_date = domain_whois.creation_date
session.add_all([domain, creation_date])
class Domains(Base):
__tablename__ = "domains"
id = Column(Integer, primary_key=True)
date_added = Column(DateTime(timezone=True), server_default=func.now())
domain = Column('domain', Text())
creation_date = Column('creation_date', DateTime(timezone=True))
#creation_date = Column('creation_date', Text()) -- I also tried this
I made a rookie mistake in my original code.
As I initiated an instance of the class "Domains", I had to refer to it when populating the columns which I had originally missed. The working code can be found below.
class SaveDomainsPipeline(object):
def __init__(self):
engine = db_connect()
self.Session = sessionmaker(bind=engine)
def process_item(self, item, spider):
session = self.Session()
domains = Domains() #initiate instance of Domains class.
domains.domain = item['domain'] #Add the item "domain" from Items to DB
domain_whois = whois.query(domains.domain)
domains.creation_date = domain_whois.creation_date #Add the creation date to DB
#save the instance which saves both the domain item and creation date.
I am unable to run scrapy through my pipeline to my local database. I have already installed mysql-connector-python 8.0.19 and am able to write data to the database within the same project but outside of a Scrapy pipeline . Can someone please help i can't figure out why it isn't working.
When i try to send data via scrapy pipeline i get the following error:
[twisted] CRITICAL: Unhandled error in Deferred:
File "C:\Users\Viking\PycharmProjects\Indigo_Scrp\IndgoScrp\IndgoScrp\pipelines.py", line 7, in <module>
from mysql.connector import (connection)
ModuleNotFoundError: No module named 'mysql
Here is my code for the pipeline :
from mysql.connector import (connection)
from mysql.connector import errorcode
class IndgoscrpPipeline(object):
def __init__(self):
def create_connection(self):
self.conn = connection.MySQLConnection(
self.curr = self.conn.cursor()
def open_spider(self, spider):
print("spider open")
def process_item(self, item, spider):
print("Saving item into db ...")
return item
def close_spider(self, spider):
def mysql_connect(self):
return self.curr.connect(**self.conf)
except self.curr.Error as err:
if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
print("Something is wrong with your user name or password")
elif err.errno == errorcode.ER_BAD_DB_ERROR:
print("Database does not exist")
def create_table(self):
self.curr.execute(""" DROP TABLE IF EXISTS indigo""")
self.curr.execute(""" Create table indigo(
Product_Name text,
Product_Author text,
Product_Price text,
Product_Image text
def process_item(self, item, spider):
def store_db(self, item):
self.curr.execute("""Insert Into indigo values (%s,%s,%s,%s)""",
return item
Here is my code from my spider
import scrapy
from ..items import IndScrItem
class IndgoSpider(scrapy.Spider):
name = 'Indgo'
start_urls = ['https://www.chapters.indigo.ca/en-ca/books/?link-usage=Header%3A%20books&mc=Book&lu=Main']
def parse(self, response):
items = IndScrItem()
Product_Name= response.css('.product-list__product-title-link--grid::text').getall(),
Product_Author= response.css('.product-list__contributor::text').getall(),
Product_Price= response.css('.product-list__price--orange::text').getall(),
Product_Image= response.css('.product-image--lazy::attr(src)').getall()
items['Product_Name'] = Product_Name
items['Product_Author'] = Product_Author
items['Product_Price'] = Product_Price
items['Product_Image'] = Product_Image
yield items
This is the line in the settings file that i have to enable pipelines
'IndgoScrp.pipelines.IndgoscrpPipeline': 100,
I actually found the issue was tied to having previously pip installed the wrong version of mysql-connector even though through my ide pycharm i had installed the correct one python was confused. After uninstalling both and reinstalling mysql-connector-python it was able to run.
Below is my directory structure:
|___ /sub_directory
|__ xyz.py
Below is my xyz.py code:
from flask import Flask, request, redirect, url_for,send_from_directory, jsonify, render_template
import mysql.connector
from mysql.connector import Error
app = Flask(__name__)
connection = mysql.connector.connect(host='',database='test',user='root',password='')
if connection.is_connected():
db_Info = connection.get_server_info()
cursor = connection.cursor()
cursor.execute("select id,name from skill_category;")
record = cursor.fetchall()
out = [item for t in record for item in t]
except Error as e:
print("Error while connecting to MySQL",e)
#app.route('/', methods=['GET'])
def dropdown():
val = record
return render_template('Rankcv.html', val = val)
#app.route('/get-subskills', methods=['POST'])
def get_subskills():
skills = request.form['select_skills']
cursor.execute("SELECT skill_items.name FROM skill_items WHERE skill_items.category_id = " + skills + " ;")
record = cursor.fetchall()
out = [item for t in record for item in t]
return jsonify(something)
if __name__ == "__main__":
Now I have to use the value of variable out and skills in abc.py.
I tried importing xyz directly and tried to retrieve the value using function name (get_subskills), but it didnt work. Can someone please explain how to solve this?
Import the abc function into xyz.
I've written a few spiders that pull similar data from different sources. I've also written a pipeline that allows this data to be put in a database. I want to be able to use the same code for multiple spiders to output to different tables, named dynamically from the spider name.
Here is the pipeline.py code:
class DbPipeline(object):
def __init__(self):
Initialises database connection and sessionmaker.
Creates table if it doesn't exist.
engine = db_connect()
self.Session = sessionmaker(bind=engine)
def process_item(self, item, spider):
Saves scraped products in database
exists = self.check_item_exists(item)
if not exists:
session = self.Session()
product = Products(**item)
return item
def check_item_exists(self,item):
session = self.Session()
product = Products(**item)
result = session.query(Products).filter(Products.title == item['title']).first()
return result is not None
And here is the model.py file:
DeclarativeBase = declarative_base()
def create_output_table(engine):
def db_connect():
Connects to database from settings defined in settings.py
Returns an sqlalchemy engine instance
return create_engine(URL(**settings.DATABASE))
class Products(DeclarativeBase):
"""Sqlalchemy table model"""
__tablename__ = "name"
id = Column(Integer, primary_key=True)
title = Column('title', String(200))
price = Column('price', String(10), nullable=True)
url = Column('url', String(200), nullable=True)
What i'm trying to do is get the __tablename__ variable to be the same as the spider name, which I can easily do in the process_item function as it is passed a spider object and can use spider.name and assign it to a class variable, however the function will run after the table is created/defined. How can I go about getting the spider name outside of the process_item function in the pipelines.py file?
Edit: I've tried the solutions listed in How to access scrapy settings from item Pipeline however access to the 'settings' doesn't give me access to the attributes assigned to the current spider running. I need to dynamically get the name of the spider based on what spider is running the pipelines. Thanks
It's pretty easy to get current spider name in your create_output_table:
class DbPipeline(object):
def from_crawler(cls, crawler):
return cls(crawler.spider.name)
def __init__(self, spider_name):
Initializes database connection and sessionmaker.
Creates deals table.
engine = db_connect()
create_output_table(engine, spider_name)
and (in models.py):
def create_output_table(engine, spider_name):
# now you have your spider_name
The problem here is that Scrapy process your models.py file before your pipelines.py. So you need to find a way to generate your SQLAlchemy model later. You can use this thread as a starting point: Dynamically setting __tablename__ for sharding in SQLAlchemy?