scrapy using python3 logging module issue - python-3.x

I use scrapy 1.1.0, and I have 5 spiders in the "spiders" folder.
In every spider, I try to use python3 logging module. And the code structure like this :
import other modules
import logging
class ExampleSpider(scrapy.Spider):
name = 'special'
def __init__(self):
# other initializations
# set log
self.log = logging.getLogger('special')
self.log.setLevel(logging.DEBUG)
logFormatter = logging.Formatter('%(asctime)s %(levelname)s: %(message)s')
# file handler
fileHandler = logging.FileHandler(LOG_PATH) # LOG_PATH has defined
fileHandler.setLevel(logging.DEBUG)
fileHandler.setFormatter(logFormater)
self.log.addHandler(fileHandler)
# other functions
every spider has the same structure.When I run these spiders, I check the log file, they did exist, but their size are always 0 byte.
And the other question is that when I run one spider, it always generated two or more log files. Like I run a spider, and it will generate a.log and b.log.
Any answers would appreciate.

You can set log file via LOG_FILE setting in settings.py or via command line argument --logfile FILE, i.e. scrapy crawl myspider --logfile myspider.log
As described in the official docs

Related

Logging from .py script works, but logging does not work from compiled executable

My python script produces a log file with lines of logging when I run it from Spyder, but when I run it outside of Spyder (e.g. in the command window, or as an executable), the log file is produced but remains empty.
I confirmed that I'm using full file paths to specify the log file.
I understand that using basicConfig doesn't always work as expected, so I followed the answer provided here. A simplified version of my script is shown below:
# Example inputs
workdir = 'c:\Users\xx\work'
log_file = 'log_2.txt'
# Initialize logging functionality
fileh = logging.FileHandler(os.path.join(workdir, log_file), 'w')
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fileh.setFormatter(formatter)
fileh.setLevel(logging.DEBUG)
log = logging.getLogger() # root logger
for hdlr in log.handlers[:]: # remove all old handlers
log.removeHandler(hdlr)
log.addHandler(fileh) # set the new handler
# Rest of code including log messages (removed rest of script here, only leaving logging-related text)
logging.info('Here is a log message.')
logging.debug('Here is another log message.')
# At the end of the script
logging.shutdown()
Most related questions are when someone can't find the log file that's being written. However, in this case, the log file is created but not written to.
Can someone advise on how this code needs to be modified to also work outside of Spyder?

Python logging use a single logger for entire project where name is defined by arguments from cmd

Hey I am trying to set up a logger for python3.9 for an entire project, with multiple files. I want to just define the logger in main.py using command line arguments to define log file name.
logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s', level='INFO')
logger = logging.getLogger(__name__)
def main():
file_name = sys.argv[1]
lookback_minutes = int(sys.argv[2])
file_handler = logging.FileHandler(f'log/{file_name}.log')
logger.addHandler(file_handler)
logger.info(f'Running processing chain for: {file_name}')
processing_chain.run(lookback_minutes)
processing_chain file:
import logging
def run(lookback_minutes):
logging.info(lookback_minutes)
Which works for main, I get the info statement printed to the log file. However I do not understand how to import it into the files that main calls. How do I bring the file handler into processing_chain file? Currently from what I could understand from other places on stackoverflow, I just import logging and then use logging.info or any other level and it should follow. But it does not log to file, just to console.

Pythonw, pyw, & arg, /B won't make python's http-server run in background

I've tried every method in the title to run it in the background but here are the issues i got trying to use them:
pythonw and pyw: server doesn't work, going to localhost:8000 error with ERR_EMPTY_RESPONSE.
& arg and START/B: doesn't start script in background and instead output server log
so now I'm short on ideas on how to run this script in the background.
Using pythonw, it should help to explicitly redirect stdout and stderr to a file – maybe this behavior is somehow related to the problem described here (although this seems to be specific to Python 2.7). Not capturing the output by redirecting it to os.devnull seems to work as well.
The following script produces a minimum working server example with pythonw for me (using Python 3.7.9):
import http.server
import os
import sys
if __name__ == "__main__":
sys.stdout = sys.stderr = open(os.devnull, "w")
httpd = http.server.HTTPServer(("localhost", 8000), http.server.SimpleHTTPRequestHandler)
httpd.serve_forever()

JupyterLab 3: how to get the list of running servers

Since JupyterLab 3.x jupyter-server is used instead of the classic notebook server, and the following code does not list servers served with jupyter_server:
from notebook import notebookapp
notebookapp.list_running_servers()
None
What still works for the file/notebook name is:
from time import sleep
from IPython.display import display, Javascript
import subprocess
import os
import uuid
def get_notebook_path_and_save():
magic = str(uuid.uuid1()).replace('-', '')
print(magic)
# saves it (ctrl+S)
# display(Javascript('IPython.notebook.save_checkpoint();')) # Javascript Error: IPython is not defined
nb_name = None
while nb_name is None:
try:
sleep(0.1)
nb_name = subprocess.check_output(f'grep -l {magic} *.ipynb', shell=True).decode().strip()
except:
pass
return os.path.join(os.getcwd(), nb_name)
But it's not pythonic nor fast
How to get the current running server instances - and so e.g. the current notebook file?
Migration to jupyter_server should be as easy as changing notebook to jupyter_server, notebookapp to serverapp and changing the appropriate configuration files - the server-related codebase is largely unchanged. In the case of listing servers simply use:
from jupyter_server import serverapp
serverapp.list_running_servers()

Python, Flask print to console and log file simultaneously

I'm using python 3.7.3, with flask version 1.0.2.
When running my app.py file without the following imports:
import logging
logging.basicConfig(filename='api.log',level=logging.DEBUG)
Flask will display relevant debug information to console, such as POST/GET requests and which IP they came from.
As soon as DEBUG logging is enabled, I no longer receive this output. I have tried running my application in debug mode:
app.run(host='0.0.0.0', port=80, debug=True)
But this produces the same results. Is there a way to have both console output, and python logging enabled? This might sound like a silly request, but I would like to use the console for demonstration purposes, while having the log file present for troubleshooting.
Found a solution:
import logging
from flask import Flask
app = Flask(__name__)
logger = logging.getLogger('werkzeug') # grabs underlying WSGI logger
handler = logging.FileHandler('test.log') # creates handler for the log file
logger.addHandler(handler) # adds handler to the werkzeug WSGI logger
#app.route("/")
def index():
logger.info("Here's some info")
return "Hello World"
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)
Other Examples:
# logs to console, and log file
logger.info("Some text for console and log file")
# prints exception, and logs to file
except Exception as ue:
logger.error("Unexpected Error: malformed JSON in POST request, check key/value pair at: ")
logger.error(ue)
Source:
https://docstrings.wordpress.com/2014/04/19/flask-access-log-write-requests-to-file/
If link is broken:
You may be confused because adding a handler to Flask’s app.logger doesn’t catch the output you see in the console like:
127.0.0.1 - - [19/Apr/2014 18:51:26] "GET / HTTP/1.1" 200 -
This is because app.logger is for Flask and that output comes from the underlying WSGI module, Werkzeug.
To access Werkzeug’s logger we must call logging.getLogger() and give it the name Werkzeug uses. This allows us to log requests to an access log using the following:
logger = logging.getLogger('werkzeug')
handler = logging.FileHandler('access.log')
logger.addHandler(handler)
# Also add the handler to Flask's logger for cases
# where Werkzeug isn't used as the underlying WSGI server.
# This wasn't required in my case, but can be uncommented as needed
# app.logger.addHandler(handler)
You can of course add your own formatting and other handlers.
Flask has a built-in logger that can be accessed using app.logger. It is just an instance of the standard library logging.Logger class which means that you are able to use it as you normally would the basic logger. The documentation for it is here.
To get the built-in logger to write to a file, you have to add a logging.FileHandler to the logger. Setting debug=True in app.run, starts the development server, but does not change the log level to debug. As such, you'll need to set the log level to logging.DEBUG manually.
Example:
import logging
from flask import Flask
app = Flask(__name__)
handler = logging.FileHandler("test.log") # Create the file logger
app.logger.addHandler(handler) # Add it to the built-in logger
app.logger.setLevel(logging.DEBUG) # Set the log level to debug
#app.route("/")
def index():
app.logger.error("Something has gone very wrong")
app.logger.warning("You've been warned")
app.logger.info("Here's some info")
app.logger.debug("Meaningless debug information")
return "Hello World"
app.run(host="127.0.0.1", port=8080)
If you then look at the log file, it should have all 4 lines printed out in it and the console will also have the lines.

Resources