Changing IOPub data rate in Jupyterlab - python-3.x

I will preface this by saying that I am very new to python and PostgreSQL, and programming in general. I am currently working on querying data from a PostgreSQL server and storing the data in a python program running on Jupyterlab 0.32.1. Up until this point, I have had no problems with querying this data, but now I am receiving an error.
import psycopg2 as p
ryandata= p.connect(dbname="agent_rating")
rcurr = ryandata.cursor()
rcurr.execute("SELECT ordlog_id FROM eta")
data = rcurr.fetchall()
mylist= []
for i in range(len(data)):
orderid = data[i]
mylist.append(orderid)
print (mylist)
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
--NotebookApp.iopub_data_rate_limit.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
Can anyone help fix this?

Related

cx_Oracle returning zero rows

I am new to connecting to oracle db (version 19.3) through python 3.6. I am not getting any rows in return. Please tell me what am I missing here?
I think all connection is set up correctly because it's not showing any connection error or invalid password error. Tried using fetchall(), fetchmany(), rowcount etc. everything is returning zero. I tried printing the cursor object itself, which is working. I ran the query in my DB. I have Oracle DB 19.3 installed locally, using Oracle SQL developer for running SQL. Anything else I need to install?
import cx_Oracle
conn = cx_Oracle.connect("username", "password", "localhost/orcl")
cur = conn.cursor()
cur.execute('select * from emp')
for line in cur:
print(line)
cur.close()
conn.close()
After running
delete from emp;
your python program will display zero rows, as you report.
You may wish to INSERT one or more rows before running it.
You may also find the interface offered by sqlalchemy to be more convenient.
It offers app portability across Oracle and other DB backends.
Cf https://developer.oracle.com/dsl/prez-python-queries.html

How to to load a Pandas Data frame to Redshfit Server using OBDC driver?

I am relatively new to Python programming. I have searched previously answered questions related to this thoroughly but couldn't find a good solution.
Problem: I am intending to use the ODBC driver installed in my system to connect to Red-shift database. All entities - (Servername, host, port, Username, and password) are configured in the DSN. I was successfully able to make a connection to the database and read the table using the following code:
import pyodbc
import pandas as pd
conn = pyodbc.connect('DSN=AWSDW')
Query = """select *
from <table_name>
limit 10"""
df2 = pd.read_sql(Query,conn)
But the problem is I can get to load this dataframe in Redshift. Below is the code that I am trying to run:
engine = sqlalchemy.create_engine('postgresql+pyodbc://AWSDW')
df2.to_sql('Abhi_Testing_Python_2'
,engine
,schema='sandbox'
,index=False
,if_exists = 'replace')
I know there is something that need to be done in Connection string for create engine. But just don't know what?
I am open to using some other method as long as I don't have to hard code my username and password in the code.
I found that you can't use postgresql dialect with pyodbc driver.
https://www.codepowered.com/manuals/SQLAlchemy-0.6.9-doc/html/core/engines.html
So, I ended up not using the Amazon Driver which I installed. Used psycopg2 instead.
connection_string = 'postgresql+psycopg2://'+username+':'+password+'#'+HOST+':'+str(PORT)+'/'+DATABASE
engine = create_engine(connection_string)
This works. Only drawback was I had to hard code the HOST name in my code.

Google Colab: "Unable to connect to the runtime" after uploading Pytorch model from local

I am using a simple (not necessarily efficient) method for Pytorch model saving.
import torch
from google.colab import files
torch.save(model, filename) # save a trained model on the VM
files.download(filename) # download the model to local
best_model = files.upload() # select the model just downloaded
best_model[filename] # access the model
Colab disconnects during execution of the last line, and hitting RECONNECT tab always shows ALLOCATING -> CONNECTING (fails, with "unable to connect to the runtime" message in the left bottom corner) -> RECONNECT. At the same time, executing any one of the cells gives Error message "Failed to execute cell, Could not send execute message to runtime: [object CloseEvent]"
I know it is related to the last line, because I can successfully connect with my other google accounts which doesn't execute that.
Why does it happen? It seems the google accounts which have executed the last line can no longer connect to the runtime.
Edit:
One night later, I can reconnect with the google account after session expiration. I just attempted the approach in the comment, and found that just files.upload() the Pytorch model would lead to the problem. Once the upload completes, Colab disconnects.
Try disabling your ad-blocker. Worked for me
(I wrote this answer before reading your update. Think it may help.)
files.upload() is just for uploading files. We have no reason to expect it to return some pytorch type/model.
When you call a = files.upload(), a is a dictionary of filename - a big bytes array.
{'my_image.png': b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR....' }
type(a['my_image.png'])
Just like when you do open('my_image', 'b').read()
So, I think the next line best_model[filename] try to print the whole huge bytes array, which bugs the colab.

Python Multiprocessing How can I make script faster?

Python 3.6
I am writing a script to automate me checking to make sure all the links on a website for work.
I have a version of it but it runs slow because the python interpreter is only running one request at a time.
I imported selenium to pull the links down in a list. I started with a list of 41000 links. I got rid of the duplicates now I am down to 7300 links in my list. I am using the request module to just check for the response code. I know multiprocessing is the answer just see a bunch of different methods. Which is the best for my needs?
The only thing I need to keep in mind I can't run to many threads at once so I don't send our webserver threads on our server sky high with request.
Here is the function that checks the links with the python requests module that I am trying to speed up:
def check_links(y):
for ii in y:
try:
r = requests.get(ii.get_attribute('href'))
rc = r.status_code
print(ii.get_attribute('href'), ' ', rc)
except Exception as e:
logf.write(str(e))
finally:
pass
If you just need to apply the same function to all the items in a list, you just need to use a process pool, and map over you inputs. Here is a simple example:
from multiprocessing import pool
def square(x):
return {x: x**2}
p = pool.Pool()
results = p.imap_unordered(square, range(10))
for r in results:
print(r)
In the example I use imap_unordered, but also look at map and imap. You should choose the one that matches your needs the best.

pyldavis Unable to view the graph

I am trying to visually depict my topics in python using pyldavis. However i am unable to view the graph. Is it that we have to view the graph in the browser or will it get popped upon execution. Below is my code
import pyLDAvis
import pyLDAvis.gensim as gensimvis
print('Pyldavis ....')
vis_data = gensimvis.prepare(ldamodel, doc_term_matrix, dictionary)
pyLDAvis.display(vis_data)
The program is continuously in execution mode on executing the above commands. Where should I view my graph? Or where it will be stored? Is it integrated only with the Ipython notebook?Kindly guide me through this.
P.S My python version is 3.5.
This not work:
pyLDAvis.display(vis_data)
This will work for you:
pyLDAvis.show(vis_data)
I'm facing the same problem now.
EDIT:
My script looks as follows:
first part:
import pyLDAvis
import pyLDAvis.sklearn
print('start script')
tf_vectorizer = CountVectorizer(strip_accents = 'unicode',stop_words = 'english',lowercase = True,token_pattern = r'\b[a-zA-Z]{3,}\b',max_df = 0.5,min_df = 10)
dtm_tf = tf_vectorizer.fit_transform(docs_raw)
lda_tf = LatentDirichletAllocation(n_topics=20, learning_method='online')
print('fit')
lda_tf.fit(dtm_tf)
second part:
print('prepare')
vis_data = pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer)
print('display')
pyLDAvis.display(vis_data)
The problem is in the line "vis_data = (...)".if I run the script, it will print 'prepare' and keep on running after that without printing anything else (so it never reaches the line "print('display')).
Funny thing is, when I just run the whole script it gets stuck on that line, but when I run the first part, got to my console and execute purely the single line "vis_data = pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer)" this is executed in a couple of seconds.
As for the graph, I saved it as html ("simple") and use the html file to view the graph.
I ran into the same problem (I use PyCharm as IDE) The problem is that pyLDAvize is developed for Ipython (see the docs, https://media.readthedocs.org/pdf/pyldavis/latest/pyldavis.pdf, page 3).
My fix/workaround:
make a dict of lda_tf, dtm_tf, tf_vectorizer (eg., pyLDAviz_dict)dump the dict to a file (eg mydata_pyLDAviz.pkl)
read the pkl file into notebook (I did get some depreciation info from pyLDAviz, but that had no effect on the end result)
play around with pyLDAviz in notebook
if you're happy with the view, dump it into html
The cause is (most likely) that pyLDAviz expects continuous user interaction (including user-initiated "exit"). However, I rather dump data from a smart IDE and read that into jupyter, than develop/code in jupyter notebook. That's pretty much like going back to before-emacs times.
From experience this approach works quite nicely for other plotting rountines
If you received the module error pyLDA.gensim, then try this one instead:
import pyLdAvis.gensim_models
You get the error because of a new version update.

Resources