gt table in Jupyter? - jupyter-lab

I'm attempting to produce reasonable visual tables in R using JupyterLab without success.
Specifically, I'd like this code chunk, which works beautifully in RStudio, to work in Jupyter:
library(gt)
library(gtExtras)
head(mtcars) %>%
gt() %>%
gt_theme_538()
In Jupyter this produces a list of 17 table elements, not a formatted table.

Related

Pandas DataFrame indexing problem from 40,000 to 49,999

I have a strange problem with my code (At least it is strange for me!).
I have a Pandas DataFrame called "A". One of the column names is "asin". I want to execute all specific rows including my data. So I write this simple code:
df2 = A[A['asin']=='B0000UYZG0']
And it works normally as expected, except for data from 40,000 to 499,999!!
It doesn't work on these data series at all!
Refer to the picture, df2 = A[A['asin']=='0077614992'] (related to 50,000) works but df2 = A[A['asin']=='B0006O0WW6'] (related to 49,999) does not work!
I do not have tried all 10,000 data! But randomly I test them and have no answer.
I have grow accustomed to fixing bugs such as this one, usually when that happen is because of an alternative dtype or maybe because the string you see displayed to you isn't actually THE string itself. It seen your issue is mostly on the second part.
So lets first clear your "string" column from any white spaces.
df2['asin'] = df2.asin.str.strip()
# I am going with the idea that that is your non functional df object
After that try rerunning your filter
df2[df2['asin'].eq('0077614992')]

Pandas read_csv - dealing with columns that have ID numbers with consecutive '$' and '#' symbols, along with letters and digits

I'm trying to read a csv file with a column of data that has a scrambled ID number that includes the occasional consecutive $$ along with #, numbers, and letters.
SCRAMBLE_ID
AL9LLL677
AL9$AM657
$L9$$4440
#L9$306A1
etc.
I tried the following:
df = pd.read_csv('MASTER~1.CSV',
dtype = {'SCRAMBLE_ID': str})
which rendered the third entry as L9$4440 (L9 appear in serif font, italicized, and the first and second $ vanish).
Faced with an entire column of ID numbers configured in this manner, what is the best way of dealing with such data? I can imagine:
PRIOR TO pd.read_csv: replacing the offending symbols with substitutes that don't create this problem (and what would those be), OR,
is there a way of preserving the IDs as is but making them into a data type that ignores these symbols while keeping them present?
Thank you. I've attached a screenshot of the .csv side by side with resulting df (Jupyter notebook) below.
csv column to pandas df with $$
I cannot replicate this using the same values as you in a mock CSV file.
Are you sure that the formatting based on the $ symbol is not occurring in wherever you are rendering your dataframe values? Have you checked to see if the data in the dataframe is what you expect or are you just rendering it externally?

Pyspark/jupyter notebook display issue with database

I try to use PySpark with jupyter notebook. But when I want to see (a part of) the dataframe,
...(some columns are even not shown).
I would like to have a display
.
Any idea how to do it?
Your dataframe is semicolon separated.
Pass that as a separator
df = spark.read.csv(path,sep=';')

Python sort data and load to excel

I am working on a task in which I scan 1000+ emails of candidates and assign points based on their relevance as per requirement. I want to export this data to excel but in sorted order. Sorting order would be, the profile with maximum points (the most relevant profile) should go on top (order by points desc). I have python 3.3.5 on windows 7 32 bit.
I searched and understood that I might need pandas module to store the data in a dataframe and then sort that on my column and load into excel file. I then tried installing pandas using
pip install pandas
on both cmd as well cmd (run as administrator) but it gives an error -
Command "python setup.py egg_info" failed with error code 1 in c:\users\sanket~1
\appdata\local\temp\pip-build-ihqwe4\pandas\
can someone please help me on this sorting issue and suggest how can I resolve installation error of pandas? Or is there any other way to sort the data?
I'm not sure why you need pandas to do this. I would just store them as a sorted list of (score,candidate) tuples and then just write as a CSV:
tuples = ... # read your dataset as a list of (score,candidate) tuples
tuples.sort(reverse=True) # will sort by first element of tuples (score) in descending order
f = open('output.csv','w')
for e in tuples:
f.write("%s,%s" % e)
f.close()
The resulting CSV file can be opened in Excel

Table columns not perfectly aligning right even when using format specifications in python

I want to produce a table using python 3.6 and using string formatting to align the columns to the right. However, I can't get it to properly align to the right no matter how high I adjust the width. Is there something I should be adding to the code?
Here's the code
x = "{0:>5} {1:>20} {2:>35}"
for i in range(1,11):
print (x.format(i,i*(10**i),i*(10**(2*i))))
Here's the intended vs actual output (image of output)

Resources