Display full Pandas dataframe in Jupyter without index - python-3.x

I have a pandas dataframe that I would like to pretty-print in full (it's ~90 rows) in a Jupyter notebook. I'd also like to display it without the index column, if possible. How can I do that?

For pretty-printing without an index, I think the right approach is to call the display method for HTML (which is what jupyter does under the hood):
from IPython.display import HTML
HTML(df.to_html(index=False))
(Credit to Display pandas dataframe without index)
As others have suggested you can use pd.display_max_rows() for the row count limitation.

In pandas you can use this
pd.set_option("display.max_rows", None, "display.max_columns", None)
please use this.
Without index use additionally.
df.to_string(index=False)

Related

Countif function in Python looping in every cell

enter image description here
Hey everyone I am used to work in excel but recently after getting a dataset of about 500k rows that need to be worked in the same worksheet I have huge capacity issues and I was advised to try and transition any function to a python environment. So this excel function "=IF(COUNTIF($J$2:J3,J3)>1,0,1)"~J is the column of the Asset ID~ goes to each cell and if it has previous encountered it in the cells above it returns 0 and if it is unique it returns 1.
How that would be possible in a python environment if I load my table as a DataFrame?
Thanks in advance
You can use pandas to achieve this very easily:
import pandas as pd # import pandas
df = pd.read_excel('your_file.xlsx') # use appropriate function according to your file type
df['Unique'] = ~df[Asset_Id].duplicated().astype(int) # places 1 where it is not encountered before, 0 elsewhere

Pyspark/jupyter notebook display issue with database

I try to use PySpark with jupyter notebook. But when I want to see (a part of) the dataframe,
...(some columns are even not shown).
I would like to have a display
.
Any idea how to do it?
Your dataframe is semicolon separated.
Pass that as a separator
df = spark.read.csv(path,sep=';')

How to align Pandas DataFrame column number text in Jinja

I rendered a Pandas Dataframe to a webpage through Jinja but noticed the number column is left aligned.
When I tried applying the code below on the particular column to align right and loaded the webpage.
df = df.style.set_properties(subset=["col1", "col2"], **{'text-align': 'right'})
It gives an error on the browser page. Funny enough it works perfectly when tried on Jupyter Notebook
TypeError: 'Styler' object is not subscriptable
What I want is the number column to align right.
Anyone has a better solution.
I couldn't get a Pandas or Jinja solution that worked. However I stumbled on a this and that solved the whole issue.
It was a CSS trick. I simply had to identify the specific column and applied the code below in my Style.css file.
tbody>tr>:nth-child(5){
text-align:right;
}
The '5' being the column number.
Credit to Charles Riebeling
I believe this will be of help to someone.

DataFrame entries got round off when converted to txt

This is what the dataframe looks like before exporting
After that it becomes
Rounding down is not what I want here; I want the text in txt.file look like what it is shown in the console. So how can I fix this? Any simple solutions?
Did you try writing directly from your Pandas dataframe instead of going through Numpy?
Try DF.to_csv(‘output.txt’, sep=‘\t’, float_format=‘%g’)
For more details see pandas.DataFrame.to_csv

pandas method .isin() automatically converts datatypes in element comparisons?

Wondering if someone can help me here. When I take a regular python list containing strings, and check to see if a pandas series (pla.id) has a value that matches a value in that list. It works.
Which is great and all but I wonder how it's able to compare strings to ints... is there documentation somewhere that states that it will convert under the hood before comparing those values??
I wasn't able to find anything on the .isin() page of pandas documentation..
Also super interesting is that when I try pandas indexing it fails due to a type comparison.
So my two questions:
Does pandas.series.isin(some_list_of_strings) method automatically convert the values in the series (which are int values) to strings before doing a value comparison?
If so, why doesn't pandas indexing i.e. (df[df.series == 'some value']) not do the same thing? What is the thinking behind this? If I wanted to accomplish the same thing I would have to do df[df.series.astype(str) == ''] or df[df.series.astype(str).isin(some_list_of_strings)] to access those values in the df that match
After some digging I think this might be due to the pandas object datatype? but I have no understanding of why this works. Also this doesn't explain why the below works... since it is a int dtype
Thanks in advance!

Resources