How to align Pandas DataFrame column number text in Jinja - python-3.x

I rendered a Pandas Dataframe to a webpage through Jinja but noticed the number column is left aligned.
When I tried applying the code below on the particular column to align right and loaded the webpage.
df = df.style.set_properties(subset=["col1", "col2"], **{'text-align': 'right'})
It gives an error on the browser page. Funny enough it works perfectly when tried on Jupyter Notebook
TypeError: 'Styler' object is not subscriptable
What I want is the number column to align right.
Anyone has a better solution.

I couldn't get a Pandas or Jinja solution that worked. However I stumbled on a this and that solved the whole issue.
It was a CSS trick. I simply had to identify the specific column and applied the code below in my Style.css file.
tbody>tr>:nth-child(5){
text-align:right;
}
The '5' being the column number.
Credit to Charles Riebeling
I believe this will be of help to someone.

Related

unsupported operand type(s) for /: 'str' and 'str' when using . pct_change()

My code is very simple. I imported a .csv file with daily prices of 10 different indices. Then it looks like this
When using the code
TVOL_Port.pct_change()
I get the error message from the title. Obviously the prices aren't floats but strings but other functions like .describe(), .cov(), .corr() and so on work fine.
I used the same numbers in another jupyter notebooks where it worked fine as well but there I only used one column instead of 10 in this example.
If I am right, and all prices are strings, how can I convert them into floats? I am beginner in programming. Many thanks for your help.
Problem solved by indexing the first column 'date'.

Display full Pandas dataframe in Jupyter without index

I have a pandas dataframe that I would like to pretty-print in full (it's ~90 rows) in a Jupyter notebook. I'd also like to display it without the index column, if possible. How can I do that?
For pretty-printing without an index, I think the right approach is to call the display method for HTML (which is what jupyter does under the hood):
from IPython.display import HTML
HTML(df.to_html(index=False))
(Credit to Display pandas dataframe without index)
As others have suggested you can use pd.display_max_rows() for the row count limitation.
In pandas you can use this
pd.set_option("display.max_rows", None, "display.max_columns", None)
please use this.
Without index use additionally.
df.to_string(index=False)

Fails to display certain columns data in Matplotlib

Given a dataframe as follows:
date,unit_value,unit_value_cumulative,daily_growth_rate
2019/1/29,1.0139,1.0139,0.22
2019/1/30,1.0057,1.0057,-0.81
2019/1/31,1.0122,1.0122,0.65
2019/2/1,1.0286,1.0286,1.62
2019/2/11,1.0446,1.0446,1.56
2019/2/12,1.0511,1.0511,0.62
2019/2/13,1.0757,1.0757,2.34
2019/2/14,1.0763,1.0763,0.06
2019/2/15,1.0554,1.0554,-1.94
2019/2/18,1.0949,1.0949,3.74
2019/2/19,1.0958,1.0958,0.08
I have used the code below to plot them, but as you can see from out image, one column doesn't display on the plot.
df.plot(x='date', y=['unit_value', 'unit_value_cumulative', 'daily_growth_rate'], kind="line")
Output:
To plot unit_value only, I use: df.plot(x='date', y=['unit_value'], kind="line")
Out:
Anyone could help to figure out why it doesn't work out when I plot three columns on same plot? Thanks.
I just reproduced your results and it actually does work fine. In your case the values of the columns "unit_value" and "unit_value_cumulative" are identical, which is why you only see the one in the front.
Besides of this problem your current data looks like you made a mistake when calculating the cumulative values.

Spark - Have I read from csv correctly?

I read a csv file into Spark using:
df = spark.read.format(file_type).options(header='true', quote='\"',
ignoreLeadingWhiteSpace='true',inferSchema='true').load(file_location)
When I tried it with sample csv data from another source and did diplsay(df) it showed a neatly displayed header row followed by data.
When I try it on my main data, which has 40 columns, and millions of rows, it simply displays the first 20 column headers and no data rows.
Is this normal behavior or is it reading it wrong?
Update:
I shall mark the question as answered as the tips below are useful. However my results from doing:
df.show(5, truncate=False)
currently shows:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |��"periodID","DAXDate","Country Name","Year","TransactionDate","QTR","Customer Number","Customer Name","Customer City","Document Type Code","Order Number","Product Code","Product Description","Selling UOM","Sub Franchise Code","Sub Franchise Description","Product Major Code","Product Major Description","Product Minor Code","Product Minor Description","Invoice Number","Invoice DateTime","Class Of Trade ID","Class Of Trade","Region","AmountCurrencyType","Extended Cost","Gross Trade Sales","Net Trade Sales","Total(Ext Std Cost)","AdjustmentType","ExcludeComment","CurrencyCode","fxRate","Quantity","FileName","RecordCount","Product Category","Direct","ProfitCenter","ProfitCenterRegion","ProfitCenterCountry"| +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I shall have to go back to basics an preview the csv in a text editor to find out what the correct format is for this file to figure out what's going wrong. Note, I had to update my code to the following to deal with pipe delimter:
df = spark.read.format(file_type).options(header='true', quote='\"', delimiter='|',ignoreLeadingWhiteSpace='true',inferSchema='true').load(file_location)
Yes this is normal beheaviour. The dataframe function show() has a default value to display 20 rows. You can set a different value for that (but keep in mind that it doesn't make sense to print all rows of your file) and also stop it from truncating. For example:
df.show(100, truncate=False)
It is a normal behaviour. You can view the content of your data in different ways:
show(): Show you in a formatted way the first 20 rows. You can specify as argument the number of rows you want to display (if you provide a value much higher that your data is ok!). Columns will be truncated too, as a default configuration. You can specify truncate=False to show all the columns. (like #cronoik correctly said in his answer).
head(): The same as show(), but it prints the date in a "row" format. Does not provide a nice formatted table, it is useful for a quick complete look of your data, for example with head(1) to show only the first row.
describe().show(): you can show a summary that gives you an insight of the data. For example, shows you the count of elements, the min/max/avg value of each column.
It is normal for Spark dataframes to display limited rows and columns. Your reading of the data should not be a problem. However, to confirm that you have read the csv correctly you can try to see the number of rows and columns in the df, using
len(df.columns)
or
df.columns
For number of rows
df.count()
In case you need to see the content in detail you can use the option stated by cronoik.

I need to change the value of a specific column of a dataframe using condition format while imported multiple Excel file

import pandas as pd
batch=pd.read_excel('batch.xlsx')
stock_report=pd.read_excel('Stock_Report.xlsx')
Result_stock=pd.merge(stock_report,batch[['Batch','Cost price']], on='Batch').fillna(0)
Result_stock2=pd.merge(Result_stock,batch[['Item number',' Batch MRP']], on='Item number').fillna(0)
Result_stock2['Total']=Result_stock2['Posted quantity']*Result_stock2['Cost price']
I need to change the value of Column(Total) for Result_stock2 by multiplying it with two column value if it has 0.
You need to learn some formatting. Please format your code so we can read.
If I understood what you mean and your script is working fine so far, you should just simply add:
Result_stock2.loc[Result_stock2['Total']==0,'Total']=(****OPERATION YOU NEED****)
example in 'OPERATION'
Result_stock2.loc[Result_stock2['Total']==0,'Posted quantity']*(Result_stock2.loc[Result_stock2['Total']==0,'Cost price']-5)
It's not a beautiful code but will do what you need.

Resources