DataFrame entries got round off when converted to txt

DataFrame entries got round off when converted to txt - python-3.x

This is what the dataframe looks like before exporting
After that it becomes
Rounding down is not what I want here; I want the text in txt.file look like what it is shown in the console. So how can I fix this? Any simple solutions?

Did you try writing directly from your Pandas dataframe instead of going through Numpy?
Try DF.to_csv(‘output.txt’, sep=‘\t’, float_format=‘%g’)
For more details see pandas.DataFrame.to_csv

Related

Tried to convert datafram to series in jupyter

I data framed the following CSV:
I ran the loop for it:
The result:
When trying to print s out of loop it is only showing the volume column and not the others:

This is the expected behavior, as you are assigning new Series to s instead of appending it. At the end of the for loop, s will only be the Series containing the volume column.
You can take a look at this page to learn more about appending Series to an existing series.
To be brief you should replace
s = pd.Series(df2[column])
you would do the following
s = s.append(pd.Series(df2[column])),
although i'm not sure why you would like to do that! If you go through the documentation you can see that you can reindex while appending by running the following code instead
s = s.append(pd.Series(df2[column]), ignore_index=True)

Display full Pandas dataframe in Jupyter without index

I have a pandas dataframe that I would like to pretty-print in full (it's ~90 rows) in a Jupyter notebook. I'd also like to display it without the index column, if possible. How can I do that?

For pretty-printing without an index, I think the right approach is to call the display method for HTML (which is what jupyter does under the hood):
from IPython.display import HTML
HTML(df.to_html(index=False))
(Credit to Display pandas dataframe without index)
As others have suggested you can use pd.display_max_rows() for the row count limitation.

In pandas you can use this
pd.set_option("display.max_rows", None, "display.max_columns", None)
please use this.
Without index use additionally.
df.to_string(index=False)

Pyspark/jupyter notebook display issue with database

I try to use PySpark with jupyter notebook. But when I want to see (a part of) the dataframe,
...(some columns are even not shown).
I would like to have a display
.
Any idea how to do it?

Your dataframe is semicolon separated.
Pass that as a separator
df = spark.read.csv(path,sep=';')

Can't correctly read .csv file

When importing a .csv, I saved the result as a pandas DataFrame as follows:
csv_dataframe= pd.DataFrame(pd.read_csv(r'filepath.csv', delimiter=';', encoding='iso-8859-1', decimal=',', low_memory=False))
However, when I call a specific column that has numbers and letters, it ignores some of the characters or adds others. For example, in column 'A', there are elements similar to this:
'ABC123456789'
'123456789'
'1234567'
and when I call:
csv_dataframe['A']
The results are:
'ABC123456789'
'1234567342'
'3456475'
So, some of the values are correct but, in others, it changes the values, adding or removing elements. In some cases it even alters their length.
Is there some form of changing the way that other programs read .csv files in the .csv file, for example? That is, is there an option in the .csv file that masks values that isn't noticeable when openning it? Or, did I make any mistake when calling the file/functions, please?
Thank you very much.

Try removing 'pd.DataFrame()'
pd.read_csv already creates a dataframe
This should work:
csv_dataframe= pd.read_csv(r'filepath.csv', delimiter=';', encoding='iso-8859-1', decimal=',', low_memory=False)
It might fix your issue, other than that, I'm willing to bet the issue is in the CSV.

I need to change the value of a specific column of a dataframe using condition format while imported multiple Excel file

import pandas as pd
batch=pd.read_excel('batch.xlsx')
stock_report=pd.read_excel('Stock_Report.xlsx')
Result_stock=pd.merge(stock_report,batch[['Batch','Cost price']], on='Batch').fillna(0)
Result_stock2=pd.merge(Result_stock,batch[['Item number',' Batch MRP']], on='Item number').fillna(0)
Result_stock2['Total']=Result_stock2['Posted quantity']*Result_stock2['Cost price']
I need to change the value of Column(Total) for Result_stock2 by multiplying it with two column value if it has 0.

You need to learn some formatting. Please format your code so we can read.

If I understood what you mean and your script is working fine so far, you should just simply add:
Result_stock2.loc[Result_stock2['Total']==0,'Total']=(****OPERATION YOU NEED****)
example in 'OPERATION'
Result_stock2.loc[Result_stock2['Total']==0,'Posted quantity']*(Result_stock2.loc[Result_stock2['Total']==0,'Cost price']-5)
It's not a beautiful code but will do what you need.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

DataFrame entries got round off when converted to txt - python-3.x

This is what the dataframe looks like before exporting After that it becomes Rounding down is not what I want here; I want the text in txt.file look like what it is shown in the console. So how can I fix this? Any simple solutions?

Did you try writing directly from your Pandas dataframe instead of going through Numpy? Try DF.to_csv(‘output.txt’, sep=‘\t’, float_format=‘%g’) For more details see pandas.DataFrame.to_csv

Related

Tried to convert datafram to series in jupyter

Display full Pandas dataframe in Jupyter without index

Pyspark/jupyter notebook display issue with database

Can't correctly read .csv file

I need to change the value of a specific column of a dataframe using condition format while imported multiple Excel file

Categories

Resources