Display 2 decimal places, and use comma as separator in pandas? - python-3.x

Is there any way to replace the dot in a float with a comma and keep a precision of 2 decimal places?
Example 1 : 105 ---> 105,00
Example 2 : 99.2 ---> 99,20
I used a lambda function df['abc']= df['abc'].apply(lambda x: f"{x:.2f}".replace('.', ',')). But then I have an invalid format in Excel.
I'm updating a specific sheet on excel, so I'm using : wb = load_workbook(filename) ws = wb["FULL"] for row in dataframe_to_rows(df, index=False, header=True): ws.append(row)

Let us try
out = (s//1).astype(int).astype(str)+','+(s%1*100).astype(int).astype(str).str.zfill(2)
0 105,00
1 99,20
dtype: object
Input data
s=pd.Series([105,99.2])

s = pd.Series([105, 99.22]).apply(lambda x: f"{x:.2f}".replace('.', ',')
First .apply takes a function inside and
f string: f"{x:.2f} turns float into 2 decimal point string with '.'.
After that .replace('.', ',') just replaces '.' with ','.
You can change the pd.Series([105, 99.22]) to match it with your dataframe.

I think you're mistaking something in here. In excel you can determine the print format i.e. the format in which numbers are printed (this icon with +/-0).
But it's not a format of cell's value i.e. cell either way is numeric. Now your approach tackles only cell value and not its formatting. In your question you save it as string, so it's read as string from Excel.
Having this said - don't format the value, upgrade your pandas (if you haven't done so already) and try something along these lines: https://stackoverflow.com/a/51072652/11610186
To elaborate, try replacing your for loop with:
i = 1
for row in dataframe_to_rows(df, index=False, header=True):
ws.append(row)
# replace D with letter referring to number of column you want to see formatted:
ws[f'D{i}'].number_format = '#,##0.00'
i += 1

well i found an other way to specify the float format directly in Excel using this code :
for col_cell in ws['S':'CP'] :
for i in col_cell :
i.number_format = '0.00'

Related

Splitting the data of one excel column into two columns sing python

I have problem of splitting the content of one excel column which contains numbers and letters into two columns the numbers in one column and the letters in the other.
As can you see in the first photo there is no space between the numbers and the letters, but the good thing is the letters are always "ms". I need a method split them as in the second photo.
Before
After
I tried to use the replace but it did not work. it did not split them.
Is there any other method.
You can use the extract method. Here is an example:
df = pd.DataFrame({'time': ['34ms', '239ms', '126ms']})
df[['time', 'unit']] = df['time'].str.extract('(\d+)(\D+)')
# convert time column into integer
df['time'] = df['time'].astype(int)
print(df)
# output:
# time unit
# 0 343 ms
# 1 239 ms
# 2 126 ms
It is pretty simple.
You need to use pandas.Series.str.split
Attaching the Syntax here :- pandas.Series.str.split
The Code should be
import pandas as pd
data_before = {'data' : ['34ms','56ms','2435ms']}
df = pd.DataFrame(data_before)
result = df['data'].str.split(pat='(\d+)',expand=True)
result = result.loc[:,[1,2]]
result.rename(columns={1:'number', 2:'string'}, inplace=True)
Output : -
print(result)
Output

Extract substrings from irregular text in Excel cell

I am trying to solve this problem -
If suppose I have text like this in a single column on Excel
#22-atr$$1 AM**01-May-2015&&
$21-atr#10-Jan-2007*6 PM&
&&56-atr#11 PM$$8-Jan-2016*
**4 PM#68-atr#21-Mar-2022&&
and I want to write functions to have separate columns as follows
Can someone help me do that please?
I am trying to solve this problem and the only thing that I was able to arrive to is extracting Month by using =MID(A1,FIND("-",A1)+1,3)
One option for formulae would be using new functions, currently available in the BETA-channel for insiders:
Formula in B1:
=LET(A,TEXTSPLIT(A1,{"#","$","&","*","#"},,1),B,SORTBY(A,IFERROR(MATCH(RIGHT(A),{"r","M"},0),3)),C,HSTACK(TAKE(B,,2),TEXTSPLIT(TEXT(--INDEX(B,3),"YYYY-Mmm-D"),"-")),IFERROR(--C,C))
The idea is to:
Use LET() throughout to store variables;
TEXTSPLIT() the value in column A using all available delimiters into columns and skip empty values in the resulting array;
Then SORTBY() the rightmost character of the resulting three elements using MATCH(). The IFERROR() will catch the data-string;
We can than HSTACK() the 1st and 2nd column with the result of splitting the 3rd element after we formatted to YYYY-MMM-D first;
Finally, the resulting array can be multiplied by a double unary. If not, we replace it with original content from the previous variable.
Notes:
I formatted column C to hold time-value in AM/PM.
I changed the text to hold dutch month-names to have Excel recognize the dates for demonstration purposes. Should work the same with English names.
For fun an UDF using regular expressions:
Public Function GetPart(inp As String, prt As Long) As Variant
Dim Pat As String
Select Case prt
Case 0
Pat = "(\d+-atr)"
Case 1
Pat = "(\d+\s*[AP]M)"
Case 2
Pat = "-(\d{4})"
Case 3
Pat = "-(\w+)-"
Case 4
Pat = "(\d+)-\w+-"
Case Else
Pat = ""
End Select
With CreateObject("vbscript.regexp")
.Pattern = ".*" & Pat & ".*"
GetPart = .Replace(inp, "$1")
End With
End Function
Invoke through =GetPart(0,A1). Choices ares 0-4 and in order of your column-headers.
You can achieve what you wish by applying a few simple transformations.
Replace the #,$,* and & with a common character that is guaranteed not to appear in the data sections (e.g. #)
Replace all occurrences of 2 or more runs of the # character with a single #
Trim the # from the start and end of the string
Split the string into an array using # as the split character (vba.split)
use For Each to loop over the array
In the loop have a set of three tests
Test 1 tests the string for the ocurrence of "-atr"
Test 2 tests the string for the occurence of "-XXX-" where XXX is a three letter month - You then split the date at the - to give an array with Day/Month/Year
Test 3 Tests if the string has ' AM' or ' PM'

How to change dtype & apply mathematical calculations in np.where?

I have dataframe like this
df = pd.DataFrame()
df['yy'] = [2012,2011,2010]
df['mm'] = ['10','','8']
yy mm
0 2012 10
1 2011
2 2010 8
I want to multiply values in column 'mm' with 2. However all values on the column are string.
I tried it with np.where as follows:
df['X'] = np.where(df['mm']!='',df['mm'].astype(int) * 2,'')
However its not working & giving error as follows:
ValueError: invalid literal for int() with base 10: ''.
Its clear from the error that the first filter in the where doesnt work here & its applying df['mm'].astype(int) on all values hence failing for empty string value ''.
Can anyone please suggest a another way to achieve this ? I don't want to use for loop as y actual df is too big & for loop will take lot of time.
Thanks in advance.
It's better if you replace the empty strint with NaN first:
df['mm'] = df.mm.replace({'': np.nan}).fillna(0).astype(int) * 2

Looping through a panda dataframe

My variable noExperience1 is a dataframe
I am trying to go through this loop:
num = 0
for row in noExperience1:
if noExperience1[row+1] - noExperience1[row] > num:
num = noExperience1[row+1] - noExperience1[row]
print(num)
My goal is to find the biggest difference in y values from one x value to the next. But I get the error that the line of my if statement needs to be a string and not an integer. How do I fix this so I can have a number?
We can't directly access a row of dataframe using indexing. We need to use loc or iloc for it. I had just solved the problem stated by you.
`noExperience1=pd.read_csv("../input/data.csv")#reading CSV file
num=0
for row in range(1,len(noExperience1)): #iterating row in all rows of DF
if int(noExperience1.loc[row]-noExperience1.loc[row-1]) > num:
num = int(noExperience1.loc[row]-noExperience1.loc[row-1])
print(num)`
Note:
1.Column Slicing : DataFrame[ColName] ==> will give you all enteries of specified column.
2.Row Slicing: DataFrame.loc[RowNumber] ==> will give you a complete row of specified row numbe.RowNumber starts with 0.
Hope this helps.

cut important parts of string in a column

i have a column called Dateiname which contains a string. my goal is to get only the string Gruen Gelb Orange from the column and create a new column which represents each row if it contains Gruen Gelb Orange
i tried with this code:
result['Y'] = result.Dateiname.str[-10:-4]
as these words are not equally long i get 4_ or 1_ or just _, depending if it is Gruen or Gelb which i want to slice out. Is there any possibility to get the parts Gruen Gelb Orange of the column Dateiname and save it into the column Y?
the goal would be this:
Use str.extract:
result['Y'] = result.Dateiname.str[-10:-4].str.extract('(Gruen|Gelb|Orange)')
Another solution is split by _ or . and get second value from end by indexing:
result.Dateiname.str.split('_|\.').str[-2]
Or if want check all data:
result['Y'] = result.Dateiname.str.extract('(Gruen|Gelb|Orange)')
If your data follows same format as required_word followed by .csv then use str.extract with regex:
For Example:
result = pd.DataFrame({'Dateiname':['asdfjaskld_3242_34.fsdf_450_Violet.csv',
'asdfjaskld_3242_34.fsdf_450_Green.csv',
'asdfjaskld_3242_34.fsdf_450_Indigo.csv',
'asdfjaskld_3242_34.fsdf_450_Red.csv']})
result['Y'] = result.Dateiname.str.extract(r'([a-zA-Z]+).csv')
print(result)
Dateiname Y
0 asdfjaskld_3242_34.fsdf_450_Violet.csv Violet
1 asdfjaskld_3242_34.fsdf_450_Green.csv Green
2 asdfjaskld_3242_34.fsdf_450_Indigo.csv Indigo
3 asdfjaskld_3242_34.fsdf_450_Red.csv Red
You can use:
result['Y'] = result['Dateiname'].str.split('_').str[-1].str[:-4]

Resources