How to convert Excel negative value to Pandas negative value - python-3.x

I am a beginner in python pandas. I am working on a data-set named fortune_company. Data set are like below.
In this data-set for Profits_In_Million column there are some negative value which is indicating by red color and parenthesis.
but in pandas it's showing like below screenshot
I was trying to convert the data type Profits_In_Million column using below code
import pandas as pd
fortune.Profits_In_Million = fortune.Profits_In_Million.str.replace("$","").str.replace(",","").str.replace(")","").str.replace("(","-").str.strip()
fortune.Profits_In_Million.astype("float")
But I am getting the below error. Please someone help me one that. How I can convert this string datatype to float.
ValueError: could not convert string to float: '-'

Assuming you have no control over the cell format in Excel, the converters kwarg of read_excel can be used:
converters : dict, default None
Dict of functions for converting values in certain columns. Keys can
either be integers or column labels, values are functions that take
one input argument, the Excel cell content, and return the transformed
content.
From read_excel's docs.
def negative_converter(x):
# a somewhat naive implementation
if '(' in x:
x = '-' + x.strip('()')
return x
df = pd.read_excel('test.xlsx', converters={'Profits_In_Million': negative_converter})
print(df)
# Profits_In_Million
# 0 $1000
# 1 -$1000
Note however that the values of this column are still strings and not numbers (int/float). You can quite easily implement the conversion in negative_converter (remove the the dollar sign, and most probably the comma as well), for example:
def negative_converter(x):
# a somewhat naive implementation
x = x.replace('$', '')
if '(' in x:
x = '-' + x.strip('()')
return float(x)
df = pd.read_excel('test.xlsx', converters={'Profits_In_Million': negative_converter})
print(df)
# Profits_In_Million
# 0 1000.0
# 1 -1000.0

Related

Replace items like A2 as AA in the dataframe

I have a list of items, like "A2BCO6" and "ABC2O6". I want to replace them as A2BCO6--> AABCO6 and ABC2O6 --> ABCCO6. The number of items are much more than presented here.
My dataframe is like:
listAB:
Finctional_Group
0 Ba2NbFeO6
1 Ba2ScIrO6
3 MnPb2WO6
I create a duplicate array and tried to replace with following way:
B = ["Ba2", "Pb2"]
C = ["BaBa", "PbPb"]
for i,j in range(len(B)), range(len(C)):
listAB["Finctional_Group"]= listAB["Finctional_Group"].str.strip().str.replace(B[i], C[j])
But it does not produce correct output. The output is like:
listAB:
Finctional_Group
0 PbPbNbFeO6
1 PbPbScIrO6
3 MnPb2WO6
Please suggest the necessary correction in the code.
Many thanks in advance.
I used for simplicity purpose chemparse package that seems to suite your needs.
As always we import the required packages, in this case chemparse and pandas.
import chemparse
import pandas as pd
then we create a pandas.DataFrame object like in your example with your example data.
df = pd.DataFrame(
columns=["Finctional_Group"], data=["Ba2NbFeO6", "Ba2ScIrO6", "MnPb2WO6"]
)
Our parser function will use chemparse.parse_formula which returns a dict of element and their frequency in a molecular formula.
def parse_molecule(molecule: str) -> dict:
# initializing empty string
molecule_in_string = ""
# iterating over all key & values in dict
for key, value in chemparse.parse_formula(molecule).items():
# appending number of elements to string
molecule_in_string += key * int(value)
return molecule_in_string
molecule_in_string contains the molecule formula without numbers now. We just need to map this function to all elements in our dataframe column. For that we can do
df = df.applymap(parse_molecule)
print(df)
which returns:
0 BaBaNbFeOOOOOO
1 BaBaScIrOOOOOO
2 MnPbPbWOOOOOO
dtype: object
Source code for chemparse: https://gitlab.com/gmboyer/chemparse

Using pandas manipulate number format

just out of my curiosity, I have a name list with phone numbers in a csv file, and I want to change these phone numbers from ############ (11 digits) to the format of ###-####-####, adding two minus sign in between 3-4 and 7-8 place.
is this possible?
If it's Dataframe you can use apply with formate string
df
num
0 09187543839
1 08745763412
df.num = df.num.apply(lambda x : "{}-{}-{}".format(x[:3],x[3:7],x[7:]))
df
num
0 091-8754-3839
1 087-4576-3412
Yes, it is possible. Below is a code-snippet that accomplishes what you want:
phone = str(55512354567)
print(f'{phone[:3]}-{phone[3:7]}-{phone[7:]}')
You can adapt the above idea to your Pandas dataframe as shown below:
# Sample data
data_df = pd.DataFrame([[55512345678], [55587654321]], columns=['phone'])
# Create a string column
data_df['phone_str'] = data_df['phone'].map(lambda x: str(x))
# Convert the column values to the right format
data_df['phone_str'] = data_df['phone_str'].map(lambda x: f'{x[:3]}-{x[3:7]}-{x[7:]}')
I may not be using pandas but this could potentially work...
n = 3
n1 = 7
str = "12345678901"
l, m, r = str[:n], str[n:n1], str[n1:]
final = l+"-"+m+"-"+r
print(final)
Output:
123-4567-8901

pandas style tag give "ValueError: style is not supported for non-unique indices"

I would like to give the negative numbers in mine data frame a red color.
But when trying to achieve with the following code
def color_negative_red(val):
"""
Takes a scalar and returns a string with
the css property `'color: red'` for negative
strings, black otherwise.
"""
color = 'red' if val < 0 else 'black'
return 'color: %s' % color
s = df05.style.applymap(color_negative_red)
print(s)
I got the following Value Error "ValueError: style is not supported for non-unique indices."
Where must i look to get the right output?
I believe you need unique default index values by DataFrame.reset_index and drop=True:
s = df05.reset_index(drop=True).style.applymap(color_negative_red)
This can be one of two possible problems, either your index has duplicate names (check with df.index) or your columns have duplicate names (check with df.columns).
If your index has duplicates, do:
df = df.reset_index(drop=True)
If your columns have duplicates, do:
df.columns = range(len(df.columns))

Pandas dataframe column float inside string (i.e. "float") to int

I'm trying to clean some data in a pandas df and I want the 'volume' column to go from a float to an int.
EDIT: The main issue was that the dtype for the float variable I was looking at was actually a str. So first it needed to be floated, before being changed.
I deleted the two other solutions I was considering, and left the one I used. The top one is the one with the errors, and the bottom one is the solution.
import pandas as pd
import numpy as np
#Call the df
t_df = pd.DataFrame(client.get_info())
#isolate only the 'symbol' column in t_df
tickers = t_df.loc[:, ['symbol']]
def tick_data(tickers):
for i in tickers:
tick_df = pd.DataFrame(client.get_ticker())
tick = tick_df.loc[:, ['symbol', 'volume']]
tick.iloc[:,['volume']].astype(int)
if tick['volume'].dtype != np.number:
print('yes')
else:
print('no')
return tick
Below is the revised code:
import pandas as pd
#Call the df
def ticker():
t_df = pd.DataFrame(client.get_info())
#isolate only the 'symbol' column in t_df
tickers = t_df.loc[:, ['symbol']]
for i in tickers:
#pulls out market data for each symbol
tickers = pd.DataFrame(client.get_ticker())
#isolates the symbol and volume
tickers = tickers.loc[:, ['symbol', 'volume']]
#floats volume
tickers['volume'] = tickers.loc[:, ['volume']].astype(float)
#volume to int
tickers['volume'] = tickers.loc[:, ['volume']].astype(int)
#deletes all symbols > 20,000 in volume, returns only symbol
tickers = tickers.loc[tickers['volume'] >= 20000, 'symbol']
return tickers
You have a few issues here.
In your first example, iloc only accepts integer locations for the rows and columns in the DataFrame, which is generating your error. I.e.
tick.iloc[:,['volume']].astype(int)
doesn't work. If you want label-based indexing, use .loc:
tick.loc[:,['volume']].astype(int)
Alternately, use bracket-based indexing, which allows you to take a whole column directly without using slice syntax (:) on the rows:
tick['volume'].astype(int)
Next, astype(int) returns a new value, it does not modify in-place. So what you want is
tick['volume'] = tick['volume'].astype(int)
As for your dtype is a number check, you don't want to check == np.number, but you don't want to check is either, which only returns True if it's np.number and not if it's a subclass like np.int64. Use np.issubdtype, or pd.api.types.is_numeric_dtype, i.e.:
if np.issubdtype(tick['volume'].dtype, np.number):
or:
if pd.api.types.is_numeric_dtype(tick['volume'].dtype):

Converting an array of strings containing range of integer values into an array of floats

Click here to see an image that contains a screenshot sample of the data.
I have a CSV file with a column for temperature range with values like "20-25" stored as string. I need to convert this to 22.5 as a float.
Need this to be done for the entire column of such values, not a single value.I want to know how this can be done in Python as i am very new to it.
Notice in the sample data image that there are NaN values as well in the records
Like said in the reactions split the array using "-" as argument.
Second, create a float array of it. Finally, take the average using numpy.
import numpy as np
temp_input = ["20-25", "36-40", "10-11", "23-24"]
# split and convert to float
# [t.split("-") for t in temp_input] is an inline iterator
tmp = np.array([t.split("-") for t in temp_input], dtype=np.float32)
# average the tmp array
temp_output = np.average(tmp, axis=1)
And here's a oneliner:
temp_output = [np.average(np.array(t.split('-'), dtype=np.float32)) for t in temp_input]

Resources