Using apply function to convert values in columns [duplicate] - python-3.x

This question already has answers here:
Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe
(6 answers)
Closed 2 years ago.
I am trying to convert a value of a column in the dataframe. column name is size. it has data as 11.1 K or 51.6M, i.e ending in K or M and has object data type. i want to write an apply function which converts this value to 11.1 if it is ending in K and 516000 if it is ending in M . Any help?
I am trying to code for this in python 3

Lots of way to do this, a simple way would be to use pd.eval with replace
df = pd.DataFrame({'A' : ['56.1M', '11.1K']})
print(df)
A
0 56.1M
1 11.1K
df['B'] = df['A'].replace({'M' : '*10000', 'K' : '*1'},regex=True).map(pd.eval)
print(df)
A B
0 56.1M 561000.0
1 11.1K 11.1

Related

dropna() not working for axis = 1 with the given threshold [duplicate]

This question already has answers here:
thresh in dropna for DataFrame in pandas in python
(3 answers)
Closed 2 years ago.
For the given dataset
I performed a dropna on axis = 1 with threshold = 2
df.dropna(thresh=2,axis=1)
The output was
Which does not seem correct, what I expect is to drop column with index = 1 and 2 given that both columns have NaN occurences >= 2
The code works perfectly fine with axis=0
Try using df.dropna(thresh=6,axis=1) for same dataframe.

How to extract the entire column from a df based on a string of the column name? [duplicate]

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 3 years ago.
I have 2 dfs:
Sample of df1: s12
BacksGas_Flow_sccm ContextID StepID Time_Elapsed
46.6796875 7289972 12 25.443
46.6796875 7289972 12 26.443
Sample of df2: step12
ContextID BacksGas_Flow_sccm StepID Time_Elapsed
7289973 46.6796875 12 26.388
7289973 46.6796875 12 27.388
Since the BacksGas_Flow_sccm is on different positions in both the dfs, I would like to know as to how can I extract the column using df.columns.str.contains('Flow')
I tried doing:
s12.columns[s12.columns.str.contains('Flow')]
but it just gives the following output:
Index(['BacksGas_Flow_sccm'], dtype='object')
I would like the entire column to be extracted. How can this be done?
You are close, use DataFrame.loc with : for get all rows and columns filtered by conditions:
s12.loc[:, s12.columns.str.contains('Flow')]
Another idea is select by columns names:
cols = s12.columns[s12.columns.str.contains('Flow')]
s12[cols]

Pandas, DataFrame unique values from few columns [duplicate]

This question already has an answer here:
Get total values_count from a dataframe with Python Pandas
(1 answer)
Closed 4 years ago.
I am trying to count uniqiue values that are in few columns. My data frame looks like that:
Name Name.1 Name.2 Name.3
x z c y
y p q x
q p a y
Output should looks like below:
x 2
z 1
c 1
y 3
q 2
p 2
a 1
I used a groupby or count_values but couldn't get a correct output. Any ideas ? Thanks All !
Seems you want to consider values regardless of their row or column location. In that case you should collapse the dataframe and just use Counter.
from collections import Counter
arr = np.array(df)
count = Counter(arr.reshape(arr.size))
Another (Pandas-based) approach is to (Series) apply value_counts to multiple columns and then take the sum (column-wise)
df2 = df.apply(pd.Series.value_counts)
print(df2.sum(axis=1).astype(int)
a 1
c 1
p 2
q 2
x 2
y 3
z 1
dtype: int32

NA values on Dataframe [duplicate]

This question already has answers here:
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
(15 answers)
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 4 years ago.
so I'm working with python and data frames.
I have a list of movies from which I create a subset and I'm trying to plot or just figure out the mean for that matter but there is a lot of missing information and the frame just has an "NA" instead of data.
I use np library to ignore does values but I still get an error saying the type of input is not supported by isfinite
data = pd.read_csv("C:\\Users\Bubbles\Documents\CS241\week 13\movies.csv")
Action = data[(data.Action == 1)]
Action = Action[np.isfinite(Action.budget)]
print(Action.budget.mean())
the budget list would just contain "NA" and integers as possible values

Indexing Pandas Dataframe [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
I have 2 pandas dataframes with names and scores.
The first dataframe is is in the form:
df_score_1
A B C D
A 0 1 2 0
B 1 0 0 2
C 2 0 0 3
D 0 2 3 0
where
df_score_1.index
Index(['A', 'B', 'C', 'D'],dtype='object')
The second dataframe is from a text file with three columns which does not display zeros but only positive scores (or non-zero values)
df_score_2
A B 1
A C 1
A D 2
B C 5
B D 1
The goal is to transform df_score_2 into the form df_score_1 using pandas commands. The original form is from a networkx output nx.to_pandas_dataframe(G) line.
I've tried multi-indexing and the index doesn't display the form I would like. Is there an option when reading in a text file or a function to transform the dataframe after?
are you trying to merge the dataframes? or you just want them to have the same index? if you need the same index then use this:
l=df1.index.tolist()
df2.set_index(l, inplace=True)
crosstab and reindex are the best solutions I've found so far:
df = pd.crosstab(df[0], df[1], df[2], aggfunc=sum)
idx = df.columns.union(df.index)
df = df.reindex(index=idx, columns = idx)
The output is an adjacency matrix with NaN values instead of mirrored.
Here's a link to a similar question
I think you need,
df_score_2.set_index(df_score_1.index,inplace=True)

Resources