This question already has answers here:
Replacing blank values (white space) with NaN in pandas
(13 answers)
Closed 1 year ago.
empty string like this isnull() not find empty string
conn = connect(host='localhost',port=3306,user='root',password='root',database='spiderdata',charset='utf8')
df = pd.read_sql('select * from beikedata_community1',con=conn)
df
df.subway.isnull()
**i want to use 'isnull()' find missing value, but it's not support empty string, what can i do? thanks very much!**
You can use print(df.replace(r' ', 'NaN')) .
This Replaces the empty cells with NaN.
Related
This question already has answers here:
Why doesn't [01-12] range work as expected?
(7 answers)
RegEx - Match Numbers of Variable Length
(4 answers)
Closed 4 months ago.
My case is extract number between text (ex: FL-number-$) from string in File names column to Check column. Example:
2022-06-09-FR-Echecks.pdf > Return ''
2022-06-09-FR-FL-3-$797.pdf > Return 3
2022-06-09-FR-TX-20-$35149.91.pdf > Return 20
My case as below
This code I used:
dt_test['File_names_page'] = dt_test['File names'].str.extract('\-([0-99])-\$')
It only return one digit number as below:
So how to extract all number (all digit) in my case?
Tks for all attention!
Your regex pattern is slightly off. Just use \d+ to match any integer number:
dt_test["File_names_page"] = dt_test["File names"].str.extract(r'-(\d+)-\$')
You can't use a 0-99 range, you should use \d{1,2} for one or two digits:
dt_test['File_names_page'] = dt_test['File names'].str.extract(r'-(\d{1,2})-\$')
Or for any number of digits (at least 1) \d+:
dt_test['File_names_page'] = dt_test['File names'].str.extract(r'-(\d+)-\$')
NB. - doesn't require an escape
Example:
File names File_names_page
0 ABC-12-$456 12
This question already has answers here:
How to extract the n-th maximum/minimum value in a column of a DataFrame in pandas?
(3 answers)
Closed 3 years ago.
I have a data frame with a DateTime column, I can get minimum value by using
df['Date'].min()
How can I get the second, third... smallest values
Use nlargest or nsmallest
For second largest,
series.nlargest(2).iloc[-1]
Make sure your dates are in datetime first:
df['Sampled_Date'] = pd.to_datetime(df['Sampled_Date'])
Then drop the duplicates, take the nsmallest(2), and take the last value of that:
df['Sampled_Date'].drop_duplicates().nsmallest(2).iloc[-1]
This question already has answers here:
Add Leading Zeros to Strings in Pandas Dataframe
(7 answers)
Closed 3 years ago.
I have a data frame like this:
Date
9/02/2019
12/08/2019
8/06/2019
I want to add a 0 in infront of dates that are in single digit. I want it to be like this:
Date
09/02/2019
12/08/2019
08/06/2019
I am using this RegEx and the string manipulation. The string manipulation by itself works, but when I try to work it out with the RegEx, it doesn't yield anything.
if row['Date'] == r'[\d]{1}/[\d]{1,2}/[\d]{4}':
row['blah'] = '0' + row['Date']
print(row['blah'])
else:
pass
Use pd.to_datetime and convert it to string using Series.dt.strftime:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['Date'] = df['Date'].dt.strftime('%d/%m/%Y')
print(df)
Date
0 09/02/2019
1 12/08/2019
2 08/06/2019
This question already has answers here:
filter out "empty array" values in Pandas DataFrame
(3 answers)
Closed 4 years ago.
I have a df that looks like this
list
[]
[8,8]
[0,9]
[]
This column has a dtype of object. How do I remove the [] from the column and replace with np.nan?
I have tried string replace but the brackets are not be
df.list = df.list.replace('"[]"','')
Using astype(bool)
df=df[df['list'].astype(bool)]
This question already has answers here:
Get date column from numpy loadtxt()
(2 answers)
Closed 4 years ago.
I have an array of dates in the format ('yyyy-mm-dd') and another array of integers numbers, each corresponding to a value in the date array. But, when I tried to plot the graph using:
matplotlib.pyplot.plot(dates, values, label='Price')
It gives the error:
ValueError: could not convert string to float: '2017-07-26'
How do I fix this error?
Your dates are strings, convert them to datetime objects first.
import datetime
x = [datetime.datetime.strptime(date, "%Y-%m-%d") for date in dates]