How do you add single quotes to words in Series in Pandas? - string

I have a series that I am trying to add single quotes to.
The following code returns just one single quote.
Test03 = pd.Series (['BEWO, SLSD, VWTR,'])
Test03.str.replace(' ',"'" )
Result
0 BEWO,'SLSD,'VWTR,
dtype: object
I am trying to match the following format so I can then compare them.
Good = pd.Series(['BEWO', 'SLSD', 'VWTR'])
Thank you greatly in advance for your help.
Spencer
The following code returns just one single quote.
Test03 = pd.Series (['BEWO, SLSD, VWTR,'])
Test03.str.replace(' ',"'" )

Related

Convert an unknown data item to string in Python

I have certain data that need to be converted to strings. Example:
[ABCGHDEF-12345, ABCDKJEF-123235,...]
The example above does not represent a constant or a string by itself but is taken from an Excel sheet (ranging upto 30+ items for each row). I want to convert these to strings. Since data is undefined, explicitly converting them doesn't work. Is there a way to do this iteratively without placing double/single quotes manually between each data element?
What I want finally:
["ABCGHDEF-12345", "ABCDKJEF-123235",...]
To convert the string to list of strings you can try:
s = "[ABCGHDEF-12345, ABCDKJEF-123235]"
s = s.strip("[]").split(", ")
print(s)
Prints:
['ABCGHDEF-12345', 'ABCDKJEF-123235']

Pandas Series - Trouble Dropping First Value

I'm having trouble dropping the first NaN value in my Series.
I took the difference of the normal Series to the shifted Series of 1 period. This is how I calculated it:
x[c] = x[c] - x[c].shift(periods=1)
When i try to drop the first value using these methods:
x[c].drop(labels=[0])
x[c].dropna()
x[c].iloc[1:]
It doesn't work for me in the reassignment
# these are not used all together, but separately
x[c] = x[c].dropna()
x[c] = x[c].drop(labels=['1981-09-29'])
x[c] = x[c][1:]
print(x[c])
Date
1981-09-29 NaN
1981-09-30 -0.006682
1981-10-01 -0.014575
1981-10-02 -0.004963
1981-10-05 -0.004963
However, when I call the drop or dropna function in a print statement, it works!
print(x[c].dropna())
Date
1981-09-30 -0.006682
1981-10-01 -0.014575
1981-10-02 -0.004963
1981-10-05 -0.004963
1981-10-06 -0.005514
It doesn't matter what method, I just want to get rid of the first element in my Series.
Pls help.
The dataframe has multiple series where if I tried reassigning just one series, it would still give me an NaN. This is because Series need to be of the same length in a dataframe. Therefore I need to call dropna on the dataframe after the calculation is performed
Try this?
x[c] = x[c].dropna(inplace=True)
print(x[c])
If you know it is the first element you don't want, the simplest way is to use series.iloc[1:, :] or dataframe.iloc[1:, :].

replacing a special character in a pandas dataframe

I have a dataset that '?' instead of 'NaN' for missing values. I could have gone through each column using replace but the only problem is I have 22 columns. I am trying to create a loop do it effectively but I am getting wrong. Here is what I am doing:
for col in adult.columns:
if adult[col]=='?':
adult[col]=adult[col].str.replace('?', 'NaN')
The plan is to use the 'NaN' then use the fillna function or to drop them with dropna. The second problem is that not all the columns are categorical so the str function is also wrong. How can I easily deal with this situation?
If you're reading the data from a .csv or .xlsx file you can use the na_values parameter:
adult = pd.read_csv('path/to/file.csv', na_values=['?'])
Otherwise do what #MasonCaiby said and use adult.replace('?', float('nan'))

Python Pandas concatenate a Series of strings into one string

In python pandas, there is a Series/dataframe column of str values to combine into one long string:
df = pd.DataFrame({'text' : pd.Series(['Hello', 'world', '!'], index=['a', 'b', 'c'])})
Goal: 'Hello world !'
Thus far methods such as df['text'].apply(lambda x: ' '.join(x)) are only returning the Series.
What is the best way to get to the goal concatenated string?
You can join a string on the series directly:
In [3]:
' '.join(df['text'])
Out[3]:
'Hello world !'
Apart from join, you could also use pandas string method .str.cat
In [171]: df.text.str.cat(sep=' ')
Out[171]: 'Hello world !'
However, join() is much faster.
Your code is "returning the series" because you didn't specify the right axis. Try this:
df.apply(' '.join, axis=0)
text Hello world !
dtype: object
Specifying the axis=0 combines all the values from each column and puts them in a single string. The return type is a series where the index labels are the column names, and the values are the corresponding joined string. This is particularly useful if you want to combine more than one column into a single string at a time.
Generally I find that it is confusing to understand which axis you need when using apply, so if it doesn't work the way you think it should, always try applying along the other axis too.

using split() to split values in an entire column in a python dataframe

I am trying to clean a list of url's that has garbage as shown.
/gradoffice/index.aspx(
/gradoffice/index.aspx-
/gradoffice/index.aspxjavascript$
/gradoffice/index.aspx~
I have a csv file with over 190k records of different url's. I tried to load the csv into a pandas dataframe and took the entire column of url's into a list by using the statement
str = df['csuristem']
it clearly gave me all the values in the column. when i use the following code - It is only printing 40k records and it starts some where in the middle. I don't know where am going wrong. the program runs perfectly but is showing me only partial number of results. any help would be much appreciated.
import pandas
table = pandas.read_csv("SS3.csv", dtype=object)
df = pandas.DataFrame(table)
str = df['csuristem']
for s in str:
s = s.split(".")[0]
print s
I am looking to get an output like this
/gradoffice/index.
/gradoffice/index.
/gradoffice/index.
/gradoffice/index.
Thank you,
Santhosh.
You need to do the following, so call .str.split on the column and then .str[0] to access the first portion of the split string of interest:
In [6]:
df['csuristem'].str.split('.').str[0]
Out[6]:
0 /gradoffice/index
1 /gradoffice/index
2 /gradoffice/index
3 /gradoffice/index
Name: csuristem, dtype: object

Resources