Getting the match value of re.search python [duplicate] - string

This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 3 years ago.
I'm pulling some data out of the web utilizing python in the Jupyter notebook. I have pulled down the data, parsed, and created the data frame. I need to extract a number out of a string that I have in the data frame. I utilizing this regex to do it:
for note in df["person_notes"]:
print(re.search(r'\d+', note))
and the outcome is the following:
<_sre.SRE_Match object; span=(53, 55), match='89'>
How can I get just the match number; in this line would be 89. I tried to convert the whole line to str() and the replace(), but not all lines have the span=(number, number) iqual. Thank you in advance!

You can use the start() and end() methods on the returned match objects to get the correct positions within the string:
for note in df["person_notes"]:
match = re.search(r'\d+', note)
if match:
print(note[match.start():match.end()])
else:
# no match found ...

Related

Write a function that returns the count of the unique answers to all of the questions in a dataset [duplicate]

This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
How to test if a string contains one of the substrings in a list, in pandas?
(4 answers)
Closed 1 year ago.
For example, after filtering the entire dataset to only questions containing the word "King", we could then find all of the unique answers to those questions.
I filtered by using the following code:
`def lower1(x):
x.lower()
filter_dataset = lambda x:all(x) in jeopardy.Question.apply(lower1)
print(filter_dataset(['King','England']))`
The above code is printing True instead of printing the rows of jeopardy['Question'] with the keywords 'King' and 'England'.
That is the first problem.
Now I want to count the unique answers to the jeopardy['Question']
Here is the sample data frame
Now I want to create a function that does the count of the unique answers.
I wrote the following code:
`def unique_counts():
print(jeopardy['Answer'].unique().value_counts())
unique_counts()`
Which is giving me the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'value_counts'
Use Series.str.contains:
jeopardy[jeopardy['Question'].str.contains('|'.join(['King','England']))]

Add multiple value to a key in Python dictionary [duplicate]

This question already has answers here:
list to dictionary conversion with multiple values per key?
(7 answers)
Closed 2 years ago.
I am trying to add multiple value to a single key(if found) from a file in python. I tried below code but getting this error: AttributeError: 'str' object has no attribute 'append'
file=open("Allwords",'r')
list=sorted(list(set([words.strip() for words in file])))
def sequence(word):
return "".join(sorted(word))
dict= {}
for word in list:
if sequence(word) in dict:
dict[sequence(word)].append(word)
else:
dict[sequence(word)]=word
Thanks in advance for your help!
You should insert the first element by putting it into a list, so that you can append subsequent items to it later on. You can do it as follows -
file=open("Allwords",'r')
list=sorted(list(set([words.strip() for words in file])))
def sequence(word):
return "".join(sorted(word))
dict= {}
for word in list:
if sequence(word) in dict:
dict[sequence(word)].append(word)
else:
new_lst = [word] # Inserting the first element as a list, so we can later append to it
dict[sequence(word)]=new_lst
Now you will be able to append to it properly. In your case, the value you were inserting was just a string to which you wouldn't have been able to append. But this will work, since you are inserting a list at start to which you would be able to append to .
Hope this helps !

Looping thru JSON using Python? [duplicate]

This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
Closed 2 years ago.
How do you filter a records in JSON Array using Python?
Here's my Python code:
Sample Data Source: https://s3-eu-west-1.amazonaws.com/dwh-test-resources/recipes.json
In the "ingredients" i need to filter all records that contain bread. all string contain bread under the "ingredients" regardless if the string bread is upper case, lower case, plural or singular i should be able to filter it.
My python version is 3.
Lets say:
data = [{...json you have}]
Now we will check:
res = []
for i in data:
if 'bread' in i["ingredients"].lower():
res.append(i)
or simply:
res = [i for i in data if 'bread' in i["ingredients"].lower()]

Removing the whitespaces from a particular column in a dataframe [duplicate]

This question already has answers here:
Pandas - Strip white space
(6 answers)
Closed 3 years ago.
I have the following dataframe and I'd like to remove all the whitespace characters and make it lowercase:
df = pd.DataFrame({"col1":[1,2,3,4], "col2":["A","B ", "Cc","D"]})
I tried to do that via df[["col2"]].apply(lambda x: x.strip().lower()) but it raises an error:
AttributeError: ("'Series' object has no attribute 'strip'", 'occurred at index col2')
You need two function call from str
df["col2"].str.strip().str.lower()

How do I parse one character from Python Pandas String? [duplicate]

This question already has answers here:
Pandas: get second character of the string, from every row
(2 answers)
Closed 4 years ago.
I have a data frame and want to parse the 9th character into a second column. I'm missing the syntax somewhere though.
#develop the data
df = pd.DataFrame(columns = ["vin"], data = ['LHJLC79U58B001633','SZC84294845693987','LFGTCKPA665700387','L8YTCKPV49Y010001',
'LJ4TCBPV27Y010217','LFGTCKPM481006270','LFGTCKPM581004253','LTBPN8J00DC003107',
'1A9LPEER3FC596536','1A9LREAR5FC596814','1A9LKEER2GC596611','1A9L0EAH9C596099',
'22A000018'])
df['manufacturer'] = ['A','A','A','A','B','B','B','B','B','C','C','D','D']
def check_digit(df):
df['check_digit'] = df['vin'][8]
print(df['checkdigit'])]
For some reason, this puts the 8th row VIN in every line.
In your code doing this:
df['check_digit'] = df['vin'][8]
Is only selecting the 8th element in the column 'vin'. Try this instead:
for i in range(len(df['vin'])):
df['check_digit'] = df['vin'][i][8]
As a rule of thumb, whenever you are stuck, simply check the type of the variable returned. It solves a lot of small problems.
EDIT: As pointed out by #Georgy in the comment, using a loop wouldn't be pythonic and a more efficient way of solving this would be :
df['check_digit'] = df['vin'].str[8]
The .str does the trick. For future reference on that, I think you would find this helpful.
The correct way is:
def check_digit(df):
df['check_digit'] = df['vin'].str[8]
print(df)

Resources