This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
Closed 2 years ago.
How do you filter a records in JSON Array using Python?
Here's my Python code:
Sample Data Source: https://s3-eu-west-1.amazonaws.com/dwh-test-resources/recipes.json
In the "ingredients" i need to filter all records that contain bread. all string contain bread under the "ingredients" regardless if the string bread is upper case, lower case, plural or singular i should be able to filter it.
My python version is 3.
Lets say:
data = [{...json you have}]
Now we will check:
res = []
for i in data:
if 'bread' in i["ingredients"].lower():
res.append(i)
or simply:
res = [i for i in data if 'bread' in i["ingredients"].lower()]
Related
This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
How to test if a string contains one of the substrings in a list, in pandas?
(4 answers)
Closed 1 year ago.
For example, after filtering the entire dataset to only questions containing the word "King", we could then find all of the unique answers to those questions.
I filtered by using the following code:
`def lower1(x):
x.lower()
filter_dataset = lambda x:all(x) in jeopardy.Question.apply(lower1)
print(filter_dataset(['King','England']))`
The above code is printing True instead of printing the rows of jeopardy['Question'] with the keywords 'King' and 'England'.
That is the first problem.
Now I want to count the unique answers to the jeopardy['Question']
Here is the sample data frame
Now I want to create a function that does the count of the unique answers.
I wrote the following code:
`def unique_counts():
print(jeopardy['Answer'].unique().value_counts())
unique_counts()`
Which is giving me the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'value_counts'
Use Series.str.contains:
jeopardy[jeopardy['Question'].str.contains('|'.join(['King','England']))]
This question already has answers here:
How to extract the n-th elements from a list of tuples
(8 answers)
Closed 3 years ago.
I have this list:
[('5.333333333333333', 'n04'), ('5.0', 'n01'), ('3.9936507936507932', 'n03'), ('2.4206349206349205', 'n05'), ('1.9629629629629628', 'n02')]
and I like to have the list like this:
[n04, n01, n03, n02, n04]
how to do it? I have spend too many houres on this problem.
Help please!
You can use a list comprension to iterate over the list and pick out the values you are interested in and put them in a new list
my_list = [('5.333333333333333', 'n04'), ('5.0', 'n01'), ('3.9936507936507932', 'n03'), ('2.4206349206349205', 'n05'), ('1.9629629629629628', 'n02')]
my_new = [item[1] for item in my_list]
print(my_new)
OUTPUT
['n04', 'n01', 'n03', 'n05', 'n02']
Try:
x,y=zip(*[('5.333333333333333', 'n04'), ('5.0', 'n01'), ('3.9936507936507932', 'n03'), ('2.4206349206349205', 'n05'), ('1.9629629629629628', 'n02')])
y=list(y)
print(y)
Outputs:
['n04', 'n01', 'n03', 'n05', 'n02']
This question already has answers here:
Is there a built in function for string natural sort?
(23 answers)
Closed 3 years ago.
I have a list of strings that I am trying to organize numerically it looks like this :
List=['Core_0_0.txt', 'Core_0_1.txt','Core_0_2.txt',...'Core_1_0.txt','Core_2_3.txt', ]
but when I sort it sorted(List)
It doesn't sort the list properly.
It's very important that I keep the values as strings and they must be ordered by the number; I.E. 0_1, 0_2,0_3....31_1, they all have Core_X_X.txt How would I do this.
If you can assume all your entries will look like *_N1_N2.txt, you can use the str.split method along with a sorting key function to sort your list properly. It might look something like this
sorted_list = sorted(List, key = lambda s: (int(s.split("_")[1]), int(s.split("_")[2].split(".")[0])))
Essentially, this internally creates tuples like (N1, N2) where your file is named *_N1_N2.txt and sorts based on the N1 value. If there's a tie, it will resort to the N2 value.
Your question is a possible duplicate of another question.
Which I am posting for you here again.
you just need to change 'alist' to your 'List'.
import re
def atoi(text):
return int(text) if text.isdigit() else text
def natural_keys(text):
'''
alist.sort(key=natural_keys) sorts in human order
http://nedbatchelder.com/blog/200712/human_sorting.html
(See Toothy's implementation in the comments)
'''
return [ atoi(c) for c in re.split(r'(\d+)', text) ]
alist=[
"something1",
"something12",
"something17",
"something2",
"something25",
"something29"]
alist.sort(key=natural_keys)
print(alist)
yields
['something1', 'something2', 'something12', 'something17', 'something25', 'something29']
This question already has answers here:
Pandas: get second character of the string, from every row
(2 answers)
Closed 4 years ago.
I have a data frame and want to parse the 9th character into a second column. I'm missing the syntax somewhere though.
#develop the data
df = pd.DataFrame(columns = ["vin"], data = ['LHJLC79U58B001633','SZC84294845693987','LFGTCKPA665700387','L8YTCKPV49Y010001',
'LJ4TCBPV27Y010217','LFGTCKPM481006270','LFGTCKPM581004253','LTBPN8J00DC003107',
'1A9LPEER3FC596536','1A9LREAR5FC596814','1A9LKEER2GC596611','1A9L0EAH9C596099',
'22A000018'])
df['manufacturer'] = ['A','A','A','A','B','B','B','B','B','C','C','D','D']
def check_digit(df):
df['check_digit'] = df['vin'][8]
print(df['checkdigit'])]
For some reason, this puts the 8th row VIN in every line.
In your code doing this:
df['check_digit'] = df['vin'][8]
Is only selecting the 8th element in the column 'vin'. Try this instead:
for i in range(len(df['vin'])):
df['check_digit'] = df['vin'][i][8]
As a rule of thumb, whenever you are stuck, simply check the type of the variable returned. It solves a lot of small problems.
EDIT: As pointed out by #Georgy in the comment, using a loop wouldn't be pythonic and a more efficient way of solving this would be :
df['check_digit'] = df['vin'].str[8]
The .str does the trick. For future reference on that, I think you would find this helpful.
The correct way is:
def check_digit(df):
df['check_digit'] = df['vin'].str[8]
print(df)
This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 3 years ago.
I'm pulling some data out of the web utilizing python in the Jupyter notebook. I have pulled down the data, parsed, and created the data frame. I need to extract a number out of a string that I have in the data frame. I utilizing this regex to do it:
for note in df["person_notes"]:
print(re.search(r'\d+', note))
and the outcome is the following:
<_sre.SRE_Match object; span=(53, 55), match='89'>
How can I get just the match number; in this line would be 89. I tried to convert the whole line to str() and the replace(), but not all lines have the span=(number, number) iqual. Thank you in advance!
You can use the start() and end() methods on the returned match objects to get the correct positions within the string:
for note in df["person_notes"]:
match = re.search(r'\d+', note)
if match:
print(note[match.start():match.end()])
else:
# no match found ...