import pandas as pd
import yaml as y
Movies = pd.read_csv('tmdb_5000_movies.csv',encoding="ISO-8859-1")
company = pd.DataFrame(Movies[['original_title','production_companies']])
for idn in range(10000):
for index in range(len(company['original_title'])):
akm = y.load(company.loc[index,'production_companies'])
for i in range(len(akm)):
if akm[i]['id'] == idn:
if str(idn) not in keyword.columns:
keyword[str(idn)] = " "
keyword.loc[index,str(idn)] = 1
elif str(idn) in keyword.columns:
keyword.loc[index,str(idn)] = 1
# check if akm == idn
# akm length
keyword = keyword.fillna(0)
My data:
[{"id": 416, "name": "miami"},
{"id": 529, "name": "ku klux klan"},
{"id": 701, "name": "cuba"},
{"id": 1568, "name": "undercover"},
{"id": 1666, "name": "mexican standoff"},
{"id": 1941, "name": "ecstasy"},
{"id": 7963, "name": "guant\u00e1namo"},
{"id": 10089, "name": "slaughter"},
{"id": 10950, "name": "shootout"},
{"id": 12371, "name": "gunfight"},
{"id": 12648, "name": "bromance"},
{"id": 13142, "name": "gangster"},
{"id": 14819, "name": "violence"},
{"id": 14967, "name": "foot chase"},
{"id": 15271, "name": "interrogation"},
{"id": 15483, "name": "car chase"},
{"id": 18026, "name": "drug lord"},
{"id": 18067, "name": "exploding house"},
{"id": 155799, "name": "narcotics cop"},
{"id": 156117, "name": "illegal drugs"},
{"id": 156805, "name": "dea agent"},
{"id": 167316, "name": "buddy cop"},
{"id": 179093, "name": "criminal underworld"},
{"id": 219404, "name": "action hero"},
{"id": 226380, "name": "haitian gang"},
{"id": 226381, "name": "minefield"}]
Error message (copied from the comments below):
ParserError: while parsing a flow mapping in "<unicode string>", line 1, column 2: {""name"": ""Dune Entertainment"" ^ expected ',' or '}', but got '<scalar>' in "<unicode string>", line 1, column 5: {""name"": ""Dune Entertainment"" ^
Related
I am trying to get list of dictionaries from a list based on a specific property list of values? Any suggestions
list_of_persons = [
{"id": 2, "name": "name_2", "age": 23},
{"id": 3, "name": "name_3", "age": 43},
{"id": 4, "name": "name_4", "age": 35},
{"id": 5, "name": "name_5", "age": 59}
]
ids_search_list = [2, 4]
I'd like to get the following list
result_list = [
{"id": 2, "name": "name_2", "age": 23},
{"id": 4, "name": "name_4", "age": 35}
]
looping could be the simplest solution but there should be a better one in python
you can do this like that :
list_of_persons = [
{"id": 2, "name": "name_2", "age": 23},
{"id": 3, "name": "name_3", "age": 43},
{"id": 4, "name": "name_4", "age": 35},
{"id": 5, "name": "name_5", "age": 59}
]
ids_search_list = [2, 4]
result = []
for person in list_of_persons:
if person["id"] in ids_search_list:
result.append(person)
print(result)
You can use list comprehension
result_list = [person for person in list_of_persons if person["id"] in ids_search_list]
If you want some reading material about it: https://realpython.com/list-comprehension-python/
i want to assert values between 2 json files.
Here is the code i tried, it works fine but i have more than 300 values to test.
Instead of having 300 lines is there a better way to do it with a loop :
file1.json content is:
[
{
"Name": "Pierre",
"Address": 1,
"City": "Paris",
"Country": "FRA",
"Code": "2020-01-01T00:00:00",
"Position": " 7000,00 $ "
},
{
"Name": "Pierre",
"Address": 2,
"City": "Paris",
"Country": "USA",
"Code": "2020-01-01T00:00:00",
"Position": " 9000,00 $ "
},
{
"Name": "Pierre",
"Address": 3,
"City": "Paris",
"Country": "GER",
"Code": "2020-01-01T00:00:00",
"Position": " 2000,00 $ "
}
]
file2.json content is:
{"value": {
"data": {"number1": [
[
{
"Name": "Pierre",
"Address": 1,
"City": "Paris",
"Country": "FRA",
"Code": "2020-01-01T00:00:00",
"Position": " 7000,00 $ "
},
{
"Name": "Paul",
"Address": 2,
"City": "Paris",
"Country": "USA",
"Code": "2020-01-01T00:00:00",
"Position": " 9000,00 $ "
},
{
"Name": "Pierre",
"Address": 3,
"City": "Paris",
"Country": "GER",
"Code": "2020-01-01T00:00:00",
"Position": " 2000,00 $ "
},
"Name": "Luc",
"Address": 6,
"City": "Pekin",
"Country": "CHN",
"Code": "2020-01-01T00:00:00",
"Position": " 800,00 $ "
},
]
]
}
i want to assert each value of file1 with file2.
For exemple:
JsonSlurper jsonSlurper1 = new JsonSlurper()
File file1Actual = new File('c:/temp/file1.json')
def actualJson = jsonSlurper1.parse(file1Actual)
JsonSlurper jsonSlurper2 = new JsonSlurper()
File file2Expected = new File('c:/temp/file2.json')
def expectedJson = jsonSlurper2.parse(file2Expected)
assert actualJson.value.data.number1.Name[0] == expectedJson.Name[0]
assert actualJson.value.data.number1.Name[1] == expectedJson.Name[1]
assert actualJson.value.data.number1.Name[2] == expectedJson.Name[2]
Thank you for any suggestions
Assuming you are actually coding in Groovy (not Java) as your code example suggests, something like the following should do it:
/* ... read in the Json files as outlined in the question ... */
actualJson.value.data.number1.size.times {
// Just to demonstrate that 'it' takes on values 0, 1, 2, 3, 4 etc.
println "assert number ${it}"
assert actualJson.value.data.number1.Name[it] == expectedJson.Name[it]
}
However this is a very 'groovyesque' way of doing it. If you want something a tad closer what a Java developer would write, you would maybe do:
/* read in Jsons, then: */
// Let's assign the list to a var for better readability
def actualList = actualJson.value.data.number1
// Bonus points for asserting that both lists are of equal size
// assert actualList.size == expectedJson.size
// If this does not hold then
def smaller = Math.min(actualList.size, expectedJson.size)
// a classic for-loop.
for (int i = 0; i < smaller; i++) {
println "assert number ${i}"
assert actualList[i].Name == expectedJson[i].Name
}
I'm scraping a JS loaded website using requests. In order to do so, I go to inspect website, network console and look for the XHR calls to know where is the website calling for the data and how. Process would be as follows
Go to the link https://www.888sport.es/futbol/#/event/1006276426 in Chrome. Once that is loaded, you can click on many items with an unique ID. After doing so, a pop up window with information appears. In the XHR call I mentioned above you get a direct link to get this information as follows:
import requests
url='https://eu-offering.kambicdn.org/offering/v2018/888es/betoffer/outcome.json?lang=es_ES&market=ES&client_id=2&channel_id=1&ncid=1586874367958&id=2740660278'
#ncid is the date in timestamp format, and id is the unique id of the node clicked
response=requests.get(url=url,headers=headers)
Problem is, this isn't user friendly and require python. If I put this last url in the Chrome driver, I get the information but in plain text, and I can't interact with it. Is there any way to get a workable link from the request so that manually inserting it in a Chrome driver it loads that pop up window directly, as a regular website?
You've to make the requests as .json() so you receive a json dict, which you can access it with keys.
import requests
import json
def main(url):
r = requests.get(url).json()
print(r.keys())
hview = json.dumps(r, indent=4)
print(hview) # here to see it in nice view.
main("https://eu-offering.kambicdn.org/offering/v2018/888es/betoffer/outcome.json?lang=es_ES&market=ES&client_id=2&channel_id=1&ncid=1586874367958&id=2740660278")
Output:
dict_keys(['betOffers', 'events', 'prePacks'])
{
"betOffers": [
{
"id": 2210856430,
"closed": "2020-04-17T14:30:00Z",
"criterion": {
"id": 1001159858,
"label": "Final del partido",
"englishLabel": "Full Time",
"order": [],
"occurrenceType": "GOALS",
"lifetime": "FULL_TIME"
},
"betOfferType": {
"id": 2,
"name": "Partido",
"englishName": "Match"
},
"eventId": 1006276426,
"outcomes": [
{
"id": 2740660278,
"label": "1",
"englishLabel": "1",
"odds": 1150,
"participant": "FC Lokomotiv Gomel",
"type": "OT_ONE",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"participantId": 1003789012,
"oddsFractional": "1/7",
"oddsAmerican": "-670",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2740660284,
"label": "X",
"englishLabel": "X",
"odds": 6750,
"type": "OT_CROSS",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"oddsFractional": "23/4",
"oddsAmerican": "575",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2740660286,
"label": "2",
"englishLabel": "2",
"odds": 11000,
"participant": "Khimik Svetlogorsk",
"type": "OT_TWO",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"participantId": 1001024009,
"oddsFractional": "10/1",
"oddsAmerican": "1000",
"status": "OPEN",
"cashOutStatus": "ENABLED"
}
],
"tags": [
"OFFERED_PREMATCH",
"MAIN"
],
"cashOutStatus": "ENABLED"
}
],
"events": [
{
"id": 1006276426,
"name": "FC Lokomotiv Gomel - Khimik Svetlogorsk",
"nameDelimiter": "-",
"englishName": "FC Lokomotiv Gomel - Khimik Svetlogorsk",
"homeName": "FC Lokomotiv Gomel",
"awayName": "Khimik Svetlogorsk",
"start": "2020-04-17T14:30:00Z",
"group": "1\u00aa Divisi\u00f3n",
"groupId": 2000053499,
"path": [
{
"id": 1000093190,
"name": "F\u00fatbol",
"englishName": "Football",
"termKey": "football"
},
{
"id": 2000051379,
"name": "Bielorrusa",
"englishName": "Belarus",
"termKey": "belarus"
},
{
"id": 2000053499,
"name": "1\u00aa Divisi\u00f3n",
"englishName": "1st Division",
"termKey": "1st_division"
}
],
"nonLiveBoCount": 6,
"sport": "FOOTBALL",
"tags": [
"MATCH"
],
"state": "NOT_STARTED",
"groupSortOrder": 1999999000000000000
}
],
"prePacks": []
}
I have managed to get out a json list and get a key out from the json.
I am working on how to put every version value in a list. How do I do that from a Map?
Map convertedJSONMap = new JsonSlurperClassic().parseText(data)
//If you have the nodes then fetch the first one only
if(convertedJSONMap."items"){
println "Version : " + convertedJSONMap."items"[0]."version"
}
So what I need is some kind of foreach loop that is going to throw the Map and just getting the items. version from each and put it in a list. How?
Groovy has Collection.collect(closure) that can be used to transform a list of values of one type to a list of new values. Consider the following example:
import groovy.json.JsonSlurper
def json = '''{
"items": [
{"id": "ID-001", "version": "1.23", "name": "Something"},
{"id": "ID-002", "version": "1.14.0", "name": "Foo Bar"},
{"id": "ID-003", "version": "2.11", "name": "Something else"},
{"id": "ID-004", "version": "8.0", "name": "ABC"},
{"id": "ID-005", "version": "2.32", "name": "Empty"},
{"id": "ID-006", "version": "4.11.2.3", "name": "Null"}
]
}'''
def convertedJSONMap = new JsonSlurper().parseText(json)
def list = convertedJSONMap.items.collect { it.version }
println list.inspect()
Output:
['1.23', '1.14.0', '2.11', '8.0', '2.32', '4.11.2.3']
Groovy also provides spread operator *. which can simplify this example to something like this:
import groovy.json.JsonSlurper
def json = '''{
"items": [
{"id": "ID-001", "version": "1.23", "name": "Something"},
{"id": "ID-002", "version": "1.14.0", "name": "Foo Bar"},
{"id": "ID-003", "version": "2.11", "name": "Something else"},
{"id": "ID-004", "version": "8.0", "name": "ABC"},
{"id": "ID-005", "version": "2.32", "name": "Empty"},
{"id": "ID-006", "version": "4.11.2.3", "name": "Null"}
]
}'''
def convertedJSONMap = new JsonSlurper().parseText(json)
def list = convertedJSONMap.items*.version
println list.inspect()
Or even this (you can replace *.version with just .version):
import groovy.json.JsonSlurper
def json = '''{
"items": [
{"id": "ID-001", "version": "1.23", "name": "Something"},
{"id": "ID-002", "version": "1.14.0", "name": "Foo Bar"},
{"id": "ID-003", "version": "2.11", "name": "Something else"},
{"id": "ID-004", "version": "8.0", "name": "ABC"},
{"id": "ID-005", "version": "2.32", "name": "Empty"},
{"id": "ID-006", "version": "4.11.2.3", "name": "Null"}
]
}'''
def convertedJSONMap = new JsonSlurper().parseText(json)
def list = convertedJSONMap.items.version
println list.inspect()
All examples produce the same output.
I use cyrillic symbols in my IPython notebooks. It works fine when I work in ML studio.
But when I download notebooks and open them (for example on http://try.jupyter.org ), I see strange characters.
Bad notebook (created on Azure ML Studio):
{"nbformat_minor": 0, "cells": [{"source": "\u00d1\u0082\u00d0\u00b5\u00d1\u0081\u00d1\u0082", "cell_type": "markdown", "metadata": {"collapsed": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.11", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}
$ file bad.ipynb
bad.ipynb: ASCII text, with very long lines, with no line terminators
"Good" version, created on http://try.jupyter.org:
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ัะตัั"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
$ file good.ipynb
good.ipynb: UTF-8 Unicode text
I have done some lab, and found out that your json is encoded into utf-8. For your case, it's simple to get the real content back. See the code below:
Python 3.x
a = '{"nbformat_minor": 0, "cells": [{"source": "\u00d1\u0082\u00d0\u00b5\u00d1\u0081\u00d1\u0082", "cell_type": "markdown", "metadata": {"collapsed": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.11", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}'
result = a.encode('latin-1').decode("utf-8")
Python 2.x
a = '{"nbformat_minor": 0, "cells": [{"source": "\u00d1\u0082\u00d0\u00b5\u00d1\u0081\u00d1\u0082", "cell_type": "markdown", "metadata": {"collapsed": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.11", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}'
result = a.decode('unicode-escape').encode("latin-1")
This piece of code may not work for some other cases, because 'latin-1' does not cover all 0-255 characters. Hence, I am still looking for a better encoding for this kind of things.