I am having an issue with FormRecognizer not behaving how I have seen it should. Here is the dilemma
I have an Invoice that, when run through https://{endpoint}/formrecognizer/v2.0/layout/analyze
it recognized the table in the Invoice and generates the proper JSON with the "tables" node. Here is an example of part of it
{
"rows": 8,
"columns": 8,
"cells": [
{
"rowIndex": 0,
"columnIndex": 4,
"columnSpan": 3,
"text": "% 123 F STREET Deer Park TX 71536",
"boundingBox": [
3.11,
2.0733
],
"elements": [
"#/readResults/0/lines/20/words/0",
"#/readResults/0/lines/20/words/1"
]
}
When I train a model with NO labels file https://{endpoint}/formrecognizer/v2.0/custom/models It does not generate an empty "tables" node, but it generates (tokens). Here is an example of the one above without "table"
{
"key": {
"text": "__Tokens__12",
"boundingBox": null,
"elements": null
},
"value": {
"text": "123 F STREET",
"boundingBox": [
5.3778,
2.0625,
6.8056,
2.0625,
6.8056,
2.2014,
5.3778,
2.2014
],
"elements": null
},
"confidence": 1.0
}
I am not sure exactly where this is not behaving how intended, but any insight would be appreciated!
If you train a model WITH labeling files, then call FR Analyze(), the FR service will call the Layout service, which returns tables in "pageResults" section.
Related
I'm using mongodb in my project. And I'll import about 20,000 products to the database in the end. So, I tried to write a script to convert the data from the spreadsheet to json, and then upload them to the mongodb. But many fields were missing.
I'm trying to figure out how to layout the spreadsheet so it would contain nested data. But I didn't find any resources to do so but only this package:
https://www.npmjs.com/package/spread-sheet-to-nested-json
But it has one problem, it will always contain "title" and "children", not the actual name of the field.
This is my product json:
[
{
"sku": "ADX112",
"name": {
"en": "Multi-Mat Gallery Frames",
"ar": "لوحة بإطار"
},
"brand": "Dummy brand",
"description": {
"en": "Metal frame in a Black powder-coated finish. Tempered glass. 2 removable, acid-free paper mats included with each frame. Can be hung vertically and horizontally. D-rings included. 5x7 and 8x10 frames include easel backs. Sold individually. Made in China.",
"ar": "إطار اسود. صنع في الصين."
},
"tags": [
"art",
"frame",
"لوحة",
"إطار"
],
"colors": [
"#000000"
],
"dimensions": [
"5x7",
"8x10"
],
"units_in_stock": {
"5x7": 5,
"8x10": 7
},
"thumbnail": "https://via.placeholder.com/150",
"images": [
"https://via.placeholder.com/150",
"https://via.placeholder.com/150"
],
"unit_size": {
"en": [
"individual",
"set of 3"
],
"ar": [
"فردي",
"مجموعة من 3"
]
},
"unit_price": 2000,
"discount": 19,
"category_id": "631f3ca65b2310473b978ab5",
"subCategories_ids": [
"631f3ca65b2310473b978ab5",
"631f3ca65b2310473b978ab5"
],
"featured": false
}
]
How can I layout a spreadsheet so it would be a template for future imports?
I have a huge text file with blockchain data that I'd like to parse so that I can get the info from the fields I need. I have tried to convert it to json but it says is invalid. After doing some thought, I've realised it is not the best way since I only want 2 or 3 fields. Can someone help me to find the best way of extracting data from the file? There's an example below. I would only want txid size, and hash.
{
"txid": "254d5cc8d2b1889a2cb45f7e3dca8ed53a3fcfa32e8b9eac5f68c4f09e7af7bd",
"hash": "a8e125eb6d7ab883177d8ab228a3d09c1733d1ca49b7b2dff4b057eeb80ff9be",
"version": 2,
"size": 171,
"vsize": 144,
"weight": 576,
"locktime": 0,
"vin": [
{
"coinbase": "02ee170101",
"sequence": 4294967295
}
],
"vout": [
{
"value": 12.00000000,
"n": 0,
"scriptPubKey": {
"asm": "OP_HASH160 cd5b833dd43bc60b8c28c4065af670f283a203ff OP_EQUAL",
"hex": "a914cd5b833dd43bc60b8c28c4065af670f283a203ff87",
"reqSigs": 1,
"type": "scripthash",
"addresses": [
"2NBy4928yJakYBFQuXxXBwXjsLCRWgzyiGm"
]
}
},
{
"value": 5.00000000,
"n": 1,
"scriptPubKey": {
"asm": "OP_HASH160 cd5b833dd43bc60b8c28c4065af670f283a203ff OP_EQUAL",
"hex": "a914cd5b833dd43bc60b8c28c4065af670f283a203ff87",
"reqSigs": 1,
"type": "scripthash",
"addresses": [
"2NBy4928yJakYBFQuXxXBwXjsLCRWgzyiGm"
]
}
}
],
"hex":
"020000000001010000000000000000000000000000000000000000000000000000000000000000
ffffffff0502ee170101ffffffff02000000000000000017a914cd5b833dd43bc60b8c28c4065af670f283a
203ff870000000000000000266a24aa21a9ede2f61c3f71d1defd3fa999dfa36953755c69068979996
2b48bebd836974e8cf9012000000000000000000000000000000000000000000000000000000000
0000000000000000",
"blockhash": "0f84abb78891a4b9e8bc9637ec5fb8b4962c7fe46092fae99e9d69373bf7812a",
"confirmations": 1,
"time": 1590830080,
"blocktime": 1590830080
}
Thank you
#andrewjames is correct. If you have no control over the JSON file, you can address the error by just removing the newline characters:
parsed = json.loads(jsonText.replace("\n", ""))
Then you can access the fields you want like a normal dictionary:
print(parsed['txid'])
I have a Azure LUIS instance for NLP,
tried to extract Alphanumberic values using RegEx Expression. it worked well but the output had output in lowercase alphabets.
For example:
CASE 1*
My Input: " run job for AE0002" RegExCode = [a-zA-Z]{2}\d+
Output:
{
"query": " run job for AE0002",
"topScoringIntent": {
"intent": "Run Job",
"score": 0.7897274
},
"intents": [
{
"intent": "Run Job",
"score": 0.7897274
},
{
"intent": "None",
"score": 0.00434472738
}
],
"entities": [
{
"entity": "ae0002",
"type": "Alpha Number",
"startIndex": 15,
"endIndex": 20
}
]
}
I need to maintain the case of the input.
CASE 2
My Input : "Extract only abreaviations like HP and IBM" RegExCode = [A-Z]{2,}
Output :
{
"query": "extract only abreaviations like hp and ibm", // Query accepted by LUIS test window
"query": "extract only abreaviations like HP and IBM", // Query accepted as an endpoint url
"prediction": {
"normalizedQuery": "extract only abreaviations like hp and ibm",
"topIntent": "None",
"intents": {
"None": {
"score": 0.09844558
}
},
"entities": {
"Abbre": [
"extract",
"only",
"abreaviations",
"like",
"hp",
"and",
"ibm"
],
"$instance": {
"Abbre": [
{
"type": "Abbre",
"text": "extract",
"startIndex": 0,
"length": 7,
"modelTypeId": 8,
"modelType": "Regex Entity Extractor",
"recognitionSources": [
"model"
]
},
{
"type": "Abbre",
"text": "only",
"startIndex": 8,
"length": 4,
"modelTypeId": 8,
"modelType": "Regex Entity Extractor",
"recognitionSources": [
"model"
]
},....
{
"type": "Abbre",
"text": "ibm",
"startIndex": 39,
"length": 3,
"modelTypeId": 8,
"modelType": "Regex Entity Extractor",
"recognitionSources": [
"model"
]
}
]
}
}
}
}
This makes me doubt if the entire training is happening in lowercase, What shocked me was all the words that were trained initially to their respective entities were retrained as Abbre
Any input would be of great help :)
Thank you
For Case 1, do you need to preserve the case in order to query the job on your system? As long as the job identifier always has uppercase characters you can just use toUpperCase(), e.g. var jobName = step._info.options.entities.Alpha_Number.toUpperCase() (not sure about the underscore in Alpha Number, I've never had an entity with spaces before).
For Case 2, this is a shortcoming of the LUIS application. You can force case sensitivity in the regex with (?-i) (e.g. /(?-i)[A-Z]{2,}/g). However, LUIS appears to convert everything to lowercase first, so you'll never get any matches with that statement (which is better than matching every word, but that isn't saying much!). I don't know of any way to make LUIS recognize entities in the way you are requesting.
You could create a list entity with all of the abbreviations you are expecting, but depending on the inputs you are expecting, that could be too much to maintain. Plus abbreviations that are also words would be picked up as false positives (e.g. CAT and cat). You could also write a function to do it for you outside of LUIS, basically building your own manual entity detection. There could be some additional solutions based on exactly what you are trying to do after you identify the abbreviations.
You can simply use the word indexes provided in the output to get the values from the input string, exactly as they were provided.
{
"query": " run job for AE0002",
...
"entities": [
{
"entity": "ae0002",
"type": "Alpha Number",
"startIndex": 15,
"endIndex": 20
}
]
}
Once you got this reply, use a substring method on your query, using startIndex and endIndex (or endIndex - startIndex if your method want a length, not an end index), in order to have the value you are looking for.
I am trying to extract user reviews linked to a business from the google places API.
I am using the requests library :
import requests
r = requests.get('https://maps.googleapis.com/maps/api/place/details/json?placeid=ChIJlc_6_jM4DW0RQUUtaQj2_lk&key=AIzaSyBuS4meH_HW3FO1cpUaCm6jbqzRCWe7mjc')
json_data_dic = r.json()
print(json_data_dic)
So I get the json object converted to a python object for me to parse and extract my user ratings & reviews .
I get a "lump" of text back (see below). As someone new to python/coding, all I see is a tangle of nested dictionaries and lists. In situations where there is nesting- how do I refer to an items such as 'rating', or the 'text' of the review?
Any guidance would be appreciated.
{
"html_attributions": [],
"result": {
"reviews": [
{
"relative_time_description": "in the last week",
"profile_photo_url": "//lh6.googleusercontent.com/-06-3qCU8jEg/AAAAAAAAAAI/AAAAAAAAACc/a1z-ga9rOhs/photo.jpg",
"rating": 1,
"time": 1481050128,
"author_name": "Tass Wilson",
"text": "Worse company I've had the pleasure of dealing with. They don't follow through on what they say, their technical support team knows jack all. They are unreliable and lie. The only good part about their entire team is the sales team and thats to get you in the door, signed up and committed to a 12 month contract so they can then screw you over many times without taking you out to dinner first\n\nI would literally rather go back to using smoke signals and frikkin carrier pigeons then use Orcon again as an internet provider",
"aspects": [
{
"rating": 0,
"type": "overall"
}
],
"language": "en",
"author_url": "https://www.google.com/maps/contrib/116566965301711692941/reviews"
},
{
"relative_time_description": "3 weeks ago",
"rating": 1,
"time": 1479258972,
"author_name": "Anne-Marie Petersen",
"text": "I have experienced nothing but pain with them - from the start to (almost the end) in fact so I could skip the 5 days without internet I have had my other provider set up my fibre and just cut the loss of the extra month as the final bad money I will ever have to pay them. I called them to ask why I hadn't been offered an upgrade to fibre - I was told I wasn't eligible for fibre due to bad credit rating. I flipped out. Namely because I have a good credit score - it's something I check regularly. So I said well how do you know - they gave me the number of the company they use so I could call them. I hung up, called the number - it was the YELLOW PAGES number. I call back, I get given the same number (same person answered) I am seeing RED by this point, so I say just give me the name of the company. I find the number myself - I then call them only to be told they don't even work with Orcon. Then the guy offers to do a quick scan of the system to see if my name is in then. Doesn't even appear. Round and round the mulberry bush - I called another company and finally have had my fibre installed and everything ago. I still have no idea how to use the extra remote they've given me but the internet is fabulous. Oh - and I also got sick of every time something was wrong it was always MY fault even though I knew they would go offline and fix something. I used to WORK at a telco company let me tell you I get the system. Finally I have to send the modem back but I've already been advised to take it into their head office and take a photo of myself handing it in because they have on numerous occasions said to people that they've never received the modem even though they had.... why the hell are they still even a company????\n\nUPDATE>> got sent two cancellation notices - neither of which were for accounts that I had power over apparently. Have taken to twitter to have a public forum so they can't get me on not returning my modem.",
"aspects": [
{
"rating": 0,
"type": "overall"
}
],
"language": "en",
"author_url": "https://www.google.com/maps/contrib/113533966859873380749/reviews"
},
{
"relative_time_description": "4 months ago",
"profile_photo_url": "//lh4.googleusercontent.com/-lAEJbEHaIoE/AAAAAAAAAAI/AAAAAAAAEFo/IATRvjK2Oak/photo.jpg",
"rating": 1,
"time": 1469481312,
"author_name": "Keith Rankine",
"text": "Everything works well until you try to cancel your account. Do not be fooled into thinking you cannot give them notice to cancel within your contract term. They will try everything in their power to squeeze an extra month from you. \nI had a 12 month contract and informed them of my wish to cancel on the anniversary on that sign up date. All their emails were carefully worded to imply that this could not be done. If read carefully, and you argue, they will agree to it.",
"aspects": [
{
"rating": 0,
"type": "overall"
}
],
"language": "en",
"author_url": "https://www.google.com/maps/contrib/115729563512218272075/reviews"
},
{
"relative_time_description": "a month ago",
"rating": 1,
"time": 1476082876,
"author_name": "Wayne Furlong",
"text": "Completely useless. Dishonest, lazy and downright incompetent. Corporate bullies. I'm so much happier with Bigpipe.",
"aspects": [
{
"rating": 0,
"type": "overall"
}
],
"language": "en",
"author_url": "https://www.google.com/maps/contrib/113527219809275504416/reviews"
},
{
"relative_time_description": "3 months ago",
"rating": 1,
"time": 1471292986,
"author_name": "Shaun b",
"text": "Recently upgraded to \"unlimited\" Fibre with Orcon. Most mornings (5-9) we have a limited wired or wireless connection. Too often (as is the case this morning) we have no internet (so while at home we have to use phone data). This is on Wellington's cbd area. Their customer service is such that a reply could take upwards of 4 weeks. I intend to change provider.",
"aspects": [
{
"rating": 0,
"type": "overall"
}
],
"language": "en",
"author_url": "https://www.google.com/maps/contrib/101110905108291593535/reviews"
}
],
"utc_offset": 780,
"adr_address": "<span class=\"street-address\">1 The Strand</span>, <span class=\"extended-address\">Takapuna</span>, <span class=\"locality\">Auckland</span> <span class=\"postal-code\">0622</span>, <span class=\"country-name\">New Zealand</span>",
"photos": [
{
"html_attributions": [
"Orcon"
],
"height": 877,
"photo_reference": "CoQBdwAAAO0RRplNcUkeQUxtJLTNk3uAOTadHfKZQ8g2NMa6XLRmGX2oKdUHItfnKZP0CG2WwIj198PwzfDRJpZIw4M1wSENCEOD9mFjITSwWTMjHkw1PzHb9teT6vuuROxcCdH-fwCYp0tkeBc75R8RHb2drPbTk-NN_5q88jkJTfNwdZQDEhB-25Az9550mGd00B-zK-LRGhQpTusm33tZBFXA1952txiuAUsgQA",
"width": 878
}
],
"id": "a7c161a7081101d8897c2dd2fb41fa94b812b050",
"scope": "GOOGLE",
"vicinity": "1 The Strand, Takapuna, Auckland",
"international_phone_number": "+64 800 131 415",
"url": "https://maps.google.com/?cid=6484891029444838721",
"types": [
"point_of_interest",
"establishment"
],
"name": "Orcon",
"rating": 1.8,
"geometry": {
"viewport": {
"northeast": {
"lat": -36.7889585,
"lng": 174.77310355
},
"southwest": {
"lat": -36.7890697,
"lng": 174.77302975
}
},
"location": {
"lat": -36.78901820000001,
"lng": 174.7730694
}
},
"place_id": "ChIJlc_6_jM4DW0RQUUtaQj2_lk",
"formatted_address": "1 The Strand, Takapuna, Auckland 0622, New Zealand",
"reference": "CmRRAAAAD0loSIVYDAuRKLbv5Cp6ZM_jxKHbzJ7EOrDLakY1PAlmq5YDTJ82A4qzWje0ILFv3lsEdaUpCtkuVHuOxXW6so5yqxDSfkgEnXbzd84jtfItuxis7Izu-y87vwkD7JO4EhBZB6aIdHSchBT6_USM5B5VGhTTRgmnDKndDt6amWnPkXw-57-Pww",
"icon": "https://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png",
"website": "https://www.orcon.net.nz/",
"formatted_phone_number": "0800 131 415",
"address_components": [
{
"short_name": "1",
"types": [
"street_number"
],
"long_name": "1"
},
{
"short_name": "The Strand",
"types": [
"route"
],
"long_name": "The Strand"
},
{
"short_name": "Takapuna",
"types": [
"sublocality_level_1",
"sublocality",
"political"
],
"long_name": "Takapuna"
},
{
"short_name": "Auckland",
"types": [
"locality",
"political"
],
"long_name": "Auckland"
},
{
"short_name": "Auckland",
"types": [
"administrative_area_level_1",
"political"
],
"long_name": "Auckland"
},
{
"short_name": "NZ",
"types": [
"country",
"political"
],
"long_name": "New Zealand"
},
{
"short_name": "0622",
"types": [
"postal_code"
],
"long_name": "0622"
}
]
},
"status": "OK"
}
json_data_dic.get("result").get("reviews") or json_data_dic['result']['reviews'] gives you the list of reviews
json_data_dic.get("result").get("reviews")[0].get("text") returns the text of the first review
If you need to get each review:
for review in json_data_dic.get("result").get("reviews"):
print review.get("text")
In general, use .get(KEY) or [KEY] access a dictionary item by the key and use
[INDEX] access an item in a list by the index (starting from 0)
Hi i'm on a project and want to use Flickr for my image galery, i'm using the photosets.* method but whenever i make a request i don't get images, i only get info.
Json Result:
{
"photoset": {
"id": "77846574839405047",
"primary": "88575847594",
"owner": "998850450#N03",
"ownername": "mr.barde",
"photo": [
{
"id": "16852316982",
"secret": "857fur848c",
"server": "8568",
"farm": 9,
"title": "wallpaper-lenovo-blue-pc-brand",
"isprimary": "1",
"ispublic": 1,
"isfriend": 0,
"isfamily": 0
},
{
"id": "16665875068",
"secret": "857fur848c",
"server": "7619",
"farm": 8,
"title": "white_horses-1280x720",
"isprimary": "0",
"ispublic": 1,
"isfriend": 0,
"isfamily": 0
}
],
"page": 1,
"per_page": "2",
"perpage": "2",
"pages": 3,
"total": "6",
"title": "My First Album"
},
"stat": "ok"
}
Please would like to have actual image URLs returned, how can i do this.
Thanks to the comment by #CBroe
I found this in the Flickr API doc.
You can construct the source URL to a photo once you know its ID, server ID, farm ID and secret, as returned by many API methods.
https://farm{farm-id}.staticflickr.com/{server-id}/{id}_{secret}.jpg
or
https://farm{farm-id}.staticflickr.com/{server-id}/{id}_{secret}_[mstzb].jpg
or
https://farm{farm-id}.staticflickr.com/{server-id}/{id}_{o-secret}_o.(jpg|gif|png)
The final result would then look something like this.
https://farm1.staticflickr.com/2/1418878_1e92283336_m.jpg
Reference: https://www.flickr.com/services/api/misc.urls.html