i am trying to understand a skillsets in Azure Cognitive Search. I want to build an Ocr powered search and i try to understend how it works.
For example documentation says taht ocr skill produces response:
{
"text": "Hello World. -John",
"layoutText":
{
"language" : "en",
"text" : "Hello World. -John",
"lines" : [
{
"boundingBox":
[ {"x":10, "y":10}, {"x":50, "y":10}, {"x":50, "y":30},{"x":10, "y":30}],
"text":"Hello World."
},
{
"boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
"text":"-John"
}
],
"words": [
{
"boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
"text":"Hello"
},
{
"boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
"text":"World."
},
{
"boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
"text":"-John"
}
]
}
}
but then in this paragraph we see, that only text field from OCR skill is used and newcomer, contentOffset is presented.
Custom skillset definition:
{
"description": "Extract text from images and merge with content text to produce merged_text",
"skills":
[
{
"description": "Extract text (plain and structured) from image.",
"#odata.type": "#Microsoft.Skills.Vision.OcrSkill",
"context": "/document/normalized_images/*",
"defaultLanguageCode": "en",
"detectOrientation": true,
"inputs": [
{
"name": "image",
"source": "/document/normalized_images/*"
}
],
"outputs": [
{
"name": "text"
}
]
},
{
"#odata.type": "#Microsoft.Skills.Text.MergeSkill",
"description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
{
"name":"text",
"source": "/document/content"
},
{
"name": "itemsToInsert",
"source": "/document/normalized_images/*/text"
},
{
"name":"offsets",
"source": "/document/normalized_images/*/contentOffset"
}
],
"outputs": [
{
"name": "mergedText",
"targetName" : "merged_text"
}
]
}
]
}
and input should look like this:
{
"values": [
{
"recordId": "1",
"data":
{
"text": "The brown fox jumps over the dog",
"itemsToInsert": ["quick", "lazy"],
"offsets": [3, 28]
}
}
]
}
So how the array of offsets (contentOffset in skill definition) are coming from where OcrSkill response not returning that and Read method from computer vision not returning that as well from API?
contentOffset - is the default feature to extract content from the files are having images embedded init. So, whenever the OCR skillset recognizes images included in the input document, contentOffset is called.
To answer the reason for coming array of contentOffset is due to having multiple images in every input we are uploading for analyzing. Consider the following documentation for ReadAPI through REST to follow the JSON operations.
Related
Raw Read response from Azure Computer Vision looks like this:
{
"status": "succeeded",
"createdDateTime": "2021-04-08T21:56:17.6819115+00:00",
"lastUpdatedDateTime": "2021-04-08T21:56:18.4161316+00:00",
"analyzeResult": {
"version": "3.2",
"readResults": [
{
"page": 1,
"angle": 0,
"width": 338,
"height": 479,
"unit": "pixel",
"lines": [
{
"boundingBox": [
25,
14
],
"text": "NOTHING",
"appearance": {
"style": {
"name": "other",
"confidence": 0.971
}
},
"words": [
{
"boundingBox": [
27,
15
],
"text": "NOTHING",
"confidence": 0.994
}
]
}
]
}
]
}
}
Copied from here
I want to create custom skill in Azure Cognitive Search that are not using VisionSkill but my own Azure Functions that will use Computer vision client in code.
The problem is, that to pass input to Text.MergeSkill:
{
"#odata.type": "#Microsoft.Skills.Text.MergeSkill",
"description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
{
"name":"text",
"source": "/document/content"
},
{
"name": "itemsToInsert",
"source": "/document/normalized_images/*/text"
},
{
"name":"offsets",
"source": "/document/normalized_images/*/contentOffset"
}
],
"outputs": [
{
"name": "mergedText",
"targetName" : "merged_text"
}
]
}
i need to convert Read output to form that returns OcrSkill from custom skills. That response must look like this:
{
"text": "Hello World. -John",
"layoutText":
{
"language" : "en",
"text" : "Hello World.",
"lines" : [
{
"boundingBox":
[ {"x":10, "y":10}, {"x":50, "y":10}, {"x":50, "y":30},{"x":10, "y":30}],
"text":"Hello World."
},
],
"words": [
{
"boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
"text":"Hello"
},
{
"boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
"text":"World."
}
]
}
}
And i copied it from here
My question is, how to convert boundingBox parameter from Read Computer Vision endpoint to form that Text.MergeSkill accept? Do we really need to do that or we just can pass Read response to Text.MergeSkill diffrently?
The built in OCRSkill calls the Cognitive Services Computer Vision Read API for certain languages, and it handles the merging of the text for you via the 'text' output. If at all possible, I would strongly suggest you use this skill instead of writing a custom one.
If you must write a custom skill and merge the output text yourself, per the MergeSkill documentation, the 'text' and 'offsets' inputs are optional. Meaning that you should just be able to directly pass the text from the individual Read API output objects to the MergeSkill via the 'itemsToInsert' input if you just need a way to merge those outputs together into one large text. This would make your skillset look something like this (not tested to know for sure), assuming you are still using the built in AzureSearch image extraction and your custom skill outputs the exact payload that the Read API returns that you shared above.
{
"skills": [
{
"#odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "Custom skill that calls Cognitive Services Computer Vision Read API",
"uri": "<your custom skill uri>",
"batchSize": 1,
"context": "/document/normalized_images/*",
"inputs": [
{
"name": "image",
"source": "/document/normalized_images/*"
}
],
"outputs": [
{
"name": "readAPIOutput"
}
]
},
{
"#odata.type": "#Microsoft.Skills.Text.MergeSkill",
"description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
"context": "/document",
"insertPreTag": "",
"insertPostTag": "\n",
"inputs": [
{
"name": "itemsToInsert",
"source": "/document/normalized_images/*/readAPIOutput/analyzeResult/readResults/*/lines/*/text"
}
],
"outputs": [
{
"name": "mergedText",
"targetName": "merged_text"
}
]
}
]
}
However, if you need to guarantee that the text appears in the correct order based on the bounding boxes, you will likely need to write a custom solution to calculate the positions and recombine the text yourself. Hence the suggestion to use our built in solution in the OCRSkill if at all possible.
I am doing an application in Node.js using the Twitter API. The way the Twitter API is setup you get a JSON file such as:
result = {
"data": [
{
"id": "1410481286947483648",
"created_at": "2021-07-01T06:11:58.000Z",
"text": "RT #weatherdak: Absolutely mind-blowing wildfire behavior in British Columbia.\n\nIncredible & massive storm-producing pyrocumulonimbus plumeā¦"
},
{
"attachments": {
"media_keys": [
"7_1408149248726667266"
]
},
"id": "1408168068237434883",
"created_at": "2021-06-24T21:00:04.000Z",
"text": "RT #Brophyst: Tornado in Brno, Czech Republic :O "
},
{
"attachments": {
"media_keys": [
"16_1406870837026770949"
]
},
"id": "1406870843972632579",
"created_at": "2021-06-21T07:05:22.000Z",
"text": "Working on a new storm chasing application with incoming thunder rumbling. "
},
{
"id": "1405081317985947648",
"created_at": "2021-06-16T08:34:26.000Z",
"text": "#thefunkygirl #TornadoGreg #cgphotography Slender tornadoes are (almost) the best!"
}
],
"includes": {
"media": [
{
"media_key": "7_1408149248726667266",
"preview_image_url": "https://pbs.twimg.com/ext_tw_video_thumb/1408149248726667266/pu/img/qNKbx0MtaSGG0Sv0.jpg",
"type": "video"
},
{
"media_key": "16_1406870837026770949",
"preview_image_url": "https://pbs.twimg.com/tweet_video_thumb/E4Y1koXWUAU1Ls5.jpg",
"type": "animated_gif"
}
]
},
"meta": {
"oldest_id": "1405081317985947648",
"newest_id": "1410481286947483648",
"result_count": 10,
"next_token": "7140dibdnow9c7btw3z0ry2hiovfdas2sur0h2hu1m6ns"
}
}
How do I create a new JSON where instead of getting the attachments/media_keys subsection and the additional 'includes' subsection in the JSON, I can replace the attachment/media_keys with the preview_url?
For example, so that:
{
"attachments": {
"media_keys": [
"16_1406870837026770949"
]
},
"id": "1406870843972632579",
"created_at": "2021-06-21T07:05:22.000Z",
"text": "Working on a new storm chasing application with incoming thunder rumbling."
},
becomes:
{
"preview_image_url": "https://pbs.twimg.com/tweet_video_thumb/E4Y1koXWUAU1Ls5.jpg",
"type": "animated_gif"
"id": "1406870843972632579",
"created_at": "2021-06-21T07:05:22.000Z",
"text": "Working on a new storm chasing application with incoming thunder rumbling."
},
I am displaying some data in using adaptive card in BOT emulator,I want to add one comment section there.I tried this way:
{
"type": "Action.ShowCard",
"title": "Comment",
"card": {
"type": "AdaptiveCard",
"body": [
{
"type": "Input.Text",
"id": "test",
"isMultiline": true,
"placeholder": "Enter your comment",
}
],
"actions": [
{
"type": "Action.Submit",
"id":"submit",
"title": "OK",
"data":
{
"test":""
}
}
],
"$schema": "http://adaptivecards.io/schemas/adaptive-card.json"
}
}
]
but not getting any output when click on OK action.How can we do it?
Say I have a product collection like this:
{
"_id": "5a74784a8145fa1368905373",
"name": "This is my first product",
"description": "This is the description of my first product",
"category": "34/73/80",
"condition": "New",
"images": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstproduct_image1.jpg"
},
...
],
"attributes": [
{
"name": "Material",
"value": "Synthetic"
},
...
],
"variation": {
"attributes": [
{
"name": "Color",
"values": ["Black", "White"]
},
{
"name": "Size",
"values": ["S", "M", "L"]
}
]
}
}
and a variation collection like this:
{
"_id": "5a748766f5eef50e10bc98a8",
"name": "color:black,size:s",
"productID": "5a74784a8145fa1368905373",
"condition": "New",
"price": 1000,
"sale": null,
"image": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstvariation_image1.jpg"
}
],
"attributes": [
{
"name": "Color",
"value": "Black"
},
{
"name": "Size",
"value": "S"
}
]
}
I want to keep the documents separate and for the purpose of easy browsing, searching and faceted search implementation, I want to fetch all the data in a single query but I don't want to do join in my application code.
I know it's achievable using a third collection called summary that might look like this:
{
"_id": "5a74875fa1368905373",
"name": "This is my first product",
"category": "34/73/80",
"condition": "New",
"price": 1000,
"sale": null,
"description": "This is the description of my first product",
"images": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstproduct_image1.jpg"
},
...
],
"attributes": [
{
"name": "Material",
"value": "Synthetic"
},
...
],
"variations": [
{
"condition": "New",
"price": 1000,
"sale": null,
"image": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstvariation_image.jpg"
}
],
"attributes": [
"color=black",
"size=s"
]
},
...
]
}
problem is, I don't know how to keep the summary collection in sync with the product and variation collection. I know it can be done using mongo-connector but i'm not sure how to implement it.
please help me, I'm still a beginner programmer.
you don't actually need to maintain a summary collection, its redundant to store product and variation summary in another collection
instead of you can use an aggregate pipeline $lookup to outer join product and variation using productID
aggregate pipeline
db.products.aggregate(
[
{
$lookup : {
from : "variation",
localField : "_id",
foreignField : "productID",
as : "variations"
}
}
]
).pretty()
I am using azure media service to store my assets like video. Now i want to trim video to first one minute. suppose video is 5 minute then i want to trim it to first 1 minute. I tried with following code
{
"Version": 1.0,
"Sources": [
{
"StartTime": "00:00:04",
"Duration": "00:00:16"
}
],
"Codecs": [
{
"KeyFrameInterval": "00:00:02",
"SceneChangeDetection": true,
"H264Layers": [
{
"Profile": "Auto",
"Level": "auto",
"Bitrate": 4500,
"MaxBitrate": 4500,
"BufferWindow": "00:00:05",
"Width": 1280,
"Height": 720,
"BFrames": 3,
"ReferenceFrames": 3,
"AdaptiveBFrame": true,
"Type": "H264Layer",
"FrameRate": "0/1"
}
],
"Type": "H264Video"
},
{
"Profile": "AACLC",
"Channels": 2,
"SamplingRate": 48000,
"Bitrate": 128,
"Type": "AACAudio"
}
],
"Outputs": [
{
"FileName": "{Basename}_{Width}x{Height}_{VideoBitrate}.mp4",
"Format": {
"Type": "MP4Format"
}
}
]
}
My question is , is there any way to trim video without specifying video codecs because i just want to trim video don't want to encode. like using this code
{
"Version": "1.0",
"Sources": [
{
"StartTime": "00:00:00",
"Duration": "00:01:00"
}
],
"Outputs": [
{
"FileName": "$filename$.mp4",
"Format": {
"Type": "MP4Format"
}
}
]
}
I presume you want an output MP4 for downloading/delivering offline.
If the following conditions are satisfied:
Source is an MP4 file, or it uses video/audio codecs that are compatible with the MP4 file format (eg. H.264 video, AAC audio), and
The source is encoded with closed GOPs
Then, you should be able to use the following preset JSON, that tells the encoder to copy the input video and audio:
{
"Version": "1.0",
"Sources": [
{
"StartTime": "00:00:00",
"Duration": "00:01:00"
}
],
"Outputs": [
{
"FileName": "$filename$.mp4",
"Format": {
"Type": "MP4Format"
}
}
],
"Codecs": [
{
"Type": "CopyVideo"
},
{
"Type": "CopyAudio"
}
]
}