I am trying to extract table data by training different table layout structures. But once the training is completed I am not able to fetch the table data if I am trying to analyze a new file which has a different layout. Is it a limitation from Azure cognitive services.
Was the table extracted automatically and appears in the pageResults section of the JSON output or has a small table icon near it in the UX ? If you are labeling tables and training on tables is your training data all documents from the same format and layout ? If not you should create a model per document type (same format and layout) and then compose all the models together to a single model.
This is something that you can tryout with the Konfuzio SDK
to get you started:
# pip install konfuzio_sdk
# in the working directory
# konfuzio_sdk init
from konfuzio_sdk.api import get_results_from_segmentation
result = get_results_from_segmentation(doc_id=1111, project_id=111)
# result contains the elements per page
tables_first_page = [r for r in result[0] if r['label'] == 'table']
Create a free account here and upload your training data
https://github.com/konfuzio-ai/document-ai-python-sdk/issues/24
Related
I haven't work much with ADF but I am trying to connect to a REST API and write the data to an Azure SQL DB. I have already created a pipe that copies the JSON retrieved from the Rest API to and writes to Blob storage.
When I create a dataflow and use the Blob as the source, I get a nested table in the data preview tab. Allow schema drift is selected and the JSON settings is set to document of arrays.
All the data is in subarrays under the tickets array. Is there a way to select only the tickets array? If this is possible then I should be able to easily flatten the rest.
Top Level JSON
Sub-Array
Data Preview
You can use the Flatten transformation to unroll the tickets array. It is currently showing as drifted in your data preview, so you'll want to first make it part of your metadata. You can do that either through Import Projection on the source projection tab, or use the "Map Drifted" button on your data preview panel.
While trying to build an ADF pipeline that generates datasets within Data Factory, I ran into an interesting issue. Or maybe I misunderstand some components completely, in which case I'd happily be educated.
I basically read some meta data from a SQL Database table which determines which source system, schema and tables I should pull new data from. The meta data is stored within a bunch of variables, which then feed a Web Request that attempts to generate a new Data Source as per the MS documentation. Yes, I'm trying to use Azure Data Factory to generate Azure Data Factory components.
The URL to create the DataSet and the JSON Body for the request are both generated using #Concat and a number of the variables. The resulting DataSet is a very straightforward file that does not contain references to the columns, but just the table schema and table name. I generated these manually before, and that all seems to work brilliantly. I basically have a dataset connected to the source system, referincing the table from the meta data.
The code runs, but the resulting dataset is directly published, as opposed to being added in my working branch. While this should not be a big issue once I manage to properly test everything, ideally the object would be created in my working branch (using Azure DevOps, thus a local file).
My next thought was to set up a linked service to my local PC, and simply write the same contents as above there. My challenge seems to be that I essentially am creating a file out of nothing. I am trying to use a Copy Data component, and added an empty placeholder file to act as a source.
I configure the sink with Dynamic Content for Copy Behavior, and attempted to add the JSON contents there. This gets the file created, but it's unfortunately empty. I also attempted to add a new column to the source with the data being the same contents.
However, seeing the file to be used as a sink doesn't exist, a mapping error will occur. Apart from this, I'd not want a column header to be written; just the dynamically created contents.
I'm not sure how to continue with this. I feel I'm very close to achieving my goal, but cannot seem to take this final hurdle.
Any hints or suggestions would be very welcome.
I am new to the ADF.
While I am trying to use Copy activity for moving data from API Call output to Blob Json, I am unable to use Lookup output. I am trying to map the fields explicitly in Mapping using #item().SiteID. But JSON output returns only with input fields (not the derived fields). Can someone help me to let me know how to achieve this?
Can I use Copy activity in For Each activity (#activity('LookupAvailableChannelListForExport').output.value) to pass Lookup output value (#item().siteID)in mapping between source and sink?
As i know, the output of Look Up Activity can't be source data in copy activity,even mapping between source and sink. Acutally, Look Up activity prefers the following usage according to official document:
Dynamically determine which objects to operate on in a subsequent
activity, instead of hard coding the object name. Some object examples
are files and tables.
I think the example from above link is a good interpretation.You could see that the output of Look Up activity is configured as dynamic sql db source dataset table name.Not the data in source.
Then back to your requirement,i think you could configure the source dataset as root folder if the files are stored in the same directory with same schema. And keep this option is selected so that all the data in all files will be grabbed.
If you want to implement some variant of source data, copy activity can't cover it but data flow activity could.You could use Derived column.Such as resetting the Json structure.
What is the difference between the
'Run MetaData Wizard' --> Select Data Source
and
'Create Query Subject' --> 'Data Source' and then picking the datasource from the wizard
Are they the same thing?
They are identical if you pick the options as suggested above.
In the Run Metadata Wizard you have options to import metadata from other Cognos tools or from 3rd party applications.
In the create query subject, you can create Data Source queries, which done directly against the data source.
Model query, which done against other query subjects in the model
and stored procedure, which maps to DB stored procedure that return result set.
Using the Metadata Wizard you can create multiple query sources (e.g. import multiple tables) from a data source in a single operation.
I have defined my own projection by a query which returns a set of content items of known content type. I would like to take pick up certain content parts of these content items and display them in the list. Using shape tracing tool I have found the view template where to write my custom layout:
/Views/Parts.ProjectionPart.cshtml
but from the Model variable in the template I can not get the data I want because it is way too high above from the content parts data.
a good example of what I want: let's say I want to render product catalog as defined in this tutorial:
http://skywalkersoftwaredevelopment.net/blog/writing-an-orchard-webshop-module-from-scratch-part-5
but I want only to render a list which consists from items:
name of the owner who created the product
name of the product.
publish date of the product
and I need to render it at one place, i.e., not separately in their own part views.
Have you tried adding a layout in the projector module? There is a properties mode option that lets you select which fields/data to show. If the data you want is not there, you should be able to implement an IPropertyProvider. There are examples of this in the Projections module code.