I have a huge text file with blockchain data that I'd like to parse so that I can get the info from the fields I need. I have tried to convert it to json but it says is invalid. After doing some thought, I've realised it is not the best way since I only want 2 or 3 fields. Can someone help me to find the best way of extracting data from the file? There's an example below. I would only want txid size, and hash.
{
"txid": "254d5cc8d2b1889a2cb45f7e3dca8ed53a3fcfa32e8b9eac5f68c4f09e7af7bd",
"hash": "a8e125eb6d7ab883177d8ab228a3d09c1733d1ca49b7b2dff4b057eeb80ff9be",
"version": 2,
"size": 171,
"vsize": 144,
"weight": 576,
"locktime": 0,
"vin": [
{
"coinbase": "02ee170101",
"sequence": 4294967295
}
],
"vout": [
{
"value": 12.00000000,
"n": 0,
"scriptPubKey": {
"asm": "OP_HASH160 cd5b833dd43bc60b8c28c4065af670f283a203ff OP_EQUAL",
"hex": "a914cd5b833dd43bc60b8c28c4065af670f283a203ff87",
"reqSigs": 1,
"type": "scripthash",
"addresses": [
"2NBy4928yJakYBFQuXxXBwXjsLCRWgzyiGm"
]
}
},
{
"value": 5.00000000,
"n": 1,
"scriptPubKey": {
"asm": "OP_HASH160 cd5b833dd43bc60b8c28c4065af670f283a203ff OP_EQUAL",
"hex": "a914cd5b833dd43bc60b8c28c4065af670f283a203ff87",
"reqSigs": 1,
"type": "scripthash",
"addresses": [
"2NBy4928yJakYBFQuXxXBwXjsLCRWgzyiGm"
]
}
}
],
"hex":
"020000000001010000000000000000000000000000000000000000000000000000000000000000
ffffffff0502ee170101ffffffff02000000000000000017a914cd5b833dd43bc60b8c28c4065af670f283a
203ff870000000000000000266a24aa21a9ede2f61c3f71d1defd3fa999dfa36953755c69068979996
2b48bebd836974e8cf9012000000000000000000000000000000000000000000000000000000000
0000000000000000",
"blockhash": "0f84abb78891a4b9e8bc9637ec5fb8b4962c7fe46092fae99e9d69373bf7812a",
"confirmations": 1,
"time": 1590830080,
"blocktime": 1590830080
}
Thank you
#andrewjames is correct. If you have no control over the JSON file, you can address the error by just removing the newline characters:
parsed = json.loads(jsonText.replace("\n", ""))
Then you can access the fields you want like a normal dictionary:
print(parsed['txid'])
Related
I am having an issue with FormRecognizer not behaving how I have seen it should. Here is the dilemma
I have an Invoice that, when run through https://{endpoint}/formrecognizer/v2.0/layout/analyze
it recognized the table in the Invoice and generates the proper JSON with the "tables" node. Here is an example of part of it
{
"rows": 8,
"columns": 8,
"cells": [
{
"rowIndex": 0,
"columnIndex": 4,
"columnSpan": 3,
"text": "% 123 F STREET Deer Park TX 71536",
"boundingBox": [
3.11,
2.0733
],
"elements": [
"#/readResults/0/lines/20/words/0",
"#/readResults/0/lines/20/words/1"
]
}
When I train a model with NO labels file https://{endpoint}/formrecognizer/v2.0/custom/models It does not generate an empty "tables" node, but it generates (tokens). Here is an example of the one above without "table"
{
"key": {
"text": "__Tokens__12",
"boundingBox": null,
"elements": null
},
"value": {
"text": "123 F STREET",
"boundingBox": [
5.3778,
2.0625,
6.8056,
2.0625,
6.8056,
2.2014,
5.3778,
2.2014
],
"elements": null
},
"confidence": 1.0
}
I am not sure exactly where this is not behaving how intended, but any insight would be appreciated!
If you train a model WITH labeling files, then call FR Analyze(), the FR service will call the Layout service, which returns tables in "pageResults" section.
I feel like the documentation on loading json files into cassandra is really lacking in dsbulk docs.
Here is part of the json file that im trying to load:
[
{
"tags": [
"r"
],
"owner": {
"reputation": 23,
"user_id": 12235281,
"user_type": "registered",
"profile_image": "https://www.gravatar.com/avatar/60e28f52215bff12adb9758fc2cf86dd?s=128&d=identicon&r=PG&f=1",
"display_name": "Me28",
"link": "https://stackoverflow.com/users/12235281/me28"
},
"is_answered": false,
"view_count": 3,
"answer_count": 0,
"score": 0,
"last_activity_date": 1589053659,
"creation_date": 1589053659,
"question_id": 61702762,
"link": "https://stackoverflow.com/questions/61702762/merge-dataframes-in-r-with-different-size-and-condition",
"title": "Merge dataframes in R with different size and condition"
},
{
"tags": [
"python",
"location",
"pyautogui"
],
"owner": {
"reputation": 1,
"user_id": 13507535,
"user_type": "registered",
"profile_image": "https://lh3.googleusercontent.com/a-/AOh14GgtdM9KrbH3X5Z33RCtz6xm_TJUSQS_S31deNYUcA=k-s128",
"display_name": "lowhatex",
"link": "https://stackoverflow.com/users/13507535/lowhatex"
},
"is_answered": false,
"view_count": 2,
"answer_count": 0,
"score": 0,
"last_activity_date": 1589053657,
"creation_date": 1589053657,
"question_id": 61702761,
"link": "https://stackoverflow.com/questions/61702761/want-to-get-a-grip-of-this-pyautogui-command",
"title": "Want to get a grip of this pyautogui command"
}
]
The way I have been trying to load this is following:
dsbulk load -url ./data_so1.json -k stackoverflow_t -t staging_t -h '182.14.0.1' -header false -u username -p password
This is the closest i get and it pushes the values into Cassandra row by row like this:
data
-------------------------------------------------------------------------------------------------------------------------------
"title": "'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine giving exception on client"
"profile_image": "https://www.gravatar.com/avatar/05085ede54486bdaebefcf8363e081e2?s=128&d=identicon&r=PG&f=1",
"view_count": 422,
"question_id": 61702768,
"user_id": 12235281,
This just takes the rows as they are (including the commas). I've tried the -m key for mapping but didnt really get anywhere with it.
What would be the right way to get these values to their own respective columns?
I have previously been reading the text from a table in excel using the following URL:
https://graph.microsoft.com/v1.0/me/drive/root:/my-folder%5Cmy-workbook.xlsx:/workbook/worksheets('MyWorksheet')/tables('MyTable')/range/text
Now that is giving me a 200 response with this content:
{
"#odata.context": "https://graph.microsoft.com/v1.0/$metadata#microsoft.graph.Json"
}
I can access the range using the same URL but without the \text segment. For example the requests to the following URL
https://graph.microsoft.com/v1.0/me/drive/root:/my-folder%5Cmy-workbook.xlsx:/workbook/worksheets('MyWorksheet')/tables('MyTable')/range
result in:
{
"#odata.context": "https://graph.microsoft.com/v1.0/$metadata#workbookRange",
"#odata.type": "#microsoft.graph.workbookRange",
"#odata.id": "/users('xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx')/drive/root/workbook/worksheets(%27%7Bxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx%7D%27)/tables(%2719%27)/range()",
"address": "MyWorksheet!C6:BB18",
"addressLocal": "MyWorksheet!C6:BB18",
"cellCount": 676,
"columnCount": 52,
"columnHidden": false,
"columnIndex": 2,
"formulas": [ ... ],
"formulasLocal": [ ... ],
"formulasR1C1": [ ... ],
"hidden": false,
"numberFormat": [ ... ],
"rowCount": 13,
"rowHidden": false,
"rowIndex": 5,
"text": [
[
"Text",
"From",
"The",
"Range",
...,
]
]
"values": [ ... ],
"valueTypes": [ ....]
}
The text property is present and contains the expected data.
The MS Graph documentation includes a text property on the range resource.
As stated above, I believe this was working earlier, so I'm assuming this is due to some bug / change / limitaion in MS Graph.
Can anyone advise how to read the text from that table range directly (and/or why I am getting that response)?
Text is a property of the microsoft.graph.workbookRang, but not an endpoint. I'm not sure how/if /range/text could have worked in the past but I wouldn't have expected it to.
As for returning just the text, you can use the $select query parameter:
/me/drive/root:/{path}:/workbook/worksheets('{id}')/tables('{id}')/range?$select=text
[updated 17:15 on 28/09]
I'm manipulating json data of type:
[
{
"id": 1,
"title": "Sun",
"seeAlso": [
{
"id": 2,
"title": "Rain"
},
{
"id": 3,
"title": "Cloud"
}
]
},
{
"id": 2,
"title": "Rain",
"seeAlso": [
{
"id": 3,
"title": "Cloud"
}
]
},
{
"id": 3,
"title": "Cloud",
"seeAlso": [
{
"id": 1,
"title": "Sun"
}
]
},
];
After inclusion in the database, a node.js search using
db.documents.query(
q.where(
q.collection('test films'),
q.value('title','Sun')
).withOptions({categories: 'none'})
)
.result( function(results) {
console.log(JSON.stringify(results, null,2));
});
will return both the film titled 'Sun' and the films which have a seeAlso/title property (forgive the xpath syntax) = 'Sun'.
I need to find 1/ films with title = 'Sun' 2/ films with seeAlso/title = 'Sun'.
I tried a container query using q.scope() with no success; I don't find how to scope the root object node (first case) and for the second case,
q.where(q.scope(q.property('seeAlso'), q.value('title','Sun')))
returns as first result an item which matches all text inside the root object node
{
"index": 1,
"uri": "/1.json",
"path": "fn:doc(\"/1.json\")",
"score": 137216,
"confidence": 0.6202662,
"fitness": 0.6701325,
"href": "/v1/documents?uri=%2F1.json&database=Documents",
"mimetype": "application/json",
"format": "json",
"matches": [
{
"path": "fn:doc(\"/1.json\")/object-node()",
"match-text": [
"Sun Rain Cloud"
]
}
]
},
which seems crazy.
Any idea about how doing such searches on denormalized json data?
Laurent:
XPaths on JSON are supported by MarkLogic.
In particular, you might consider setting up a path range index to match /title at the root:
http://docs.marklogic.com/guide/admin/range_index#id_54948
Scoped property matching required either filtering or indexed positions to be accurate. An alternative is to set up another path range index on /seeAlso/title
For the match issue it would be useful to know the MarkLogic version and to see the entire query.
Hoping that helps,
Hi i'm on a project and want to use Flickr for my image galery, i'm using the photosets.* method but whenever i make a request i don't get images, i only get info.
Json Result:
{
"photoset": {
"id": "77846574839405047",
"primary": "88575847594",
"owner": "998850450#N03",
"ownername": "mr.barde",
"photo": [
{
"id": "16852316982",
"secret": "857fur848c",
"server": "8568",
"farm": 9,
"title": "wallpaper-lenovo-blue-pc-brand",
"isprimary": "1",
"ispublic": 1,
"isfriend": 0,
"isfamily": 0
},
{
"id": "16665875068",
"secret": "857fur848c",
"server": "7619",
"farm": 8,
"title": "white_horses-1280x720",
"isprimary": "0",
"ispublic": 1,
"isfriend": 0,
"isfamily": 0
}
],
"page": 1,
"per_page": "2",
"perpage": "2",
"pages": 3,
"total": "6",
"title": "My First Album"
},
"stat": "ok"
}
Please would like to have actual image URLs returned, how can i do this.
Thanks to the comment by #CBroe
I found this in the Flickr API doc.
You can construct the source URL to a photo once you know its ID, server ID, farm ID and secret, as returned by many API methods.
https://farm{farm-id}.staticflickr.com/{server-id}/{id}_{secret}.jpg
or
https://farm{farm-id}.staticflickr.com/{server-id}/{id}_{secret}_[mstzb].jpg
or
https://farm{farm-id}.staticflickr.com/{server-id}/{id}_{o-secret}_o.(jpg|gif|png)
The final result would then look something like this.
https://farm1.staticflickr.com/2/1418878_1e92283336_m.jpg
Reference: https://www.flickr.com/services/api/misc.urls.html