Concatenate the rows based on number (Google Refine, Excel/Google Spreadsheet)

Concatenate the rows based on number (Google Refine, Excel/Google Spreadsheet) - excel

I have large amount of rows on a csv file, which look like:
name a,1
name b,1
name c,1
name d,2
name e,2
I need to concatenate the rows based on number. Result should be:
name a|name b|name c
name d|name e
How can I do it in Google Refine or in Excel/Google Spreadsheet?
I am thinking of it, but with no solution.
Thank you a lot!

Here is a proposal with Open refine. The only Grel formula i used is:
row.record.cells['myColumn'].value.join('|')
screencast
And here is the JSOn, assuming that your first column is named "myColumn" and the second "number" :
[
{
"op": "core/column-addition",
"description": "Create column test at index 2 based on column number using expression grel:value",
"engineConfig": {
"mode": "row-based",
"facets": [
{
"omitError": false,
"expression": "isBlank(value)",
"selectBlank": false,
"selection": [
{
"v": {
"v": false,
"l": "false"
}
}
],
"selectError": false,
"invert": false,
"name": "ee",
"omitBlank": false,
"type": "list",
"columnName": "ee"
}
]
},
"newColumnName": "test",
"columnInsertIndex": 2,
"baseColumnName": "number",
"expression": "grel:value",
"onError": "set-to-blank"
},
{
"op": "core/column-move",
"description": "Move column test to position 0",
"columnName": "test",
"index": 0
},
{
"op": "core/blank-down",
"description": "Blank down cells in column test",
"engineConfig": {
"mode": "row-based",
"facets": [
{
"omitError": false,
"expression": "isBlank(value)",
"selectBlank": false,
"selection": [
{
"v": {
"v": false,
"l": "false"
}
}
],
"selectError": false,
"invert": false,
"name": "ee",
"omitBlank": false,
"type": "list",
"columnName": "ee"
}
]
},
"columnName": "test"
},
{
"op": "core/column-addition",
"description": "Create column concatenation at index 2 based on column myColumn using expression grel:row.record.cells['myColumn'].value.join('|')",
"engineConfig": {
"mode": "row-based",
"facets": [
{
"omitError": false,
"expression": "isBlank(value)",
"selectBlank": false,
"selection": [
{
"v": {
"v": false,
"l": "false"
}
}
],
"selectError": false,
"invert": false,
"name": "ee",
"omitBlank": false,
"type": "list",
"columnName": "ee"
}
]
},
"newColumnName": "concatenation",
"columnInsertIndex": 2,
"baseColumnName": "myColumn",
"expression": "grel:row.record.cells['myColumn'].value.join('|')",
"onError": "set-to-blank"
}
]

If you can use Python it would be pretty easy to do this manipulation. In the the code below, the name and group are read from "input.csv", and the grouped names (along with the group) are written to "output.csv". A defaultdict is used to create empty lists to store the group members.
import collections
import csv
grouped = collections.defaultdict(list)
with open('input.csv') as fp:
reader = csv.reader(fp)
for row in reader:
name, group = row
grouped[group].append(name)
with open('output.csv', 'w', newline='') as fp:
writer = csv.writer(fp, delimiter='|')
for key in sorted(grouped.keys()):
writer.writerow([key] + grouped[key])

Related

Parse a json data out from a tag

The text below is part of a string I parsed from html (pre). Since I can not place tags <> in this box I have replace the beginning and end tags as (pre) (/pre).
(pre)(b)Below are the details for server SomeServerName from NetSystem.(Data Length - 1)(/b)
[
{
"askId": "Value1",
"billingCode": "99999999",
"clusterId": null,
"createdBy": "Mike",
"createdFromLegacy": null,
"createdOn": "2021-08-06T17:54:28.220Z",
"description": "Windows 2019",
"environment": "devops",
"hostId": null,
"id": "acd16582-b009-4667-aa95-5977603772sa",
"infrastructure": {
"apiId": "App2019_SA_1_v8-w2019_mike3_cc8f7e02-d426-423d-addb-b29bc7e163be",
"capacityId": "ODI",
"catalogManagementGroup": "Sales Marketing"
},
"legacyId": "XL143036181",
"location": {
"availabilityZone": "ny",
"code": "mx31",
"description": "uhg : mn053",
"region": "west",
"vendor": "apple"
},
"maintenance": {
"group": "3",
"status": "steady_state",
"window": {
"days": "Sunday",
"endTime": "06:00:00.000Z",
"startTime": "02:00:00.000Z"
}
},
"name": "SomeServer",
"network": {
"fqdn": "SomeServer.dom.tes.contoso.com",
"ipAddress": "xx.xx.xx.xx"
},
"os": {
"description": "Microsoft windows 2019",
"type": "windows",
"vendor": "Microsoft",
"version": "2019"
},
"owner": {
"id": "000111111",
"msid": "jtest"
},
"provision": {
"id": "ba424e42-a925-49a5-a4b7-5dcf41b69d4e",
"requestingApi": "mars Hub",
"system": "vRealize"
},
"specs": {
"cpuCount": 4,
"description": "Virtual Hardware",
"ram": 64384,
"serialNumber": null
},
"status": "ACTIVE",
"support": {
"group": "Support group"
},
"tags": {
"appTag": "minitab"
},
"updatedBy": "snir_agent",
"updatedOn": "2021-08-06T17:54:31.525Z"
}
](/pre)
As you can see this is almost json data but I can not parse it as such because of the (b) (/b) tag that exists inside my (pre) (/pre) tag. How can I parse out this (b) tag with its content so I am left with the json data and can treat it as such enabling me to more easily select values with json functions.

If your JSON always has [] brackets you can extract the content inside it and then parse it:
Python example:
import re
import json
text = '<b>asd</b>[{"a": "b", "c": "d"}] pre' # your content
json_content = re.findall('\[[^\]]*\]', text)[0] # '[{"a": "b", "c": "d"}]'
result = json.loads(json_content) # [{'a': 'b', 'c': 'd'}]

You can do this either by using re as indicated here or using split:
cleaned = data.split("(b)")[0] + data.split("(/b)")[1]
Above line will concatenate the content before (b) and after (/b) cleaning the b tag and its content.

Append Coordinates of each Element in a JSON

this is partially a follow-up to this question: Filtering Arrays in NodeJS without knowing where the value's location is
I got the JSON output but now I'm required to append the coordinates of each element to its respective element in the JSON.
I tried to use page.$(input[type='']) selector while having a variable instead of type for each key, and having the value being the value of said key, the issue is, I only got one type of output, that is the first element with type text, it returned null when element type was anything but text (i.e. element for example), and it, of course, didn't cycle through all elements with type text (I do know why), I tried using page.$$(input[type='']) but I couldn't figure out much how to use elementHandle on each object & lastly I can't figure out how to append back the coordinates to each element without losing the original hierarchy.
For reference: Here's a sample of the outputted JSON:
[
{
"type": "element",
"tagName": "form",
"attributes": [
{
"key": "action",
"value": "/action_page.php"
},
{
"key": "target",
"value": "_blank"
}
],
"children": [
{
"type": "text",
"content": "\nFirst name:"
},
{
"type": "element",
"tagName": "input",
"attributes": [
{
"key": "type",
"value": "text"
},
{
"key": "name",
"value": "firstname"
},
{
"key": "value",
"value": "John"
}
],
"children": []
},
{
"type": "text",
"content": "\nLast name:"
},
{
"type": "element",
"tagName": "input",
"attributes": [
{
"key": "type",
"value": "text"
},
{
"key": "name",
"value": "lastname"
},
{
"key": "value",
"value": "Doe"
}
],
"children": []
},
{
"type": "element",
"tagName": "input",
"attributes": [
{
"key": "type",
"value": "submit"
},
{
"key": "value",
"value": "Submit"
}
],
"children": []
},
{
"type": "text",
"content": "\n"
}
]
please note that this output isn't uniform and this will be used on multiple pages with different layouts, so the answer does need to be as adaptable as possible.

How to index complex types into Edm.ComplexType with Azure Cognitive Search

I am indexing data into an Azure Search Index that is produced by a custom skill. This custom skill produces complex data which I want to preserve into the Azure Search Index.
Source data is coming from blob storage and I am constrained to using the REST API without a very solid argument for using the .NET SDK.
Current code
The following is a brief rundown of what I currently have. I cannot change the index's field or the format of data produced by the endpoint used by the custom skill.
Complex data
The following is an example of complex data produced by the custom skill (in the correct value/recordId/etc. format):
{
"field1": 0.135412,
"field2": 0.123513,
"field3": 0.243655
}
Custom skill
Here is the custom skill which creates said data:
{
"#odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"uri": "https://myfunction.azurewebsites.com/api,
"httpHeaders": {},
"httpMethod": "POST",
"timeout": "PT3M50S",
"batchSize": 1,
"degreeOfParallelism": 5,
"name": "MySkill",
"context": "/document/mycomplex
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "field1",
"targetName": "field1"
},
{
"name": "field2",
"targetName": "field2"
},
{
"name": "field3",
"targetName": "field3"
}
]
}
I have attempted several variations, notable using the ShaperSkill with each field as an input and the output "targetName" as "mycomplex" (with the appropriate context).
Indexer
Here is the indexer's output field mapping for the skill:
{
"sourceFieldName": "/document/mycomplex,
"targetFieldName": "mycomplex"
}
I have tried several variations such as "sourceFieldName": "/document/mycomplex/*.
Search index
And this is the targeted index field:
{
"name": "mycomplex",
"type": "Edm.ComplexType",
"fields": [
{
"name": "field1",
"type": "Edm.Double",
"retrievable": true,
"filterable": true,
"sortable": true,
"facetable": false,
"searchable": false
},
{
"name": "field2",
"type": "Edm.Double",
"retrievable": true,
"filterable": true,
"sortable": true,
"facetable": false,
"searchable": false
},
{
"name": "field3",
"type": "Edm.Double",
"retrievable": true,
"filterable": true,
"sortable": true,
"facetable": false,
"searchable": false
}
]
}
Result
My result is usually similar to Could not map output field 'mycomplex' to search index. Check your indexer's 'outputFieldMappings' property..

This may be a mistake with the context of your skill. Instead of setting the context to /document/mycomplex, can you try setting it to /document? You can then add a ShaperSkill with the context also set to /document and the output field being mycomplex to generate the expected complex type shape
Example skills:
"skills":
[
{
"#odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"uri": "https://myfunction.azurewebsites.com/api,
"httpHeaders": {},
"httpMethod": "POST",
"timeout": "PT3M50S",
"batchSize": 1,
"degreeOfParallelism": 5,
"name": "MySkill",
"context": "/document"
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "field1",
"targetName": "field1"
},
{
"name": "field2",
"targetName": "field2"
},
{
"name": "field3",
"targetName": "field3"
}
]
},
{
"#odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"context": "/document",
"inputs": [
{
"name": "field1",
"source": "/document/field1"
},
{
"name": "field2",
"source": "/document/field2"
},
{
"name": "field3",
"source": "/document/field3"
}
],
"outputs": [
{
"name": "output",
"targetName": "mycomplex"
}
]
}
]
Please refer to the documentation on shaper skill for specifics.

ArangoDB SORT in document in document per int

I want sort a result, but arango ignore my aql-request.
My Query as example:
FOR d IN system_menu
SORT d.Lvl DESC
SORT d.Submenu[*].Lvl DESC
RETURN d
d.Lvl is a INT-Value. How I can sort a array of documents in a document?
my document:
{
"System": {},
"Controller": "reports",
"Show": true,
"Icon": "mdi-newspaper",
"Lvl": 3,
"Title": {
"DEde": "Berichte",
"Universal": "Reports"
},
"Submenu": [
{
"Title": {
"DEde": "Tätigkeitsberichte",
"Universal": "Activity reports"
},
"Controller": "activity-reports",
"Tabmenu": "",
"Filter": "",
"Lvl": 2,
"Show": true,
"Hrule": false
},
{
"Title": {
"DEde": "Behördenbericht",
"Universal": "Authority reports"
},
"Controller": "request-data-subject",
"Tabmenu": "",
"Filter": "",
"Lvl": 1,
"Show": true,
"Hrule": false
},
{
"Title": {
"DEde": "Auskunftsersuchen",
"Universal": "Request from a data subject"
},
"Controller": "request-data-subject",
"Tabmenu": "",
"Filter": "",
"Lvl": 3,
"Show": true,
"Hrule": false
}
]
}
The sorting does not work! How I sort all my documents per INT from 1...100?

To sort an inner array, you can use an inner loop, e.g.
FOR d IN system_menu
SORT d.Lvl DESC
LET submenus = (
FOR s IN d.Submenu
SORT s.Lvl DESC
RETURN s
)
RETURN MERGE(d, { Submenu: submenus })

Need help cleaning up iterating through nested JSON

I'm iterating through JSON content. I'm returning the sensor name + the temperature. There are two value keys under capability. I'm having trouble coming up with the simple logic to ignore the second one. I'm fairly new to python but feel like this is a simple adjustment. I just can't seem to find a good example on how to ignore the second value its pulling.
DATA:
"remoteSensors": [
{
"id": "rs:100",
"name": "Guest Bedroom",
"type": "ecobee3_remote_sensor",
"code": "TPCM",
"inUse": false,
"capability": [
{
"id": "1",
"type": "temperature",
"value": "690"
},
{
"id": "2",
"type": "occupancy",
"value": "false"
}
]
},
{
"id": "rs:101",
"name": "Mudd Room",
"type": "ecobee3_remote_sensor",
"code": "X9YF",
"inUse": false,
"capability": [
{
"id": "1",
"type": "temperature",
"value": "572"
},
{
"id": "2",
"type": "occupancy",
"value": "false"
}
]
},
{
"id": "rs:102",
"name": "Master Bedroom",
"type": "ecobee3_remote_sensor",
"code": "YDNZ",
"inUse": false,
"capability": [
{
"id": "1",
"type": "temperature",
"value": "694"
},
{
"id": "2",
"type": "occupancy",
"value": "true"
}
]
},
{
"id": "ei:0",
"name": "Main Floor",
"type": "thermostat",
"inUse": true,
"capability": [
{
"id": "1",
"type": "temperature",
"value": "725"
},
{
"id": "2",
"type": "humidity",
"value": "37"
},
{
"id": "3",
"type": "occupancy",
"value": "false"
}
]
}
]
CODE:
import requests
import json
url = 'https://api.ecobee.com/1/thermostat'
header = {'Content-Type': 'application/json;charset=UTF-8',
'Authorization': 'Bearer ZkEf7ONibogGpMQibem3SlhXhEOS99zK'}
params = {'json': ('{"selection":{"selectionType":"registered",'
'"includeRuntime":"true",'
'"includeSensors":"true",'
'"includeProgram":"true",'
'"includeEquipmentStatus":"true",'
'"includeEvents":"true",'
'"includeWeather":"true",'
'"includeSettings":"true"}}')}
request = requests.get(url, headers=header, params=params)
#print(request)
thermostats = request.json()['thermostatList']
remote_sensors = thermostats[0]['remoteSensors']
for eachsensor in thermostats[0]['remoteSensors']:
for temp in eachsensor['capability']:
name = eachsensor.get('name')
temp = temp.get('value')
print(name, temp)
Actual Results:
Guest Bedroom 690
Guest Bedroom false
Mudd Room 579
Mudd Room false
Master Bedroom 698
Master Bedroom false
Main Floor 731
Main Floor false
Expected Results:
Guest Bedroom 690
Mudd Room 579
Master Bedroom 698
Main Floor 731
Thanks for the help! #Naveen
Fixed:
for eachsensor in thermostats[0]['remoteSensors']:
for temp in eachsensor['capability']:
name = eachsensor.get('name')
tempr = temp.get('value')
id = temp.get('id')
if id == '1':
print(name, tempr, id)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Concatenate the rows based on number (Google Refine, Excel/Google Spreadsheet) - excel

Related

Parse a json data out from a tag

Append Coordinates of each Element in a JSON

How to index complex types into Edm.ComplexType with Azure Cognitive Search

ArangoDB SORT in document in document per int

Need help cleaning up iterating through nested JSON

Categories

Resources