Extracting an xml value in Groovy

Extracting an xml value in Groovy - groovy

I have this code
String dbresponse = '''
<rows>
<row>
<file_data>One</file_data>
<time_inserted>2019-01-30T10:29:20.543</time_inserted>
</row>
<row>
<file_data>two</file_data>
<time_inserted>2019-01-30T10:29:20.547</time_inserted>
</row>
<row>
<file_data>three</file_data>
<time_inserted>2019-01-30T10:29:20.550</time_inserted>
</row>
<row>
<file_data>four</file_data>
<time_inserted>2019-01-30T10:29:20.550</time_inserted>
</row>
<row>
<file_data>five</file_data>
<time_inserted>2019-01-30T10:29:20.553</time_inserted>
</row>
</rows>
'''
def response = new XmlSlurper().parseText(dbresponse)
def data = response.rows.row[1].file_data
print data
I have two questions:
1] With the above code why am I not getting the response of: two ?
2] How do I iterate through the entire xml doc to get this response:
one
two
three
four
five
Thanks

1] With the above code why am I not getting the response of: two ?
As per the Official Groovy doc it should be
def rows = new XmlSlurper().parseText(dbresponse)
println(rows.row[1].file_data)
First line is "parsing the XML an returning the root node as a GPathResult". In your case, the root node is rows
2] How do I iterate through the entire xml doc to get this response: one two three four five
println("Iterating using each() method")
rows.row.file_data.each { row ->
println(row)
}
println("Iterating using Groovy for loop")
for (fileData in rows.row.file_data) {
println(fileData)
}
println("Getting a list of necessary elements using Groovy Spread operator")
def fileDataList = rows.row*.file_data
println(fileDataList)
Output:
Iterating using each() method
One
two
three
four
five
Iterating using Groovy for loop
One
two
three
four
five
Getting a list of necessary elements using Groovy Spread operator
[One, two, three, four, five]

Here is how it works:
def rows = new XmlSlurper().parseText(dbresponse)
print (rows.row[1])
print (rows.row[1].file_data)
The identifier, rows, gives a handle on the object returned when parsing dbresponse (<rows> in this case). I named it rows since this is the convention slurp'ers use; it doesn't have to be.

You are almost there, just trivial.
In the script you posted, it was trying to extract only first row data. That's why remaining data not shown.
Here is the script you can get all the data
def response = new XmlSlurper().parseText(dbresponse)
def data = response.'**'.findAll{it.name() =='row'}*.file_data*.text()
println data
You can quickly try it online Demo

Related

Concatenate long strings from multiple records into one string

I have a situation where I need to concatenate long strings from multiple records in an Oracle database into a single string. These long strings are portions of a larger XML string, and my ultimate goal is to be able to convert this XML into something resembling query results and pull out specific values.
The data would look something like this, with the MSG_LINE_TEXT field being VARCHAR2(4000). So if the total message is less than 4000 characters, then there'd only be one record. In theory, there could be an infinite number of records for each message, although the highest I've seen so far is 14 records, which means I need to be able to handle strings that are at least 56000 characters long.
MESSAGE_ID MSG_LINE_NUMBER MSG_LINE_TEXT
---------- --------------- --------------------------------
17415414 1 Some XML snippet here
17415414 2 Some XML snippet here
17415414 3 Some XML snippet here
17415414 4 Some XML snippet here
The total XML for one MESSAGE_ID might look something like this. There could be many App_Advice_Error tags, although this specific example only contains one.
<tXML>
<Header>
<Source>MANH_prod_wmsweb</Source>
<Action_Type />
<Sequence_Number />
<Company_ID>1</Company_ID>
<Msg_Locale />
<Version />
<Internal_Reference_ID>17415414</Internal_Reference_ID>
<Internal_Date_Time_Stamp>2021-02-09 13:45:22</Internal_Date_Time_Stamp>
<External_Reference_ID />
<External_Date_Time_Stamp />
<User_ID>ESBUSER</User_ID>
<Message_Type>RESPONSE</Message_Type>
</Header>
<Response>
<Persistent_State>0</Persistent_State>
<Error_Type>2</Error_Type>
<Resp_Code>501</Resp_Code>
<Response_Details>
<Application_Advice>
<Shipper_ID />
<Imported_Object_Type>ASN</Imported_Object_Type>
<Response_Type>Error</Response_Type>
<Transaction_Date>2/9/21 13:45</Transaction_Date>
<Application_Ackg_Code>TE</Application_Ackg_Code>
<Business_Unit></Business_Unit>
<Tran_Set_Identifier_Code></Tran_Set_Identifier_Code>
<Transaction_Purpose_Code>11</Transaction_Purpose_Code>
<Imported_Message_Id></Imported_Message_Id>
<Imported_Object_Id>Reference Number Here</Imported_Object_Id>
<Additional_References>
<Additional_Reference_Info>
<Reference_Type>BusinessPartner</Reference_Type>
<Reference_ID></Reference_ID>
</Additional_Reference_Info>
</Additional_References>
<App_Advice_Errors>
<App_Advice_Error>
<App_Error_Text>Some error text here</App_Error_Text>
<Error_Message_Tokens>
<Error_Message_Token>Object that errored out</Error_Message_Token>
</Error_Message_Tokens>
<App_Err_Cond_Code>6100234</App_Err_Cond_Code>
</App_Advice_Error>
</App_Advice_Errors>
<Imported_Data></Imported_Data>
</Application_Advice>
</Response_Details>
</Response>
</tXML>
The values that I'm most interested in pulling out are the App_Err_Cond_Code, Error_Message_Token, and App_Error_Text tags. I had tried using something like this:
extractvalue(xmltype(msg_line_text), '//XPath of Tag')
This works beautifully for stuff where the entire XML is less than 4000 characters, i.e. the entire XML is stored in a single record. The problem comes when there are multiple records, because each individual snippet of XML isn't a valid XML string on its own, and so XMLTYPE throws an error, hence the reason I'm trying to concatenate them all into a single string, which I can then use with the above method.
I've tried a variety of ways to do this - LISTAGG, XMLAGG, SYS_CONNECT_BY_PATH, as well as writing a custom function something like this:
with
function get_messages(pTranLogID number) return string
is
xml varchar2;
begin
xml := '';
for msg in (
select r.msg_line_text
from tran_log_response_message r, tran_log t
where
t.message_id = r.message_id
and t.tran_log_id = pTranLogID
order by r.msg_line_number
)
loop
xml := xml || msg.msg_line_text;
end loop;
return 'test';
end;
select
tran_log_id, get_messages(tran_log_id)
from
tran_log
where
tran_log_id = '20633610';
/
The problem is that every one of these methods complained that the string was too long. Does anyone have any other ideas? Or maybe a better approach to this problem?
Thanks.

How to parse the only the second span tag in an HTML document using python bs4

I want to parse only one span tag in my html document. There are three sibling span tags without any class or I'd. I am targeting the second one only using BeautifulSoup 4.
Given the following html document:
<div class="adress">
<span>35456 street</span>
<span>city, state</span>
<span>zipcode</span>
</div>
I tried:
for spn in soup.findAll('span'):
data = spn[1].text
but it didn't work. The expected result is the text in the second span stored in a a variable:
data = "city, state"
and how to to get both the first and second span concatenated in one variable.

You are trying to slice an individual span (a Tag instance). Get rid of the for loop and slice the findAll response instead, i.e.
>>> soup.findAll('span')[1]
<span>city, state</span>
You can get the first and second tags together using:
>>> soup.findAll('span')[:2]
[<span>35456 street</span>, <span>city, state</span>]
or, as a string:
>>> "".join([str(tag) for tag in soup.findAll('span')[:2]])
'<span>35456 street</span><span>city, state</span>'

Another option:
data = soup.select_one('div > span:nth-of-type(2)').get_text(strip=True)
print(data)
Output:
city, state

Why does d3.select() return array of array?

I recently started using d3.js to write some scripts to manipulate SVGs. So most of the time I refer d3 documentation and find the solution. However I cannot understand why d3.select function return array of arrays. For example let's say i have an SVG element and if I do d3.select("svg"), it returns [[svg]] so I have to do d3.select("svg")[0]. The documentation says
One nuance is that selections are grouped: rather than a one-dimensional array, each
selection is an array of arrays of elements. This preserves the
hierarchical structure of subselections
Then says we can ignore it most of the time.
Why does it return array of array ?
What does
This preserves the hierarchical structure of subselections
mean?
Thanks in advance.

You shouldn't need to know or care how the object d3.select returns is structured internally. All you need to know is which methods are accessible in that object, which is what the documentation describes.
Say you have this document:
<div>
<span>1</span>
<span>2</span>
</div>
<div>
<span>3</span>
<span>4</span>
</div>
If you select all <div> elements with d3.selectAll
var div = d3.selectAll("div");
the div is a d3 selection object of size 2, one for each <div> element in the document.
But if you now generate a subselection from this selection object
var span = div.selectAll("span");
a search is made for matching elements within each element in the div selection, and the structure is preserved -- i.e., the span selection will contain the same number of elements as the div selection it was based on, and each of these will consist of a selection of elements found in that element.
So in this case, span will contain two selections (first <div> and second <div>), each of which will contain two elements(1 and 2 in the first, 3 and 4 in the second).
As for select, it is the same as selectAll except it stops after finding one match; its return is structured exactly the same way, however.
Demo

Access list element using get()

I'm trying to use get() to access a list element in R, but am getting an error.
example.list <- list()
example.list$attribute <- c("test")
get("example.list") # Works just fine
get("example.list$attribute") # breaks
## Error in get("example.list$attribute") :
## object 'example.list$attribute' not found
Any tips? I am looping over a vector of strings which identify the list names, and this would be really useful.

Here's the incantation that you are probably looking for:
get("attribute", example.list)
# [1] "test"
Or perhaps, for your situation, this:
get("attribute", eval(as.symbol("example.list")))
# [1] "test"
# Applied to your situation, as I understand it...
example.list2 <- example.list
listNames <- c("example.list", "example.list2")
sapply(listNames, function(X) get("attribute", eval(as.symbol(X))))
# example.list example.list2
# "test" "test"

Why not simply:
example.list <- list(attribute="test")
listName <- "example.list"
get(listName)$attribute
# or, if both the list name and the element name are given as arguments:
elementName <- "attribute"
get(listName)[[elementName]]

If your strings contain more than just object names, e.g. operators like here, you can evaluate them as expressions as follows:
> string <- "example.list$attribute"
> eval(parse(text = string))
[1] "test"
If your strings are all of the type "object$attribute", you could also parse them into object/attribute, so you can still get the object, then extract the attribute with [[:
> parsed <- unlist(strsplit(string, "\\$"))
> get(parsed[1])[[parsed[2]]]
[1] "test"

flodel's answer worked for my application, so I'm gonna post what I built on it, even though this is pretty uninspired. You can access each list element with a for loop, like so:
#============== List with five elements of non-uniform length ================#
example.list=
list(letters[1:5], letters[6:10], letters[11:15], letters[16:20], letters[21:26])
#===============================================================================#
#====== for loop that names and concatenates each consecutive element ========#
derp=c(); for(i in 1:length(example.list))
{derp=append(derp,eval(parse(text=example.list[i])))}
derp #Not a particularly useful application here, but it proves the point.
I'm using code like this for a function that calls certain sets of columns from a data frame by the column names. The user enters a list with elements that each represent different sets of column names (each set is a group of items belonging to one measure), and the big data frame containing all those columns. The for loop applies each consecutive list element as the set of column names for an internal function* applied only to the currently named set of columns of the big data frame. It then populates one column per loop of a matrix with the output for the subset of the big data frame that corresponds to the names in the element of the list corresponding to that loop's number. After the for loop, the function ends by outputting that matrix it produced.
Not sure if you're looking to do something similar with your list elements, but I'm happy I picked up this trick. Thanks to everyone for the ideas!
"Second example" / tangential info regarding application in graded response model factor scoring:
Here's the function I described above, just in case anyone wants to calculate graded response model factor scores* in large batches...Each column of the output matrix corresponds to an element of the list (i.e., a latent trait with ordinal indicator items specified by column name in the list element), and the rows correspond to the rows of the data frame used as input. Each row should presumably contain mutually dependent observations, as from a given individual, to whom the factor scores in the same row of the ouput matrix belong. Also, I feel I should add that if all the items in a given list element use the exact same Likert scale rating options, the graded response model may be less appropriate for factor scoring than a rating scale model (cf. http://www.rasch.org/rmt/rmt143k.htm).
'grmscores'=function(ColumnNameList,DataFrame) {require(ltm) #(Rizopoulos,2006)
x = matrix ( NA , nrow = nrow ( DataFrame ), ncol = length ( ColumnNameList ))
for(i in 1:length(ColumnNameList)) #flodel's magic featured below!#
{x[,i]=factor.scores(grm(DataFrame[, eval(parse(text= ColumnNameList[i]))]),
resp.patterns=DataFrame[,eval(parse(text= ColumnNameList[i]))])$score.dat$z1}; x}
Reference
*Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses, Journal of Statistical Software, 17(5), 1-25. URL: http://www.jstatsoft.org/v17/i05/

XSLT equation using SP list fields

I have a SP Dataview that I have converted to XSLT, so that I could add a header displaying a percentage (Complete). Before I converted the dvwp to xslt, I added two count headers- one on Complete, and another on LastName. They worked wonderfully- showing me the # of records and the # of records with a value in the complete field. However, when I converted the dv to xslt I realized that I lost my headers :(
So, I am adding them back in using xslt. Currently the XPath code for the equation that I have is <xsl:value-of select="count($Rows) div count($Rows)" />.
How do I get the total # of Yes values that are in my Complete field?
UPDATE1:
Found this http://www.endusersharepoint.com/STP/viewtopic.php?f=14&t=534 and tried it, however causes the following error- Failed setting processor stylesheet: 0x80004005: Argument 1 must return a node-set. -->count(/dsQueryResponse/Rows/Row='Y')<--
UPDATE2:
Complete is the name of a field w/i my XSLT dataset. The return type is either Y or blank. For grins I tried <xsl:value-of select="count(/xpath/to/parent/element[#Complete eq 'Y']) div count($Rows)" /> however I recieved the following error- Failed setting processor stylesheet: 0x80004005: Expected token ']' found 'NAME'.count((/xpath/to/parent/element[#Complete -->eq <--'Y']) div count($Rows) Am starting to think that there may be a problem w/ 'eq'.... Referencing my XML operators...
UPDATE3:
<xsl:value-of select="count(/xpath/to/parent/element[#Complete = 'Y']) div count($Rows)" />
Okay so it still says 0, but I think the reason why it's not showing the correct answer is b/c it is expecting to show an integer, and obviously the value being returned from the equation is going to be a decimal... Have been fiddling with the equation in XPath... here's what I've tried-
count(/xpath/to/parent/element[#Complete = 'Y']) div count($Rows)*100
(count(/xpath/to/parent/element[#Complete = 'Y']) div count($Rows))*100
100(count(/xpath/to/parent/element[#Complete = 'Y']) div count($Rows))
UPDATE4:
So I know my previous thought that the correct number not showing b/c it was a float is not correct, as all numbers in XPath and XSLT 1.0 are floats. Reference
UPDATE5:
Upon further investigation, I have found that the problem lies with the count(/xpath/to/parent/element[#Complete = 'Y']) part of my equation, as this is returning 0 instead of a value. [i know i have at least 3 'Y' vals in my Complete col]
UPDATE6:
<records*>
<record*>
<last_name></last_name>
<first_name></first_name>
<mi></mi>
<office_symbol></office_symbol>
<geo_location></geo_location>
<complete></complete>
<date_complete></date_complete>
<date_expires></date_expires>
<email></email>
<supervisor></supervior>
</record*>
</records*>
*i don't know what these nodes are called as my data is coming from a database and not an xml file, i just made up record/records
UPDATE7
Going back to my original question. I am still trying to find out the XPath equation to display the number of parents (record in the XML i posted above) where the complete node = Y.
UPDATE8
Ok. So I have edited and tested using http://www.w3schools.com/xsl/tryxslt.asp?xmlfile=cdcatalog&xsltfile=tryxsl_value-of. Working XSLT to count the # of Complete = Y is <xsl:value-of select="count(catalog/cd [complete = 'Y'])" /> so theen I put EXACTLY what works on W3schools into my SP Dataview and I get nothing... just an empty space. Why doesn't the code work in my SPDV?

If your "Complete" field is an element:
<xsl:value-of select="count(/xpath/to/complete/field/element[string(.) eq 'Yes])"/>
If your complete field is an attribute of an element:
<xsl:value-of select="count(/xpath/to/parent/element[#complete eq 'Yes'])"/>
Without knowing the structure of your XML I can't provide the specific XPATH required -- the predicate "[]" is what selects only the "Yes" values

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extracting an xml value in Groovy - groovy

Related

Concatenate long strings from multiple records into one string

How to parse the only the second span tag in an HTML document using python bs4

Why does d3.select() return array of array?

Access list element using get()

XSLT equation using SP list fields

Categories

Resources