ArangoDB AQL: Find Gaps In Sequential Data - arangodb

I've been given data to build an application that has sequential data in the form of part numbers of products: "000000", "000001", "000002", "000010", "000011" .... The previous application was an old MS Access database that didn't have any gap filling features in the part number generator, hence the gap between "000002" and "000010" (Yes, they are also strings, but I can work with that...).
We could continue to increment based on the last value and ignore the gaps, however, in an attempt to use all numbers available to us with our naming scheme, we'd like to be able to fill the gaps. Our naming scheme describes the "product family" with the first two digits such that: [00]0000 would be a different family from [02]0000.
I can find the starting and ending values using something like:
let query = `
LET first = (
MIN(
FOR part in part_search
SEARCH STARTS_WITH(part.PartNumber, #family)
RETURN part.PartNumber
)
)
LET last = (
MAX(
FOR part in part_search
SEARCH STARTS_WITH(part.PartNumber, #family)
RETURN part.PartNumber
)
)
RETURN { first, last }
`
The above example returns: {first: "000000", last: "000915"}
Using ArangoDB and AQL, how could I go about finding these gaps? I've found some SQL examples but I feel the features of AQL are a bit more limiting.
Thanks in advance!

To start with, I think your best bet for getting min/max values is using aggregates:
FOR part in part_search
SEARCH STARTS_WITH(part.PartNumber, #family)
COLLECT x = 1
AGGREGATE first = MIN(part.PartNumber), last = MAX(part.PartNumber)
RETURN {
first: first,
last: last
}
But that won't really help when trying to find gaps. And you're right - SQL has several logical constructs that could help (like using variables and cursor iteration), but even that would be a pattern I would discourage.
The better path might be to do a "brute force" approach - compare a table containing your existing numbers with a table of all numbers, using a native method like JOIN to find the difference. Here's how you might do that in AQL:
LET allNumbers = 0..9999
LET existingParts = (
FOR part in part_search
SEARCH STARTS_WITH(part.PartNumber, #family)
LET childId = RIGHT(part.PartNumber, 4)
RETURN TO_NUMBER(childId)
)
RETURN MINUS(allNumbers, existingParts)
The x..y construct creates a sequence (an array of numbers), which we use as the full set of possible numbers. Then, we want to return only the "non-family" part of the ID (I'm calling it "child"), which needs to be numeric to compare with the previous set. Then, we use MINUS to remove elements of existingParts from the allNumbers list.
One thing to note, that query would return only the "child" portion of the part number, so you would have to join it back to the family number later. Alternatively, you could also skip string-splitting, and get "fancy" with your list creation:
LET allNumbers = TO_NUMBER(CONCAT(#family, '0000'))..TO_NUMBER(CONCAT(#family, '9999'))
LET existingParts = (
FOR part in part_search
SEARCH STARTS_WITH(part.PartNumber, #family)
RETURN TO_NUMBER(part.PartNumber)
)
RETURN MINUS(allNumbers, existingParts)

Related

Python: (partial) matching elements of a list to DataFrame columns, returning entry of a different column

I am a beginner in python and have encountered the following problem: I have a long list of strings (I took 3 now for the example):
ENSEMBL_IDs = ['ENSG00000040608',
'ENSG00000070371',
'ENSG00000070413']
which are partial matches of the data in column 0 of my DataFrame genes_df (first 3 entries shown):
genes_list = (['ENSG00000040608.28', 'RTN4R'],
['ENSG00000070371.91', 'CLTCL1'],
['ENSG00000070413.17', 'DGCR2'])
genes_df = pd.DataFrame(genes_list)
The task I want to perform is conceptually not that difficult: I want to compare each element of ENSEMBL_IDs to genes_df.iloc[:,0] (which are partial matches: each element of ENSEMBL_IDs is contained within column 0 of genes_df, as outlined above). If the element of EMSEMBL_IDs matches the element in genes_df.iloc[:,0] (which it does, apart from the extra numbers after the period ".XX" ), I want to return the "corresponding" value that is stored in the first column of the genes_df Dataframe: the actual gene name, 'RTN4R' as an example.
I want to store these in a list. So, in the end, I would be left with a list like follows:
`genenames = ['RTN4R', 'CLTCL1', 'DGCR2']`
Some info that might be helpful: all of the entries in ENSEMBL_IDs are unique, and all of them are for sure contained in column 0 of genes_df.
I think I am looking for something along the lines of:
`genenames = []
for i in ENSEMBL_IDs:
if i in genes_df.iloc[:,0]:
genenames.append(# corresponding value in genes_df.iloc[:,1])`
I am sorry if the question has been asked before; I kept looking and was not able to find a solution that was applicable to my problem.
Thank you for your help!
Thanks also for the edit, English is not my first language, so the improvements were insightful.
You can get rid of the part after the dot (with str.extract or str.replace) before matching the values with isin:
m = genes_df[0].str.extract('([^.]+)', expand=False).isin(ENSEMBL_IDs)
# or
m = genes_df[0].str.replace('\..*$', '', regex=True).isin(ENSEMBL_IDs)
out = genes_df.loc[m, 1].tolist()
Or use a regex with str.match:
pattern = '|'.join(ENSEMBL_IDs)
m = genes_df[0].str.match(pattern)
out = genes_df.loc[m, 1].tolist()
Output: ['RTN4R', 'CLTCL1', 'DGCR2']

Remove a map item in terraform

If I use the following yamldecode, I get the following output:
output "product" {
value = distinct(yamldecode(file("resource/lab/abc.yaml"))["account_list"]["resource_tags"][*]["TAG:product"])
}
Output:
+ product = [
+ "fargate",
+ "CRM",
]
I want fargate to be removed from my output and the expected output is this:
+ product = [
+ "CRM"
]
Please let me know how I can do this.
output "product" {
value = compact([for x in distinct(yamldecode(file("resource/lab/abc.yaml"))["account_list"]["resource_tags"][*]["TAG:product"]) : x == "fargate" ? "" : x])
}
I get this output:
test = [
"enter",
]
compact function solved the problem.
The Terraform language is based on functional principles rather than imperative principles and so it does not support directly modifying an existing data structure. Instead, the best way to think about goals like this is how to define a new value which differs from your existing value in some particular way.
For collection and structural types, the most common way to derive a new value with different elements is to use a for expression, which describes a rule for creating a new collection based on the elements of an existing collection. For each element in the source collection we can decide whether to use it in the result at all, and then if we do decide to use it we can also describe how to calculate a new element value based on the input element value.
In your case you seem to want to create a new list with fewer elements than an existing list. (Your question title mentions maps, but the question text shows a tuple which is behaving as a list.)
To produce a new collection with fewer elements we use the optional if clause of a for expression. For example:
[for x in var.example : x if x != "fargate"]
In the above expression, x refers to each element of var.example in turn. Terraform first evaluates the if clause, and expects it to return true if the element should be used. If so, it will then evaluate the expression immediately after the colon, which in this case is just x and so the result element is identical to the input element.
Your example expression also includes some unrelated work to decode a YAML string and then traverse to the list inside it. That expression replaces var.example in the above expression, as follows:
output "products" {
value = toset([
for x in yamldecode(file("${path.module}/resource/lab/abc.yaml"))["account_list"]["resource_tags"][*]["TAG:product"]: x
if x != "fargate"
])
}
I made some other small changes to the above compared to your example:
I named the output value "products" instead of "product", because it's returning a collection and it's Terraform language idiom to use plural names for collection values.
I wrapped the expression in toset instead of distinct, because from your description it seems like this is an unordered collection of unique strings rather than an ordered sequence of strings.
I added path.module to the front of the file path so that this will look for the file in the current module directory. If your output value is currently in a root module then this doesn't really matter because the module directory will always be the current working directory, but it's good practice to include this so that it's clear that the file belongs to the module.
Therefore returning a set is more appropriate than returning a list because it communicates to the caller of the module that they may not rely on the order of these items, and therefore allows the order to potentially change in future without it being a breaking change to your module.

Filter the list generated by a Gremlin traversal and Groovy

I'm doing the following traversal:
g.V().has('Transfer','eventName','Airdrop').as('t1').
outE('sent_to').
inV().dedup().as('a2').
inE('sent_from').
outV().as('t2').
where('t1',eq('t2')).by('address').
outE('sent_to').
inV().as('a3').
select('a3','a2').
by('accountId').toList().groupBy { it.a3 }.collectEntries { [(it.key): [a2 : it.value.a2]]};
So as you can see I'm basically doing a traversal and at the end I'm using groovy with collectEntries to aggregate the results like I need them, which is aggregated by a3 in this case. The results look like this:
==>0xfe43502662ce2adf86d9d49f25a27d65c70a709d={a2=[0x99feb505a8ed9976cf19e757a9536117e6cdc5ba, 0x22019ad32ea3adabae68003bdefd099d7e5e3886]}
(This is GOOD, because the number of values in a2 is at least 2)
==>0x129e0131ea3cc16fe5252d7280bd1258f629f20f={a2=[0xf7958fad496d15cf9fd9e54c0012504f4fdb96ff]}
(This is NOT GOOD, I want to return in my list only those combinations where there are at lest 2 values for a2)
I have tried using filters and an additional where step in the traversal itself but I haven't been able to do it. I'm not sure if this is something I should skip using Groovy in my last line. Any help or orientation would be very much appreciated
I don't think you need to drop into Groovy to get the answer you want. It would be preferable to do this all in Gremlin especially since you intend to filter results which could yield some performance benefit. Gremlin has it's own group() step as well as methods for filtering the resulting Map:
g.V().has('Transfer','eventName','Airdrop').as('t1').
out('sent_to').
dedup().as('a2').
in('sent_from').as('t2').
where('t1',eq('t2')).by('address').
out('sent_to').inV().as('a3').
select('a3','a2').
by('accountId').
group().
by('a3').
by('a2').
unfold().
where(select(values).limit(local,2).count(local).is(gte(2)))
The idea is to build your Map with group() then deconstruct it to entries with unfold(). You the filter each entry with where() by selecting the values of the entry, which is a List of "a2" then counting the items locally in that List. I use limit(local,2) to avoid unnecessary iteration beyond 2 since the filter is gte(2).
The easiest way to do this is with findAll { }.
.groupBy { it.a3 }
.findAll { it.value.a2.size() > 1 }
.collectEntries { [(it.key): [a2: it.value.a2]] }
if some a2 are null, then value.a2 also evaluates to null and filters the results without the need for explicit nullchecks

Calling a vector from a string

I am attempting to write an algorithm that selects a specific reference standard (vector) as a function of temperature. The temperature values are stored in a structure ( procspectra(i).temperature ). My reference standards are stored in another structure ( standards.interp.zeroed.ClOxxx ) where xxx are numbers such as 200, 210, 220, etc. I have built the rounding construct and paste it below.
for i = 1:length(procspectra);
if mod(-procspectra(i).temperature,10) > mod(procspectra(i).temperature,10);
%if mod(-) > mod(+) round down, else round up
tempvector(i) = procspectra(i).temperature - mod(procspectra(i).temperature,10);
else
tempvector(i) = procspectra(i).temperature + mod(-procspectra(i).temperature,10);
end
clostd = strcat('standards.interp.zeroed.ClO',num2str(tempvector(i)));
end
This construct works well. Now, I have built a string which is identical to the name of the vector I want to invoke, but I'm uncertain how to actually call the vector given that this is encoded as a string. Ideally I want to do something within the for-loop like:
parameters(i).standards.ClOstandard = clostd
where I actually am assigning that parameter structure to be the same as the vector I have saved in the standards structure I have previously generated (and not just a string)
Could anyone help out?
Don't construct clostd like that (containing the full variable name), make it contain only the last field name instead:
clostd = ['ClO' num2str(tempvector(i))];
parameters(i).standards.ClOstandard = standards.interp.zeroed.(clostd);
This is the syntax of accessing a structure's field dynamically, using a string. So the following three are equivalent:
struc.Cl0123
struc.('Cl0123')
fieldn='Cl0123'; struc.(fieldn)

How to remove percent character from a string in Cognos?

I have a string field with mostly numeric values like 13.4, but some have 13.4%. I am trying to use the following expression to remove the % symbols and retain just the numeric values to convert the field to integer.
Here is what I have so far in the expression definition of Cognos 8 Report Studio:
IF(POSITION('%' IN [FIELD1]) = NULL) THEN
/*** this captures rows with valid data **/
([FIELD1])
ELSE
/** trying to remove the % sign from rows with data like this 13.4% **/
(SUBSTRING([FIELD1]), 1, POSITION('%' IN [FIELD1])))
Any hints/help is much appreciated.
An easy way to do this is to use the trim() function. The following will remove any trailing % characters:
TRIM(trailing '%',[FIELD1])
The approach you are using is feasable. However, the syntax you are using is not compatible with the version of the ReportStudio that I'm familiar with. Below you will find an updated expression which works for me.
IF ( POSITION( '%'; [FIELD1]) = 0) THEN
( [FIELD1] )
ELSE
( SUBSTRING( [FIELD1]; 1; POSITION( '%'; [FIELD1]) - 1 ) )
Since character positions in strings are 1-based in Cognos it's important to substract 1 from the position returned by POSITION(). Otherwise you would only cut off characters after the percent sign.
Another note: what you are doing here is data cleansing. It's usually more advantageous to push these chores down to a lower level of the data retrieval chain, e.g. the Data Warehouse or at least the Framework Manager model, so that at the reporting level you can use this field as numeric field directly.

Resources