Related
I've got the host which executes commands via 'subprocess' and gets output list of several parameters. The problem is that output can not be correctly modified to be translated to the dictionary, whether it's yaml or json. After list is received Regexp takes part to match valuable information and to perform grouping. I am interested in getting a unique dictionary, where crossing keys are put into nested dictionary.
Here is the code and the example of output list:
from re import compile,match
# Output can differ from request to request, the "keys" from the #
# list_of_values can dublicate or appear more than two times. The values #
# mapped to the keys can differ too. #
list_of_values = [
"paramId: '11'", "valueId*: '11'",
"elementId: '010_541'", 'mappingType: Both',
"startRng: ''", "finishRng: ''",
'DbType: sql', "activeSt: 'false'",
'profile: TestPr1', "specificHost: ''",
'hostGroup: tstGroup10', 'balance: all',
"paramId: '194'", "valueId*: '194'",
"elementId: '010_541'", 'mappingType: Both',
"startRng: '1020304050'", "finishRng: '1020304050'",
'DbType: sql', "activeSt: 'true'",
'profile: TestPr1', "specificHost: ''",
'hostGroup: tstGroup10', 'balance: all']
re_compile_valueId = compile(
"valueId\*:\s.(?P<valueId>\d{1,5})"
"|elementId:\s.(?P<elementId>\d{3}_\d{2,3})"
"|startRng:\s.(?P<startRng>\d{1,10})"
"|finishRng:\s.(?P<finishRng>\d{1,10})"
"|DbType:\s(?P<DbType>nosql|sql)"
"|activeSt:\s.(?P<activeSt>true|false)"
"|profile:\s(?P<profile>[A-z0-9]+)"
"|hostGroup:\s(?P<hostGroup>[A-z0-9]+)"
"|balance:\s(?P<balance>none|all|priority group)"
)
iterator_loop = 0
uniq_dict = dict()
next_dict = dict()
for element in list_of_values:
match_result = match(re_compile_valueId,element)
if match_result:
temp_dict = match_result.groupdict()
for key, value in temp_dict.items():
if value:
if key == 'valueId':
uniq_dict['valueId'+str(iterator_loop)] = ''
iterator_loop +=1
next_dict.update({key: value})
else:
next_dict.update({key: value})
uniq_dict['valueId'+str(iterator_loop-1)] = next_dict
print(uniq_dict)
This code right here responses with:
{
'valueId0':
{
'valueId': '194',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'true',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '1020304050',
'finishRng': '1020304050'
},
'valueId1':
{
'valueId': '194',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'true',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '1020304050',
'finishRng': '1020304050'
}
}
And I was waiting for something like:
{
'valueId0':
{
'valueId': '11',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'false',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '',
'finishRng': ''
},
'valueId1':
{
'valueId': '194',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'true',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '1020304050',
'finishRng': '1020304050'
}
}
I've also got another code below, which runs and puts values as expected. But the structure breaks the idea of having this all looped around, because each dictionary result key has its own order number mapped. The example below. The list_of_values and re_compile_valueId can be used from previous example.
for element in list_of_values:
match_result = match(re_compile_valueId,element)
if match_result:
temp_dict = match_result.groupdict()
for key, value in temp_dict.items():
if value:
if key == 'balance':
key = key+str(iterator_loop)
uniq_dict.update({key: value})
iterator_loop +=1
else:
key = key+str(iterator_loop)
uniq_dict.update({key: value})
print(uniq_dict)
The output will look like:
{
'valueId1': '11', 'elementId1': '010_541',
'DbType1': 'sql', 'activeSt1': 'false',
'profile1': 'TestPr1', 'hostGroup1': 'tstGroup10',
'balance1': 'all', 'valueId2': '194',
'elementId2': '010_541', 'startRng2': '1020304050',
'finishRng2': '1020304050', 'DbType2': 'sql',
'activeSt2': 'true', 'profile2': 'TestPr1',
'hostGroup2': 'tstGroup10', 'balance2': 'all'
}
Would appreciate any help! Thanks!
It appeared that some documentation reading needs to be performed :D
The copy() of the next_dict under else statement needs to be applied. Thanks to thread:
Why does updating one dictionary object affect other?
Many thanks to answer's author #thefourtheye (https://stackoverflow.com/users/1903116/thefourtheye)
The final code:
for element in list_of_values:
match_result = match(re_compile_valueId,element)
if match_result:
temp_dict = match_result.groupdict()
for key, value in temp_dict.items():
if value:
if key == 'valueId':
uniq_dict['valueId'+str(iterator_loop)] = ''
iterator_loop +=1
next_dict.update({key: value})
else:
next_dict.update({key: value})
uniq_dict['valueId'+str(iterator_loop-1)] = next_dict.copy()
Thanks for the involvement to everyone.
I have been successfully retrieving data from the following Open-Link Dataset: http://linkedpolitics.ops.few.vu.nl/web/html/home.html
for the 5th, 6th and 7th parliamentary term of the EP which I am then cleaning in STATA.
However, the coding seems to differ for the 8th term because I get a lot less speeches when I use the lpv:translatedText function that I have used before. I can't help but think that a LOT more should come up in the timeframe I am specifying than what the SPARQL endpoint returns. Can anyone help me figure out what I am doing wrong?
Here is the code I used for National parties (here with the dates for anything after the 7th term):
SELECT DISTINCT ?name ?countryname ?birth ?gender ?partyname ?start ?end ?date ?speechnr ?parlterm ?dictionary
WHERE {
?speech lpv:translatedText ?text.
?speech dcterms:date ?date.
?speech lpv:docno ?speechnr.
?speech lpv:speaker ?speaker.
?speaker lpv:name ?name.
?speaker lpv:dateOfBirth ?birth.
?speaker lpv:gender ?gender.
?speaker lpv:politicalFunction ?function.
?function lpv:institution ?party.
?party rdf:type lpv:NationalParty.
?party rdfs:label ?partyname.
?function lpv:beginning ?start.
?function lpv:end ?end.
?speaker lpv:countryOfRepresentation ?country.
?country rdfs:label ?countryname.
BIND("8" as ?parlterm)
BIND("representation" as ?dictionary)
FILTER ( ?date > "2014-07-01"^^xsd:date )
FILTER(langMatches(lang(?text), "en"))
FILTER(CONTAINS(?text, 'female representation') || CONTAINS(?text, 'women’s representation') || CONTAINS(?text, 'equal representation') || CONTAINS(?text, 'gender representation') || CONTAINS(?text, 'women in science') || CONTAINS(?text, 'women in business') || CONTAINS(?text, 'women’s leadership'))
} ORDER BY ?date ?speechnr
and here is the code I used for the FEMM committee (again anything after 7th parliamentary term):
SELECT DISTINCT ?name ?countryname ?birth ?gender ?start_com ?end_com ?date ?speechnr ?parlterm ?dictionary ?FEMM
WHERE {
?speech lpv:translatedText ?text.
?speech dcterms:date ?date.
?speech lpv:docno ?speechnr.
?speech lpv:speaker ?speaker.
?speaker lpv:name ?name.
?speaker lpv:dateOfBirth ?birth.
?speaker lpv:gender ?gender.
?speaker lpv:politicalFunction ?function.
?function lpv:institution ?institution.
?institution rdfs:label ?committee.
FILTER CONTAINS (?committee, "Committee on Women's Rights and Gender Equality")
BIND("Yes" as ?FEMM).
?function lpv:beginning ?start_com.
?function lpv:end ?end_com.
?speaker lpv:countryOfRepresentation ?country.
?country rdfs:label ?countryname.
BIND("8" as ?parlterm)
BIND("representation" as ?dictionary)
FILTER ( ?date > "2014-07-01"^^xsd:date )
FILTER(langMatches(lang(?text), "en"))
FILTER(CONTAINS(?text, 'female representation') || CONTAINS(?text, 'women’s representation') || CONTAINS(?text, 'equal representation') || CONTAINS(?text, 'gender representation') || CONTAINS(?text, 'women in science') || CONTAINS(?text, 'women in business') || CONTAINS(?text, 'women’s leadership'))
} ORDER BY ?date ?speechnr
Thank you.
I'm trying to convert HQL to Spark.
I have the following query (Works in Hue with Hive editor):
select reflect('java.util.UUID', 'randomUUID') as id,
tt.employee,
cast( from_unixtime(unix_timestamp (date_format(current_date(),'dd/MM/yyyy HH:mm:ss'), 'dd/MM/yyyy HH:mm:ss')) as timestamp) as insert_date,
collect_set(tt.employee_detail) as employee_details,
collect_set( tt.emp_indication ) as employees_indications,
named_struct ('employee_info', collect_set(tt.emp_info),
'employee_mod_info', collect_set(tt.emp_mod_info),
'employee_comments', collect_set(tt.emp_comment) )
as emp_mod_details,
from (
select views_ctr.employee,
if ( views_ctr.employee_details.so is not null, views_ctr.employee_details, null ) employee_detail,
if ( views_ctr.employee_info.so is not null, views_ctr.employee_info, null ) emp_info,
if ( views_ctr.employee_comments.so is not null, views_ctr.employee_comments, null ) emp_comment,
if ( views_ctr.employee_mod_info.so is not null, views_ctr.employee_mod_info, null ) emp_mod_info,
if ( views_ctr.emp_indications.so is not null, views_ctr.emp_indications, null ) employees_indication,
from
( select * from views_sta where emp_partition=0 and employee is not null ) views_ctr
) tt
group by employee
distribute by employee
First, What I'm trying is to write it in spark.sql as follow:
sparkSession.sql("select reflect('java.util.UUID', 'randomUUID') as id, tt.employee, cast( from_unixtime(unix_timestamp (date_format(current_date(),'dd/MM/yyyy HH:mm:ss'), 'dd/MM/yyyy HH:mm:ss')) as timestamp) as insert_date, collect_set(tt.employee_detail) as employee_details, collect_set( tt.emp_indication ) as employees_indications, named_struct ('employee_info', collect_set(tt.emp_info), 'employee_mod_info', collect_set(tt.emp_mod_info), 'employee_comments', collect_set(tt.emp_comment) ) as emp_mod_details, from ( select views_ctr.employee, if ( views_ctr.employee_details.so is not null, views_ctr.employee_details, null ) employee_detail, if ( views_ctr.employee_info.so is not null, views_ctr.employee_info, null ) emp_info, if ( views_ctr.employee_comments.so is not null, views_ctr.employee_comments, null ) emp_comment, if ( views_ctr.employee_mod_info.so is not null, views_ctr.employee_mod_info, null ) emp_mod_info, if ( views_ctr.emp_indications.so is not null, views_ctr.emp_indications, null ) employees_indication, from ( select * from views_sta where emp_partition=0 and employee is not null ) views_ctr ) tt group by employee distribute by employee")
But I got the following exception:
Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failute: Task not serializable:
java.io.NotSerializableException:
org.apache.spark.unsafe.types.UTF8String$IntWrapper
-object not serializable (class : org.apache.spark.unsafe.types.UTF8String$IntWrapper, value:
org.apache.spark.unsafe.types.UTF8String$IntWrapper#30cfd641)
If I'm trying to run my query without collect_set function its work, It's can fail because struct column types in my table?
How can I write my HQL query in Spark / fix my exception?
I have the following graph:
Vertices and edges have been added like this:
def graph=ConfiguredGraphFactory.open('Baptiste');def g = graph.traversal();
graph.addVertex(label, 'Group', 'text', 'BNP Paribas');
graph.addVertex(label, 'Group', 'text', 'BNP PARIBAS');
graph.addVertex(label, 'Company', 'text', 'JP Morgan Chase');
graph.addVertex(label, 'Location', 'text', 'France');
graph.addVertex(label, 'Location', 'text', 'United States');
graph.addVertex(label, 'Location', 'text', 'Europe');
def v1 = g.V().has('text', 'JP Morgan Chase').next();def v2 = g.V().has(text, 'BNP Paribas').next();v1.addEdge('partOf',v2);
def v1 = g.V().has('text', 'JP Morgan Chase').next();def v2 = g.V().has(text, 'United States').next();v1.addEdge('doesBusinessIn',v2);
def v1 = g.V().has('text', 'BNP Paribas').next();def v2 = g.V().has(text, 'United States').next();v1.addEdge('doesBusinessIn',v2);
def v1 = g.V().has('text', 'BNP Paribas').next();def v2 = g.V().has(text, 'France').next();v1.addEdge('partOf',v2);
def v1 = g.V().has('text', 'BNP PARIBAS').next();def v2 = g.V().has(text, 'Europe').next();v1.addEdge('partOf',v2);
And I need a query that returns me every paths possible given specific vertex labels, edge labels and number of possible hops.
Let's say I need paths with maximum hops of 2 and every labels in this example. I tried this query:
def graph=ConfiguredGraphFactory.open('TestGraph');
def g = graph.traversal();
g.V().has(label, within('Location', 'Company', 'Group'))
.repeat(bothE().has(label, within('doesBusinessIn', 'partOf')).bothV().has(label, within('Location', 'Company', 'Group')).simplePath())
.emit().times(2).path();
This query returns 20 paths (supposed to return 10 paths). So it returns paths in the 2 possible directions. Is there a way to specify that I need only 1 direction? I tried adding dedup() in my query but it returns 7 paths instead of 10 so it's not working?
Also whenever I try to find paths with 4 hops, it doesn't return me the "cyclic" paths such as France -> BNP Paribas -> United States -> JP Morgan Chase -> BNP Paribas. Any idea what to add in my query to allow returning those kind of paths?
EDIT:
Thanks for your solution #DanielKuppitz. It seems to be exactly what I'm looking for.
I use JanusGraph built on top of Apache Tinkerpop:
I tried the first query:
g.V().hasLabel('Location', 'Company', 'Group').
repeat(bothE('doesBusinessIn', 'partOf').otherV().simplePath()).
emit().times(2).
path().
dedup().
by(unfold().order().by(id).fold())
And it threw the following error:
Error: org.janusgraph.graphdb.relations.RelationIdentifier cannot be cast to java.lang.Comparable
So I moved the dedup command. into the repeat loop like so:
g.V().hasLabel('Location', 'Company', 'Group').
repeat(bothE('doesBusinessIn', 'partOf').otherV().simplePath().dedup().by(unfold().order().by(id).fold())).
emit().times(2).
path().
And it only returned 6 paths :
[
[
"JP Morgan Chase",
"doesBusinessIn",
"United States"
],
[
"JP Morgan Chase",
"partOf",
"BNP Paribas"
],
[
"JP Morgan Chase",
"partOf",
"BNP Paribas",
"partOf",
"France"
],
[
"Europe",
"partOf",
"BNP PARIBAS"
],
[
"BNP PARIBAS",
"partOf",
"Europe"
],
[
"United States",
"doesBusinessIn",
"JP Morgan Chase"
]
]
I'm not sure what's going on here... Any ideas?
Is there a way to specify that I need only 1 direction?
You kinda need a bidirected traversal, so you'll have to filter duplicated paths in the end ("duplicated" in this case means that 2 paths contain the same elements). In order to do that you can dedup() paths by a deterministic order of elements; the easiest way to do it is to order the elements by their id.
g.V().hasLabel('Location', 'Company', 'Group').
repeat(bothE('doesBusinessIn', 'partOf').otherV().simplePath()).
emit().times(2).
path().
dedup().
by(unfold().order().by(id).fold())
Any idea what to add in my query to allow returning those kinds of paths (cyclic)?
Your query explicitly prevents cyclic paths through the simplePath() step, so it's not quite clear in which scenarios you want to allow them. I assume that you're okay with a cyclic path if the cycle is created by only the first and last element in the path. In this case, the query would look more like this:
g.V().hasLabel('Location', 'Company', 'Group').as('a').
repeat(bothE('doesBusinessIn', 'partOf').otherV()).
emit().
until(loops().is(4).or().cyclicPath()).
filter(simplePath().or().where(eq('a'))).
path().
dedup().
by(unfold().order().by(id).fold())
Below is the output of the 2 queries (ignore the extra map() step, it's just there to improve the output's readability).
gremlin> g.V().hasLabel('Location', 'Company', 'Group').
......1> repeat(bothE('doesBusinessIn', 'partOf').otherV().simplePath()).
......2> emit().times(2).
......3> path().
......4> dedup().
......5> by(unfold().order().by(id).fold()).
......6> map(unfold().coalesce(values('text'), label()).fold())
==>[BNP Paribas,doesBusinessIn,United States]
==>[BNP Paribas,partOf,France]
==>[BNP Paribas,partOf,JP Morgan Chase]
==>[BNP Paribas,doesBusinessIn,United States,doesBusinessIn,JP Morgan Chase]
==>[BNP Paribas,partOf,JP Morgan Chase,doesBusinessIn,United States]
==>[BNP PARIBAS,partOf,Europe]
==>[JP Morgan Chase,doesBusinessIn,United States]
==>[JP Morgan Chase,partOf,BNP Paribas,doesBusinessIn,United States]
==>[JP Morgan Chase,partOf,BNP Paribas,partOf,France]
==>[France,partOf,BNP Paribas,doesBusinessIn,United States]
gremlin> g.V().hasLabel('Location', 'Company', 'Group').as('a').
......1> repeat(bothE('doesBusinessIn', 'partOf').otherV()).
......2> emit().
......3> until(loops().is(4).or().cyclicPath()).
......4> filter(simplePath().or().where(eq('a'))).
......5> path().
......6> dedup().
......7> by(unfold().order().by(id).fold()).
......8> map(unfold().coalesce(values('text'), label()).fold())
==>[BNP Paribas,doesBusinessIn,United States]
==>[BNP Paribas,partOf,France]
==>[BNP Paribas,partOf,JP Morgan Chase]
==>[BNP Paribas,doesBusinessIn,United States,doesBusinessIn,JP Morgan Chase]
==>[BNP Paribas,doesBusinessIn,United States,doesBusinessIn,BNP Paribas]
==>[BNP Paribas,partOf,France,partOf,BNP Paribas]
==>[BNP Paribas,partOf,JP Morgan Chase,doesBusinessIn,United States]
==>[BNP Paribas,partOf,JP Morgan Chase,partOf,BNP Paribas]
==>[BNP Paribas,doesBusinessIn,United States,doesBusinessIn,JP Morgan Chase,partOf,BNP Paribas]
==>[BNP PARIBAS,partOf,Europe]
==>[BNP PARIBAS,partOf,Europe,partOf,BNP PARIBAS]
==>[JP Morgan Chase,doesBusinessIn,United States]
==>[JP Morgan Chase,doesBusinessIn,United States,doesBusinessIn,JP Morgan Chase]
==>[JP Morgan Chase,partOf,BNP Paribas,doesBusinessIn,United States]
==>[JP Morgan Chase,partOf,BNP Paribas,partOf,France]
==>[JP Morgan Chase,partOf,BNP Paribas,partOf,JP Morgan Chase]
==>[JP Morgan Chase,doesBusinessIn,United States,doesBusinessIn,BNP Paribas,partOf,France]
==>[JP Morgan Chase,doesBusinessIn,United States,doesBusinessIn,BNP Paribas,partOf,JP Morgan Chase]
==>[France,partOf,BNP Paribas,doesBusinessIn,United States]
==>[France,partOf,BNP Paribas,partOf,France]
==>[France,partOf,BNP Paribas,partOf,JP Morgan Chase,doesBusinessIn,United States]
==>[United States,doesBusinessIn,JP Morgan Chase,doesBusinessIn,United States]
==>[United States,doesBusinessIn,BNP Paribas,doesBusinessIn,United States]
==>[United States,doesBusinessIn,JP Morgan Chase,partOf,BNP Paribas,doesBusinessIn,United States]
==>[Europe,partOf,BNP PARIBAS,partOf,Europe]
UPDATE (based on latest comments)
Since JanusGraph has non-comparable edge identifiers, you'll need a unique comparable property on all edges. This can be as simple as a random UUID.
This is how I updated your sample graph:
g.addV('Group').property('text', 'BNP Paribas').as('a').
addV('Group').property('text', 'BNP PARIBAS').as('b').
addV('Company').property('text', 'JP Morgan Chase').as('c').
addV('Location').property('text', 'France').as('d').
addV('Location').property('text', 'United States').as('e').
addV('Location').property('text', 'Europe').as('f').
addE('partOf').from('c').to('a').
property('uuid', UUID.randomUUID().toString()).
addE('doesBusinessIn').from('c').to('e').
property('uuid', UUID.randomUUID().toString()).
addE('doesBusinessIn').from('a').to('e').
property('uuid', UUID.randomUUID().toString()).
addE('partOf').from('a').to('d').
property('uuid', UUID.randomUUID().toString()).
addE('partOf').from('b').to('f').
property('uuid', UUID.randomUUID().toString()).
iterate()
Now, that we have properties that can uniquely identify an edge, we also need unique properties (of the same data type) on all vertices. Luckily the existing text properties seem to be good enough for that (otherwise it would be the same story as with the edges - just add a random UUID). The updated queries now look like this:
g.V().hasLabel('Location', 'Company', 'Group').
repeat(bothE('doesBusinessIn', 'partOf').otherV().simplePath()).
emit().times(2).
path().
dedup().
by(unfold().values('text','uuid').order().fold())
g.V().hasLabel('Location', 'Company', 'Group').as('a').
repeat(bothE('doesBusinessIn', 'partOf').otherV()).
emit().
until(loops().is(4).or().cyclicPath()).
filter(simplePath().or().where(eq('a'))).
path().
dedup().
by(unfold().values('text','uuid').order().fold())
The result are, of course, the same as above.
I have next Cassandra table structure:
CREATE TABLE ringostat.hits (
hitId uuid,
clientId VARCHAR,
session MAP<VARCHAR, TEXT>,
traffic MAP<VARCHAR, TEXT>,
PRIMARY KEY (hitId, clientId)
);
INSERT INTO ringostat.hits (hitId, clientId, session, traffic)
VALUES('550e8400-e29b-41d4-a716-446655440000'. 'clientId', {'id': '1', 'number': '1', 'startTime': '1460023732', 'endTime': '1460023762'}, {'referralPath': '/example_path_for_example', 'campaign': '(not set)', 'source': 'www.google.com', 'medium': 'referal', 'keyword': '(not set)', 'adContent': '(not set)', 'campaignId': '', 'gclid': '', 'yclid': ''});
INSERT INTO ringostat.hits (hitId, clientId, session, traffic)
VALUES('650e8400-e29b-41d4-a716-446655440000'. 'clientId', {'id': '1', 'number': '1', 'startTime': '1460023732', 'endTime': '1460023762'}, {'referralPath': '/example_path_for_example', 'campaign': '(not set)', 'source': 'www.google.com', 'medium': 'cpc', 'keyword': '(not set)', 'adContent': '(not set)', 'campaignId': '', 'gclid': '', 'yclid': ''});
INSERT INTO ringostat.hits (hitId, clientId, session, traffic)
VALUES('750e8400-e29b-41d4-a716-446655440000'. 'clientId', {'id': '1', 'number': '1', 'startTime': '1460023732', 'endTime': '1460023762'}, {'referralPath': '/example_path_for_example', 'campaign': '(not set)', 'source': 'www.google.com', 'medium': 'referal', 'keyword': '(not set)', 'adContent': '(not set)', 'campaignId': '', 'gclid': '', 'yclid': ''});
I want to select all rows where source='www.google.com' AND medium='referal'.
SELECT * FROM hits WHERE traffic['source'] = 'www.google.com' AND traffic['medium'] = 'referal' ALLOW FILTERING;
Without add ALLOW FILTERING I got error: No supported secondary index found for the non primary key columns restrictions.
That's why I see two options:
Create index on traffic column.
Create materialized view.
Create another table and set INDEX for traffic column.
Which is the best option ? Also, I have many fields with MAP type on which I will need to filter. What issues can be if on every field I will add INDEX ?
Thank You.
From When to use an index.
Do not use an index in these situations:
On high-cardinality columns because you then query a huge volume of records for a small number of results. [...] Conversely, creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense.
In tables that use a counter column
On a frequently updated or deleted column.
To look for a row in a large partition unless narrowly queried.
If your planned usage meets one or more of these criteria, it is probably better to use a materialized view.