Use spark sql optimizer to optimize two ranges join

Use spark sql optimizer to optimize two ranges join - apache-spark

I am using spark sql optimizer to optimize two ranges join, it is optimized to calculate two range intersection to avoid join
test("SparkTest") {
object RangeIntersectRule extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
case Join(Project(_, Range(start1, end1, _, _, _, _)), Project(_, Range(start2, end2, _, _, _, _)), _, _) => {
val start = start1 max start2
val end = end1 min end2
if (start1 > end2 || end1 < start2) Range(0, 0, 1, Some(1), false) else Range(start, end, 1, Some(1), false)
}
}
}
val spark = SparkSession.builder().master("local").appName("SparkTest").enableHiveSupport().getOrCreate()
spark.experimental.extraOptimizations = Seq(RangeIntersectRule)
spark.range(10, 40).toDF("x").createOrReplaceTempView("t1")
spark.range(20, 50).toDF("y").createOrReplaceTempView("t2")
val df = spark.sql("select t1.x from t1 join t2 on t1.x = t2.y")
df.explain(true)
df.show(truncate = false)
}
But when I run it, an exception throws, could someone help where the problem is?Thanks
The optimized logical plan and physical plan is:
== Optimized Logical Plan ==
Project [x#2L]
+- !Project [id#0L AS x#2L]
+- Range (20, 40, step=1, splits=Some(1))
== Physical Plan ==
Project [x#2L]
+- !Project [id#0L AS x#2L]
+- Range (20, 40, step=1, splits=1)
The exception is:
Caused by: java.lang.RuntimeException: Couldn't find id#0L in [id#14L]
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:106)
at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:100)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:59)
... 47 more

object RangeIntersectRule extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
case Join(Range(start1, end1, 1, Some(1), output1, false), Range(start2, end2, 1, Some(1), output2, false), Inner, _) => {
val start = start1 max start2
val end = end1 min end2
if (start1 > end2 || end1 < start2) Project(output1, Range(0, 0, 1, Some(1), output1, false))
else Project(output1, Range(start, end, 1, Some(1), output1, false))
}
}
}

Related

SQLAlchemy join onto an inline table

I have an inline defined table:
select(
Values(column('key', String), column('value', String), column('ordering', Integer), name='subq')
.data([(e.name, e.value, i) for i, e in enumerate(DurationType)])
This produces this SQL:
select key, value, ordering
from (values ("key1", "name1", 1), ("key2", "name2", 2) ...)
Which is a fine table:
Now i need to join this subquery to another one. The other query is:
self.session.query(self.model)
.filter(self.model.duration != None)
.with_entities(
duration_bucket := case((self.model.duration < 60, DurationType.LESS_THAN_1.name),
(and_(60 <= self.model.duration, self.model.duration < 60 * 5),
DurationType.FROM_1_TO_5.name),
(and_(60 * 5 <= self.model.duration,
self.model.duration < 60 * 10), DurationType.FROM_5_TO_10.name),
(and_(60 * 10 <= self.model.duration,
self.model.duration < 60 * 20), DurationType.FROM_10_TO_20.name),
(and_(60 * 20 <= self.model.duration,
self.model.duration < 60 * 30), DurationType.FROM_20_TO_30.name),
(60 * 30 <= self.model.duration, DurationType.MORE_THAN_30.name)
).label('id'))
.group_by(duration_bucket)
.having(count() > 0)
Which is apart from case clause is just a select from aggregation, that leaves me with a result of single column "id"
But for the life of me I can't figure how to join it. My initial attempt looked like this:
inline_enum_table = select(
Values(column('key', String), column('value', String), column('ordering', Integer), name="suqb")
.data([(e.name, e.value, i) for i, e in enumerate(DurationType)])).subquery()
inline_enum_table = self.session.query(inline_enum_table)
return (
self.session.query(self.model)
.filter(self.model.duration != None)
.with_entities(
duration_bucket := case((self.model.duration < 60, DurationType.LESS_THAN_1.name),
(and_(60 <= self.model.duration, self.model.duration < 60 * 5),
DurationType.FROM_1_TO_5.name),
(and_(60 * 5 <= self.model.duration,
self.model.duration < 60 * 10), DurationType.FROM_5_TO_10.name),
(and_(60 * 10 <= self.model.duration,
self.model.duration < 60 * 20), DurationType.FROM_10_TO_20.name),
(and_(60 * 20 <= self.model.duration,
self.model.duration < 60 * 30), DurationType.FROM_20_TO_30.name),
(60 * 30 <= self.model.duration, DurationType.MORE_THAN_30.name)
).label('id'))
.group_by(duration_bucket)
.having(count() > 0)
.join(duration_enum := inline_enum_table.label('qwe'), duration_enum.key == duration_bucket)
This particular attempt results in sqlalchemy.exc.ArgumentError: Expected mapped entity or selectable/table as join target
I've had many more tries with all sorts of errors.

I got the following to work. select().subquery() is what helps SQLAlchemy work with such objects as part of a FROM clause.
from sqlalchemy import Column, column, create_engine, Integer, join, select, String, values
from sqlalchemy.orm import declarative_base, Session
engine = create_engine("postgresql://scott:tiger#192.168.0.199/test")
Base = declarative_base()
class Thing(Base):
__tablename__ = "thing"
id = Column(Integer, primary_key=True, autoincrement=False)
key = Column(String(50))
def __repr__(self):
return f"Thing(id={repr(self.id)}, key={repr(self.key)})"
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
with Session(engine) as sess, sess.begin():
sess.add_all(
[
Thing(id=1, key="key1"),
Thing(id=2, key="key2"),
]
)
inline_enum_table = select(
values(
column("key", String),
column("value", String),
column("ordering", Integer),
name="inline_enums",
literal_binds=True,
).data(
[
("key1", "name1", 1),
("key2", "name2", 2),
]
)
).subquery()
with Session(engine) as sess:
query = select(Thing, inline_enum_table.c.value).select_from(
join(Thing, inline_enum_table, Thing.key == inline_enum_table.c.key)
)
print(query)
"""
SELECT thing.id, thing.key, anon_1.value
FROM thing JOIN (SELECT inline_enums.key AS key, inline_enums.value AS value, inline_enums.ordering AS ordering
FROM (VALUES ('key1', 'name1', 1), ('key2', 'name2', 2)) AS inline_enums (key, value, ordering)) AS anon_1 ON thing.key = anon_1.key
"""
results = sess.execute(query).all()
print(results)
# [(Thing(id=1, key='key1'), 'name1'), (Thing(id=2, key='key2'), 'name2')]

Error in setting parameter value in pyomo

I have the following optimization model with two sets that act as indexes:
import pyomo.environ as pyo
# Define the model
model = pyo.AbstractModel()
# Define the set of VPP users
model.VPP_users = pyo.Set()
# Define the set of timesteps
model.timesteps = pyo.Set()
# Define the inputs of the model
model.DR_signal = pyo.Param(within = pyo.NonNegativeReals)
model.power_contract = pyo.Param(model.VPP_users, model.timesteps, within = pyo.NonNegativeReals)
model.HVAC_flex_available = pyo.Param(model.VPP_users, model.timesteps, within = pyo.NonNegativeReals)
model.DHW_flex_available = pyo.Param(model.VPP_users, model.timesteps, within = pyo.NonNegativeReals)
# Define the decision variables of the model
model.HVAC_flex = pyo.Var(model.VPP_users, model.timesteps, within = pyo.NonNegativeReals)
model.DHW_flex = pyo.Var(model.VPP_users, model.timesteps, within = pyo.NonNegativeReals)
# Define the constraints of the model
def DRSignalRule(model, i, t):
return model.HVAC_flex[i, t] + model.DHW_flex[i, t] <= model.DR_signal
model.cons1 = pyo.Constraint(model.VPP_users, model.timesteps, rule = DRSignalRule)
def PowerContractedRule(model, i, t):
return model.HVAC_flex[i, t] + model.DHW_flex[i, t] <= model.power_contract[i, t]
model.cons2 = pyo.Constraint(model.VPP_users, model.timesteps, rule = PowerContractedRule)
def HVACFlexRule(model, i, t):
return model.HVAC_flex[i, t] <= model.HVAC_flex_available[i, t]
model.cons3 = pyo.Constraint(model.VPP_users, model.timesteps, rule = HVACFlexRule)
def DHWFlexRule(model, i, t):
return model.DHW_flex[i, t] <= model.DHW_flex_available[i, t]
model.cons4 = pyo.Constraint(model.VPP_users, model.timesteps, rule = DHWFlexRule)
# Define the objective function
def ObjRule(model):
return sum(model.HVAC_flex[i, t] + model.DHW_flex[i, t] for i in model.VPP_users for t in model.timesteps)
model.obj = pyo.Objective(rule = ObjRule, sense = pyo.maximize)
The data to solve my problem have the following form:
data = {None: {
'VPP_users': {None: [1,2]},
'timesteps': {None: [1,2]},
'DR_signal': {None: 100},
'power_contract': {(1, 1): 50, (1, 2): 50, (2, 1): 50, (2, 2): 50},
'HVAC_flex_available': {(1, 1): 10, (1, 2): 10, (2, 1): 10, (2, 2): 10},
'DHW_flex_available': {(1, 1): 40, (1, 2): 35, (2, 1): 40, (2, 2): 35},
}}
Finally, I solve the problem as follows:
instance = model.create_instance(data)
opt = pyo.SolverFactory('glpk')
opt.solve(instance)
However, I am getting the following error:
Failed to set value for param=power_contract, index=1, value=50.
source error message="Index '1' is not valid for indexed component 'power_contract'"
Any idea of what am I doing wrong and how can I bypass it?

Reportlab pyhton alignment issue

Style3 = TableStyle([('VALIGN',(0,0),(-1,-1),'TOP'),
('ALIGN',(0,0),(-1,-1),'RIGHT'),
('LEFTPADDING',(0,0),(-1,-1), 130),
('RIGHTPADDING',(0,0),(-1,-1), 0),
('TOPPADDING',(0,0),(-1,-1), 0),
])
I want response and categories label in starting but it show at the end of line, i want response and categories in from starting not end of line..
any special styling i need.
def getToolsTables(data,tname):
doc = SimpleDocTemplate("somefilename.pdf",pagesize=A2, topMargin=25, bottomMargin=0)
main_header = []
h_arr1 = []
h_arr1.append(tname)
main_header.append(h_arr1)
mainHeader = Table(main_header, colWidths='*')
finalTable = []
main_table_header_Style = TableStyle([
('BACKGROUND', (0, 0), (-1, 0), '#D3D3D3'),
('TEXTCOLOR',(0,0),(-1,-1),colors.black),
('ALIGN',(0,0),(-1,-1),'LEFT'),
('FONTSIZE', (0,0), (-1,-1), 12),
('FONTNAME', (0,0), (-1,-1),
'Courier-Bold'
),
('TOPPADDING',(0,0),(-1,-1), 5),
('BOTTOMPADDING',(0,0),(-1,-1), 7),
('LINEBELOW',(0,0),(-1,0),1,colors.black)
])
tools_table_header_Style = TableStyle([
('BACKGROUND', (0, 0), (-1, 0), 'lightblue'),
('TEXTCOLOR',(0,0),(-1,-1),colors.black),
('ALIGN',(0,0),(-1,-1),'LEFT'),
('FONTSIZE', (0,0), (-1,-1), 11),
('FONTNAME', (0,0), (-1,-1),
'Courier-Bold'),
])
tools_table_header_Style1 = TableStyle([
('BACKGROUND', (0, 0), (-1, 0), 'lightgreen'),
('TEXTCOLOR',(0,0),(-1,-1),colors.black),
('ALIGN',(0,0),(-1,-1),'CENTER'),
('FONTSIZE', (0,0), (-1,-1), 11),
('FONTNAME', (0,0), (-1,-1),
'Courier-Bold'),
])
Style2 = TableStyle([
('ALIGN',(0,0),(-1,-1),'LEFT'),
('LEFTPADDING',(0,0),(-1,-1), 0),
('RIGHTPADDING',(0,0),(-1,-1), 0),
('BOTTOMPADDING',(0,0),(-1,-1), -10),])
Style3 = TableStyle([('VALIGN',(0,0),(-1,-1),'TOP'),
('ALIGN',(0,0),(-1,-1),'RIGHT'),
('LEFTPADDING',(0,0),(-1,-1), 130),
('RIGHTPADDING',(0,0),(-1,-1), 0),
('TOPPADDING',(0,0),(-1,-1), 0),
])
mainHeader.setStyle(main_table_header_Style) # adding style to table main header
finalTable.append(mainHeader)
# Create Tools Array
tools_header = []
tools_body = []
# print(',,,,,,,,,,,,,,,,,,,,,,,,',data)
if type(data) == dict:
all_table = []
for key, value in data.items() :
temp = []
temp_table = []
temp.append(key)
tool_table_header = Table([temp],colWidths='*')
tool_table_header.setStyle(tools_table_header_Style)
temp_table.append(tool_table_header)
if key != 'Builtwith':
t_h = []
t_b = []
for k,v in value.items():
t_h.append(k)
t_b.append(v)
t_body = []
# import pdb; pdb.set_trace()
for index, item in enumerate(t_h):
if item != 'status':
arr1 = []
arr2 = []
if type(t_b[index]) is list:
temp_txt = ''
for txt in t_b[index]:
temp_txt += txt + ', '
arr1.append(item + ':')
text = t_b[index]
wraped_text = "\n".join(wrap(str(temp_txt[:-3]), 60)) # 60 is line width
arr1.append(wraped_text)
else:
arr1.append(item + ':')
text = t_b[index]
wraped_text = "\n".join(wrap(str(text), 60)) # 60 is line width
arr1.append(wraped_text)
arr2.append(arr1)
n_table =Table(arr2,[200,370])
t_body.append(n_table)
tool_header = Table([temp_table], colWidths='*')
tool_body = Table([[t_body]],[200,370])
finalTable.append(Table([[[tool_header,tool_body]]], colWidths='*'))
else:
for key,val in value.items():
temp1 = []
temp_table1 = []
temp1.append(key)
tool_table_header = Table([temp1],colWidths='*')
tool_table_header.setStyle(tools_table_header_Style1)
temp_table1.append(tool_table_header)
print('kkkkkkkk')
t_h = []
t_b = []
for k,v in val.items():
t_h.append(k)
t_b.append(v)
t_body = []
for index, item in enumerate(t_h):
if item != 'status':
arr1 = []
arr2 = []
if type(t_b[index]) is list:
temp_txt = ''
for txt in t_b[index]:
temp_txt += txt + ', '
arr1.append(item + ':')
text = t_b[index]
wraped_text = "\n".join(wrap(str(temp_txt[:-3]), 60)) # 60 is line width
arr1.append(wraped_text)
else:
arr1.append(item + ':')
text = t_b[index]
wraped_text = "\n".join(wrap(str(text), 80)) # 60 is line width
arr1.append(wraped_text)
arr2.append(arr1)
n_table =Table(arr2,[200,370])
t_body.append(n_table)
tool_header = Table([temp_table1], colWidths='*')
tool_header.setStyle(Style3)
tool_body = Table([[t_body]],[200,370])
table = Table([[[temp_table,tool_header,tool_body]]],colWidths='*')
finalTable.append(table)
else:
# finalTable.append(Table([['Tools Scan in progress...']], colWidths='*'))
finalTable.append(Table([[data]], colWidths='*'))
return finalTable

Recursively return if the number of lowercase letters in a string is even

I'm trying to write a recursive function that returns true if the number of lowercase letters in a string is even. This is what I have until now:
def is_number_of_lowercase_even(s,low,high):
if (low==high):
return False
if low<high:
left = s[low].islower()
return left and not is_number_of_lowercase_even(s,low+1,high)
I have to stick to the function definition above. Not sure what I'm doing wrong.

Something like this? Divide the range in half. If both halves are even or both halves are odd then the total is even.
def f(s, low, high):
if low == high:
return not s[low].islower()
mid = low + (high - low) // 2
left = f(s, low, mid)
right = f(s, mid + 1, high)
return (left and right) or (not left and not right)
strs = [
"Abc",
"ABc",
"asdf",
"aSdF",
"ASdf",
"AsDF"
]
for s in strs:
print(s, f(s, 0, len(s) - 1))
"""
('Abc', True)
('ABc', False)
('asdf', True)
('aSdF', True)
('ASdf', True)
('AsDF', False)
"""

Points in a rectangle All

I am trying to get all list of points to populate a boolean true if the list of points fall within a rectangle.
I've tried to run the below code in Jupyterlab. but I keep getting the following error:
TypeError: '>=' not supported between instances of 'tuple' and 'int'
def allIn(firstCorner=(0,0), secondCorner=(0,0), pointList=[]):
fc1,sc1=firstCorner[0],firstCorner[1]
fc2,sc2=secondCorner[0],secondCorner[1]
fc,sc=pointList[0],pointList[1]
if (fc >= fc1 and fc <= fc2 and sc >= sc1 and sc <= sc2) :
return True
elif(fc >= fc2 and fc <= fc1 and sc >= sc2 and sc <= sc1):
return True
else:
return False
print(allIn((0,0), (5,5), [(1,1), (0,0), (5,5)]))
I expect the output to be allIn((0,0), (5,5), [(1,1), (0,0), (5,5)]) should return True but allIn((0,0), (5,5), [(1,1), (0,0), (5,6)]) should return False
and empty list of points allIn((0,0), (5,5), []) should return False

Your pointsList is a list of tuples. You have set
fc,sc=pointList[0],pointList[1]
So fc and sc are tuples. When you do
if (fc >= fc1 and fc <= fc2 and sc >= sc1 and sc <= sc2) :
You are comparing fc (a tuple) to fc1 (an int), which will throw a TypeError. To make the correct comparison look at pointList[0][0], pointList[0][1], pointList[1][0], etc.

Look at individual points inside your list of points.
def allIn(firstCorner=(0,0), secondCorner=(0,0), pointList=[]):
fc1,sc1=firstCorner[0],firstCorner[1]
fc2,sc2=secondCorner[0],secondCorner[1]
inside = False
for point in pointList:
fc,sc=point[0],point[1]
if (fc >= fc1 and fc <= fc2 and sc >= sc1 and sc <= sc2) :
inside = True
elif(fc >= fc2 and fc <= fc1 and sc >= sc2 and sc <= sc1):
inside = True
else:
return False
return inside

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Use spark sql optimizer to optimize two ranges join - apache-spark

Related

SQLAlchemy join onto an inline table

Error in setting parameter value in pyomo

Reportlab pyhton alignment issue

Recursively return if the number of lowercase letters in a string is even

Points in a rectangle All

Categories

Resources