Get Compiled Select gives only initial table CodeIgniter 4 - codeigniter-4

Hello guys getCompiled Select in CodeIgniter returns only
Select * from tablename don't have any joins and wheres
if (!empty($requestArr)) {
$this->data['listings'] = $listingModel
->join('countries', 'listings.country_id=countries.country_id', 'LEFT')
->join('states', 'listings.state_id=states.state_id', 'LEFT')
->join('cities', 'listings.city_id=cities.city_id', 'LEFT')
->join('purposes', 'listings.purpose_id=purposes.purpose_id', 'LEFT')
->join('sub_purposes', 'listings.sub_purpose_id=sub_purposes.sub_purpose_id', 'LEFT')
->join('types', 'listings.type_id=types.type_id', 'LEFT')
->join('sub_types', 'listings.sub_type_id=sub_types.sub_type_id', 'LEFT')
->orLike('area',"%".$requestArr['area']."%")
->orWhere('listings.city_id', $requestArr['city_id'])
->orWhere('listings.state_id', $requestArr['city_id'])
->where('land_area', $requestArr['pSize'])
->where('land_unit', $requestArr['area_unit'])
->where('bedrooms', $requestArr['bedrooms'])
->where('bedrooms', $requestArr['bedrooms'])
->where('listings.purpose_id', $requestArr['purpose_id'])
->where('listings.type_id', $requestArr['type_id'])
->orderBy('listings.created_at', 'DESC')
->paginate(6);
echo $listingModel->getCompiledSelect();
echo json_encode($this->data['listings']);
I want to print the whole generated query but only give result in the below screenshot

Related

EOF in multi-line string error in PySpark

I was running the following query in PySpark, the SQL query runs fine on Hive
spark.sql(f"""
create table DEFAULT.TMP_TABLE as
select b.customer_id, prob
from
bi_prod.TMP_score_1st_day a,
(select customer_id, prob from BI_prod.Tbl_brand1_scoring where insert_date = 20230101
union all
select customer_id, prob from BI_prod.Tbl_brand2_scoring where insert_date = 20230101)  b
where a.customer_id = b.customer_id
""")
This produces the following error
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))
Need to fix this error, can't find out why error is occurring.
I recommend rewriting the code in a more Pythonic way.
from pyspark.sql.functions import col
df1 = (
spark.table('bi_prod.TMP_score_1st_day')
.filter(col('insert_date') == '20230101')
.select('customer_id')
)
df2 = (
spark.table('bi_prod.Tbl_brand2_scoring')
.filter(col('insert_date') == '20230101')
.select('customer_id', 'prob')
)
df = df1.join(df2, 'customer_id')
df.show(1, vertical=True)
Let me know how this works for you and if you still get the same error.

Update a column in PySpark while doing multiple inner joins?

I have a SQL query which I am trying to convert into PySpark. In SQL query, we are joining three tables and updating a column where there's a match. The SQL query looks like this:
UPDATE [DEPARTMENT_DATA]
INNER JOIN ([COLLEGE_DATA]
INNER JOIN [STUDENT_TABLE]
ON COLLEGE_DATA.UNIQUEID = STUDENT_TABLE.PROFESSIONALID)
ON DEPARTMENT_DATA.PUBLICID = COLLEGE_DATA.COLLEGEID
SET STUDENT_TABLE.PRIVACY = "PRIVATE"
The logic I have tried:
df_STUDENT_TABLE = (
df_STUDENT_TABLE.alias('a')
.join(
df_COLLEGE_DATA('b'),
on=F.col('a.PROFESSIONALID') == F.col('b.UNIQUEID'),
how='left',
)
.join(
df_DEPARTMENT_DATA.alias('c'),
on=F.col('b.COLLEGEID') == F.col('c.PUBLICID'),
how='left',
)
.select(
*[F.col(f'a.{c}') for c in df_STUDENT_TABLE.columns],
F.when(
F.col('b.UNIQUEID').isNotNull() & F.col('c.PUBLICID').isNotNull()
F.lit('PRIVATE')
).alias('PRIVACY')
)
)
This code is adding a new column "PRIVACY", but giving null values after running.
I have taken some sample data and when I apply the join using conditions, the following is the result I get (requirement is that the following record's privacy needs to be set to PRIVATE)
%sql
select student.*,college.*,department.* from department INNER JOIN college INNER JOIN student
ON college.unique_id = student.professional_id and department.public_id = college.college_id
When I used your code (same logic), I got the same output i.e., an additional column being added to the dataframe with required values and the actual privacy column has nulls.
from pyspark.sql.functions import col,when,lit
df_s = df_s.alias('a').join(df_c.alias('b'), col('a.professional_id') == col('b.unique_id'),'left').join(df_d.alias('c'), col('b.college_id') == col('c.public_id'),'left').select(*[col(f'a.{c}') for c in df_s.columns],when(col('b.unique_id').isNotNull() & col('c.public_id').isNotNull(), 'PRIVATE').otherwise(col('a.privacy')).alias('req_value'))
df_s.show()
Since, from the above, req_value is the column with required values and these values need to be reflected in privacy, you can use the following code directly.
final = df_s.withColumn('privacy',col('req_value')).select([column for column in df_s.columns if column!='req_value'])
final.show()
UPDATE:
You can also use the following code where I have updated the column using withColumn instead of select.
df_s = df_s.alias('a').join(df_c.alias('b'), col('a.professional_id') == col('b.unique_id'),'left').join(df_d.alias('c'), col('b.college_id') == col('c.public_id'),'left').withColumn('privacy',when(col('b.unique_id').isNotNull() & col('c.public_id').isNotNull(), 'PRIVATE').otherwise(col('privacy'))).select(*df_s.columns)
#or you can use this as well, without using alias.
#df_s = df_s.join(df_c, df_s['professional_id'] == df_c['unique_id'],'left').join(df_d, df_c['college_id'] == df_d['public_id'],'left').withColumn('privacy',when(df_c['unique_id'].isNotNull() & df_d['public_id'].isNotNull(), 'PRIVATE').otherwise(df_s['privacy'])).select(*df_s.columns)
df_s.show()
After the joins, you can use nvl2. It can check if the join with the last dataframe (df_dept) was successful, if yes, then you can return "PRIVATE", otherwise the value from df_stud.PRIVACY.
Inputs:
from pyspark.sql import functions as F
df_stud = spark.createDataFrame([(1, 'x'), (2, 'STAY')], ['PROFESSIONALID', 'PRIVACY'])
df_college = spark.createDataFrame([(1, 1)], ['COLLEGEID', 'UNIQUEID'])
df_dept = spark.createDataFrame([(1,)], ['PUBLICID'])
df_stud.show()
# +--------------+-------+
# |PROFESSIONALID|PRIVACY|
# +--------------+-------+
# | 1| x|
# | 2| STAY|
# +--------------+-------+
Script:
df = (df_stud.alias('s')
.join(df_college.alias('c'), F.col('s.PROFESSIONALID') == F.col('c.UNIQUEID'), 'left')
.join(df_dept.alias('d'), F.col('c.COLLEGEID') == F.col('d.PUBLICID'), 'left')
.select(
*[f's.`{c}`' for c in df_stud.columns if c != 'PRIVACY'],
F.expr("nvl2(d.PUBLICID, 'PRIVATE', s.PRIVACY) PRIVACY")
)
)
df.show()
# +--------------+-------+
# |PROFESSIONALID|PRIVACY|
# +--------------+-------+
# | 1|PRIVATE|
# | 2| STAY|
# +--------------+-------+

Using AVG in Spark with window function

I have the following SQL Query:
Select st.Value,
st.Id,
ntile(2) OVER (PARTITION BY St.Id, St.VarId ORDER By St.Sls),
AVG(St.Value) OVER (PARTITION BY St.Id, St.VarId ORDER By St.Sls, St.Date)
FROM table tb
INNER JOIN staging st on St.Id = tb.Id
I've tried to adapt this to Spark/PySpark using window function, my code is below:
windowSpec_1 = Window.partitionBy("staging.Id", "staging.VarId").orderBy("staging.Sls")
windowSpec_2 = Window.partitionBy("staging.Id", "staging.VarId").orderBy("staging.Sls", "staging.Date")
df= table.join(
staging,
on=f.col("staging.Id") == f.col("table.Id"),
how='inner'
).select(
f.col("staging.Value"),
f.ntile(2).over(windowSpec_1),
f.avg("staging.Value").over(windowSpec_2)
)
Although I'm getting the following error:
pyspark.sql.utils.AnalysisException: Can't extract value from Value#42928: need struct type but got decimal(16,6)
How Can I solve this problem? Is it necessary to group data?
Maybe you forgot to assign alias to staging?:
df= table.join(
staging.alias("staging"),
on=f.col("staging.Id") == f.col("table.Id"),
how='inner'
).select(
f.col("staging.Value"),
f.ntile(2).over(windowSpec_1),
f.avg("staging.Value").over(windowSpec_2)
)

Redshift: SQL Error [XX000]: ERROR: invalid value for "YYYY" in source string

I have edited this legacy script a bit trying not to break a working script. It contains multiple CTEs and in the last statement calls the aggregated and calculated variables created within the CTE. The CTEs run fine from a - c, however with the addition of the final statement breaks the code: SQL Error [XX000]: ERROR: invalid value for "YYYY" in source string. Would appreciate suggestions or changes.
with a as (
select
to_date(df.launch_date, 'YYYYMMDD', FALSE) as n_date,
df.new_name, df.link, df.uid
from (
select
SPLIT_PART(descriptive_field,'_',4) as launch_date,
descriptive_field as new_name,
embedded_link as link,
autocreated_id as uid
from database.schema.table1
union
select
SPLIT_PART(ndf_descriptive_field,'_',4) as launch_date,
ndf_descriptive_field as new_name,
embedded_link as link,
autocreated_id as uid
from database.schema.table2
) as df
where n_date ~ '.*2022.*'
),
b as (
select link, uid,
case
when link ILIKE '%-type1%' THEN title_name||'-type1'
when link....
else new_name
end as new_name
from a
),
c as (
select
new_name,
case
when link ilike ....
when (link ilike .... )
end as link_grouping,
count(distinct(uid)) as dist_usr_cnt
from b
group by 1,2
)
select
c.new_name,
a.n_date,
c.link_grouping,
c.dist_usr_cnt
from c
join a on c.n_name = a.new_name
where a.n_date >= current_date-7
and a.n_date <= current_date-3
order by 2,1,3
;

Need to Join multiple tables in pyspark:

query using:
df= (df1.alias('a')
.join(df2, a.id == df2.id, how='inner')
.select('a.*').alias('b')
.join(df3, b.id == df3.id, how='inner'))
error: name 'b' is not defined.
.alias('b') does not create a Python identifier named b. It sets an internal name of the returned dataframe. Your a.id is likely not the thing you expect it to be, too, but is something defined previously.
I can't remember a nice way to access the newly created DF by name right in the expression. I'd go with an intermediate identifier:
df_joined = df1.join(df1.id == df2.id, how='inner')
result_df = dj_joined.join(df_joined.id == df3.id, how='inner')

Resources