Joins with 8 tables in YugabyteDB - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
I have a dataset and a query with 8 joins on tables that have between 10 and 17,000 rows, the query returns 4,000 rows. My problem is it is very slow. There's no obvious way to rewrite the query in a more optimal way. It only does equi-joins on indexed columns. I tried giving it hints but they are not used. As a comparison, I load the same set in PostgreSQL and it returns in < 1 second. The query plan of YB vs Postgres on this is completely different. I used a single instance postgres vs 3 masters and 3 tservers, latest version, all VMs have 1 cpu and 3 GB memory. I realize the cpu/memory are on the low side but they are not fully used, I would run it on bigger instances if that were the bottleneck. Can you give some tips? Where do I look to optimise this query? What's your experience with queries that do many joins?
Query plan below:
explain (costs off, analyze, verbose) SELECT
a.ptflio_clstr_mstr_key,
a.ptflio_clstr_nm,
a.reference_id,
pku.kpc_usr_nm,
a.chng_usr,
clstrhist.chng_dttm,
a.ptflio_clstr_desc,
b.ptflio_bld_blck_grp_mstr_key,
b.ptflio_bld_blck_grp_nm,
f.prd_offr_mstr_key,
f.prd_offr_nm,
d.atmc_prd_offr_type_ind,
offrhist.chng_dttm
FROM product_store.pm_ptflio_clstr a
LEFT JOIN product_store.pm_ptflio_clstr_hist clstrhist ON a.ptflio_clstr_mstr_key = clstrhist.ptflio_clstr_mstr_key AND clstrhist.row_seq=1
LEFT JOIN product_store.pm_kpc_usr pku on a.kpc_usr_id = pku.kpc_usr_id
JOIN product_store.pm_ptflio_bld_blck_grp b ON a.ptflio_clstr_mstr_key = b.ptflio_clstr_mstr_key
JOIN product_store.pm_ptflio_bld_blck c ON b.ptflio_bld_blck_grp_mstr_key = c.ptflio_bld_blck_grp_mstr_key
LEFT JOIN product_store.pm_atmc_prd_offr d ON c.ptflio_bld_blck_mstr_key = d.ptflio_bld_blck_mstr_key
LEFT JOIN product_store.pm_prd_offr f ON f.prd_offr_mstr_key = d.atmc_prd_offr_mstr_key
LEFT JOIN product_store.pm_prd_offr_hist offrhist ON f.prd_offr_mstr_key = offrhist.prd_offr_mstr_key AND offrhist.row_seq = 1
WHERE a.ptflio_clstr_mstr_key='b3000fbe-65d5-4c68-9758-0954c7f9a0f1';
Nested Loop Left Join (actual time=12.719..10427.074 rows=3835 loops=1)
Output: a.ptflio_clstr_mstr_key, a.ptflio_clstr_nm, a.reference_id, pku.kpc_usr_nm, a.chng_usr, clstrhist.chng_dttm, a.ptflio_clstr_desc, b.ptflio_bld_blck_grp_mstr_key, b.ptflio_bld_blck_grp_nm, f.prd_offr_mstr_key, f.prd_offr_nm, d.atmc_prd_offr_type_ind, offrhist.chng_dttm
Inner Unique: true
-> Nested Loop Left Join (actual time=12.338..8896.474 rows=3835 loops=1)
Output: a.ptflio_clstr_mstr_key, a.ptflio_clstr_nm, a.reference_id, a.chng_usr, a.ptflio_clstr_desc, clstrhist.chng_dttm, pku.kpc_usr_nm, b.ptflio_bld_blck_grp_mstr_key, b.ptflio_bld_blck_grp_nm, d.atmc_prd_offr_type_ind, f.prd_offr_mstr_key, f.prd_offr_nm
Inner Unique: true
-> Nested Loop (actual time=11.903..7197.184 rows=3835 loops=1)
Output: a.ptflio_clstr_mstr_key, a.ptflio_clstr_nm, a.reference_id, a.chng_usr, a.ptflio_clstr_desc, clstrhist.chng_dttm, pku.kpc_usr_nm, b.ptflio_bld_blck_grp_mstr_key, b.ptflio_bld_blck_grp_nm, d.atmc_prd_offr_type_ind, d.atmc_prd_offr_mstr_key
-> Nested Loop Left Join (actual time=1.755..1.759 rows=1 loops=1)
Output: a.ptflio_clstr_mstr_key, a.ptflio_clstr_nm, a.reference_id, a.chng_usr, a.ptflio_clstr_desc, clstrhist.chng_dttm, pku.kpc_usr_nm
Inner Unique: true
-> Nested Loop Left Join (actual time=1.298..1.301 rows=1 loops=1)
Output: a.ptflio_clstr_mstr_key, a.ptflio_clstr_nm, a.reference_id, a.chng_usr, a.ptflio_clstr_desc, a.kpc_usr_id, clstrhist.chng_dttm
Inner Unique: true
Join Filter: (a.ptflio_clstr_mstr_key = clstrhist.ptflio_clstr_mstr_key)
-> Index Scan using xpkportfolio_cluster on product_store.pm_ptflio_clstr a (actual time=0.764..0.767 rows=1 loops=1)
Output: a.ptflio_clstr_mstr_key, a.row_seq, a.ptflio_clstr_nm, a.ptflio_clstr_desc, a.mstr_stat_cd, a.chng_usr, a.chng_dttm, a.chng_rmrk, a.reference_id, a.kpc_usr_id
Index Cond: (a.ptflio_clstr_mstr_key = 'b3000fbe-65d5-4c68-9758-0954c7f9a0f1'::uuid)
-> Index Scan using pm_ptflio_clstr_hist_pkey on product_store.pm_ptflio_clstr_hist clstrhist (actual time=0.529..0.529 rows=0 loops=1)
Output: clstrhist.ptflio_clstr_mstr_key, clstrhist.row_seq, clstrhist.ptflio_clstr_nm, clstrhist.ptflio_clstr_desc, clstrhist.mstr_stat_cd, clstrhist.chng_usr, clstrhist.chng_dttm, clstrhist.chng_rmrk, clstrhist.reference_id, clstrhist.kpc_usr_id
Index Cond: ((clstrhist.ptflio_clstr_mstr_key = 'b3000fbe-65d5-4c68-9758-0954c7f9a0f1'::uuid) AND (clstrhist.row_seq = 1))
-> Index Scan using xpkkpc_user on product_store.pm_kpc_usr pku (actual time=0.453..0.453 rows=0 loops=1)
Output: pku.kpc_usr_id, pku.ruisnaam, pku.kpc_usr_nm, pku.kpc_usr_act_ind, pku.chng_usr, pku.chng_dttm
Index Cond: (a.kpc_usr_id = pku.kpc_usr_id)
-> Nested Loop (actual time=10.131..7193.091 rows=3835 loops=1)
Output: b.ptflio_bld_blck_grp_mstr_key, b.ptflio_bld_blck_grp_nm, b.ptflio_clstr_mstr_key, d.atmc_prd_offr_type_ind, d.atmc_prd_offr_mstr_key
Inner Unique: true
-> Hash Right Join (actual time=9.673..73.707 rows=16878 loops=1)
Output: c.ptflio_bld_blck_grp_mstr_key, d.atmc_prd_offr_type_ind, d.atmc_prd_offr_mstr_key
Inner Unique: true
Hash Cond: (d.ptflio_bld_blck_mstr_key = c.ptflio_bld_blck_mstr_key)
-> Seq Scan on product_store.pm_atmc_prd_offr d (actual time=8.124..45.709 rows=16865 loops=1)
Output: d.atmc_prd_offr_mstr_key, d.row_seq, d.ptflio_bld_blck_mstr_key, d.lcm_phase_cd, d.lcm_phase_start_dttm, d.lcm_phase_end_dttm, d.atmc_prd_offr_type_ind, d.chng_usr, d.chng_dttm, d.lcm_phase_desc, d.lcm_phase_alert_dttm, d.lcm_phase_approved_by, d.lcm_phase_master_key
-> Hash (actual time=1.534..1.534 rows=43 loops=1)
Output: c.ptflio_bld_blck_grp_mstr_key, c.ptflio_bld_blck_mstr_key
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Seq Scan on product_store.pm_ptflio_bld_blck c (actual time=0.531..1.523 rows=43 loops=1)
Output: c.ptflio_bld_blck_grp_mstr_key, c.ptflio_bld_blck_mstr_key
-> Index Scan using xpkportfolio_building_block_gr on product_store.pm_ptflio_bld_blck_grp b (actual time=0.403..0.403 rows=0 loops=16878)
Output: b.ptflio_bld_blck_grp_mstr_key, b.row_seq, b.ptflio_bld_blck_grp_nm, b.ptflio_bld_blck_grp_desc, b.ptflio_clstr_mstr_key, b.mstr_stat_cd, b.chng_usr, b.chng_dttm, b.chng_rmrk, b.reference_id, b.kpc_usr_id
Index Cond: (b.ptflio_bld_blck_grp_mstr_key = c.ptflio_bld_blck_grp_mstr_key)
Filter: (b.ptflio_clstr_mstr_key = 'b3000fbe-65d5-4c68-9758-0954c7f9a0f1'::uuid)
Rows Removed by Filter: 1
-> Index Scan using xpkproduct_offering on product_store.pm_prd_offr f (actual time=0.419..0.419 rows=1 loops=3835)
Output: f.prd_offr_mstr_key, f.row_seq, f.prd_offr_nm, f.prop_mod_mstr_key, f.clstr_dsct_prd_offr_grp_cd, f.trgt_ptflio_ind, f.price_brd_nmbr, f.price_brd_stat_cd, f.prd_offr_lnup, f.pm_prd_offr_type_cd, f.comm_prd_id, f.chng_usr, f.chng_dttm, f.reference_id, f.version, f.kpc_usr_id, f.generation
Index Cond: (f.prd_offr_mstr_key = d.atmc_prd_offr_mstr_key)
-> Index Scan using pm_prd_offr_hist_pkey on product_store.pm_prd_offr_hist offrhist (actual time=0.375..0.375 rows=0 loops=3835)
Output: offrhist.prd_offr_mstr_key, offrhist.row_seq, offrhist.prd_offr_nm, offrhist.prop_mod_mstr_key, offrhist.clstr_dsct_prd_offr_grp_cd, offrhist.trgt_ptflio_ind, offrhist.price_brd_nmbr, offrhist.price_brd_stat_cd, offrhist.prd_offr_lnup, offrhist.pm_prd_offr_type_cd, offrhist.comm_prd_id, offrhist.chng_usr, offrhist.chng_dttm, offrhist.reference_id, offrhist.version, offrhist.kpc_usr_id, offrhist.generation
Index Cond: ((f.prd_offr_mstr_key = offrhist.prd_offr_mstr_key) AND (offrhist.row_seq = 1))
Planning Time: 0.777 ms
Execution Time: 10655.916 ms

The general approach to query tuning is called ‘query tuning by eliminating throwaway” http://docplayer.net/20177036-Query-tuning-by-eliminating-throwaway.html, which is the name of a paper that was written by Martin Berg from Denmark (for the Oracle database, but the general methodology holds true for all databases that use query plans with rowsources). On top of that, there are some specifics for distributed databases like YugabyteDB, where rowsources like nested loops can cause higher overhead than a monolithic database, because in a monolith, everything is local.
The pg_hint_plan extension required for making hints work is not enabled in YugabyteDB by default, but it is installed in the YugabyteDB server software. This means to enable it, you need to run create extension pg_hint_plan; in one YugabyteDB database to enable it for the cluster.
Of the ~ 10 seconds, ~ 6 seconds sits in this join:
JOIN product_store.pm_ptflio_bld_blck c ON b.ptflio_bld_blck_grp_mstr_key = c.ptflio_bld_blck_grp_mstr_key
using the index xpkportfolio_building_block_gr.
Reading product_store.pm_ptflio_bld_blck_grp b 16878 times to find no rows. Probably better to start with it (I guess the predicate on b.ptflio_clstr_mstr_key = 'b3000fbe-65d5-4c68-9758-0954c7f9a0f1'::uuid is highly selective):
/*+ Leading( (b c) ) */ may be a good start but may not be possible because of outer join.
So maybe /*+ Leading( (c b) ) HashJoin(c b) */.
Depends on your data. The goal is to start with the most selective table, then join to those that reduces the number of rows, then the others.

Related

Very slow database query migrating from django 1.8 to django 3.2

I migrated a project from django1.8 to django3.2. I'm using the same database but the function I use to build a report is four times slower (8 seconds on Django1.8 and 30/40 seconds with django3.2). The database is MySQL version 5.7 (I also tried version 8.0.32 but nothing changed).
This is the query:
qs = PartitaDanno.objects.filter(incarico__cliente_intestatario_fattura_id=8006
).select_related('incarico', 'anagrafica_assctp', 'incarico__naturaincarico'
).prefetch_related('rate'
).order_by('-incarico__anno', '-incarico__numero', 'pk')
PartitaDanno is a table with 16000 rows and the model has 215 fields (I know..I didn't write it). The result of this query are just 1700 rows..a very small result.
The unusual thing is that even if I use a simple query on this model like
qs = PartitaDanno.objects.filter(incarico__cliente_intestatario_fattura_id=8006)
It takes 20 seconds to iterate through result of this basic query..I don't understand.
The raw sql query is the same in both versions of Django. This is the first queryset raw query:
SELECT `sinistri_partitadanno`.`id`, `sinistri_partitadanno`.`created`, `sinistri_partitadanno`.`modified`, `sinistri_partitadanno`.`incarico_id`, `sinistri_partitadanno`.`tipo`, `sinistri_partitadanno`.`tipologia_controparte`, `sinistri_partitadanno`.`fase`, `sinistri_partitadanno`.`competenza_giurisdizionale`, `sinistri_partitadanno`.`soggetto`, `sinistri_partitadanno`.`anagrafica_assctp_id`, `sinistri_partitadanno`.`luogo`, `sinistri_partitadanno`.`compagnia_id`, `sinistri_partitadanno`.`riferimento_compagnia`, `sinistri_partitadanno`.`tipo_di_danno`, `sinistri_partitadanno`.`tipo_lesioni`, `sinistri_partitadanno`.`tipologia_vittima`, `sinistri_partitadanno`.`modello_veicolo`, `sinistri_partitadanno`.`targa_veicolo`, `sinistri_partitadanno`.`riserva_iniziale`, `sinistri_partitadanno`.`riserva_parziale`, `sinistri_partitadanno`.`data_richiesta_danni_ctp`, `sinistri_partitadanno`.`reiezione_cautelativa`, `sinistri_partitadanno`.`data_reiezione_cautelativa`, `sinistri_partitadanno`.`importo_richiesto_da_ctp`, `sinistri_partitadanno`.`data_incarico_nostro_perito`, `sinistri_partitadanno`.`tipologia_perizia`, `sinistri_partitadanno`.`nominativo_perito`, `sinistri_partitadanno`.`data_perizia_negativa`, `sinistri_partitadanno`.`motivazioni_perizia_negativa`, `sinistri_partitadanno`.`data_interlocutoria_inviata_dal_perito`, `sinistri_partitadanno`.`importo_stimato_dal_perito`, `sinistri_partitadanno`.`data_ricezione_perizia_dal_nostro_perito`, `sinistri_partitadanno`.`riparazione_antieconomica`, `sinistri_partitadanno`.`data_incarico_medico_legale`, `sinistri_partitadanno`.`data_ricezione_perizia_dal_nostro_medico_legale`, `sinistri_partitadanno`.`data_invio_offerta_tramite_transazione_quietanza`, `sinistri_partitadanno`.`data_reiezione_indennizzo_risarcimento`, `sinistri_partitadanno`.`pagamento_parziale`, `sinistri_partitadanno`.`capitale`, `sinistri_partitadanno`.`data_pagamento_capitale`, `sinistri_partitadanno`.`anagrafica_patrocinatore_id`, `sinistri_partitadanno`.`tipo_patrocinatore`, `sinistri_partitadanno`.`cf_piva_patrocinatore`, `sinistri_partitadanno`.`numero_rif_patrocinatore`, `sinistri_partitadanno`.`spese_legali`, `sinistri_partitadanno`.`data_pagamento_spese_legali`, `sinistri_partitadanno`.`integrazione_capitale`, `sinistri_partitadanno`.`data_pag_integraz_capitale`, `sinistri_partitadanno`.`integrazione_spese_legali`, `sinistri_partitadanno`.`data_pag_integraz_spese_legali`, `sinistri_partitadanno`.`note_generali`, `sinistri_partitadanno`.`note_pagamenti`, `sinistri_partitadanno`.`cessione_del_credito`, `sinistri_partitadanno`.`veicolo_sostitutivo`, `sinistri_partitadanno`.`importo_noleggio_veicolo_sostitutivo`, `sinistri_partitadanno`.`altre_spese_accessorie_richieste`, `sinistri_partitadanno`.`importo_altre_spese_accessorie_richieste`, `sinistri_partitadanno`.`importo_onorari_richiesti`, `sinistri_partitadanno`.`liquidazione_diretta_senza_perizia`, `sinistri_partitadanno`.`offerta_lesioni_a_stralcio`, `sinistri_partitadanno`.`richiesto_nulla_osta_giudice_tutelare`, `sinistri_partitadanno`.`data_ricezione_nulla_osta_giudice_tutelare`, `sinistri_partitadanno`.`note_fase_precontenziosa`, `sinistri_partitadanno`.`certificato_di_chiusura_malattia`, `sinistri_partitadanno`.`data_ricezione_cert_chiusura_malattia`, `sinistri_partitadanno`.`richiesta_accesso_agli_atti`, `sinistri_partitadanno`.`data_ricezione_richiesta_accesso_agli_atti`, `sinistri_partitadanno`.`data_risposta_accesso_agli_atti`, `sinistri_partitadanno`.`note_accesso_agli_atti`, `sinistri_partitadanno`.`atto_di_citazione`, `sinistri_partitadanno`.`data_notifica_atto_di_citazione`, `sinistri_partitadanno`.`valore_causa`, `sinistri_partitadanno`.`valore_giudiziale_riservato`, `sinistri_partitadanno`.`mediazione`, `sinistri_partitadanno`.`data_notifica_mediazione`, `sinistri_partitadanno`.`data_risposta_mediazione`, `sinistri_partitadanno`.`note_su_adr`, `sinistri_partitadanno`.`domiciliatario_sled_id`, `sinistri_partitadanno`.`autorita_evocata_id`, `sinistri_partitadanno`.`nome_del_giudice`, `sinistri_partitadanno`.`sezione_tribunale`, `sinistri_partitadanno`.`ruolo_generale`, `sinistri_partitadanno`.`prima_udienza_in_atto_introduttivo`, `sinistri_partitadanno`.`adempimento_udienza`, `sinistri_partitadanno`.`rinvio_all_udienza_del`, `sinistri_partitadanno`.`scadenza_adempimento`, `sinistri_partitadanno`.`tipologia_adempimento`, `sinistri_partitadanno`.`numero_sentenza`, `sinistri_partitadanno`.`data_ricezione_sentenza`, `sinistri_partitadanno`.`pubblicazione_sentenza`, `sinistri_partitadanno`.`data_prima_udienza`, `sinistri_partitadanno`.`note`, `sinistri_partitadanno`.`adempimento_svolto`, `sinistri_partitadanno`.`reclamo_ivass`, `sinistri_partitadanno`.`data_ricezione_reclamo_ivass`, `sinistri_partitadanno`.`studio_legale_incaricato`, `sinistri_partitadanno`.`dettaglio_riserva_rivalsa`, `sinistri_partitadanno`.`dettaglio_esborsi_pagati_sled`, `sinistri_partitadanno`.`descrizione_esborsi_pagati_sled`, `sinistri_partitadanno`.`dettaglio_importi_recuperati_da_sled`, `sinistri_partitadanno`.`ricevuta_richiesta_danni_ctp`, `sinistri_partitadanno`.`data_ricezione_richiesta_danni_ctp`, `sinistri_partitadanno`.`termini_legali_di_risposta_ctp`, `sinistri_partitadanno`.`motivi_reiezione_cautelativa`, `sinistri_partitadanno`.`data_integraz_elementi_ctp`, `sinistri_partitadanno`.`data_ricezione_transazione_quietanza_firmata`, `sinistri_partitadanno`.`data_ricezione_rifiuto_offerta`, `sinistri_partitadanno`.`motivi_reiezione_indennizzo`, `sinistri_partitadanno`.`clausole_particolari`, `sinistri_partitadanno`.`incarico_gestione_relitto`, `sinistri_partitadanno`.`destinazione_relitto`, `sinistri_partitadanno`.`data_ricezione_incarico`, `sinistri_partitadanno`.`valore_commerciale`, `sinistri_partitadanno`.`spese_diverse_relitto`, `sinistri_partitadanno`.`tipologia_spesa_relitto`, `sinistri_partitadanno`.`importo_vendita`, `sinistri_partitadanno`.`data_bonifico`, `sinistri_partitadanno`.`acquirente`, `sinistri_partitadanno`.`data_voltura_demolizione`, `sinistri_partitadanno`.`note_gestione_relitti`, `sinistri_partitadanno`.`data_ricezione_citazione`, `sinistri_partitadanno`.`data_richiesta_pagamento_bo`, `sinistri_partitadanno`.`stato_chiusura`, `sinistri_partitadanno`.`tipo_chiusura_con_pagamento`, `sinistri_partitadanno`.`tipo_chiusura_senza_seguito`, `sinistri_partitadanno`.`tipo_chiusura_con_recupero`, `sinistri_partitadanno`.`data_apertura`, `sinistri_partitadanno`.`data_chiusura`, `sinistri_partitadanno`.`data_riapertura`, `sinistri_partitadanno`.`anagrafica_cessionario_id`, `sinistri_partitadanno`.`minore`, `sinistri_partitadanno`.`anag_esercente_potesta_id`, `sinistri_partitadanno`.`anagrafica_erede_id`, `sinistri_partitadanno`.`investigazione_antifrode`, `sinistri_partitadanno`.`data_appuntamento_perito`, `sinistri_partitadanno`.`data_ricezione_doc_completi`, `sinistri_partitadanno`.`data_incarico_agenzia_pa`, `sinistri_partitadanno`.`data_ricezione_doc_da_agenzia_pa`, `sinistri_partitadanno`.`veicolo_in_deposito`, `sinistri_partitadanno`.`veicolo_sotto_sequestro`, `sinistri_partitadanno`.`data_dissequestro`, `sinistri_partitadanno`.`data_invio_offerta_vendita`, `sinistri_partitadanno`.`tipologia_perizia_medica`, `sinistri_partitadanno`.`nominativo_medico`, `sinistri_partitadanno`.`data_appuntamento`, `sinistri_partitadanno`.`data_interlocutoria_medica`, `sinistri_partitadanno`.`data_perizia_medica_negativa`, `sinistri_partitadanno`.`motivazioni_perizia_medica_negativa`, `sinistri_partitadanno`.`tassa_di_registro`, `sinistri_partitadanno`.`data_pagamento_tassa_di_registro`, `sinistri_partitadanno`.`negoziazione_assistita`, `sinistri_partitadanno`.`causa_penale`, `sinistri_partitadanno`.`data_prossima_udienza`, `sinistri_partitadanno`.`note_causa_penale`, `sinistri_partitadanno`.`decreto_ingiuntivo`, `sinistri_partitadanno`.`tipologia_rito`, `sinistri_partitadanno`.`fase_procedura`, `sinistri_partitadanno`.`conclusione`, `sinistri_partitadanno`.`numero_di_polizza`, `sinistri_partitadanno`.`conducente`, `sinistri_partitadanno`.`anagrafica_conducente_id`, `sinistri_partitadanno`.`estratti_autentici`, `sinistri_partitadanno`.`nome_notaio`, `sinistri_partitadanno`.`data_autentica`, `sinistri_partitadanno`.`data_invio_messa_in_mora`, `sinistri_partitadanno`.`data_ricezione_messa_in_mora`, `sinistri_partitadanno`.`data_notifica_titolo_giudiziale`, `sinistri_partitadanno`.`atto_di_precetto`, `sinistri_partitadanno`.`data_notifica_precetto`, `sinistri_partitadanno`.`somme_incassate`, `sinistri_partitadanno`.`data_incasso`, `sinistri_partitadanno`.`n_decreto_ingiuntivo`, `sinistri_partitadanno`.`r_g_decreto_ingiuntivo`, `sinistri_partitadanno`.`data_emissione_decreto_ingiuntivo`, `sinistri_partitadanno`.`data_notifica_decreto_ingiuntivo`, `sinistri_partitadanno`.`capitale_liquidato_decreto_ingiuntivo`, `sinistri_partitadanno`.`onorari_liquidati_decreto_ingiuntivo`, `sinistri_partitadanno`.`data_apposizione_formula_esecutiva`, `sinistri_partitadanno`.`importo_atto_di_precetto`, `sinistri_partitadanno`.`data_notifica_precetto_rinnovazione`, `sinistri_partitadanno`.`data_notifica_pignoramento_mobiliare`, `sinistri_partitadanno`.`rge_pignoramento_mobiliare`, `sinistri_partitadanno`.`data_notifica_pignoramento_presso_terzi`, `sinistri_partitadanno`.`iscrizione_ruolo_pignoramento_presso_terzi`, `sinistri_partitadanno`.`rge_pignoramento_presso_terzi`, `sinistri_partitadanno`.`data_notifica_pignoramento_immobiliare`, `sinistri_partitadanno`.`iscrizione_ruolo_pignoramento_immobiliare`, `sinistri_partitadanno`.`rge_pignoramento_immobiliare`, `sinistri_partitadanno`.`data_trascrizione_pignoramento_immobiliare`, `sinistri_partitadanno`.`anagrafica_amministratore_id`, `sinistri_partitadanno`.`progressivo_uci_ctp`, `sinistri_partitadanno`.`codice_servizio`, `sinistri_partitadanno`.`lotto_affidato`, `sinistri_partitadanno`.`importo_affidato`, `sinistri_partitadanno`.`capitale_azionato`, `sinistri_partitadanno`.`calc_capitale_azionato`, `sinistri_partitadanno`.`note_recupero_crediti`, `sinistri_partitadanno`.`note_status`, `sinistri_partitadanno`.`note_gestione`, `sinistri_incarico`.`id`, `sinistri_incarico`.`created`, `sinistri_incarico`.`modified`, `sinistri_incarico`.`ufficio_sled`, `sinistri_incarico`.`natura_incarico`, `sinistri_incarico`.`naturaincarico_id`, `sinistri_incarico`.`progetto_id`, `sinistri_incarico`.`status`, `sinistri_incarico`.`data_chiusura_incarico`, `sinistri_incarico`.`numero`, `sinistri_incarico`.`anno`, `sinistri_incarico`.`numero_incarico_sled`, `sinistri_incarico`.`avvocato_incaricato_id`, `sinistri_incarico`.`data_affidamento_avvocato`, `sinistri_incarico`.`cliente_intestatario_fattura_id`, `sinistri_incarico`.`fronter_cliente`, `sinistri_incarico`.`intermediario_id`, `sinistri_incarico`.`numero_incarico_cliente`, `sinistri_incarico`.`locatario_id`, `sinistri_incarico`.`numero_di_polizza`, `sinistri_incarico`.`codice_evento`, `sinistri_incarico`.`tipologia_sinistro`, `sinistri_incarico`.`riserva_totale`, `sinistri_incarico`.`onorari_ed_anticipazioni_sled`, `sinistri_incarico`.`data_fattura_onorari_ed_anticipazioni_sled`, `sinistri_incarico`.`numero_fattura_sled`, `sinistri_incarico`.`data_incasso_onorari_ed_anticipazioni_sled`, `sinistri_incarico`.`totale_pagato_come_risarcimento`, `sinistri_incarico`.`totale_onorari_ed_anticipazioni_pagati_a_sled`, `sinistri_incarico`.`totale_importi_recuperati_da_sled`, `sinistri_incarico`.`costo_totale_del_sinistro`, `sinistri_incarico`.`data_ricezione_fondi`, `sinistri_incarico`.`data_del_sinistro`, `sinistri_incarico`.`ora_del_sinistro`, `sinistri_incarico`.`luogo_del_sinistro`, `sinistri_incarico`.`stato`, `sinistri_incarico`.`data_ricezione_incarico_sled`, `sinistri_incarico`.`data_apertura_incarico_sled`, `sinistri_incarico`.`data_riapertura_incarico_sled`, `sinistri_incarico`.`targa_veicolo_assicurato`, `sinistri_incarico`.`assicurato`, `sinistri_incarico`.`proprietario`, `sinistri_incarico`.`nome_conducente`, `sinistri_incarico`.`cognome_conducente`, `sinistri_incarico`.`modello_veicolo`, `sinistri_incarico`.`blackbox`, `sinistri_incarico`.`data_installazione_blackbox`, `sinistri_incarico`.`cai_2f`, `sinistri_incarico`.`data_ricezione_cai_2f`, `sinistri_incarico`.`autorita_intervenuta`, `sinistri_incarico`.`verbale_autorita`, `sinistri_incarico`.`data_ricezione_verbale_autorita`, `sinistri_incarico`.`testimonianza`, `sinistri_incarico`.`nome_testimone`, `sinistri_incarico`.`testimone_id`, `sinistri_incarico`.`data_ricezione_testimonianza`, `sinistri_incarico`.`denuncia_di_sinistro`, `sinistri_incarico`.`data_ricezione_denuncia_sx`, `sinistri_incarico`.`franchigia_si_no`, `sinistri_incarico`.`importo_franchigia`, `sinistri_incarico`.`data_richiesta_copertura`, `sinistri_incarico`.`esito_copertura`, `sinistri_incarico`.`data_inizio_copertura`, `sinistri_incarico`.`data_fine_copertura`, `sinistri_incarico`.`note_sulla_copertura`, `sinistri_incarico`.`garanzia`, `sinistri_incarico`.`pagamento_totale_del_sinistro`, `sinistri_incarico`.`sinistro_con_lesioni_personali_gravi_o_gravissime`, `sinistri_incarico`.`sinistro_mortale`, `sinistri_incarico`.`liq_cliente_incaricato`, `sinistri_incarico`.`scoperto`, `sinistri_incarico`.`limite_di_indennizzo`, `sinistri_incarico`.`esclusioni_di_polizza`, `sinistri_incarico`.`note_aggiuntive`, `sinistri_incarico`.`importo_da_recuperare`, `sinistri_incarico`.`calc_importo_da_recuperare`, `sinistri_incarico`.`importo_prescritto`, `sinistri_incarico`.`fermo_tecnico`, `sinistri_incarico`.`tutela_legale`, `sinistri_incarico`.`compagnia_tutela_legale_id`, `sinistri_incarico`.`data_effetto_polizza`, `sinistri_incarico`.`data_scadenza_polizza`, `sinistri_incarico`.`data_pagamento_premio`, `sinistri_incarico`.`codice_agenzia`, `sinistri_incarico`.`onorari`, `sinistri_incarico`.`pratica_procurata_da_id`, `sinistri_incarico`.`rif_uci`, `sinistri_incarico`.`liquidatore_uci`, `sinistri_incarico`.`rif_consap`, `sinistri_incarico`.`branding`, `sinistri_naturaincarico`.`id`, `sinistri_naturaincarico`.`label`, `sinistri_naturaincarico`.`codice_fatturazione`, `sinistri_naturaincarico`.`tipo_workflow`, T5.`id`, T5.`created`, T5.`modified`, T5.`tipo`, T5.`denominazione_giuridica`, T5.`titolo`, T5.`cognome_ragione_sociale`, T5.`nome`, T5.`rappr_sinistro_in_italia`, T5.`indirizzo`, T5.`citta`, T5.`provincia`, T5.`cap`, T5.`telefono`, T5.`fax`, T5.`cellulare`, T5.`email`, T5.`pec`, T5.`codice_fiscale`, T5.`partita_iva`, T5.`professione`, T5.`luogo_di_nascita`, T5.`data_di_nascita`, T5.`sesso`, T5.`codice_iban`, T5.`note`, T5.`blacklist`, T5.`tempi_pagamento`, T5.`codice_fatturazione`, T5.`naturacosto_id`, T5.`nazione`, T5.`codice_uci`, T5.`importata_da_xml` FROM `sinistri_partitadanno` INNER JOIN `sinistri_incarico` ON (`sinistri_partitadanno`.`incarico_id` = `sinistri_incarico`.`id`) LEFT OUTER JOIN `sinistri_naturaincarico` ON (`sinistri_incarico`.`naturaincarico_id` = `sinistri_naturaincarico`.`id`) LEFT OUTER JOIN `sinistri_anagrafica` T5 ON (`sinistri_partitadanno`.`anagrafica_assctp_id` = T5.`id`) WHERE `sinistri_incarico`.`cliente_intestatario_fattura_id` = 8006 ORDER BY `sinistri_incarico`.`anno` DESC, `sinistri_incarico`.`numero` DESC, `sinistri_partitadanno`.`id` ASC
This is the result of queryset.explain():
-> Sort row IDs: sinistri_incarico.anno DESC, sinistri_incarico.numero DESC, sinistri_partitadanno.id (actual time=163.955..172.010 rows=1761 loops=1)
-> Table scan on <temporary> (cost=2518.91..2544.83 rows=1874) (actual time=153.040..163.135 rows=1761 loops=1)
-> Temporary table (cost=2518.90..2518.90 rows=1874) (actual time=153.007..153.007 rows=1761 loops=1)
-> Nested loop left join (cost=2331.46 rows=1874) (actual time=0.247..62.163 rows=1761 loops=1)
-> Nested loop inner join (cost=1675.43 rows=1874) (actual time=0.224..52.547 rows=1761 loops=1)
-> Nested loop left join (cost=1019.40 rows=1677) (actual time=0.089..17.068 rows=1677 loops=1)
-> Index lookup on sinistri_incarico using b842d1d4b6d5fa98d8dc06d2c92e02c5 (cliente_intestatario_fattura_id=8006) (cost=432.45 rows=1677) (actual time=0.081..15.586 rows=1677 loops=1)
-> Single-row index lookup on sinistri_naturaincarico using PRIMARY (id=sinistri_incarico.naturaincarico_id) (cost=0.25 rows=1) (actual time=0.001..0.001 rows=1 loops=1677)
-> Index lookup on sinistri_partitadanno using sinistri_pa_incarico_id_398e55f0a3d8c2c0_fk_sinistri_incarico_id (incarico_id=sinistri_incarico.id) (cost=0.28 rows=1) (actual time=0.018..0.021 rows=1 loops=1677)
-> Single-row index lookup on T5 using PRIMARY (id=sinistri_partitadanno.anagrafica_assctp_id) (cost=0.25 rows=1) (actual time=0.005..0.005 rows=1 loops=1761)
Of course it slow down when I start iterating the queryset with a for loop. I don't use iterator() on the complex query because I'm using prefetch_related.
--- First Edit ---
This is the result for raw query in phpmyadmin. It's really fast (3.5 seconds to make the query and build the table with results)
Thanks for your help!

Index Not use in basic query

Having the table block:
CREATE TABLE IF NOT EXISTS "block" (
"hash" char(66) CONSTRAINT block_pk PRIMARY KEY,
"size" text,
"miner" text ,
"nonce" text,
"number" text,
"number_int" integer not null,
"gasused" text ,
"mixhash" text ,
"gaslimit" text ,
"extradata" text ,
"logsbloom" text,
"stateroot" char(66) ,
"timestamp" text ,
"difficulty" text ,
"parenthash" char(66) ,
"sha3uncles" char(66) ,
"receiptsroot" char(66),
"totaldifficulty" text ,
"transactionsroot" char(66)
);
CREATE INDEX number_int_index ON block (number_int);
The table has about 3M of rows , when a query a simple query the results are:
EXPLAIN ANALYZE select number_int from block where number_int > 1999999 and number_int < 2999999 order by number_int desc limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Limit (cost=110.00..110.00 rows=1 width=4) (actual time=16154.891..16154.894 rows=1 loops=1)
-> Sort (cost=110.00..112.50 rows=1000 width=4) (actual time=16154.890..16154.890 rows=1 loops=1)
Sort Key: number_int DESC
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on block (cost=0.00..105.00 rows=1000 width=4) (actual time=172.766..16126.135 rows=190186 loops=1)
Remote Filter: ((number_int > 1999999) AND (number_int < 2999999))
Planning Time: 19.961 ms
Execution Time: 16155.382 ms
Peak Memory Usage: 1113 kB
(9 rows)
any advice?
Regards
I tried something I've found here in stackoverflow with the same result
select number_int from block where number_int > 1999999 and number_int < 2999999 order by number_int+0 desc limit 1;
Hi the problem was related to yugabyte, there was not a issue with a index or with other stuff related with postgres, I ended up migrated to a self-managed database, but at least yugabyte is fully compatible with postgres because I migrated with pg_dump without any problem. It worth it when you are starting if you don't want to manage the database server.

Databricks AnalysisException: Column 'l' does not exist

I have a very strange occurrence with my code.
I keep on getting the error
AnalysisException: Column 'homepage_url' does not exist
However, when I do a select with cross Joins the column does actually exist.
Can someone take a look at my cross joins and let me know if that is where the problem is
SELECT DISTINCT
account.xpd_relationshipstatus AS CRM_xpd_relationshipstatus
,REPLACE(owneridname,'Data.Import #','') AS MontaguOwner
,account.ts_montaguoffice AS Montagu_Office
,CAST(account.ts_reminderdatesetto AS DATE) AS CRM_ts_reminderdatesetto
,CAST(account.ts_lastdatestatuschanged AS DATE) AS YearofCRMtslastdatestatuschanged
,organizations.name AS nameCB
,organizations.homepage_url
,iff(e like 'www.%', e, 'www.' + e) AS website
,left(category_list,charindex(',',category_list +',' )-1) AS category_CB
-- ,case when charindex(',',category_list,0) > 0 then left(category_list,charindex(',',category_list)-1) else category_list end as category_CB
,organizations.category_groups_list AS category_groups_CB
FROM basecrmcbreport.account
LEFT OUTER JOIN basecrmcbreport.CRM2CBURL_Lookup
ON account.Id = CRM2CBURL_Lookup.Key
LEFT OUTER JOIN basecrmcbreport.organizations
ON CRM2CBURL_Lookup.CB_URL_KEY = organizations.cb_url
cross Join (values (charindex('://', homepage_url))) a(a)
cross Join (values (iff(a = 0, 1, a + 3))) b(b)
cross Join (values (charindex('/', homepage_url, b))) c(c)
cross Join (values (iff(c = 0, length(homepage_url) + 1, c))) d(d)
cross Join (values (substring(homepage_url, b, d - b))) e(e)
Without the cross Joins
The main reason for cross join (or any join) to recognize the column when you select not when using table valued functions is that joins are used on tables only.
To use table valued functions, one must use cross apply or outer apply. But these are not supported in Databricks sql.
The following is the demo data I am using:
I tried using inner join on a table valued function using the following query and got the same error:
select d1.*,a from demo1 inner join (values(if(d1.team = 'OG',2,1))) a;
Instead, using the select query, the joins work as that is how they function:
select d1.*,a.no_of_wins from demo1 d1 inner join (select id,case team when 'OG' then 2 when 'TS' then 1 end as no_of_wins from demo1) a on d1.id=a.id;
So, the remedy for this problem is to replace all the table valued functions on which you are using joins with SELECT statements.

Postgresql Query is taking over 30s, Failing on Heroku

I have two tables (tb_accounts and tb_similar_accounts) in my PostgreSQL database.
Each record in tb_accounts has a unique id field.
And in tb_similar_accounts table, we store the relationships between accounts. We have around 10 million records in this table.
I get around 1000~4000 accounts in one of my API endpoints, and I want to get all the links from tb_similar_accounts
const ids_str = `${ids.map((id) => `('${id}')`).join(`,`)}`;
const pgQuery = `
SELECT account_1 as source, account_2 as target, strength as weight FROM tb_similar_accounts
WHERE account_1 = ANY (VALUES ${ids_str}) AND account_2 = ANY (VALUES ${ids_str})
ORDER BY strength DESC
LIMIT 10000
`;
const { rows: visEdges } = await pgClient.query(pgQuery);
This is what I'm doing here, but it's taking too much time. It takes over 30 seconds so it's failing on the Heroku server.
Limit (cost=296527.60..297597.60 rows=10000 width=30) (actual time=25075.977..25078.602 rows=10000 loops=1)
Buffers: shared hit=19 read=121634
I/O Timings: read=22104.502
-> Gather Merge (cost=296527.60..297967.40 rows=13456 width=30) (actual time=25075.975..25080.332 rows=10000 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=110 read=244343
I/O Timings: read=44196.556
-> Sort (cost=295527.60..295534.33 rows=13456 width=30) (actual time=25070.720..25071.022 rows=5546 loops=2)
Sort Key: tb_similar_accounts.strength DESC
Sort Method: top-N heapsort Memory: 1550kB
Worker 0: Sort Method: top-N heapsort Memory: 1550kB
Buffers: shared hit=110 read=244343
I/O Timings: read=44196.556
-> Hash Semi Join (cost=5.74..295343.04 rows=13456 width=30) (actual time=1040.173..25060.553 rows=38449 loops=2)
Hash Cond: (tb_similar_accounts.account_1 = "*VALUES*".column1)
Buffers: shared hit=63 read=244343
I/O Timings: read=44196.556
-> Hash Semi Join (cost=2.87..295096.26 rows=381936 width=30) (actual time=2.197..25039.864 rows=80874 loops=2)
Hash Cond: (tb_similar_accounts.account_2 = "*VALUES*_1".column1)
Buffers: shared hit=33 read=244343
I/O Timings: read=44196.556
-> Parallel Seq Scan on tb_similar_accounts (cost=0.00..286491.44 rows=14038480 width=30) (actual time=0.032..23394.824 rows=11932708 loops=2)
Buffers: shared hit=33 read=244343
I/O Timings: read=44196.556
-> Hash (cost=1.44..1.44 rows=410 width=32) (actual time=0.241..0.242 rows=410 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 26kB
-> Values Scan on "*VALUES*_1" (cost=0.00..1.44 rows=410 width=32) (actual time=0.001..0.153 rows=410 loops=2)
-> Hash (cost=1.44..1.44 rows=410 width=32) (actual time=0.198..0.198 rows=410 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 26kB
-> Values Scan on "*VALUES*" (cost=0.00..1.44 rows=410 width=32) (actual time=0.002..0.113 rows=410 loops=2)
Planning Time: 3.522 ms
Execution Time: 25081.725 ms
This is for 410 accounts and takes around 25 seconds.
Is there any way to improve this query? (I'm using Node.js and pg module.)
Check if both your tables are indexed properly i.e. try to build index on the columns which are used in joining two tables.

How to make subquery fast

for an author overview we are looking for a query which will show all the authors including their best book. The problem with this query is that it lacks speed. There are only about 1500 authors and the query do generate the overview is currently taking 20 seconds.
The main problem seems te be generating the average rating of all the books per person.
By selecting the following query, it is still rather fast
select
person.id as pers_id,
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate) as year,
thriller.id as thrill_id,
count(user_rating.id) as nr,
AVG(user_rating.rating) as avgrating
from
thriller
inner join
thriller_form
on thriller_form.thriller_id = thriller.id
inner join
thriller_person
on thriller_person.thriller_id = thriller.id
and thriller_person.person_type_id = 1
inner join
person
on person.id = thriller_person.person_id
left outer join
user_rating
on user_rating.thriller_id = thriller.id
and user_rating.rating_type_id = 1
where thriller.id in
(select top 1 B.id from thriller as B
inner join thriller_person as C on B.id=C.thriller_id
and person.id=C.person_id)
group by
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate),
thriller.id,
person.id
order by
person.lastname
However, if we make the subquery a little more complex by selecting the book with the average rating it takes a full 20 seconds to generate a resultset.
The query would then be as follows:
select
person.id as pers_id,
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate) as year,
thriller.id as thrill_id,
count(user_rating.id) as nr,
AVG(user_rating.rating) as avgrating
from
thriller
inner join
thriller_form
on thriller_form.thriller_id = thriller.id
inner join
thriller_person
on thriller_person.thriller_id = thriller.id
and thriller_person.person_type_id = 1
inner join
person
on person.id = thriller_person.person_id
left outer join
user_rating
on user_rating.thriller_id = thriller.id
and user_rating.rating_type_id = 1
where thriller.id in
(select top 1 B.id from thriller as B
inner join thriller_person as C on B.id=C.thriller_id
and person.id=C.person_id
inner join user_rating as D on B.id=D.thriller_id
group by B.id
order by AVG(D.rating))
group by
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate),
thriller.id,
person.id
order by
person.lastname
Anyone got a good suggestion to speed up this query?
Calculating an average requires a table scan since you've got to sum the values and then divide by the number of (relevant) rows. This in turn means that you're doing a lot of rescanning; that's slow. Can you calculate the averages once and store them? That would let your query use those pre-computed values. (Yes, it denormalizes the data, but denormalizing for performance is often necessary; there's a trade-off between performance and minimal data.)
It might be appropriate to use a temporary table as the store of the averages.

Resources