Postgresql Query is taking over 30s, Failing on Heroku - node.js

I have two tables (tb_accounts and tb_similar_accounts) in my PostgreSQL database.
Each record in tb_accounts has a unique id field.
And in tb_similar_accounts table, we store the relationships between accounts. We have around 10 million records in this table.
I get around 1000~4000 accounts in one of my API endpoints, and I want to get all the links from tb_similar_accounts
const ids_str = `${ids.map((id) => `('${id}')`).join(`,`)}`;
const pgQuery = `
SELECT account_1 as source, account_2 as target, strength as weight FROM tb_similar_accounts
WHERE account_1 = ANY (VALUES ${ids_str}) AND account_2 = ANY (VALUES ${ids_str})
ORDER BY strength DESC
LIMIT 10000
`;
const { rows: visEdges } = await pgClient.query(pgQuery);
This is what I'm doing here, but it's taking too much time. It takes over 30 seconds so it's failing on the Heroku server.
Limit (cost=296527.60..297597.60 rows=10000 width=30) (actual time=25075.977..25078.602 rows=10000 loops=1)
Buffers: shared hit=19 read=121634
I/O Timings: read=22104.502
-> Gather Merge (cost=296527.60..297967.40 rows=13456 width=30) (actual time=25075.975..25080.332 rows=10000 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=110 read=244343
I/O Timings: read=44196.556
-> Sort (cost=295527.60..295534.33 rows=13456 width=30) (actual time=25070.720..25071.022 rows=5546 loops=2)
Sort Key: tb_similar_accounts.strength DESC
Sort Method: top-N heapsort Memory: 1550kB
Worker 0: Sort Method: top-N heapsort Memory: 1550kB
Buffers: shared hit=110 read=244343
I/O Timings: read=44196.556
-> Hash Semi Join (cost=5.74..295343.04 rows=13456 width=30) (actual time=1040.173..25060.553 rows=38449 loops=2)
Hash Cond: (tb_similar_accounts.account_1 = "*VALUES*".column1)
Buffers: shared hit=63 read=244343
I/O Timings: read=44196.556
-> Hash Semi Join (cost=2.87..295096.26 rows=381936 width=30) (actual time=2.197..25039.864 rows=80874 loops=2)
Hash Cond: (tb_similar_accounts.account_2 = "*VALUES*_1".column1)
Buffers: shared hit=33 read=244343
I/O Timings: read=44196.556
-> Parallel Seq Scan on tb_similar_accounts (cost=0.00..286491.44 rows=14038480 width=30) (actual time=0.032..23394.824 rows=11932708 loops=2)
Buffers: shared hit=33 read=244343
I/O Timings: read=44196.556
-> Hash (cost=1.44..1.44 rows=410 width=32) (actual time=0.241..0.242 rows=410 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 26kB
-> Values Scan on "*VALUES*_1" (cost=0.00..1.44 rows=410 width=32) (actual time=0.001..0.153 rows=410 loops=2)
-> Hash (cost=1.44..1.44 rows=410 width=32) (actual time=0.198..0.198 rows=410 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 26kB
-> Values Scan on "*VALUES*" (cost=0.00..1.44 rows=410 width=32) (actual time=0.002..0.113 rows=410 loops=2)
Planning Time: 3.522 ms
Execution Time: 25081.725 ms
This is for 410 accounts and takes around 25 seconds.
Is there any way to improve this query? (I'm using Node.js and pg module.)

Check if both your tables are indexed properly i.e. try to build index on the columns which are used in joining two tables.

Related

Very slow database query migrating from django 1.8 to django 3.2

I migrated a project from django1.8 to django3.2. I'm using the same database but the function I use to build a report is four times slower (8 seconds on Django1.8 and 30/40 seconds with django3.2). The database is MySQL version 5.7 (I also tried version 8.0.32 but nothing changed).
This is the query:
qs = PartitaDanno.objects.filter(incarico__cliente_intestatario_fattura_id=8006
).select_related('incarico', 'anagrafica_assctp', 'incarico__naturaincarico'
).prefetch_related('rate'
).order_by('-incarico__anno', '-incarico__numero', 'pk')
PartitaDanno is a table with 16000 rows and the model has 215 fields (I know..I didn't write it). The result of this query are just 1700 rows..a very small result.
The unusual thing is that even if I use a simple query on this model like
qs = PartitaDanno.objects.filter(incarico__cliente_intestatario_fattura_id=8006)
It takes 20 seconds to iterate through result of this basic query..I don't understand.
The raw sql query is the same in both versions of Django. This is the first queryset raw query:
SELECT `sinistri_partitadanno`.`id`, `sinistri_partitadanno`.`created`, `sinistri_partitadanno`.`modified`, `sinistri_partitadanno`.`incarico_id`, `sinistri_partitadanno`.`tipo`, `sinistri_partitadanno`.`tipologia_controparte`, `sinistri_partitadanno`.`fase`, `sinistri_partitadanno`.`competenza_giurisdizionale`, `sinistri_partitadanno`.`soggetto`, `sinistri_partitadanno`.`anagrafica_assctp_id`, `sinistri_partitadanno`.`luogo`, `sinistri_partitadanno`.`compagnia_id`, `sinistri_partitadanno`.`riferimento_compagnia`, `sinistri_partitadanno`.`tipo_di_danno`, `sinistri_partitadanno`.`tipo_lesioni`, `sinistri_partitadanno`.`tipologia_vittima`, `sinistri_partitadanno`.`modello_veicolo`, `sinistri_partitadanno`.`targa_veicolo`, `sinistri_partitadanno`.`riserva_iniziale`, `sinistri_partitadanno`.`riserva_parziale`, `sinistri_partitadanno`.`data_richiesta_danni_ctp`, `sinistri_partitadanno`.`reiezione_cautelativa`, `sinistri_partitadanno`.`data_reiezione_cautelativa`, `sinistri_partitadanno`.`importo_richiesto_da_ctp`, `sinistri_partitadanno`.`data_incarico_nostro_perito`, `sinistri_partitadanno`.`tipologia_perizia`, `sinistri_partitadanno`.`nominativo_perito`, `sinistri_partitadanno`.`data_perizia_negativa`, `sinistri_partitadanno`.`motivazioni_perizia_negativa`, `sinistri_partitadanno`.`data_interlocutoria_inviata_dal_perito`, `sinistri_partitadanno`.`importo_stimato_dal_perito`, `sinistri_partitadanno`.`data_ricezione_perizia_dal_nostro_perito`, `sinistri_partitadanno`.`riparazione_antieconomica`, `sinistri_partitadanno`.`data_incarico_medico_legale`, `sinistri_partitadanno`.`data_ricezione_perizia_dal_nostro_medico_legale`, `sinistri_partitadanno`.`data_invio_offerta_tramite_transazione_quietanza`, `sinistri_partitadanno`.`data_reiezione_indennizzo_risarcimento`, `sinistri_partitadanno`.`pagamento_parziale`, `sinistri_partitadanno`.`capitale`, `sinistri_partitadanno`.`data_pagamento_capitale`, `sinistri_partitadanno`.`anagrafica_patrocinatore_id`, `sinistri_partitadanno`.`tipo_patrocinatore`, `sinistri_partitadanno`.`cf_piva_patrocinatore`, `sinistri_partitadanno`.`numero_rif_patrocinatore`, `sinistri_partitadanno`.`spese_legali`, `sinistri_partitadanno`.`data_pagamento_spese_legali`, `sinistri_partitadanno`.`integrazione_capitale`, `sinistri_partitadanno`.`data_pag_integraz_capitale`, `sinistri_partitadanno`.`integrazione_spese_legali`, `sinistri_partitadanno`.`data_pag_integraz_spese_legali`, `sinistri_partitadanno`.`note_generali`, `sinistri_partitadanno`.`note_pagamenti`, `sinistri_partitadanno`.`cessione_del_credito`, `sinistri_partitadanno`.`veicolo_sostitutivo`, `sinistri_partitadanno`.`importo_noleggio_veicolo_sostitutivo`, `sinistri_partitadanno`.`altre_spese_accessorie_richieste`, `sinistri_partitadanno`.`importo_altre_spese_accessorie_richieste`, `sinistri_partitadanno`.`importo_onorari_richiesti`, `sinistri_partitadanno`.`liquidazione_diretta_senza_perizia`, `sinistri_partitadanno`.`offerta_lesioni_a_stralcio`, `sinistri_partitadanno`.`richiesto_nulla_osta_giudice_tutelare`, `sinistri_partitadanno`.`data_ricezione_nulla_osta_giudice_tutelare`, `sinistri_partitadanno`.`note_fase_precontenziosa`, `sinistri_partitadanno`.`certificato_di_chiusura_malattia`, `sinistri_partitadanno`.`data_ricezione_cert_chiusura_malattia`, `sinistri_partitadanno`.`richiesta_accesso_agli_atti`, `sinistri_partitadanno`.`data_ricezione_richiesta_accesso_agli_atti`, `sinistri_partitadanno`.`data_risposta_accesso_agli_atti`, `sinistri_partitadanno`.`note_accesso_agli_atti`, `sinistri_partitadanno`.`atto_di_citazione`, `sinistri_partitadanno`.`data_notifica_atto_di_citazione`, `sinistri_partitadanno`.`valore_causa`, `sinistri_partitadanno`.`valore_giudiziale_riservato`, `sinistri_partitadanno`.`mediazione`, `sinistri_partitadanno`.`data_notifica_mediazione`, `sinistri_partitadanno`.`data_risposta_mediazione`, `sinistri_partitadanno`.`note_su_adr`, `sinistri_partitadanno`.`domiciliatario_sled_id`, `sinistri_partitadanno`.`autorita_evocata_id`, `sinistri_partitadanno`.`nome_del_giudice`, `sinistri_partitadanno`.`sezione_tribunale`, `sinistri_partitadanno`.`ruolo_generale`, `sinistri_partitadanno`.`prima_udienza_in_atto_introduttivo`, `sinistri_partitadanno`.`adempimento_udienza`, `sinistri_partitadanno`.`rinvio_all_udienza_del`, `sinistri_partitadanno`.`scadenza_adempimento`, `sinistri_partitadanno`.`tipologia_adempimento`, `sinistri_partitadanno`.`numero_sentenza`, `sinistri_partitadanno`.`data_ricezione_sentenza`, `sinistri_partitadanno`.`pubblicazione_sentenza`, `sinistri_partitadanno`.`data_prima_udienza`, `sinistri_partitadanno`.`note`, `sinistri_partitadanno`.`adempimento_svolto`, `sinistri_partitadanno`.`reclamo_ivass`, `sinistri_partitadanno`.`data_ricezione_reclamo_ivass`, `sinistri_partitadanno`.`studio_legale_incaricato`, `sinistri_partitadanno`.`dettaglio_riserva_rivalsa`, `sinistri_partitadanno`.`dettaglio_esborsi_pagati_sled`, `sinistri_partitadanno`.`descrizione_esborsi_pagati_sled`, `sinistri_partitadanno`.`dettaglio_importi_recuperati_da_sled`, `sinistri_partitadanno`.`ricevuta_richiesta_danni_ctp`, `sinistri_partitadanno`.`data_ricezione_richiesta_danni_ctp`, `sinistri_partitadanno`.`termini_legali_di_risposta_ctp`, `sinistri_partitadanno`.`motivi_reiezione_cautelativa`, `sinistri_partitadanno`.`data_integraz_elementi_ctp`, `sinistri_partitadanno`.`data_ricezione_transazione_quietanza_firmata`, `sinistri_partitadanno`.`data_ricezione_rifiuto_offerta`, `sinistri_partitadanno`.`motivi_reiezione_indennizzo`, `sinistri_partitadanno`.`clausole_particolari`, `sinistri_partitadanno`.`incarico_gestione_relitto`, `sinistri_partitadanno`.`destinazione_relitto`, `sinistri_partitadanno`.`data_ricezione_incarico`, `sinistri_partitadanno`.`valore_commerciale`, `sinistri_partitadanno`.`spese_diverse_relitto`, `sinistri_partitadanno`.`tipologia_spesa_relitto`, `sinistri_partitadanno`.`importo_vendita`, `sinistri_partitadanno`.`data_bonifico`, `sinistri_partitadanno`.`acquirente`, `sinistri_partitadanno`.`data_voltura_demolizione`, `sinistri_partitadanno`.`note_gestione_relitti`, `sinistri_partitadanno`.`data_ricezione_citazione`, `sinistri_partitadanno`.`data_richiesta_pagamento_bo`, `sinistri_partitadanno`.`stato_chiusura`, `sinistri_partitadanno`.`tipo_chiusura_con_pagamento`, `sinistri_partitadanno`.`tipo_chiusura_senza_seguito`, `sinistri_partitadanno`.`tipo_chiusura_con_recupero`, `sinistri_partitadanno`.`data_apertura`, `sinistri_partitadanno`.`data_chiusura`, `sinistri_partitadanno`.`data_riapertura`, `sinistri_partitadanno`.`anagrafica_cessionario_id`, `sinistri_partitadanno`.`minore`, `sinistri_partitadanno`.`anag_esercente_potesta_id`, `sinistri_partitadanno`.`anagrafica_erede_id`, `sinistri_partitadanno`.`investigazione_antifrode`, `sinistri_partitadanno`.`data_appuntamento_perito`, `sinistri_partitadanno`.`data_ricezione_doc_completi`, `sinistri_partitadanno`.`data_incarico_agenzia_pa`, `sinistri_partitadanno`.`data_ricezione_doc_da_agenzia_pa`, `sinistri_partitadanno`.`veicolo_in_deposito`, `sinistri_partitadanno`.`veicolo_sotto_sequestro`, `sinistri_partitadanno`.`data_dissequestro`, `sinistri_partitadanno`.`data_invio_offerta_vendita`, `sinistri_partitadanno`.`tipologia_perizia_medica`, `sinistri_partitadanno`.`nominativo_medico`, `sinistri_partitadanno`.`data_appuntamento`, `sinistri_partitadanno`.`data_interlocutoria_medica`, `sinistri_partitadanno`.`data_perizia_medica_negativa`, `sinistri_partitadanno`.`motivazioni_perizia_medica_negativa`, `sinistri_partitadanno`.`tassa_di_registro`, `sinistri_partitadanno`.`data_pagamento_tassa_di_registro`, `sinistri_partitadanno`.`negoziazione_assistita`, `sinistri_partitadanno`.`causa_penale`, `sinistri_partitadanno`.`data_prossima_udienza`, `sinistri_partitadanno`.`note_causa_penale`, `sinistri_partitadanno`.`decreto_ingiuntivo`, `sinistri_partitadanno`.`tipologia_rito`, `sinistri_partitadanno`.`fase_procedura`, `sinistri_partitadanno`.`conclusione`, `sinistri_partitadanno`.`numero_di_polizza`, `sinistri_partitadanno`.`conducente`, `sinistri_partitadanno`.`anagrafica_conducente_id`, `sinistri_partitadanno`.`estratti_autentici`, `sinistri_partitadanno`.`nome_notaio`, `sinistri_partitadanno`.`data_autentica`, `sinistri_partitadanno`.`data_invio_messa_in_mora`, `sinistri_partitadanno`.`data_ricezione_messa_in_mora`, `sinistri_partitadanno`.`data_notifica_titolo_giudiziale`, `sinistri_partitadanno`.`atto_di_precetto`, `sinistri_partitadanno`.`data_notifica_precetto`, `sinistri_partitadanno`.`somme_incassate`, `sinistri_partitadanno`.`data_incasso`, `sinistri_partitadanno`.`n_decreto_ingiuntivo`, `sinistri_partitadanno`.`r_g_decreto_ingiuntivo`, `sinistri_partitadanno`.`data_emissione_decreto_ingiuntivo`, `sinistri_partitadanno`.`data_notifica_decreto_ingiuntivo`, `sinistri_partitadanno`.`capitale_liquidato_decreto_ingiuntivo`, `sinistri_partitadanno`.`onorari_liquidati_decreto_ingiuntivo`, `sinistri_partitadanno`.`data_apposizione_formula_esecutiva`, `sinistri_partitadanno`.`importo_atto_di_precetto`, `sinistri_partitadanno`.`data_notifica_precetto_rinnovazione`, `sinistri_partitadanno`.`data_notifica_pignoramento_mobiliare`, `sinistri_partitadanno`.`rge_pignoramento_mobiliare`, `sinistri_partitadanno`.`data_notifica_pignoramento_presso_terzi`, `sinistri_partitadanno`.`iscrizione_ruolo_pignoramento_presso_terzi`, `sinistri_partitadanno`.`rge_pignoramento_presso_terzi`, `sinistri_partitadanno`.`data_notifica_pignoramento_immobiliare`, `sinistri_partitadanno`.`iscrizione_ruolo_pignoramento_immobiliare`, `sinistri_partitadanno`.`rge_pignoramento_immobiliare`, `sinistri_partitadanno`.`data_trascrizione_pignoramento_immobiliare`, `sinistri_partitadanno`.`anagrafica_amministratore_id`, `sinistri_partitadanno`.`progressivo_uci_ctp`, `sinistri_partitadanno`.`codice_servizio`, `sinistri_partitadanno`.`lotto_affidato`, `sinistri_partitadanno`.`importo_affidato`, `sinistri_partitadanno`.`capitale_azionato`, `sinistri_partitadanno`.`calc_capitale_azionato`, `sinistri_partitadanno`.`note_recupero_crediti`, `sinistri_partitadanno`.`note_status`, `sinistri_partitadanno`.`note_gestione`, `sinistri_incarico`.`id`, `sinistri_incarico`.`created`, `sinistri_incarico`.`modified`, `sinistri_incarico`.`ufficio_sled`, `sinistri_incarico`.`natura_incarico`, `sinistri_incarico`.`naturaincarico_id`, `sinistri_incarico`.`progetto_id`, `sinistri_incarico`.`status`, `sinistri_incarico`.`data_chiusura_incarico`, `sinistri_incarico`.`numero`, `sinistri_incarico`.`anno`, `sinistri_incarico`.`numero_incarico_sled`, `sinistri_incarico`.`avvocato_incaricato_id`, `sinistri_incarico`.`data_affidamento_avvocato`, `sinistri_incarico`.`cliente_intestatario_fattura_id`, `sinistri_incarico`.`fronter_cliente`, `sinistri_incarico`.`intermediario_id`, `sinistri_incarico`.`numero_incarico_cliente`, `sinistri_incarico`.`locatario_id`, `sinistri_incarico`.`numero_di_polizza`, `sinistri_incarico`.`codice_evento`, `sinistri_incarico`.`tipologia_sinistro`, `sinistri_incarico`.`riserva_totale`, `sinistri_incarico`.`onorari_ed_anticipazioni_sled`, `sinistri_incarico`.`data_fattura_onorari_ed_anticipazioni_sled`, `sinistri_incarico`.`numero_fattura_sled`, `sinistri_incarico`.`data_incasso_onorari_ed_anticipazioni_sled`, `sinistri_incarico`.`totale_pagato_come_risarcimento`, `sinistri_incarico`.`totale_onorari_ed_anticipazioni_pagati_a_sled`, `sinistri_incarico`.`totale_importi_recuperati_da_sled`, `sinistri_incarico`.`costo_totale_del_sinistro`, `sinistri_incarico`.`data_ricezione_fondi`, `sinistri_incarico`.`data_del_sinistro`, `sinistri_incarico`.`ora_del_sinistro`, `sinistri_incarico`.`luogo_del_sinistro`, `sinistri_incarico`.`stato`, `sinistri_incarico`.`data_ricezione_incarico_sled`, `sinistri_incarico`.`data_apertura_incarico_sled`, `sinistri_incarico`.`data_riapertura_incarico_sled`, `sinistri_incarico`.`targa_veicolo_assicurato`, `sinistri_incarico`.`assicurato`, `sinistri_incarico`.`proprietario`, `sinistri_incarico`.`nome_conducente`, `sinistri_incarico`.`cognome_conducente`, `sinistri_incarico`.`modello_veicolo`, `sinistri_incarico`.`blackbox`, `sinistri_incarico`.`data_installazione_blackbox`, `sinistri_incarico`.`cai_2f`, `sinistri_incarico`.`data_ricezione_cai_2f`, `sinistri_incarico`.`autorita_intervenuta`, `sinistri_incarico`.`verbale_autorita`, `sinistri_incarico`.`data_ricezione_verbale_autorita`, `sinistri_incarico`.`testimonianza`, `sinistri_incarico`.`nome_testimone`, `sinistri_incarico`.`testimone_id`, `sinistri_incarico`.`data_ricezione_testimonianza`, `sinistri_incarico`.`denuncia_di_sinistro`, `sinistri_incarico`.`data_ricezione_denuncia_sx`, `sinistri_incarico`.`franchigia_si_no`, `sinistri_incarico`.`importo_franchigia`, `sinistri_incarico`.`data_richiesta_copertura`, `sinistri_incarico`.`esito_copertura`, `sinistri_incarico`.`data_inizio_copertura`, `sinistri_incarico`.`data_fine_copertura`, `sinistri_incarico`.`note_sulla_copertura`, `sinistri_incarico`.`garanzia`, `sinistri_incarico`.`pagamento_totale_del_sinistro`, `sinistri_incarico`.`sinistro_con_lesioni_personali_gravi_o_gravissime`, `sinistri_incarico`.`sinistro_mortale`, `sinistri_incarico`.`liq_cliente_incaricato`, `sinistri_incarico`.`scoperto`, `sinistri_incarico`.`limite_di_indennizzo`, `sinistri_incarico`.`esclusioni_di_polizza`, `sinistri_incarico`.`note_aggiuntive`, `sinistri_incarico`.`importo_da_recuperare`, `sinistri_incarico`.`calc_importo_da_recuperare`, `sinistri_incarico`.`importo_prescritto`, `sinistri_incarico`.`fermo_tecnico`, `sinistri_incarico`.`tutela_legale`, `sinistri_incarico`.`compagnia_tutela_legale_id`, `sinistri_incarico`.`data_effetto_polizza`, `sinistri_incarico`.`data_scadenza_polizza`, `sinistri_incarico`.`data_pagamento_premio`, `sinistri_incarico`.`codice_agenzia`, `sinistri_incarico`.`onorari`, `sinistri_incarico`.`pratica_procurata_da_id`, `sinistri_incarico`.`rif_uci`, `sinistri_incarico`.`liquidatore_uci`, `sinistri_incarico`.`rif_consap`, `sinistri_incarico`.`branding`, `sinistri_naturaincarico`.`id`, `sinistri_naturaincarico`.`label`, `sinistri_naturaincarico`.`codice_fatturazione`, `sinistri_naturaincarico`.`tipo_workflow`, T5.`id`, T5.`created`, T5.`modified`, T5.`tipo`, T5.`denominazione_giuridica`, T5.`titolo`, T5.`cognome_ragione_sociale`, T5.`nome`, T5.`rappr_sinistro_in_italia`, T5.`indirizzo`, T5.`citta`, T5.`provincia`, T5.`cap`, T5.`telefono`, T5.`fax`, T5.`cellulare`, T5.`email`, T5.`pec`, T5.`codice_fiscale`, T5.`partita_iva`, T5.`professione`, T5.`luogo_di_nascita`, T5.`data_di_nascita`, T5.`sesso`, T5.`codice_iban`, T5.`note`, T5.`blacklist`, T5.`tempi_pagamento`, T5.`codice_fatturazione`, T5.`naturacosto_id`, T5.`nazione`, T5.`codice_uci`, T5.`importata_da_xml` FROM `sinistri_partitadanno` INNER JOIN `sinistri_incarico` ON (`sinistri_partitadanno`.`incarico_id` = `sinistri_incarico`.`id`) LEFT OUTER JOIN `sinistri_naturaincarico` ON (`sinistri_incarico`.`naturaincarico_id` = `sinistri_naturaincarico`.`id`) LEFT OUTER JOIN `sinistri_anagrafica` T5 ON (`sinistri_partitadanno`.`anagrafica_assctp_id` = T5.`id`) WHERE `sinistri_incarico`.`cliente_intestatario_fattura_id` = 8006 ORDER BY `sinistri_incarico`.`anno` DESC, `sinistri_incarico`.`numero` DESC, `sinistri_partitadanno`.`id` ASC
This is the result of queryset.explain():
-> Sort row IDs: sinistri_incarico.anno DESC, sinistri_incarico.numero DESC, sinistri_partitadanno.id (actual time=163.955..172.010 rows=1761 loops=1)
-> Table scan on <temporary> (cost=2518.91..2544.83 rows=1874) (actual time=153.040..163.135 rows=1761 loops=1)
-> Temporary table (cost=2518.90..2518.90 rows=1874) (actual time=153.007..153.007 rows=1761 loops=1)
-> Nested loop left join (cost=2331.46 rows=1874) (actual time=0.247..62.163 rows=1761 loops=1)
-> Nested loop inner join (cost=1675.43 rows=1874) (actual time=0.224..52.547 rows=1761 loops=1)
-> Nested loop left join (cost=1019.40 rows=1677) (actual time=0.089..17.068 rows=1677 loops=1)
-> Index lookup on sinistri_incarico using b842d1d4b6d5fa98d8dc06d2c92e02c5 (cliente_intestatario_fattura_id=8006) (cost=432.45 rows=1677) (actual time=0.081..15.586 rows=1677 loops=1)
-> Single-row index lookup on sinistri_naturaincarico using PRIMARY (id=sinistri_incarico.naturaincarico_id) (cost=0.25 rows=1) (actual time=0.001..0.001 rows=1 loops=1677)
-> Index lookup on sinistri_partitadanno using sinistri_pa_incarico_id_398e55f0a3d8c2c0_fk_sinistri_incarico_id (incarico_id=sinistri_incarico.id) (cost=0.28 rows=1) (actual time=0.018..0.021 rows=1 loops=1677)
-> Single-row index lookup on T5 using PRIMARY (id=sinistri_partitadanno.anagrafica_assctp_id) (cost=0.25 rows=1) (actual time=0.005..0.005 rows=1 loops=1761)
Of course it slow down when I start iterating the queryset with a for loop. I don't use iterator() on the complex query because I'm using prefetch_related.
--- First Edit ---
This is the result for raw query in phpmyadmin. It's really fast (3.5 seconds to make the query and build the table with results)
Thanks for your help!

Index Not use in basic query

Having the table block:
CREATE TABLE IF NOT EXISTS "block" (
"hash" char(66) CONSTRAINT block_pk PRIMARY KEY,
"size" text,
"miner" text ,
"nonce" text,
"number" text,
"number_int" integer not null,
"gasused" text ,
"mixhash" text ,
"gaslimit" text ,
"extradata" text ,
"logsbloom" text,
"stateroot" char(66) ,
"timestamp" text ,
"difficulty" text ,
"parenthash" char(66) ,
"sha3uncles" char(66) ,
"receiptsroot" char(66),
"totaldifficulty" text ,
"transactionsroot" char(66)
);
CREATE INDEX number_int_index ON block (number_int);
The table has about 3M of rows , when a query a simple query the results are:
EXPLAIN ANALYZE select number_int from block where number_int > 1999999 and number_int < 2999999 order by number_int desc limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Limit (cost=110.00..110.00 rows=1 width=4) (actual time=16154.891..16154.894 rows=1 loops=1)
-> Sort (cost=110.00..112.50 rows=1000 width=4) (actual time=16154.890..16154.890 rows=1 loops=1)
Sort Key: number_int DESC
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on block (cost=0.00..105.00 rows=1000 width=4) (actual time=172.766..16126.135 rows=190186 loops=1)
Remote Filter: ((number_int > 1999999) AND (number_int < 2999999))
Planning Time: 19.961 ms
Execution Time: 16155.382 ms
Peak Memory Usage: 1113 kB
(9 rows)
any advice?
Regards
I tried something I've found here in stackoverflow with the same result
select number_int from block where number_int > 1999999 and number_int < 2999999 order by number_int+0 desc limit 1;
Hi the problem was related to yugabyte, there was not a issue with a index or with other stuff related with postgres, I ended up migrated to a self-managed database, but at least yugabyte is fully compatible with postgres because I migrated with pg_dump without any problem. It worth it when you are starting if you don't want to manage the database server.

Joins with 8 tables in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
I have a dataset and a query with 8 joins on tables that have between 10 and 17,000 rows, the query returns 4,000 rows. My problem is it is very slow. There's no obvious way to rewrite the query in a more optimal way. It only does equi-joins on indexed columns. I tried giving it hints but they are not used. As a comparison, I load the same set in PostgreSQL and it returns in < 1 second. The query plan of YB vs Postgres on this is completely different. I used a single instance postgres vs 3 masters and 3 tservers, latest version, all VMs have 1 cpu and 3 GB memory. I realize the cpu/memory are on the low side but they are not fully used, I would run it on bigger instances if that were the bottleneck. Can you give some tips? Where do I look to optimise this query? What's your experience with queries that do many joins?
Query plan below:
explain (costs off, analyze, verbose) SELECT
a.ptflio_clstr_mstr_key,
a.ptflio_clstr_nm,
a.reference_id,
pku.kpc_usr_nm,
a.chng_usr,
clstrhist.chng_dttm,
a.ptflio_clstr_desc,
b.ptflio_bld_blck_grp_mstr_key,
b.ptflio_bld_blck_grp_nm,
f.prd_offr_mstr_key,
f.prd_offr_nm,
d.atmc_prd_offr_type_ind,
offrhist.chng_dttm
FROM product_store.pm_ptflio_clstr a
LEFT JOIN product_store.pm_ptflio_clstr_hist clstrhist ON a.ptflio_clstr_mstr_key = clstrhist.ptflio_clstr_mstr_key AND clstrhist.row_seq=1
LEFT JOIN product_store.pm_kpc_usr pku on a.kpc_usr_id = pku.kpc_usr_id
JOIN product_store.pm_ptflio_bld_blck_grp b ON a.ptflio_clstr_mstr_key = b.ptflio_clstr_mstr_key
JOIN product_store.pm_ptflio_bld_blck c ON b.ptflio_bld_blck_grp_mstr_key = c.ptflio_bld_blck_grp_mstr_key
LEFT JOIN product_store.pm_atmc_prd_offr d ON c.ptflio_bld_blck_mstr_key = d.ptflio_bld_blck_mstr_key
LEFT JOIN product_store.pm_prd_offr f ON f.prd_offr_mstr_key = d.atmc_prd_offr_mstr_key
LEFT JOIN product_store.pm_prd_offr_hist offrhist ON f.prd_offr_mstr_key = offrhist.prd_offr_mstr_key AND offrhist.row_seq = 1
WHERE a.ptflio_clstr_mstr_key='b3000fbe-65d5-4c68-9758-0954c7f9a0f1';
Nested Loop Left Join (actual time=12.719..10427.074 rows=3835 loops=1)
Output: a.ptflio_clstr_mstr_key, a.ptflio_clstr_nm, a.reference_id, pku.kpc_usr_nm, a.chng_usr, clstrhist.chng_dttm, a.ptflio_clstr_desc, b.ptflio_bld_blck_grp_mstr_key, b.ptflio_bld_blck_grp_nm, f.prd_offr_mstr_key, f.prd_offr_nm, d.atmc_prd_offr_type_ind, offrhist.chng_dttm
Inner Unique: true
-> Nested Loop Left Join (actual time=12.338..8896.474 rows=3835 loops=1)
Output: a.ptflio_clstr_mstr_key, a.ptflio_clstr_nm, a.reference_id, a.chng_usr, a.ptflio_clstr_desc, clstrhist.chng_dttm, pku.kpc_usr_nm, b.ptflio_bld_blck_grp_mstr_key, b.ptflio_bld_blck_grp_nm, d.atmc_prd_offr_type_ind, f.prd_offr_mstr_key, f.prd_offr_nm
Inner Unique: true
-> Nested Loop (actual time=11.903..7197.184 rows=3835 loops=1)
Output: a.ptflio_clstr_mstr_key, a.ptflio_clstr_nm, a.reference_id, a.chng_usr, a.ptflio_clstr_desc, clstrhist.chng_dttm, pku.kpc_usr_nm, b.ptflio_bld_blck_grp_mstr_key, b.ptflio_bld_blck_grp_nm, d.atmc_prd_offr_type_ind, d.atmc_prd_offr_mstr_key
-> Nested Loop Left Join (actual time=1.755..1.759 rows=1 loops=1)
Output: a.ptflio_clstr_mstr_key, a.ptflio_clstr_nm, a.reference_id, a.chng_usr, a.ptflio_clstr_desc, clstrhist.chng_dttm, pku.kpc_usr_nm
Inner Unique: true
-> Nested Loop Left Join (actual time=1.298..1.301 rows=1 loops=1)
Output: a.ptflio_clstr_mstr_key, a.ptflio_clstr_nm, a.reference_id, a.chng_usr, a.ptflio_clstr_desc, a.kpc_usr_id, clstrhist.chng_dttm
Inner Unique: true
Join Filter: (a.ptflio_clstr_mstr_key = clstrhist.ptflio_clstr_mstr_key)
-> Index Scan using xpkportfolio_cluster on product_store.pm_ptflio_clstr a (actual time=0.764..0.767 rows=1 loops=1)
Output: a.ptflio_clstr_mstr_key, a.row_seq, a.ptflio_clstr_nm, a.ptflio_clstr_desc, a.mstr_stat_cd, a.chng_usr, a.chng_dttm, a.chng_rmrk, a.reference_id, a.kpc_usr_id
Index Cond: (a.ptflio_clstr_mstr_key = 'b3000fbe-65d5-4c68-9758-0954c7f9a0f1'::uuid)
-> Index Scan using pm_ptflio_clstr_hist_pkey on product_store.pm_ptflio_clstr_hist clstrhist (actual time=0.529..0.529 rows=0 loops=1)
Output: clstrhist.ptflio_clstr_mstr_key, clstrhist.row_seq, clstrhist.ptflio_clstr_nm, clstrhist.ptflio_clstr_desc, clstrhist.mstr_stat_cd, clstrhist.chng_usr, clstrhist.chng_dttm, clstrhist.chng_rmrk, clstrhist.reference_id, clstrhist.kpc_usr_id
Index Cond: ((clstrhist.ptflio_clstr_mstr_key = 'b3000fbe-65d5-4c68-9758-0954c7f9a0f1'::uuid) AND (clstrhist.row_seq = 1))
-> Index Scan using xpkkpc_user on product_store.pm_kpc_usr pku (actual time=0.453..0.453 rows=0 loops=1)
Output: pku.kpc_usr_id, pku.ruisnaam, pku.kpc_usr_nm, pku.kpc_usr_act_ind, pku.chng_usr, pku.chng_dttm
Index Cond: (a.kpc_usr_id = pku.kpc_usr_id)
-> Nested Loop (actual time=10.131..7193.091 rows=3835 loops=1)
Output: b.ptflio_bld_blck_grp_mstr_key, b.ptflio_bld_blck_grp_nm, b.ptflio_clstr_mstr_key, d.atmc_prd_offr_type_ind, d.atmc_prd_offr_mstr_key
Inner Unique: true
-> Hash Right Join (actual time=9.673..73.707 rows=16878 loops=1)
Output: c.ptflio_bld_blck_grp_mstr_key, d.atmc_prd_offr_type_ind, d.atmc_prd_offr_mstr_key
Inner Unique: true
Hash Cond: (d.ptflio_bld_blck_mstr_key = c.ptflio_bld_blck_mstr_key)
-> Seq Scan on product_store.pm_atmc_prd_offr d (actual time=8.124..45.709 rows=16865 loops=1)
Output: d.atmc_prd_offr_mstr_key, d.row_seq, d.ptflio_bld_blck_mstr_key, d.lcm_phase_cd, d.lcm_phase_start_dttm, d.lcm_phase_end_dttm, d.atmc_prd_offr_type_ind, d.chng_usr, d.chng_dttm, d.lcm_phase_desc, d.lcm_phase_alert_dttm, d.lcm_phase_approved_by, d.lcm_phase_master_key
-> Hash (actual time=1.534..1.534 rows=43 loops=1)
Output: c.ptflio_bld_blck_grp_mstr_key, c.ptflio_bld_blck_mstr_key
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Seq Scan on product_store.pm_ptflio_bld_blck c (actual time=0.531..1.523 rows=43 loops=1)
Output: c.ptflio_bld_blck_grp_mstr_key, c.ptflio_bld_blck_mstr_key
-> Index Scan using xpkportfolio_building_block_gr on product_store.pm_ptflio_bld_blck_grp b (actual time=0.403..0.403 rows=0 loops=16878)
Output: b.ptflio_bld_blck_grp_mstr_key, b.row_seq, b.ptflio_bld_blck_grp_nm, b.ptflio_bld_blck_grp_desc, b.ptflio_clstr_mstr_key, b.mstr_stat_cd, b.chng_usr, b.chng_dttm, b.chng_rmrk, b.reference_id, b.kpc_usr_id
Index Cond: (b.ptflio_bld_blck_grp_mstr_key = c.ptflio_bld_blck_grp_mstr_key)
Filter: (b.ptflio_clstr_mstr_key = 'b3000fbe-65d5-4c68-9758-0954c7f9a0f1'::uuid)
Rows Removed by Filter: 1
-> Index Scan using xpkproduct_offering on product_store.pm_prd_offr f (actual time=0.419..0.419 rows=1 loops=3835)
Output: f.prd_offr_mstr_key, f.row_seq, f.prd_offr_nm, f.prop_mod_mstr_key, f.clstr_dsct_prd_offr_grp_cd, f.trgt_ptflio_ind, f.price_brd_nmbr, f.price_brd_stat_cd, f.prd_offr_lnup, f.pm_prd_offr_type_cd, f.comm_prd_id, f.chng_usr, f.chng_dttm, f.reference_id, f.version, f.kpc_usr_id, f.generation
Index Cond: (f.prd_offr_mstr_key = d.atmc_prd_offr_mstr_key)
-> Index Scan using pm_prd_offr_hist_pkey on product_store.pm_prd_offr_hist offrhist (actual time=0.375..0.375 rows=0 loops=3835)
Output: offrhist.prd_offr_mstr_key, offrhist.row_seq, offrhist.prd_offr_nm, offrhist.prop_mod_mstr_key, offrhist.clstr_dsct_prd_offr_grp_cd, offrhist.trgt_ptflio_ind, offrhist.price_brd_nmbr, offrhist.price_brd_stat_cd, offrhist.prd_offr_lnup, offrhist.pm_prd_offr_type_cd, offrhist.comm_prd_id, offrhist.chng_usr, offrhist.chng_dttm, offrhist.reference_id, offrhist.version, offrhist.kpc_usr_id, offrhist.generation
Index Cond: ((f.prd_offr_mstr_key = offrhist.prd_offr_mstr_key) AND (offrhist.row_seq = 1))
Planning Time: 0.777 ms
Execution Time: 10655.916 ms
The general approach to query tuning is called ‘query tuning by eliminating throwaway” http://docplayer.net/20177036-Query-tuning-by-eliminating-throwaway.html, which is the name of a paper that was written by Martin Berg from Denmark (for the Oracle database, but the general methodology holds true for all databases that use query plans with rowsources). On top of that, there are some specifics for distributed databases like YugabyteDB, where rowsources like nested loops can cause higher overhead than a monolithic database, because in a monolith, everything is local.
The pg_hint_plan extension required for making hints work is not enabled in YugabyteDB by default, but it is installed in the YugabyteDB server software. This means to enable it, you need to run create extension pg_hint_plan; in one YugabyteDB database to enable it for the cluster.
Of the ~ 10 seconds, ~ 6 seconds sits in this join:
JOIN product_store.pm_ptflio_bld_blck c ON b.ptflio_bld_blck_grp_mstr_key = c.ptflio_bld_blck_grp_mstr_key
using the index xpkportfolio_building_block_gr.
Reading product_store.pm_ptflio_bld_blck_grp b 16878 times to find no rows. Probably better to start with it (I guess the predicate on b.ptflio_clstr_mstr_key = 'b3000fbe-65d5-4c68-9758-0954c7f9a0f1'::uuid is highly selective):
/*+ Leading( (b c) ) */ may be a good start but may not be possible because of outer join.
So maybe /*+ Leading( (c b) ) HashJoin(c b) */.
Depends on your data. The goal is to start with the most selective table, then join to those that reduces the number of rows, then the others.

Postgres and 1000 multiple calls

I have a DB PostgresSql Server v. 11.7 - which is used 100% for local development only.
Hardware: 16 cores CPU, 112 GB memory, 3TB m.2 SSD (It is running Ubuntu 18.04 - But I get the about the same speed at my windows 10 laptop when I run the exact same query locally on it).
The DB contains ~ 1500 DB table (of the same structure).
Every call to the DB is custom and specific - so nothing to cache here.
From NodeJS I execute a lot of simultaneously calls (via await Promise.all(all 1000 promises)) and afterwards make a lot of different calculations.
Currently my stats look like this (max connection set to the default of 100):
1 call ~ 100ms
1.000 calls ~ 15.000ms (15ms/call)
I have tried to change the different settings of PostgreSQL. For example to change the max connections to 1.000 - but nothing really seems to optimize the performance (and yes - I do remember to restart the PostgreSql service every time I make a change).
How can I make the execution of the 1.000 simultaneously calls as fast as possible? Should I consider to copy all the needed data to another in-memory database like Redis instead?
The DB table looks like this:
CREATE TABLE public.my_table1 (
id int8 NOT NULL GENERATED ALWAYS AS IDENTITY,
tradeid int8 NOT NULL,
matchdate timestamptz NULL,
price float8 NOT NULL,
"size" float8 NOT NULL,
issell bool NOT NULL,
CONSTRAINT my_table1_pkey PRIMARY KEY (id)
);
CREATE INDEX my_table1_matchdate_idx ON public.my_table1 USING btree (matchdate);
CREATE UNIQUE INDEX my_table1_tradeid_idx ON public.my_table1 USING btree (tradeid);
The simple test query - fetch 30 mins of data between two time-stamps:
select * from my_table1 where '2020-01-01 00:00' <= matchdate AND matchdate < '2020-01-01 00:30'
total_size_incl_toast_and_indexes 21 GB total table size --> 143 bytes/row
live_rows_in_text_representation 13 GB total table size --> 89 bytes/row
My NodeJS code looks like this:
const startTime = new Date();
let allDBcalls = [];
let totalRawTrades = 0;
(async () => {
for(let i = 0; i < 1000; i++){
allDBcalls.push(selectQuery.getTradesBetweenDates(tickers, new Date('2020-01-01 00:00'), new Date('2020-01-01 00:30')).then(function (rawTradesPerTicker) {
totalRawTrades += rawTradesPerTicker["data"].length;
}));
}
await Promise.all(allDBcalls);
_wl.info(`Fetched ${totalRawTrades} raw-trades in ${new Date().getTime() - startTime} ms!!`);
})();
I just tried to run EXPLAIN - 4 times on the query:
EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM public.my_table1 where '2020-01-01 00:00' <= matchdate and matchdate < '2020-01-01 00:30';
Index Scan using my_table1_matchdate_idx on my_table1 (cost=0.57..179.09 rows=1852 width=41) (actual time=0.024..0.555 rows=3013 loops=1)
Index Cond: (('2020-01-01 00:00:00+04'::timestamp with time zone <= matchdate) AND (matchdate < '2020-01-01 00:30:00+04'::timestamp with time zone))
Buffers: shared hit=41
Planning Time: 0.096 ms
Execution Time: 0.634 ms
Index Scan using my_table1_matchdate_idx on my_table1 (cost=0.57..179.09 rows=1852 width=41) (actual time=0.018..0.305 rows=3013 loops=1)
Index Cond: (('2020-01-01 00:00:00+04'::timestamp with time zone <= matchdate) AND (matchdate < '2020-01-01 00:30:00+04'::timestamp with time zone))
Buffers: shared hit=41
Planning Time: 0.170 ms
Execution Time: 0.374 ms
Index Scan using my_table1_matchdate_idx on my_table1 (cost=0.57..179.09 rows=1852 width=41) (actual time=0.020..0.351 rows=3013 loops=1)
Index Cond: (('2020-01-01 00:00:00+04'::timestamp with time zone <= matchdate) AND (matchdate < '2020-01-01 00:30:00+04'::timestamp with time zone))
Buffers: shared hit=41
Planning Time: 0.097 ms
Execution Time: 0.428 ms
Index Scan using my_table1_matchdate_idx on my_table1 (cost=0.57..179.09 rows=1852 width=41) (actual time=0.016..0.482 rows=3013 loops=1)
Index Cond: (('2020-01-01 00:00:00+04'::timestamp with time zone <= matchdate) AND (matchdate < '2020-01-01 00:30:00+04'::timestamp with time zone))
Buffers: shared hit=41
Planning Time: 0.077 ms
Execution Time: 0.586 ms

Getting total number of key-value pairs in RocksDB

Is it possible to efficiently get the number of key-value pairs stored in a RocksDB key-value store?
I have looked through the wiki, and haven't seen anything discussing this topic thus far. Is such an operation even possible?
Codewisely, you could use db->GetProperty("rocksdb.estimate-num-keys", &num) to obtain the estimated number of keys stored in a rocksdb.
Another option is to use the sst_dump tool with --show_properties argument to get the number of entries, although the result would be per file basis. For example, the following command will show the properties of each SST file under the specified rocksdb directory:
sst_dump --file=/tmp/rocksdbtest-691931916/dbbench --show_properties --command=none
And here's the sample output:
Process /tmp/rocksdbtest-691931916/dbbench/000005.sst
Sst file format: block-based
Table Properties:
------------------------------
# data blocks: 845
# entries: 27857
raw key size: 668568
raw average key size: 24.000000
raw value size: 2785700
raw average value size: 100.000000
data block size: 3381885
index block size: 28473
filter block size: 0
(estimated) table size: 3410358
filter policy name: N/A
# deleted keys: 0
Process /tmp/rocksdbtest-691931916/dbbench/000008.sst
Sst file format: block-based
Table Properties:
------------------------------
# data blocks: 845
# entries: 27880
raw key size: 669120
...
Combine with some shell commands, you will be able to get the total number of entries:
sst_dump --file=/tmp/rocksdbtest-691931916/dbbench --show_properties --command=none | grep entries | cut -c 14- | awk '{x+=$0}END{print "total number of entries: " x}'
And this will generate the following output:
total number of entries: 111507
There is no way to get the count exactly. But in rocksdb 3.4 which released recently, it expose an way to get an estimate count for keys, you can try it.
https://github.com/facebook/rocksdb/releases

Resources