Advice on which PostgreSQL indexes to use to search for text? - text

I have some tables in scheme of PostgreSQL. I need to find text in some tables at the same time. I need find the same phrases in different tables and also different phrases in different tables. In the end I need to join this tables and give id from the main table. Which solution is the best? P.S. Tables will update frequently.

Related

How to list all tables and corresponding columns from a databricks database?

I am trying to get a list of tables and columns in a database, so I could find which tables have a particular column, the best I could find is use separate queries like one to show all tables , and then one to show all columns in one table, e.g. SHOW TABLES FROM database_name, SHOW COLUMNS FROM databasename.tablename. It will not be ideal when you have many tables to go through. Any solution out there at all?
Unfortunately, there is no way to fetch all metadata in one call. You can only do show databases, show tables in ..., describe table .... There is also spark.catalog.listTables, etc., but they could be slower than corresponding SQL queries.
I answered to related question yesterday - you can find code there.

Why Cassndra is called unstructured database

Why Cassandra is called unstructured even though table/column family has to be defined with columns and their data type.
For the defined table with some fixed columns we can choose to fill some columns in one particular row and choose not to fill in other row. But same thing can be done in RDBMS where we can leave some columns in the insert statement and the columns left out should allow null?
As mongo store the data in json documents where we can store different (keys) data every time insert new document. we don't need to define anything . But for cassandra we need to reconfigure our table to accommodate new columns getting added.
Even though some articles are present but still its not clear to me. Can someone pin point the reason.
Basically is not about "how it works", is how the files are stored, this is why cassandra have not structure for the files, you can have a same récords in diferents folders.

Limit data coming into Spotfire by a different data table

I have Table A prompted on Year/Month and Table B. Table B also has a Year/Month column. Table A is the default data table (gets pulled in first). I have set up a relationship between Table A and B on the common Year/Month column.
The goal is to get Table B to only pull through data where the Year/Month matches the Year/Month on Table A (what the user entered). The purpose is to keep the user from entering the Year/Month multiple times.
The issue is Table B contains almost 35 million records. What I do not want to do is have Spotfire pull across all 35 Million records. What is currently happening is Spotfire is pulling all those records, then by setting filtering to include Filtered Rows Only on Table B, I am limiting what is seen in the visualization to under 200,000 rows. I would much rather just pull across 200,000 rows to start with.
The question: Is there a way to force Spotfire to filter the data table (Table B) by another data table (Table A) as it pulls the data table (Table B) across, thus only pulling a small number of records into memory?
I'm writing this off the basis that most people utilize information links to get data into Spotfire, especially large data sets where the data is not embedded in the analysis. With that being said, I prefer to handle as much if not all of the joining / filtering / massaging at the data source versus the Spotfire application. Here are my views on the best practices and why.
Tables / Views vs Procedures as Information Links
Most people are familiar with the Table / View structure and get data into Spotfire in one of 2 ways
Create all joins / links in information designer based off data relations defined by the author by selecting individual tables from the data sources avaliable
Create a view (or similar object) at the data source where all joining / data relations are done, thus giving Spotfire a single flat file of data
Personally, option 2 is much easier IF you have access to the data source since the data source is designed to handle this type of work. Spotfire just makes it available but with limited functionality (i.e. complex queries, Intellisense, etc aren't available. No native IDE). What's even better is Stored Procedures IMHO and here is why.
In options 1 and 2 above, if you want to add a column you have to change the view / source code at the data source, or individually add a column in the information designer. This creates dwarfed objects and clutters up your library. For example, when you create an information link there is a folder with all the elements associated with it. If you want to add columns later, you'll have another folder for any columns added, and this gets confusing and hard to manage. If you create a procedure at the data source to return the data you need, and later want to add some columns, you only have to change this at the data source. i.e. change the procedure. Everything else will be inherited by Spotfire... all you have to do is click the "reload data" button in Spotfire. You don't have to change anything in the information designer. Additionally, you can easily add new parameters, set default parameter properties or prompt the user, making this a very efficient method of data retrieval. This is perfect when the data source is an OLTP and not a data-mart/data-warehouse (i.e. the data isn't already aggregated / cleansed) but can also be powerful in data warehouse environments as well.
Ditch the GUI, Edit the SQL
I find managing conditions, parameters, join paths, etc a bit annoying--but that's me. Instead, when possible, I prefer to click "Edit SQL" next to all the elements in my Information Link and alter the SQL there. This will allow database guys to work in an environment which is more familiar.

PowerPivot Relationships Many to Many

The objective I am trying to achieve is to have 2 slicers in PowerPivot, ClientID and CSQName. When a ClientID is selected only the CSQnames that are related to that ClientID show up ,and vice versa
Relationship diagram link: https://goo.gl/photos/PnCZrnsXXTx3oFGh8
I am having a problem linking a many to many relationship in PowerPivot. A brief background on the application I am trying to build...
I am trying to combine a SQL database (IDM) and Informix SQL database (Cisco Call Data). The IDM database includes the Client Data and TBAS Open Case Data. Each Client has a specific ClientID. The Cisco database includes Call Detail Info and CSQNames(queue names). A many to many relationship exists, for example, a clientid can have multiple CSQname (clientid 3 has CSQ names of "A" and "B"). Also a csqname can have multiple clientids (csqname "Z" includes clientids "99", "98" and "97". Therefore I created an innerjoin table to create the many to many relationship called "Clients_CSQ".
I am trying to use this innerjoin table for both the "TBAS Open Cases" and "Call Detail". When I use this table for my filters, PowerPivot is stating that no relationships exist. Are there any solutions? If this does not make sense please let me know and I will try to specify. I have ready many posts but am unable to grasp how to make the DAX many to many relationship work with the calculate function. If someone can shed some light on the issue I am having it would be greatly appreciated. Thank you.
This really depends upon the data you are looking to report on.
When you add two slicers to a PowerPivot table, the available selections in each slicer will be affected by the selection in the other slicer IF and ONLY IF all of the fields in the Values section of the Pivot Table are reliant on the entries in both of the slicer fields.
In your case, it is possible to make this work (as an example) by creating 3 measures:
[Call Total]=SUM('TBAS Open Cases'[Case duration])
[Number of Calls]=COUNTA('Call Detail'[appname])
[Calls by Duration]=SUMX('Clients_CSQ',DIVIDE([Call Total],[Number of Calls]))
Place the last of these 3 measures in a pivot table with the slicers set to use 'Clients_IDM'[ic_client_id] and 'CSQ Name'[csqname] and "Hey Presto!"
The first two measures are straightforward enough. The third one is cycling through each entry in the only table that these two slicer fields have in common (Clients_CSQ) and performing a calculation using the data from your FACT tables. I have no idea if the [Calls by Duration] measure that I've come up with makes any sense with your data set, but hopefully the example will help you reach the solution you want. Again depending on what data you want to show it doesn't really matter if this measure returns junk, the important thing is that it's pulling your two data sets together.
Remember that as soon as you add any raw field from either of the fact tables to this 'unifying pivot table', the inter-relationship between the slicers will break. !!!BUT!!! there is nothing to stop you from linking the csqname slicer to another pivot on the same sheet which contains fields from your Call Detail table and likewise linking the ic_client_id slicer to a pivot that contains TBAS Open Cases data. In fact, the 'unifying pivot table' could be on a different sheet from your slicers, so you only see the two sets of data that you are interested in.
And ignore that warning about no relationships existing!

PIVOT in HSQLDB

I want to insert 1 million records in CLIENT side HSQLDB and i want to perform pivoting operation with that one
million records in client side to analyze the business growth in various ways.
Is there HSQLDB having this feature. If so kindly help me.
In general pivoting operations are implemented with aggregate functions together with CASE WHEN and GROUP BY.
HSQLDB has all the features of PostgreSQL and MySQL that are used for operating pivots. You can look at questions with the PIVOT tag on stackoverflow for examples.
I will add the PIVOT tag to this question.

Resources