What is the difference between a 'connection' and a 'query' as Excel defines it in their data model? For example, if I load a csv file from local, it stores that as a query (it shows a 'select *' as the query against the file). What then would be considered a connection and what's the difference between the two? The only thing I can think of is that a connection would not return data without specifying the table/query to use-- for example, a database connection where it has multiple tables, and possibly, connecting to another Excel file if it had more than one tab.
Reference: https://support.microsoft.com/en-us/office/create-edit-and-manage-connections-to-external-data-89d44137-f18d-49cf-953d-d22a2eea2d46
Every query is also a connection, but not all connections are queries.
Connections have been in Excel for a long time, typically associated with commands that access data source like SQL server etc. The page you linked to has more links at the bottom. You may want to read up there.
The term "query" is now typically associated with Power Query, where the data connection to the data source is made via the Power Query engine, and then further refined in the query editor.
So, each query (Power Query) also has a connection, but you can only edit Power Queries in the Power Query editor, whereas legacy connections can be edited in the properties dialog of the connection.
Edit: Let's put it this way: The connection is just that. It connects your workbook to a data source. Like a highway connecting two cities.
A query is the request for actual data that you spell out, calling from your workbook (via the connection) into the data source. The data source then sends the data back (via the connection). The mechanics of asking for, receiving, and manipulating the received data (for e.g. cleaning it up, storing it in the workbook) is what the query does, but it can't do this without the connection. The query is the actual traffic on the highway.
Before Power Query, you could also connect to SQL Server and return data. The query details are visible in a tab in the connections dialog, so connection and query were used synonymously. These legacy data tools are now hidden by default and must be activated in the Excel Advanced options.
With Power Query, the brand name influences the use of the terminology. The term "query", more often than not now means Power Query, whereas some people may use "connection" (which is always a part of any query) for old style, legacy data connections (which also contain queries).
However, when you use Power Query, each of these queries will use connections. These are established when you first create the query. Your workbook may have a number of connections to different data sources. The credentials for each data source are stored with the connections (on your computer), not in the Power Query. This is like your toll fee for the highway. By storing the credentials with the connection, you establish the permission to use the connection and it doesn't matter how many people you bring back in your bus.
You can even use the same connection (to a specific SQL Server) for several different queries. When you create the first query to the SQL Server, you are prompted for credentials for that new connection (your toll for the highway). When you create another query to the same SQL Server, the connection already exists and you are not prompted for your credentials.
You can drive your bus along that same highway several times and pick up people from different suburbs of the city that the highway connects you to.
Your highway toll fee is only valid for a limited time. You can take as many trips as you want, but it will expire after some time. (This happens with SharePoint credential after 90 days, after which you have to provide your credentials again. Don't know about SQL Server, though.)
When you send a workbook with a query to SQL Server to someone else, they need to provide their own credentials in order to use the connection. Your toll fee does not cover their bus.
I'm going to stop now before this turns into a children's book.
Hope it helps.
In addition, Connections is a dynamic link and can be set to enable on:
background refresh
when the file opened
refresh every X minutes
or
Refresh when the queries are refreshed.
However, Query is a more static link and needs to be refreshed manually to load the latest data.
Related
I am re-designing a project I built a year ago when I was just starting to learn how to code. I used MEAN stack, back then and want to convert it to a PERN stack now. My AWS knowledge has also grown a bit and I'd like to expand on these new skills.
The application receives real-time data from an api which I clean up to write to a database as well as broadcast that data to connected clients.
To better conceptualize this question I will refer to the following items:
api-m1 : this receives the incoming data and passes it to my schema I then send it to my socket-server.
socket-server: handles the WSS connection to the application's front-end clients. It also will write this data to a postgres database which it gets from Scraper and api-m1. I would like to turn this into clusters eventually as I am using nodejs and will incorporate Redis. Then I will run it behind an ALB using sticky-sessions etc.. for multiple EC2 instances.
RDS: postgres table which socket-server writes incoming scraper and api-m1 data to. RDS is used to fetch the most recent data stored along with user profile config data. NOTE: RDS main data table will have max 120-150 UID records with 6-7 columns
To help better visualize this see img below.
From a database perspective, what would be the quickest way to write my data to RDS.
Assuming we have during peak times 20-40 records/s from the api-m1 + another 20-40 records/s from the scraper? After each day I tear down the database using a lambda function and start again (as the data is only temporary and does not need to be saved for any prolonged period of time).
1.Should I INSERT each record using a SERIAL id, then from the frontend fetch the most recent rows based off of the uid?
2.a Should I UPDATE each UID so i'd have a fixed N rows of data which I just search and update? (I can see this bottlenecking with my Postgres client.
2.b Still use UPDATE but do BATCHED updates (what issues will I run into if I make multiple clusters i.e will I run into concurrency problems where table record XYZ will have an older value overwrite a more recent value because i'm using BATCH UPDATE with Node Clusters?
My concern is UPDATES are slower than INSERTS and I don't want to make it as fast as possible. This section of the application isn't CPU heavy, and the rt-data isn't that intensive.
To make my comments an answer:
You don't seem to need SQL semantics for anything here, so I'd just toss RDS and use e.g. Redis (or DynamoDB, I guess) for that data store.
I'm looking to get some opinions on what the best approach is for the following scenario:
Our product requires connections to our users' Postgres databases via our Node Express server, they provide their credentials once and we store it in an encrypted way in our internal operations DB and can reference to it when access is needed. A user can do an action on our app UI like create a table, delete a table, etc. and view table sizes, min max values of a column, etc.
These actions comes to our server as authenticated API calls and we can query their databases via Sequelize as needed and return the results to frontend.
My question is, when there are N number of users with N number of databases from different SQL instances that need to be connected when an API is called to query the respective database, what is the best approach to maintain that?
Should we create a new Sequelize connection instance each time an API is called and run the query, return the response, and close the connection. Or create a new Sequelize connection instance for a DB when an API is called, and keep the instance for certain amount of time, and close the connection if it was inactive during that amount of time, and restart the instance next time?
If there are better and more efficient ways of doing this, I would love to hear about it. Thanks.
Currently, I've tried to do a new Sequelize instance each time at the beginning of the API request, and run the query, and then close the connection. Works ok, but that's just locally with 2 DBs so I can't tell what production would be like.
Edit: Anatoly suggested connection pool, in that case, what're the things that need to be considered for the config?
We are using CouchDB to sync data between an a medical record and clients and want to include a subset of data on the client side.
Each client needs access to data identified by a set of patient ids. We would prefer to avoid include the patient id set in the request query during replication as this would consume a lot of bandwidth (this is in a low connectivity setting in Kenya). Is there a way to store this data somewhere on the server side (say the user context object?) such that the replication filter function could have access to the data?
This requires a little context, so bear with me.
Suppose you're building a chat app atop CouchDB that functioned like IRC (or Slack). There's a server and some clients. But in this case, the server and the clients each have a CouchDB database and they're all bidirectionally replicating to each other -- the clients to the server, and the server to the other clients (hub-and-spoke style). Clients send messages by writing to their local instance, which then replicates to the server and out to the other clients.
Is there any way (validation function?) to prevent hostile clients from inserting a billion records and replicating those changes up to the server and other clients? Or is it a rule that you just can't give untrusted clients write access to a CouchDB instance that replicates anywhere else?
Related:
couchdb validation based on content from existing documents
Can I query views from a couchdb update or validate_doc_update function?
Can local documents be disabled in CouchDB?
For a rather simple defense agaist flooding, I am using the following workflow:
All public write access is only allowed through update functions
Every document insert/update gets generated a unique hash, consisting of the req.peer field (for the IP address) and an ISO timestamp where I cut off the final part. For example I may have 2017-11-24T14:14 as they key unique string, so that ensures that a unique key is generated every minute
Calculate the hash for every write request, ensure it is unique, and you will be certain a given IP would only be allowed to write once every minute.
This technique works ok for small floods, coming from a given set of IPs. For a more coordinated attack a variation (or even something else completely) might be needed.
How to extend the address space into your SQL database with bidirectional mirroring, which immediately reflects any value change in the variable or database end in the opposite end.
So if I have a table in Database, whose values can be changed from outside(for-example data could be added, deleted or updated), how would my node-opcua server would be notified?
In OPC UA, any server which is created will follow SoA architecture. Meaning server will process request only when some service request.
In your case, you can achieve that with the help of Subscribing for Data Change and Monitoring the node which exposes the table in your data base to client. Subscribing for data change will be possible only when that node is exposed to client.
Once node is subscribed for data change, there are 2 values which is needed by server from client.
Sampling interval: how frequently server should refresh data from source
Publishing interval: how frequently client is going to ask for notification from server.
Lets say for example Sampling interval is 100 milliseconds and Publishing interval is 1 minute. Meaning Server has to collect the samples from the source (in your case it could be data base) at every 100 milliseconds, But Client will request for all those collected samples every 1 minute.
In you will be able to achieve updating the server with the changed values for table in database.
If SDK supports multi threading, then there is another way to achieve what is mentioned in question.
In server application, let the data source (i.e. data base) object be running in one thread.
Create a callback to server application layer and intialise data source object with this callback.
When data changes in data base, trigger a call to data source thread from data base. and if it is the required data and need to be informed to server, call the callback function which is initialized earlier.
I hope this answered your question.