How can I setup Pagination in Excel Power Query? - excel

I am importing financial data using JSON from the web into excel, but as the source uses pagination (giving 50 results per page I need to implement pagination in order to import all the results.
The data source is JSON:
https://localbitcoins.com//sell-bitcoins-online/VES/.json?page=1
or https://localbitcoins.com//sell-bitcoins-online/VES/.json?page=2
?page=1, ?page=2, ?page=3
I use the following code to implement pagination, but receive an error:
= (page as number) as table =>
let
Source = Json.Document(Web.Contents("https://localbitcoins.com//sell-bitcoins-online/VES/.json?page=" & Number.ToText(page) )),
Data1 = Source{1}[Data],
RemoveBottom = Table.RemoveLastN(Data1,3)
in
RemoveBottom
When I envoke a parameter (1 for page 1) to test it I get the following error and I can't seem to find out why?
An error occurred in the ‘GetData’ query. Expression.
Error: We cannot convert a value of type Record to type List.
Details:
Value=Record
Type=Type
For the record, I try to include page handling using ListGenerate:
= List.Generate( ()=>
[Result= try GetData(1) otherwise null, page = 1],
each [Result] <> null,
each [Result = try GetData([page]+1) otherwise null, Page = [Page]+1],
each [Result])
What is the default way to implement pagination using Power Query in MS Excel?

I realise you asked this nearly a month ago and may have since found an answer, but will respond anyway in case it helps someone else.
This line Data1 = Source{1}[Data] doesn't make sense to me, since I think Source will be a record and you can't use {1} positional lookup syntax with records.
The code below returns 7 pages for me. You may want to check if it's getting all the pages you need/expect.
let
getPageOfData = (pageNumber as number) =>
let
options = [
Query = [page = Number.ToText(pageNumber)]
],
url = "https://localbitcoins.com/sell-bitcoins-online/VES/.json",
response = Web.Contents(url, options),
deserialised = Json.Document(response)
in deserialised,
responses = List.Generate(
() => [page = 1, response = getPageOfData(page), lastPage = null],
each [lastPage] = null or [page] <= [lastPage],
each [
page = [page] + 1,
response = getPageOfData(page),
lastPage = if [lastPage] = null then if Record.HasFields(response[pagination], "next") then null else page else [lastPage]
],
each [response]
)
in
responses
In List.Generate, my selector only picks the [response] field to keep things simple. You could drill deeper into the data either within selector itself (e.g. each [response][data][ad_list]) or create a new step/expression and use List.Transform to do so.
After a certain amount of drilling down and transforming, you might see some data like:
but that depends on what you need the data to look like (and which columns you're interested in).
By the way, I used getPageOfData in the query above, but this particular API was including the URL for the next page in its responses. So pages 2 and thereafter could have just requested the URL in the response (rather than calling getPageOfData).

Related

Using variables in oData URL in Excel Power Query

I'm trying to make a dynamic oData URL. This works, but it applies the filter after downloading the data set. I need to use query folding, so it applies filter in the oData Query.
Can someone point me in right direction
let
// Get Parameters
Params = Excel.CurrentWorkbook(){[Name="tParams"]}[Content],
ItemNoValue = Params{0}[Value],
// Get D365 Data
Source = OData.Feed("https://xxxxxxxx&$filter=ItemNumber eq '11111111' &$select=dataAreaId,ItemNumber &$top=100", null, [Implementation="2.0"]),
#"Filter by Item Number" = Table.SelectRows(Source, each ([ItemNumber] = ItemNoValue))
in
#"Filter by Item Number"
What's also very strange is if I construct the URL in an excel cell, it STILL returns 30MB of data before displaying one row
let
// Get Parameters
UrlParam = Excel.CurrentWorkbook(){[Name="URL"]}[Content],
UrlValue = UrlParam[Column1]{0},
Source = OData.Feed(UrlValue, null, [Implementation="2.0"])
in
Source

How to filter large array based on "in-between" value in a sub-array? (Node.js)

I have a large database of items that have somewhat fluid statuses. I need to get an array of those items based on what each items's status was on a given date.
Here's an excerpt from an example record:
{"status":[
{"date":{"$date":"2019-06-14T06:17:41.625Z"},"statusCode":200},
{"date":{"$date":"2019-11-04T02:02:58.020Z"},"statusCode":404},
{"date":{"$date":"2020-08-07T01:11:16.184Z"},"statusCode":200},
{"date":{"$date":"2020-08-07T03:54:09.703Z"},"statusCode":404}
]}
Using this example, the status on 2020-01-13 would be 404 (as it would be also on 2020-01-12 or any other givenDate until the status changed back to 200).
So how would I filter my big array to this record (and others like it) to only items with status 404 as of 2020-01-13? (And I would do the same for 200.)
Note that I can't simply filter for objects with date < givenDate && statusCode == 200 because that would ignore if the status changed after those records. (The above example would return for either 200 or 404 since both records exist before givenDate.)
My only idea at the moment is that I could first filter the status array to anything before givenDate, and then compare based on the last record (since this filtered array's last record would then always be before givenDate). But this seems more complicated than necessary.
Processing time isn't important to me on this because I'm trying to make some one-time corrections to past statistics.
Thanks in advance!
A bit verbose, but I think this should do what you want.
var feedHistory = {"status":[
{"date":{"$date":"2019-06-14T06:17:41.625Z"},"statusCode":200},
{"date":{"$date":"2019-11-04T02:02:58.020Z"},"statusCode":404},
{"date":{"$date":"2020-08-07T01:11:16.184Z"},"statusCode":200},
{"date":{"$date":"2020-08-07T03:54:09.703Z"},"statusCode":404}
]};
const filterByStatus = (feedHistory,statusDate) => {
let foundRecord = false;
feedHistory.forEach((record) => {
let recordDate = new Date(Date.parse(record.date['$date']));
if (recordDate < statusDate && (!foundRecord || foundRecord.parsedDate < recordDate)) {
record.parsedDate = recordDate;
foundRecord = record;
}
});
return foundRecord;
};
var statusDate = new Date('2019-06-15');
var statusOnDate = filterByStatus(feedHistory.status,statusDate);
console.log(`On ${statusDate} the status was ${statusOnDate.statusCode}`);

firebase Starting point was already set

I use firebase admin and realtime database on node.js
Data look like
When I want to get data where batch = batch-7, I was doing
let batch = "batch-7";
let ref = admin.database().ref('qr/');
ref.orderByChild("batch").equalTo(batch).on('value', (snapshot) =>
{
res.json(Object.assign({}, snapshot.val()));
ref.off();
});
All was OK!
But now i should create pagination, i.e. I should receive data on 10 elements depending on the page.
I use this code:
let page = req.query.page;// num page
let batch = req.params.batch;// batch name
let ref = admin.database().ref('qr/');
ref.orderByChild("batch").startAt(+page*10).limitToFirst(10).equalTo(batch)
.on('value', (snapshot) =>
{
res.json(Object.assign({}, snapshot.val()));
ref.off();
});
But I have error:
Query.equalTo: Starting point was already set (by another call to startAt or equalTo)
How do I get data in the amount of N, starting at position M, where batch equal my batch
You can only call one startAt (and/or endAt) OR equalTo. Calling both is not possible, nor does it make a lot of sense.
You seem to have a general misunderstanding of how startAt works though, as you're passing in an offset. Firebase queries are not offset based, but work purely on the value, often also referred to as an anchor node.
So when you want to get the data for a second page, and you order by batch, you need to pass in the value of batch for the anchor node; first item that you want to be returned. This anchor node is typically the last item of the previous page, since you don't know the first item of the next page yet. And for this anchor node, you need to know the value of the item you order on (batch) and usually also its key (if/when there may be multiple nodes with the same value for batch).
It also means that you usually request one item more than you need, which is the anchor node.
So when you request the first page, you should track the key/batch of the last node:
var lastKey, lastValue;
ref.orderByChild("batch").equalTo(batch).limitToFirst(10).on('value', (snapshot) => {
snapshot.forEach((child) => {
lastKey = child.key;
lastValue = child.child('batch').value();
})
})
Then when you need the second page, you do a query like that:
ref.orderByChild("batch").start(lastValue, lastKey).endAt(lastValue+"\uf8ff").limitToFirst(11).on('value', (snapshot) => {
snapshot.forEach((child) => {
lastKey = child.key;
lastValue = child.child('batch').value();
})
})
There's one more trick above here: I use startAt instead of equalTo, so that we can get pagination working. But it then uses endAt to ensure we still end at the correct item, by using the last known Unicode character as the last batch value to return.
I'd also highly recommend checking out some of the previous questions on pagination with the Firebase Realtime Database.

WHERE IN with Azure DocumentDB (CosmosDB) .Net SDK

Having seen this question I'm not sure why the following code for my DocDb instance isn't working:
var userApps = _docs.CreateDocumentQuery(UriFactory.CreateDocumentCollectionUri(Constants.Databases.Applications.ID, Constants.Databases.Applications.Collections.USER_APPS),
new SqlQuerySpec(#"SELECT r.appId FROM ROOT r WHERE r.userId = #userId", (#"#userId", userId).ToSqlParameters()))
.ToList()
.Select(s => (string)s.appId);
var query = _docs.CreateDocumentQuery<Document>(UriFactory.CreateDocumentCollectionUri(Constants.Databases.Applications.ID, Constants.Databases.Applications.Collections.APP_DEFINITIONS),
new SqlQuerySpec(#"SELECT r.id, r.appName FROM ROOT r WHERE r.appId IN (#userApps)", (#"#userApps", userApps.ToArray()).ToSqlParameters()),
new FeedOptions { EnableCrossPartitionQuery = true })
.AsDocumentQuery();
When I execute this, though I know the data should be returning me back a result set, it comes back empty every time.
Troubleshooting so far
Variants of .Select()
Return strings that I string.Join in to a comma-separated list.
Eg:
var userApps = string.Join(#",", _docs.CreateDocumentQuery(UriFactory.CreateDocumentCollectionUri(Constants.Databases.Applications.ID, Constants.Databases.Applications.Collections.USER_APPS),
new SqlQuerySpec(#"SELECT r.appId FROM ROOT r WHERE r.userId = #userId", (#"#userId", userId).ToSqlParameters()))
.ToList()
.Select(s => $#"'{s.appId}'");
Don't encapsulate IN parameter
Removing the () around the parameter spec in the query thinking maybe the SqlParameter spec was doing the array specification?
Eg: #"SELECT r.id, r.appName FROM ROOT r WHERE r.appId IN #userApps")
Ends up throwing "Syntax error, incorrect syntax near '#userApps'."
Validate via Azure Portal queries
Ran the (expected) SQL that this code should be running.
I get back my expected results without issue (ie: I know there is a result set for these queries as-written).
Debugger output for Query 1
AppIds are coming back from query 1.
Unsatisfactory workaround
Change query 2 to not be parameterized. Rather, inject the comma-separated list of IDs from query 1 in to it:
var userApps = string.Join(#",", _docs.CreateDocumentQuery(UriFactory.CreateDocumentCollectionUri(Constants.Databases.Applications.ID, Constants.Databases.Applications.Collections.USER_APPS),
new SqlQuerySpec(#"SELECT r.appId FROM ROOT r WHERE r.userId = #userId", (#"#userId", userId).ToSqlParameter()))
.ToList()
.Select(s => $#"'{s.appId}'"));
var query = _docs.CreateDocumentQuery<Document>(UriFactory.CreateDocumentCollectionUri(Constants.Databases.Applications.ID, Constants.Databases.Applications.Collections.APP_DEFINITIONS),
new SqlQuerySpec($#"SELECT r.id, r.appName FROM ROOT r WHERE r.appId IN ({userApps})"),
new FeedOptions { EnableCrossPartitionQuery = true })
.AsDocumentQuery();
Works perfectly but I'm not going to accept it as an answer to this problem as it goes against a couple decades-worth of SQL best practices and, frankly, shouldn't be a solution.
Here's my ToSqlParameters() extension method in case it's the culprit (this works everywhere else I've used it, though. Maybe something special is needed for arrays?):
public static SqlParameterCollection ToSqlParameters(this (string, object) parmDef) => new SqlParameterCollection(new[] { new SqlParameter(parmDef.Item1, parmDef.Item2) });
Thanks!
If you use a parameterized IN list, then it will be considered as a single value when the parameter is expanded.
For instance, for this query:
SELECT * FROM r WHERE r.item IN (#items) And #items is defined as "['val1', 'val2', 'val3']" will be interpreted as such:
SELECT * FROM r WHERE r.item IN (['val1', 'val2', 'val3'])
which basically means that you're comparing r.item to a single value that is an array of three elements (i.e. equivalent to r.item = ['val1', 'val2', 'val3']).
To compare to multiple items, you need to use a parameter for each value. Something like this: SELECT * FROM r WHERE r.item IN (#val1, #val2, #val3])
A more convenient way to write this query is to use ARRAY_CONTAINS instead and pass the array of items as a single parameter. So the above query will be written like this:
SELECT * FROM r WHERE ARRAY_CONTAINS(#items, r.item)

Apache CMIS: Paging query result

Recently I've started using Apache CMIS and read the official documentation and examples. I haven't noticed anything about paging query results.
There is an example showing how to list folder items, setting maxItemsPerPage using operationContext, but it seems that operationContext can be used inside getChilder method:
int maxItemsPerPage = 5;
int skipCount = 10;
CmisObject object = session.getObject(session.createObjectId(folderId));
Folder folder = (Folder) object;
OperationContext operationContext = session.createOperationContext();
operationContext.setMaxItemsPerPage(maxItemsPerPage);
ItemIterable<CmisObject> children = folder.getChildren(operationContext);
ItemIterable<CmisObject> page = children.skipTo(skipCount).getPage();
This is ok when it comes to listing u folder. But my case is about getting results from custom search query. The basic approach is:
String myType = "my:documentType";
ObjectType type = session.getTypeDefinition(myType);
PropertyDefinition<?> objectIdPropDef = type.getPropertyDefinitions().get(PropertyIds.OBJECT_ID);
String objectIdQueryName = objectIdPropDef.getQueryName();
String queryString = "SELECT " + objectIdQueryName + " FROM " + type.getQueryName();
ItemIterable<QueryResult> results = session.query(queryString, false);
for (QueryResult qResult : results) {
String objectId = qResult.getPropertyValueByQueryName(objectIdQueryName);
Document doc = (Document) session.getObject(session.createObjectId(objectId));
}
This approach will retrieve all documents in a queryResult, but I would like to include startIndex and limit. The idea would be to type something like this:
ItemIterable<QueryResult> results = session.query(queryString, false).skipTo(startIndex).getPage(limit);
I'm not sure about this part: getPage(limit). Is this right approach for paging? Also I would like to retrieve Total Number of Items, so I could know how to set up the max items in grid where my items will be shown. There is a method, but something strange is written in docs, like sometimes the repository can't be aware of max items. This is that method:
results.getTotalNumItems();
I have tried something like:
SELECT COUNT(*)...
but that didn't do the trick :)
Please, could you give me some advice how to do a proper paging from a query result?
Thanks in advance.
Query returns the same ItemIterable that getChildren returns, so you can page a result set returned by a query just like you can page a result set returned by getChildren.
Suppose you have a result page that shows 20 items on the page. Consider this snippet which I am running in the Groovy Console in the OpenCMIS Workbench against a folder with 149 files named testN.txt:
int PAGE_NUM = 1
int PAGE_SIZE = 20
String queryString = "SELECT cmis:name FROM cmis:document where cmis:name like 'test%.txt'"
ItemIterable<QueryResult> results = session.query(queryString, false, operationContext).skipTo(PAGE_NUM * PAGE_SIZE).getPage(PAGE_SIZE)
println "Total items:" + results.getTotalNumItems()
for (QueryResult result : results) {
println result.getPropertyValueByQueryName("cmis:name")
}
println results.getHasMoreItems()
When you run it with PAGE_NUM = 1, you'll get 20 results and the last println statement will return true. Also note that the first println will print 149, the total number of documents that match the search query, but as you point out, not all servers know how to return that.
If you re-run this with PAGE_NUM = 7, you'll get 9 results and the last println returns false because you are at the end of the list.
If you want to see a working search page that leverages OpenCMIS and plain servlets and JSP pages, take a look at the SearchServlet class in The Blend, a sample web app that comes with the book CMIS & Apache Chemistry in Action.

Resources