Is there a faster way to create view on CouchDB?
My data is something like this:
{"docs":[{
"c_custkey": 1,
"c_name": "Customer#000000001",
"c_address": "IVhzIApeRb",
"c_city": "MOROCCO 0",
"c_nation": "MOROCCO",
"c_region": "AFRICA",
"lineorder": [{
"lo_orderkey": 164711,
"lo_linenumber": 1,
"lo_custkey": 1,
"lo_partkey": 82527,
"lo_suppkey": 1848,
"lo_quantity": 34,
"lo_extendedprice": 5132368,
"lo_revenue": 2816872,
"orderdate": [{
"d_datekey": 19920426,
"d_date": "April 26, 1992",
"d_dayofweek": "Monday",
"d_month": "April",
"d_year": 1992,
"d_yearmonthnum": 199204,
}],
"part": [{
"p_partkey": 82527,
"p_name": "steel tomato",
"p_mfgr": "MFGR#4",
"p_category": "MFGR#45",
"p_brand1": "MFGR#452",
}],
"supplier": [{
"s_city": "MOZAMBIQU8",
"s_nation": "MOZAMBIQUE",
"s_region": "AFRICA",
}]
}, {
"lo_orderkey": 164711,
"lo_linenumber": 2,
"lo_custkey": 1,
"lo_partkey": 26184,
"lo_suppkey": 1046,
"lo_orderdate": 19920426,
"lo_quantity": 15,
"lo_extendedprice": 1665270,
"orderdate": [{
"d_datekey": 19920426,
"d_date": "April 26, 1992",
"d_dayofweek": "Monday",
"d_month": "April",
"d_year": 1992,
"d_yearmonthnum": 199204,
}],
"part": [{
"p_partkey": 26184,
"p_name": "chartreuse green",
"p_mfgr": "MFGR#2",
"p_category": "MFGR#23",
"p_brand1": "MFGR#2329",
}],
"supplier": [{
"s_suppkey": 1046,
"s_city": "SAUDI ARA2",
"s_nation": "SAUDI ARABIA",
"s_region": "MIDDLE EAST",
}]
},...
And I'm creating the view this way using FUTON, but it takes 30 min:
Map function:
function(doc)
{
var c_city=doc.c_city
var c_nation=doc.c_nation
if (c_nation=="UNITED STATES"){
for each (lineorder in doc.lineorder) {
for each (supplier in lineorder.supplier){
var s_city=supplier.s_city
var s_nation=supplier.s_nation
}
if (s_nation=="UNITED STATES"){
for each (orderdate in lineorder.orderdate) {
var d_year=orderdate.d_year
}
if (d_year>=1992 && d_year<=1997){
emit({d_year:d_year,c_city:c_city,s_city:s_city},lineorder.lo_revenue);
}
}
}
}
}
Reduce Function: "_sum"
My database have 2 GB of this kind of data.
This is obviously not your real view (lineorder vs. lineorders and there's no lo_revenue), so I won't waste time finessing it. Instead let me say that for a 2GB data set with who knows how many lineorders iterations per document, 30 minutes is not at all surprising.
Related
I'm new to JSONPath and want to write a JSONPath-syntax that retrieves the property value only if a certain condition is met. The value I'm after is not part of an array, but I've managed to make filtering work in the following JSONPath tool: https://www.site24x7.com/tools/json-path-evaluator.html
Given the following JSON, I only want to extract the value of column2.dimValue if column2.attributeId equals B0:
{
"batchId": 279,
"companyId": "40",
"period": 202208,
"taxCode": "1",
"taxSystem": "",
"transactionDate": "2022-08-05T00:00:00.000",
"transactionNumber": 222006089,
"transactionType": "IF",
"year": 2022,
"accountingInformation": {
"account": "4010",
"column1": {
"attributeId": "H9",
"dimValue": "76"
},
"column2": {
"attributeId": "B0",
"dimValue": "2170103"
},
"column3": {
"attributeId": "",
"dimValue": ""
},
"column4": {
"attributeId": "BF",
"dimValue": "217010330"
},
"column5": {
"attributeId": "10",
"dimValue": "3101"
},
"column6": {
"attributeId": "06",
"dimValue": ""
},
"column7": {
"attributeId": "19",
"dimValue": "K"
}
},
"categories": {
"cat1": "H9",
"cat2": "B0",
"cat3": "",
"cat4": "BF",
"cat5": "10",
"cat6": "06",
"cat7": "19",
"dim1": "76",
"dim2": "2170103",
"dim3": "",
"dim4": "217010330",
"dim5": "3101",
"dim6": "",
"dim7": "K"
},
"amounts": {
"amount": 48.24,
"amount3": 0.0,
"amount4": 0.0,
"currencyAmount": 48.24,
"currencyCode": "NOK",
"debitCreditFlag": 1
},
"invoice": {
"customerOrSupplierId": "58118",
"description": "",
"externalArchiveReference": "",
"externalReference": "2170103",
"invoiceNumber": "220238522",
"ledgerType": "P"
},
"additionalInformation": {
"number": 0,
"orderLineNumber": 0,
"orderNumber": 0,
"sequenceNumber": 1,
"status": "",
"value": 0.0,
"valueDate": "2022-08-05T00:00:00.000"
},
"lastUpdated": {
"updatedAt": "2022-09-05T10:59:11.633",
"updatedBy": "HELVES"
}
}
I've used this JSONPath-syntax:
$['accountingInformation']['column2'][?(#.attributeId=='B0')].dimValue
This gives the following result:
[
"2170103"
]
I'm using this result in Azure Data Factory mapping, and it seems that it doesn't work as the result is an array.
Can anyone help me with the syntax to it only returns the actual value? Is that even possible?
I repro'd the same and below is the approach.
Sample Json file is taken as in below image as a source in lookup activity.
If activity is taken to filter the value of column2 with attributeId='B0'. Expression is given as below
#equals(activity('Lookup1').output.value[0].accountingInformation.column2.attributeId ,'B0')
In true case of IF activity, Set Variable is added. New Variable with string type is taken and it is set using below expression.
#activity('Lookup1').output.value[0].accountingInformation.column2.dimvalue
Then Copy activity is added next to IF activity sequentially. In source dummy dataset is taken. +New is click in additional columns
Name: col1
Value: #variables('v2')
In Mapping, Import schemas is clicked. All other columns except the additional column that is added in source are deleted.
Pipeline is debugged and data is copied to sink without error.
i am trying to calculate heating/cooling degree day using (Tbase - Ta) formula Tbase is usually 65F and Ta = (high_temp + low_temp)/2
(e.x)
high_temp = 96.5F low_temp=65.21F then
mean=(high_temp + low_temp)/2
result = mean - 65
65 is average room temperature
if result is > 65 then cooling degree day(cdd) else heating degree day(hdd)
i get weather data from two api
weatherbit
darksky
in weatherbit the provide both cdd and hdd data, but in darksky we need to calculate using above formula (Tbase - Ta)
my problem is both api show different result (e.x)
darksky json response for day
{
"latitude": 47.552758,
"longitude": -122.150589,
"timezone": "America/Los_Angeles",
"daily": {
"data": [
{
"time": 1560927600,
"summary": "Light rain in the morning and overnight.",
"icon": "rain",
"sunriseTime": 1560946325,
"sunsetTime": 1561003835,
"moonPhase": 0.59,
"precipIntensity": 0.0057,
"precipIntensityMax": 0.0506,
"precipIntensityMaxTime": 1561010400,
"precipProbability": 0.62,
"precipType": "rain",
"temperatureHigh": 62.44,
"temperatureHighTime": 1560981600,
"temperatureLow": 48,
"temperatureLowTime": 1561028400,
"apparentTemperatureHigh": 62.44,
"apparentTemperatureHighTime": 1560981600,
"apparentTemperatureLow": 46.48,
"apparentTemperatureLowTime": 1561028400,
"dewPoint": 46.61,
"humidity": 0.75,
"pressure": 1021.81,
"windSpeed": 5.05,
"windGust": 8.36,
"windGustTime": 1560988800,
"windBearing": 149,
"cloudCover": 0.95,
"uvIndex": 4,
"uvIndexTime": 1560978000,
"visibility": 4.147,
"ozone": 380.8,
"temperatureMin": 49.42,
"temperatureMinTime": 1561010400,
"temperatureMax": 62.44,
"temperatureMaxTime": 1560981600,
"apparentTemperatureMin": 47.5,
"apparentTemperatureMinTime": 1561014000,
"apparentTemperatureMax": 62.44,
"apparentTemperatureMaxTime": 1560981600
}
]
},
"offset": -7
}
python calculation
response = result.get("daily").get("data")[0]
low_temp = response.get("temperatureMin")
hi_temp = response.get("temperatureMax")
mean = (hi_temp + low_temp)/2
#65 is normal room temp
print(65-mean)
here mean is 6.509999999999998
65 - mean = 58.49
hdd is 58.49 so cdd is 0
same date in weatherbit json response is :
{
"threshold_units": "F",
"timezone": "America/Los_Angeles",
"threshold_value": 65,
"state_code": "WA",
"country_code": "US",
"city_name": "Newcastle",
"data": [
{
"rh": 68,
"wind_spd": 5.6,
"timestamp_utc": null,
"t_ghi": 8568.9,
"max_wind_spd": 11.4,
"cdd": 0.4,
"dewpt": 46.9,
"snow": 0,
"hdd": 6.7,
"timestamp_local": null,
"precip": 0.154,
"t_dni": 11290.6,
"temp_wetbulb": 53.1,
"t_dhi": 1413.9,
"date": "2019-06-20",
"temp": 58.6,
"sun_hours": 7.6,
"clouds": 58,
"wind_dir": 186
}
],
"end_date": "2019-06-21",
"station_id": "727934-94248",
"count": 1,
"start_date": "2019-06-20",
"city_id": 5804676
}
here hdd is 6.7 and cdd is 0.4
can you explain how they get this result ?
You need to use hourly data to calculate the HDD and CDD, and then average them to get the daily value.
More details here: https://www.weatherbit.io/blog/post/heating-and-cooling-degree-days-weather-api-release
I am working with the Foursquare API using NodeJS and Mongodb on the backend side. I have all the user information and checkin history stored in a collection. So the collection looks similar to this:
{
_id: ...,
foursquareId: ...
personalInfo: {},
checkins: [
{
id: ...,
createdAt: 123456789 //Seconds since epoch>,
venue: {},
...
},
{
id: ...,
createdAt: 123456789 //Seconds since epoch>,
venue: {},
...
},
...
]
}
For this question I am only interested to the checkins array. I need to return a list of checkins quantity by month and year, but I am not sure which is the best way to approach this. I think that the result would be something like this: (I am not totally convinced though)
{
'2016': {
'January': 43,
'February': 38,
'March': 40,
'April': 48,
'May': 50,
'June': 41,
'July': 39,
'August': 38,
'September': 30,
'October': 29,
'November': 38,
'December': 41
},
'2017': {
'January': 55,
'February': 20
}
}
I am not interested about the way I receive the information on the frontend. I want to know if is possible to do this in mongodb because I couldn't find a way to do it on their documentation or any other example here. Otherwise I might need to do it in the frontend (not a good idea...so I could have around 7k results or more on this array...).
Using the aggregation framework should get you what you want.
db.collectionName.aggregate([
{$unwind:'$checkins'},
{
$project: {
id: 1,
'checkins.createdAt' : 1,
newDate : {
$add : [ new Date(0), {
$multiply : [ "$checkins.createdAt", 1000 ]
}]
}
}
},
{$project : {
year: {$year: "$newDate"},
month: {$month: "$newDate"}
}},
{$group: {_id:{year:"$year", month:"$month"}, count:{$sum:1}}},
{$group: {_id:{year:"$_id.year"}, monthTotals: { $push: { month: "$_id.month", count: "$count" } }}}
])
This produces documents like the following:
{
"_id" : {
"year" : NumberInt(2016)
},
"monthTotals" : [
{"month" : NumberInt(1),"count" : NumberInt(2)}
{"month" : NumberInt(2),"count" : NumberInt(3)}
]
}
The second step (first $project step) may need to be adjusted depending on how your date since epoch value is stored, but this should get you generally what you need.
There's not a way to get the data exactly as you've outlined without some post processing of the results, but it should be simple enough to modify the result.
I have a schema with following fields
var MySchema = new Schema({
roomId: { type: String, required: true, ref: 'Room' },
timestamp: Date,
currentTemp: Number,
outsideTemp: Number,
desiredTemp: Number
});
Now I want to aggregate the documents in this collection and return the response like
{
timestamp: [1,2,3,4,5,6,7,8,9,10..,24],
currentTemp: [34,45,.....],
outsideTemp: [14,45,.....],
desiredTemp: [34,45,.....]
}
I have some match condition and I am able to select the documents based on the match condition. Now I want to aggregate the selected documents and produce a result like above.
The timestamp value is bound by a lower bound and upper bound. The upper bound and lower bound is passed from the client and the aggregation should be based on the difference between the lower and upper. If the difference is
- 1 day : group by hour
- 3 days : group by 6 hrs
- 7 days : group by 12 hours
- < 30 days : group by day
- > 30 days : group by week
currentTemp is the average of all the documents currentTemp property in that time period. Similarly for outsideTemp and desiredTemp.
How can I do this ?
EDIT:
Sample data set
{
"_id": "5656bfd94d6f6304ff140000",
"roomId": "5656bc124d6f6335051b0000",
"timestamp": "2015-11-15T08:59:20Z",
"currentTemp": 10.55,
"outsideTemp": 43.83,
"desiredTemp": 21.32
}, {
"_id": "5656bfd94d6f6304ff150000",
"roomId": "5656bc124d6f633505200000",
"timestamp": "2015-06-01T06:33:49Z",
"currentTemp": 32.47,
"outsideTemp": 49.65,
"desiredTemp": 20.99
}, {
"_id": "5656bfd94d6f6304ff160000",
"roomId": "5656bc124d6f633505250000",
"timestamp": "2014-12-31T23:47:54Z",
"currentTemp": 35.69,
"outsideTemp": 29.91,
"desiredTemp": 20.15
}, {
"_id": "5656bfd94d6f6304ff170000",
"roomId": "5656bc124d6f6335051e0000",
"year": 2015,
"month": 3,
"day": 12,
"hour": 21,
"minute": 13,
"second": 38,
"inDST": true,
"timestamp": "2015-11-14T07:56:42Z",
"currentTemp": 27.65,
"outsideTemp": 41.4,
"desiredTemp": 24.68
}
We're based in the EU. When we sell our digital products to private persons or companies without a VAT number, we have to charge them VAT (Value Added Tax). This is what I'm trying:
import stripe
stripe.api_key = 'sk_test_xxx'
stripe.api_version = '2015-10-16'
product = stripe.Product.create(
id='product',
name='Product',
shippable=False
)
sku = stripe.SKU.create(
product='product',
price=100,
currency='eur',
inventory={'type': 'infinite'}
)
customer = stripe.Customer.create(
email='customer#example.org',
description="Customer"
)
order = stripe.Order.create(
customer=customer.id,
currency='eur',
items=[
{
'type': 'sku',
'quantity': 5,
'parent': sku.id,
'amount': 500
},
{
'type': 'tax',
'description': "20% VAT",
'amount': 100
}
]
)
The Order creation call gives me:
stripe.error.InvalidRequestError: Request req_xxx: Items of type tax are not supported at order creation.
When I replace the last order creation call without the tax:
order = stripe.Order.create(
customer=customer.id,
currency='eur',
items=[
{
'type': 'sku',
'quantity': 5,
'parent': sku.id,
'amount': 500
}
]
)
I'm getting back these order['items']:
[
{
"amount": 500,
"currency": "eur",
"description": "Product",
"object": "order_item",
"parent": "sku_xxx",
"quantity": 5,
"type": "sku"
},
{
"amount": 0,
"currency": "eur",
"description": "Taxes (included)",
"object": "order_item",
"parent": null,
"quantity": null,
"type": "tax"
},
{
"amount": 0,
"currency": "eur",
"description": "Free shipping",
"object": "order_item",
"parent": "ship_free-shipping",
"quantity": null,
"type": "shipping"
}
]
However, an order does not allow updating the items field after the order has been created.
What's the correct and semantic way to add VAT to the order items?
I contacted Stripe support and this should now be possible in a private beta. You can ask Stripe to join the taxes beta.
After joining, you can access documentation here: https://stripe.com/docs/relay#shipping-and-taxes and here: https://stripe.com/docs/relay/dynamic-shipping-taxes#order-creation-event.
There will be an option in your Stripe dashboard (Relay settings) to specify a "dynamic" taxes webhook where Stripe sends an Order to, and your server should then respond with a Order Item containing a tax entry. The webhook is hit immediately after creating an Order.