Hi I have below collections and relation
broadcast (24,518 doc)
videoGroup (5,699 doc)
episode (124,893 doc)
videoClip (485,878 doc)
character (55,541 doc)
And Their collections has relation each other, below
broadcast has many videoGroup(m:n), so I create broadcastToVideoGroup Edge Collection
Collection
videoGroup has many episode(1:n), so I create videoGroupToEpisode EdgeCollection
episode has many videoClip(1:n), so I create episodeToVideoClip EdgeCollection
I fetched below query for all joined result
FOR b IN broadcast
FILTER b.reg_title > NULL
return merge(b, {series: (
FOR s IN OUTBOUND b._id tvToSeries
return merge(s, {episode: (
FOR e IN OUTBOUND s._id seriesToEpisode
return merge(e, {clip: (
FOR c IN OUTBOUND e._id episodeToClip
return c
)})
)})
)})
The explain is below
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 24518 - FOR b IN broadcast /* full collection scan */
7 SubqueryNode 24518 - LET #2 = ... /* subquery */
3 SingletonNode 1 * ROOT
4 CalculationNode 1 - LET #13 = b.`_id` /* attribute expression */ /* collections used: b : broadcast */
5 TraversalNode 5 - FOR c /* vertex */ IN 1..1 /* min..maxPathDepth */ OUTBOUND #13 /* startnode */ broadcastToCharacter
6 ReturnNode 5 - RETURN c
24 SubqueryNode 24518 - LET #11 = ... /* subquery */
8 SingletonNode 1 * ROOT
9 CalculationNode 1 - LET #17 = b.`_id` /* attribute expression */ /* collections used: b : broadcast */
10 TraversalNode 1 - FOR s /* vertex */ IN 1..1 /* min..maxPathDepth */ OUTBOUND #17 /* startnode */ broadcastToVideoGroup
21 SubqueryNode 1 - LET #9 = ... /* subquery */
11 SingletonNode 1 * ROOT
12 CalculationNode 1 - LET #21 = s.`_id` /* attribute expression */
13 TraversalNode 25 - FOR e /* vertex */ IN 1..1 /* min..maxPathDepth */ OUTBOUND #21 /* startnode */ videoGroupToEpisode
18 SubqueryNode 25 - LET #7 = ... /* subquery */
14 SingletonNode 1 * ROOT
15 CalculationNode 1 - LET #25 = e.`_id` /* attribute expression */
16 TraversalNode 8 - FOR c /* vertex */ IN 1..1 /* min..maxPathDepth */ OUTBOUND #25 /* startnode */ episodeToClip
17 ReturnNode 8 - RETURN c
19 CalculationNode 25 - LET #29 = MERGE(e, { "clips" : #7 }) /* simple expression */
20 ReturnNode 25 - RETURN #29
22 CalculationNode 1 - LET #31 = MERGE(s, { "episodes" : #9 }) /* simple expression */
23 ReturnNode 1 - RETURN #31
25 CalculationNode 24518 - LET #33 = MERGE(b, { "character" : #2, "videoGroup" : #11 }) /* simple expression */ /* collections used: b : broadcast */
26 ReturnNode 24518 - RETURN #33
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
5 edge broadcastToCharacter false false 19.42 % [ `_from` ] base OUTBOUND
10 edge broadcastToVideoGroup false false 90.89 % [ `_from` ] base OUTBOUND
13 edge videoGroupToEpisode false false 3.99 % [ `_from` ] base OUTBOUND
16 edge episodeToClip false false 11.55 % [ `_from` ] base OUTBOUND
In execution plan, I wonder that why plan is not 1 but 25 at id 13(TravasalNode) Estimation.
Is not 1 for ArangoDB edge collection lookup?
Related
I have a query that runs well in single-instance setup. However, when I tried to run it on a sharded cluster, the performance dropped (4x longer execution time).
The query plan shows that practically all processing is done on Coordinator node, not on DbServer.
How to push the query to be executed at DbServer?
To give a bit of a context: I have a collection of ~120k (will grow to several millions) of multi-level JSON documents with nested arrays. And the query needs to unnest these arrays before getting to the proper node.
AQL Query:
for doc IN doccollection
for arrayLevel1Elem in doc.report.container.children.container
for arrayLevel2Elem in arrayLevel1Elem.children.container.children.num
for arrayLevel3Elem in arrayLevel2Elem.children.code
filter doc.report.container.concept.simpleCodedValue == 'A'
filter arrayLevel1Elem.concept.codedValue == "B"
filter arrayLevel2Elem.concept.simpleCodedValue == "C"
filter arrayLevel3Elem.concept.simpleCodedValue == 'X'
filter arrayLevel3Elem.value.simpleCodedValue == 'Y'
collect studyUid = doc.report.study.uid, personId = doc.report.person.id, metricName = arrayLevel2Elem.concept.meaning, value = to_number(arrayLevel2Elem.value)
return {studyUid, personId, metricName, value}
Query Plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
2 EnumerateCollectionNode DBS 121027 - FOR doc IN doccollection /* full collection scan, projections: `report`, 2 shard(s) */ FILTER (doc.`report`.`container`.`concept`.`simpleCodedValue` == "A") /* early pruning */
3 CalculationNode DBS 121027 - LET #8 = doc.`report`.`container`.`children`.`container` /* attribute expression */ /* collections used: doc : doccollection */
19 CalculationNode DBS 121027 - LET #24 = doc.`report`.`study`.`uid` /* attribute expression */ /* collections used: doc : doccollection */
20 CalculationNode DBS 121027 - LET #26 = doc.`report`.`person`.`id` /* attribute expression */ /* collections used: doc : doccollection */
29 RemoteNode COOR 121027 - REMOTE
30 GatherNode COOR 121027 - GATHER /* parallel, unsorted */
4 EnumerateListNode COOR 12102700 - FOR arrayLevel1Elem IN #8 /* list iteration */
11 CalculationNode COOR 12102700 - LET #16 = (arrayLevel1Elem.`concept`.`codedValue` == "B") /* simple expression */
12 FilterNode COOR 12102700 - FILTER #16
5 CalculationNode COOR 12102700 - LET #10 = arrayLevel1Elem.`children`.`container`.`children`.`num` /* attribute expression */
6 EnumerateListNode COOR 1210270000 - FOR arrayLevel2Elem IN #10 /* list iteration */
13 CalculationNode COOR 1210270000 - LET #18 = (arrayLevel2Elem.`concept`.`simpleCodedValue` == "C") /* simple expression */
14 FilterNode COOR 1210270000 - FILTER #18
7 CalculationNode COOR 1210270000 - LET #12 = arrayLevel2Elem.`children`.`code` /* attribute expression */
21 CalculationNode COOR 1210270000 - LET #28 = arrayLevel2Elem.`concept`.`meaning` /* attribute expression */
22 CalculationNode COOR 1210270000 - LET #30 = TO_NUMBER(arrayLevel2Elem.`value`) /* simple expression */
8 EnumerateListNode COOR 121027000000 - FOR arrayLevel3Elem IN #12 /* list iteration */
15 CalculationNode COOR 121027000000 - LET #20 = ((arrayLevel3Elem.`concept`.`simpleCodedValue` == "X") && (arrayLevel3Elem.`value`.`simpleCodedValue` == "Y")) /* simple expression */
16 FilterNode COOR 121027000000 - FILTER #20
23 CollectNode COOR 96821600000 - COLLECT studyUid = #24, personId = #26, metricName = #28, value = #30 /* hash */
26 SortNode COOR 96821600000 - SORT studyUid ASC, personId ASC, metricName ASC, value ASC /* sorting strategy: standard */
24 CalculationNode COOR 96821600000 - LET #32 = { "studyUid" : studyUid, "personId" : personId, "metricName" : metricName, "value" : value } /* simple expression */
25 ReturnNode COOR 96821600000 - RETURN #32
Thanks a lot for any hint.
Queries are not actually executed at the DB server - the coordinators handle query compilation and execution, only really asking the DB server(s) for data.
This means memory load for query execution happens on the coordinators (good!) but that the coordinator has to transport (sometimes LARGE amounts of) data across the network. This is THE BIGGEST downside to moving to a cluster - and not one that is easily solved.
I walked this same road in the beginning and found ways to optimize some of my queries, but in the end, it was easier to go with a "one-shard" cluster or an "active-failover" setup.
It's tricky to make architecture suggestions because each use case can be so different, but there are some general AQL guidelines I follow:
Collecting FOR and FILTER statements is not recommended (see #2). Try this version to see if it runs any faster (and try indexing report.container.concept.simpleCodedValue) :
FOR doc IN doccollection
FILTER doc.report.container.concept.simpleCodedValue == 'A'
FOR arrayLevel1Elem in doc.report.container.children.container
FILTER arrayLevel1Elem.concept.codedValue == 'B'
FOR arrayLevel2Elem in arrayLevel1Elem.children.container.children.num
FILTER arrayLevel2Elem.concept.simpleCodedValue == 'C'
FOR arrayLevel3Elem in arrayLevel2Elem.children.code
FILTER arrayLevel3Elem.concept.simpleCodedValue == 'X'
FILTER arrayLevel3Elem.value.simpleCodedValue == 'Y'
COLLECT
studyUid = doc.report.study.uid,
personId = doc.report.person.id,
metricName = arrayLevel2Elem.concept.meaning,
value = to_number(arrayLevel2Elem.value)
RETURN { studyUid, personId, metricName, value }
The FOR doc IN doccollection pattern will recall the ENTIRE document from the DB server for each item in doccollection. Best practice is to either limit the number of documents you are retrieving (best done with an index-backed search) and/or return only a few attributes. Don't be afraid of using LET - in-memory on the coordinator can be faster than in-memory on the DB. This example does both - filters and returns a smaller set of data:
LET filteredDocs = (
FOR doc IN doccollection
FILTER doc.report.container.concept.simpleCodedValue == 'A'
RETURN {
study_id: doc.report.study.uid,
person_id: doc.report.person.id,
arrayLevel1: doc.report.container.children.container
}
)
FOR doc IN filteredDocs
FOR arrayLevel1Elem in doc.arrayLevel1
FILTER arrayLevel1Elem.concept.codedValue == 'B'
...
Below are few details.
Query 1: Using Graph Traversal attached execution plan as well.
Here i am using an edge between CollectionA and CollectionB.
Query string:
for u in CollectionA filter u.FilterA == #opId and u.FilterB >= #startTimeInLong and u.FilterB <= #endTimeInLong
for v in 1..1 OUTBOUND u CollectionALinksCollectionB
filter
v.FilterC == null return v
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
9 IndexNode 45088 - FOR u IN CollectionA /* skiplist index scan */
5 TraversalNode 1 - FOR v /* vertex */ IN 1..1 /* min..maxPathDepth */ OUTBOUND u /* startnode */ CollectionALinksCollectionB
6 CalculationNode 1 - LET #6 = (v.`ReceivedRating` == null) /* simple expression */
7 FilterNode 1 - FILTER #6
8 ReturnNode 1 - RETURN v
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
9 skiplist CollectionA false false 100.00 % [ `FilterA`, `FilterB` ] ((u.`FilterA` == "8277") && (u.`FilterB` >= 1526947200000) && (u.`FilterB` <= 1541030400000))
5 edge CollectionALinksCollectionB false false 100.00 % [ `_from` ] base OUTBOUND
Traversals on graphs:
Id Depth Vertex collections Edge collections Options Filter conditions
5 1..1 CollectionALinksCollectionB uniqueVertices: none, uniqueEdges: path
Optimization rules applied:
Id RuleName
1 use-indexes
2 remove-filter-covered-by-index
3 remove-unnecessary-calculations-2
Query 2:
Query string:
for u in CollectionA filter u.FilterA == #opId and u.FilterB >= #startTimeInLong and
u.FilterB <= #endTimeInLong
for v in CollectionB
filter
v._key==u._key and
v.FilterC == null return v
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
8 CalculationNode 1 - LET #6 = CollectionB /* all collection documents */ /* v8 expression */
11 IndexNode 45088 - FOR u IN CollectionA /* skiplist index scan */
10 IndexNode 45088 - FOR v IN CollectionB /* primary index scan, scan only */
12 CalculationNode 45088 - LET #4 = (CollectionB /* all collection documents */.`FilterC` == null) /* v8 expression */
7 FilterNode 45088 - FILTER #4
9 ReturnNode 45088 - RETURN #6
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
11 skiplist CollectionA false false 100.00 % [ `FilterA`, `FilterB` ] ((u.`FilterA` == "8277") && (u.`FilterB` >= 1526947200000) && (u.`FilterB` <= 1541030400000))
10 primary CollectionB true false 100.00 % [ `_key` ] (CollectionB.`_key` == u.`_key`)
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 use-indexes
3 remove-filter-covered-by-index
4 remove-unnecessary-calculations-2
How Does Query 1 perform better than Query 2. Also, the query result is almost similar for smaller dataset but Query 1 performs better with larger data.
Can some one explain me in detail how does Graph traversing help here
I'm writing a keyboard-events parser for Linux, using node.js. It's working somewhat okay, but sometimes it seems like node is skipping a few bytes. I'm using a ReadStream to get the data, handle it, process it, and eventually output it when a separator character is encountered (in my case, \n).
Here is the part of my class that handles the read data:
// This method is called through this callback:
// this.readStream = fs.createReadStream(this.path);
// this.readStream.on("data", function(a) { self.parse_data(self, a); });
EventParser.prototype.parse_data = function(self, data)
{
/*
* Data format :
* {
* 0x00 : struct timeval time { long sec (8), long usec (8) } (8 bytes)
* 0x08 : __u16 type (2 bytes)
* 0x10 : __u16 code (2 bytes)
* 0x12 : __s32 value (4 bytes)
* } = (16 bytes)
*/
var dataBuffer = new Buffer(data);
var slicedBuffer = dataBuffer.slice(0, 16);
dataBuffer = dataBuffer.slice(16, dataBuffer.length);
while (dataBuffer.length > 0 && slicedBuffer.length == 16)
{
var type = GetDataType(slicedBuffer),
code = GetDataCode(slicedBuffer),
value = GetDataValue(slicedBuffer);
if (type == CST.EV.KEY)
{ // Key was pressed: KEY event type
if (code == 42 && value == 1) { self.shift_pressed = true; }
if (code == 42 && value == 0) { self.shift_pressed = false; }
console.log(type + "\t" + code + "\t" + value + "\t(" + GetKey(self.shift_pressed, code) + ")")
// GetKey uses a static array to get the actual character
// based on whether the shift key is held or not
if (value == 1)
self.handle_processed_data(GetKey(self.shift_pressed, code));
// handle_processed_data adds characters together, and outputs the string when encountering a
// separator character (in this case, '\n')
}
// Take a new slice, and loop.
slicedBuffer = dataBuffer.slice(0, 16);
dataBuffer = dataBuffer.slice(16, dataBuffer.length);
}
}
// My system is in little endian!
function GetDataType(dataBuffer) { return dataBuffer.readUInt16LE(8); }
function GetDataCode(dataBuffer) { return dataBuffer.readUInt16LE(10); }
function GetDataValue(dataBuffer) { return dataBuffer.readInt32LE(12); }
I'm basically filling up the data structure explained at the top using a Buffer. The interesting part is the console.log near the end, which will print everything interesting (related to the KEY event) that passes in our callback! Here is the result of such log, complete with the expected result, and the actual result:
EventParserConstructor: Listening to /dev/input/event19
/* Expected result: CODE-128 */
/* Note that value 42 is the SHIFT key */
1 42 1 ()
1 46 1 (C)
1 42 0 ()
1 46 0 (c)
1 42 1 ()
1 24 1 (O)
1 42 0 ()
1 24 0 (o)
1 42 1 ()
1 32 1 (D)
1 42 0 ()
1 32 0 (d)
1 42 1 ()
1 18 1 (E)
1 42 0 ()
1 18 0 (e)
1 12 0 (-)
1 2 0 (1)
1 3 1 (2)
1 3 0 (2)
1 9 1 (8)
1 9 0 (8)
1 28 1 (
)
[EventParser_Handler]/event_parser.handle_processed_data: CODE28
/* Actual result: CODE28 */
/* The '-' and '1' events can be seen in the logs, but only */
/* as key RELEASED (value: 0), not key PRESSED */
We can clearly see the - and 1 character events passing by, but only as key releases (value: 0), not key presses. The weirdest thing is that most of the time, the events are correctly translated. But 10% of the time, this happens.
Is ReadStream eating up some bytes, occasionally? If yes, what alternative should I be using?
Thanks in advance!
Well, turns out that my loop was rotten.
I was assuming that the data would only come in chunks of 16 bytes... Which obviously isn't always the case. So sometimes, I had packets of <16 bytes being left over and lost between two 'data' event callbacks.
I added this by adding an excessBuffer field to my class and using it to fill my initial slicedBuffer when receiving data.
I have a collection which holds more than 15 million documents. Out of those 15 million documents I update 20k records every hour. But update query takes a long time to finish (30 min around).
Document:
{ "inst" : "instance1", "dt": "2015-12-12T00:00:000Z", "count": 10}
I have an array which holds 20k instances to be updated.
My Query looks like this:
For h in hourly filter h.dt == DATE_ISO8601(14501160000000)
For i in instArr
filter i.inst == h.inst
update h with {"inst":i.inst, "dt":i.dt, "count":i.count} in hourly
Is there any optimized way of doing this. I have hash indexing on inst and skiplist indexing on dt.
Update
I could not use 20k inst in the query manually so following is the execution plan for just 2 inst:
FOR r in hourly FILTER r.dt == DATE_ISO8601(1450116000000) FOR i IN
[{"inst":"0e649fa22bcc5200d7c40f3505da153b", "dt":"2015-12-14T18:00:00.000Z"}, {}] FILTER i.inst ==
r.inst UPDATE r with {"inst":i.inst, "dt": i.dt, "max":i.max, "min":i.min, "sum":i.sum, "avg":i.avg,
"samples":i.samples} in hourly OPTIONS { ignoreErrors: true } RETURN NEW.inst
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
5 CalculationNode 1 - LET #6 = [ { "inst" : "0e649fa22bcc5200d7c40f3505da153b", "dt" : "2015-12-14T18:00:00.000Z" }, { } ] /* json expression */ /* const assignment */
13 IndexRangeNode 103067 - FOR r IN hourly /* skiplist index scan */
6 EnumerateListNode 206134 - FOR i IN #6 /* list iteration */
7 CalculationNode 206134 - LET #8 = i.`inst` == r.`inst` /* simple expression */ /* collections used: r : hourly */
8 FilterNode 206134 - FILTER #8
9 CalculationNode 206134 - LET #10 = { "inst" : i.`inst`, "dt" : i.`dt`, "max" : i.`max`, "min" : i.`min`, "sum" : i.`sum`, "avg" : i.`avg`, "samples" : i.`samples` } /* simple expression */
10 UpdateNode 206134 - UPDATE r WITH #10 IN hourly
11 CalculationNode 206134 - LET #12 = $NEW.`inst` /* attribute expression */
12 ReturnNode 206134 - RETURN #12
Indexes used:
Id Type Collection Unique Sparse Selectivity Est. Fields Ranges
13 skiplist hourly false false n/a `dt` [ `dt` == "2015-12-14T18:00:00.000Z" ]
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-filters-up
3 move-calculations-up-2
4 move-filters-up-2
5 remove-data-modification-out-variables
6 use-index-range
7 remove-filter-covered-by-index
Write query options:
Option Value
ignoreErrors true
waitForSync false
nullMeansRemove false
mergeObjects true
ignoreDocumentNotFound false
readCompleteInput true
I assume the selection part (not the update part) will be the bottleneck in this query.
The query seems problematic because for each document matching the first filter (h.dt == DATE_ISO8601(...)), there will be an iteration over the 20,000 values in the instArr array. If instArr values are unique, then only one value from it will match. Additionally, no index will be used for the inner loop, as the index selection has happened in the outer loop already.
Instead of looping over all values in instArr, it will be better to turn the accompanying == comparison into an IN comparison. That would already work if instArr would be an array of instance names, but it seems to be an array of instance objects (consisting of at least attributes inst and count). In order to use the instance names in an IN comparison, it would be better to have a dedicated array of instance names, and a translation table for the count and dt values.
Following is an example for generating these with JavaScript:
var instArr = [ ], trans = { };
for (i = 0; i < 20000; ++i) {
var instance = "instance" + i;
var count = Math.floor(Math.random() * 10);
var dt = (new Date(Date.now() - Math.floor(Math.random() * 10000))).toISOString();
instArr.push(instance);
trans[instance] = [ count, dt ];
}
instArr would then look like this:
[ "instance0", "instance1", "instance2", ... ]
and trans:
{
"instance0" : [ 4, "2015-12-16T21:24:45.106Z" ],
"instance1" : [ 0, "2015-12-16T21:24:39.881Z" ],
"instance2" : [ 2, "2015-12-16T21:25:47.915Z" ],
...
}
These data can then be injected into the query using bind variables (named like the variables above):
FOR h IN hourly
FILTER h.dt == DATE_ISO8601(1450116000000)
FILTER h.inst IN #instArr
RETURN #trans[h.inst]
Note that ArangoDB 2.5 does not yet support the #trans[h.inst] syntax. In that version, you will need to write:
LET trans = #trans
FOR h IN hourly
FILTER h.dt == DATE_ISO8601(1450116000000)
FILTER h.inst IN #instArr
RETURN trans[h.inst]
Additionally, 2.5 has a problem with longer IN lists. IN-list performance decreases quadratically with the length of the IN list. So in this version, it will make sense to limit the length of instArr to at most 2,000 values. That may require issuing multiple queries with smaller IN lists instead of just one with a big IN list.
The better alternative would be to use ArangoDB 2.6, 2.7 or 2.8, which do not have that problem, and thus do not require the workaround. Apart from that, you can get away with the slightly shorter version of the query in the newer ArangoDB versions.
Also note that in all of the above examples I used a RETURN ... instead of the UPDATE statement from the original query. This is because all my tests revealed that the selection part of the query is the major problem, at least with the data I had generated.
A final note on the original version of the UPDATE: updating each document's inst value with i.inst seems redudant, because i.inst == h.inst so the value won't change.
I am trying to implement the CORDIC method in rust using this c implementation as an example, however i am having rounding error issues when generating the table. Here is my code and the results.
fn generate_table () {
let pi: f32 = 3.1415926536897932384626;
let k1: f32 = 0.6072529350088812561694; // 1/k
let num_bits: uint = 32;
let num_elms: uint = num_bits;
let mul: f32 = 1073741824.000000; // 1<<(bits-2)
println!("Cordic sin in rust");
println!("num bits {}", num_bits);
println!("mul {}", mul);
println!("pi is {}", pi);
println!("k1 is {}", k1);
let shift: f32 = 2.0;
for ii in range(0, num_bits) {
let ipow: f32 = 1f32/shift.powi(ii as i32);
let cur: f32 = ipow.atan() * mul;
//println!("table values {:.10f}", cur);
println!("table values 0x{}", std::f32::to_str_hex(cur));
}
}
fn main() {
generate_table();
}
which gives me the following table, notice the first and last values to see the biggest errors.
table values 0x3243f6c0
table values 0x1dac6700
table values 0xfadbb00
table values 0x7f56ea8
table values 0x3feab78
table values 0x1ffd55c
table values 0xfffaab
table values 0x7fff55.8
table values 0x3fffea.c
table values 0x1ffffd.6
table values 0xfffff.b
table values 0x7ffff.f8
table values 0x40000
table values 0x20000
table values 0x10000
table values 0x8000
table values 0x4000
table values 0x2000
table values 0x1000
table values 0x800
table values 0x400
table values 0x200
table values 0x100
table values 0x80
table values 0x40
table values 0x20
table values 0x10
table values 0x8
table values 0x4
table values 0x2
table values 0x1
table values 0x0.8
Why am i getting these (rounding?) errors and how do i fix them?
The quick answer: How certain are you that you are feeding identical inputs into both computations? In particular, the C implementation you quote says:
int mul = (1<<(bits-2));
while you have hard-coded:
let mul: f32 = 1073741824.000000; // 1<<(bits-2)
Note 1: You have changed the type of mul from an int to an f32.
Note 2: In the output I get when I run your program, I see this:
mul 1073741844
Notably, this is different from the hard-coded constant you wrote above; it is off by 20.
My usual way to debug a problem like this, and that I actually did in this case before I noticed the problem above, is to instrument both the C and the Rust versions of the code with printouts of the values of each intermediate expression, in order to identify where things start to differ and therefore narrow down which operation is introducing an "error."
In this case, it involved modifying the C code and the Rust code in parallel to print out a table of not just the i (or ii in the Rust version) and the output c, but also every intermediate result.
Here is the code for each of those, along with the output tables I got in the end. (But then it was only analyzing those tables that I realized that the two mul values differed!)
C code:
#include <stdio.h>
#include <math.h>
#define PI 3.1415926536897932384626
#define K1 0.6072529350088812561694
int main(int argc, char **argv)
{
int i;
int bits = 32; // number of bits
int mul = (1<<(bits-2));
int n = bits; // number of elements.
int c;
printf("Cordic sin in C\n");
printf("num bits %d\n", bits);
printf("mul %d\n", mul);
printf("pi is %g\n", PI);
printf("k1 is %g\n", K1);
float shift = 2.0;
printf("computing c = atan(pow(2, -i)) * mul\n");
printf(" i \t c p a c2\n");
for(i=0;i<n;i++)
{
c = (atan(pow(2, -i)) * mul);
int neg_i = -i;
double p = pow(2, neg_i);
double a = atan(p);
int c2 = a * mul;;
printf("% 8d \t 0x%08X % 12g % 12g 0x%08X\n", i, c, p, a, c2);
}
}
Rust code:
fn generate_table () {
let pi: f32 = 3.1415926536897932384626;
let k1: f32 = 0.6072529350088812561694; // 1/k
let num_bits: uint = 32;
let num_elms: uint = num_bits;
let mul: f32 = 1073741824.000000; // 1<<(bits-2)
println!("Cordic sin in rust");
println!("num bits {}", num_bits);
println!("mul {}", mul);
println!("1 << (bits - 2): {}", (1i << (num_bits-2)) as f32);
println!("pi is {}", pi);
println!("k1 is {}", k1);
let shift: f32 = 2.0;
println!("computing c = (1f32/shift.powi(ii as i32)).atan() * mul");
println!(" i \t c p a c2\n");
for ii in range(0, num_bits) {
let ipow: f32 = 1f32/shift.powi(ii as i32);
let cur: f32 = ipow.atan() * mul;
let a = ipow.atan();
let c2 = a * mul;
//println!("table values {:.10f}", cur);
// println!("table values 0x{}", std::f32::to_str_hex(cur));
println!("{:8u} \t 0x{:8s} {:12f} {:12f} 0x{:8s}",
ii,
std::f32::to_str_hex(cur),
ipow,
a,
std::f32::to_str_hex(c2),
);
}
}
fn main() {
generate_table();
}
Tables generated:
% gcc gentable2.c && ./a.out
Cordic sin in C
num bits 32
mul 1073741824
pi is 3.14159
k1 is 0.607253
computing c = atan(pow(2, -i)) * mul
i c p a c2
0 0x3243F6A8 1 0.785398 0x3243F6A8
1 0x1DAC6705 0.5 0.463648 0x1DAC6705
2 0x0FADBAFC 0.25 0.244979 0x0FADBAFC
3 0x07F56EA6 0.125 0.124355 0x07F56EA6
4 0x03FEAB76 0.0625 0.0624188 0x03FEAB76
5 0x01FFD55B 0.03125 0.0312398 0x01FFD55B
6 0x00FFFAAA 0.015625 0.0156237 0x00FFFAAA
7 0x007FFF55 0.0078125 0.00781234 0x007FFF55
8 0x003FFFEA 0.00390625 0.00390623 0x003FFFEA
9 0x001FFFFD 0.00195312 0.00195312 0x001FFFFD
10 0x000FFFFF 0.000976562 0.000976562 0x000FFFFF
11 0x0007FFFF 0.000488281 0.000488281 0x0007FFFF
12 0x0003FFFF 0.000244141 0.000244141 0x0003FFFF
13 0x0001FFFF 0.00012207 0.00012207 0x0001FFFF
14 0x0000FFFF 6.10352e-05 6.10352e-05 0x0000FFFF
15 0x00007FFF 3.05176e-05 3.05176e-05 0x00007FFF
16 0x00003FFF 1.52588e-05 1.52588e-05 0x00003FFF
17 0x00001FFF 7.62939e-06 7.62939e-06 0x00001FFF
18 0x00000FFF 3.8147e-06 3.8147e-06 0x00000FFF
19 0x000007FF 1.90735e-06 1.90735e-06 0x000007FF
20 0x000003FF 9.53674e-07 9.53674e-07 0x000003FF
21 0x000001FF 4.76837e-07 4.76837e-07 0x000001FF
22 0x000000FF 2.38419e-07 2.38419e-07 0x000000FF
23 0x0000007F 1.19209e-07 1.19209e-07 0x0000007F
24 0x0000003F 5.96046e-08 5.96046e-08 0x0000003F
25 0x0000001F 2.98023e-08 2.98023e-08 0x0000001F
26 0x0000000F 1.49012e-08 1.49012e-08 0x0000000F
27 0x00000008 7.45058e-09 7.45058e-09 0x00000008
28 0x00000004 3.72529e-09 3.72529e-09 0x00000004
29 0x00000002 1.86265e-09 1.86265e-09 0x00000002
30 0x00000001 9.31323e-10 9.31323e-10 0x00000001
31 0x00000000 4.65661e-10 4.65661e-10 0x00000000
% rustc gentable.rs && ./gentable
gentable.rs:5:9: 5:17 warning: unused variable: `num_elms`, #[warn(unused_variables)] on by default
gentable.rs:5 let num_elms: uint = num_bits;
^~~~~~~~
Cordic sin in rust
num bits 32
mul 1073741844
1 << (bits - 2): 1073741844
pi is 3.141593
k1 is 0.607253
computing c = (1f32/shift.powi(ii as i32)).atan() * mul
i c p a c2
0 0x3243f6c0 1 0.785398 0x3243f6c0
1 0x1dac6700 0.5 0.463648 0x1dac6700
2 0xfadbb00 0.25 0.244979 0xfadbb00
3 0x7f56ea8 0.125 0.124355 0x7f56ea8
4 0x3feab78 0.0625 0.062419 0x3feab78
5 0x1ffd55c 0.03125 0.03124 0x1ffd55c
6 0xfffaab 0.015625 0.015624 0xfffaab
7 0x7fff55.8 0.007813 0.007812 0x7fff55.8
8 0x3fffea.c 0.003906 0.003906 0x3fffea.c
9 0x1ffffd.6 0.001953 0.001953 0x1ffffd.6
10 0xfffff.b 0.000977 0.000977 0xfffff.b
11 0x7ffff.f8 0.000488 0.000488 0x7ffff.f8
12 0x40000 0.000244 0.000244 0x40000
13 0x20000 0.000122 0.000122 0x20000
14 0x10000 0.000061 0.000061 0x10000
15 0x8000 0.000031 0.000031 0x8000
16 0x4000 0.000015 0.000015 0x4000
17 0x2000 0.000008 0.000008 0x2000
18 0x1000 0.000004 0.000004 0x1000
19 0x800 0.000002 0.000002 0x800
20 0x400 0.000001 0.000001 0x400
21 0x200 0 0 0x200
22 0x100 0 0 0x100
23 0x80 0 0 0x80
24 0x40 0 0 0x40
25 0x20 0 0 0x20
26 0x10 0 0 0x10
27 0x8 0 0 0x8
28 0x4 0 0 0x4
29 0x2 0 0 0x2
30 0x1 0 0 0x1
31 0x0.8 0 0 0x0.8
%