Point H/V capped at value 32768 pre-scaling - adobe-pdf-library

The Point H/V properties from a given Path in a PDF seem capped at value of 32768 pre-scaling from a matrix transform. I'm trying to read the Point information for certain PDF's where it seems to be limiting the Point data incorrectly. When I try transforming the point using transform matrix for the associated element, the matrix appears to be applied to the capped value rather than the true underlying value. A given point as a reported by the library might have an H or V value greater than 32768 where the scaling value might be something like 0.006.
Is there a way to access a Point with a H or V value greater than 32768 before it is scaled? Or even getting the correct scaled value would be okay.
I've seeing this behavior in version 15.0.4PlusP4k and other 15.0.4.x versions.

Yes, there is are two interfaces to access the points in a path. PDEPathGetData returns a list of ASFixed values, and hence is limited to the ASFixed range (as you found). PDEPathGetDataEx (PDEPath path, ASReal *Data, int DataLen) will return the same array of Points, but they will not be limited to the ASFixed range.
Also, I should point out that Datalogics Support is always available to answer these types of questions from customers. Both on-line, and by Phone.

Related

Data Structure Question: Is there a link between the size of a list in a chaining implementation of hash maps and its load factor?

For example, if I have n keys and m slots in the hash map, the average size of a linked list starting from a slot would be n/m. Am I correct in thinking this? Again, I'm talking about an average. Thanks in advance!
I'm trying to learn data structures.
As you say, the average size of a single list is generally going to be the table's load factor; but this is assuming that the "Simple Uniform Hashing Assumption" holds with your hash table (more specifically, with its hash function(s) and expected input keys): simply put, we assume that the hash function distributes elements to buckets uniformly, as well as independently of one another.
To expand a little, and in different words:
We assume that if we choose a new item randomly (imagine sampling an item from the probability distribution that characterizes our inputs), then there is an equal chance that the item we end up with will be mapped to any of the m buckets. (A chance of 1/m.)
Furthermore, that this probability is unaffected given the presence (or absence) of any other elements in any of the buckets.
This is helpful because from this we can conclude that the probability for an item to be sorted into a given bucket is always 1/m, regardless of any other circumstances; from this it directly follows that the expected (average) length of a single bucket's list will be n/m (we insert n elements into the table, and for each one, sort it into this given list at a probability of 1/m).
To see that this is important, we might imagine a case in which it doesn't hold: for instance, if we're facing some kind of "attack" and our inputs are engineered to all hash into the same bucket, or even just with a high probability. In this case SUHA no longer holds, and clearly neither does the link you've asked about between the length of a list and the load factor.
This is part of the reason that it is important to choose a good hash function for your use case: without it, the assumption may not hold which could have a harmful effect on your lookup times.

Python: How is a ~ used to exclude data?

In the below code I know that it is returning all records that are outside of the buffer, but I'm confused as to the mechanics of how that is happening.
I see that there is a "~" (aka a bitwise not) being used. From some googling my understanding of ~ is that it returns the inverse of each bit in the input it is passed eg if the bit is a 0 it returns a 1. Is this correct if not could someone please ELI5?
Could someone please explain the actual mechanics of how the below code is returning records that are outside of the "my_union" buffer?
NOTE: hospitals and collisions are just geo dataframes.
coverage = gpd.GeoDataFrame(geometry=hospitals.geometry).buffer(10000)
my_union = coverage.geometry.unary_union
outside_range = collisions.loc[~collisions["geometry"].apply(lambda x: my_union.contains(x))]
~ does indeed perform a bitwise not in python. But here it is used to perform a logical not on each element of a list (or rather pandas Series) of booleans. See this answer for an example.
Let's assume the collisions GeoDataFrame contains points, but it will work similarly for other types of geometries.
Let me further change the code a bit:
coverage = gpd.GeoDataFrame(geometry=hospitals.geometry).buffer(10000)
my_union = coverage.geometry.unary_union
within_my_union = collisions["geometry"].apply(lambda x: my_union.contains(x))
outside_range = collisions.loc[~within_my_union]
Then:
my_union is a single (Multi)Polygon.
my_union.contains(x) returns a boolean indicating whether the point x is within the my_union MultiPolygon.
collisions["geometry"] is a pandas Series containing the points.
collisions["geometry"].apply(lambda x: my_union.contains(x)) will run my_union.contains(x) on each of these points.
This will result in another pandas Series containing booleans, indicating whether each point is within my_union.
~ then negates these booleans, so that the Series now indicates whether each point is not within my_union.
collisions.loc[~within_my_union] then selects all the rows of collisions where the entry in ~within_my_union is True, i.e. all the points that don't lie within my_union.
I'm not sure exactly what you mean by the actual mechanics and it's hard to know for sure without seeing input and output, but I had a go at explaining it below if it is helpful:
All rows from the geometry column in the collisions dataframe that contain any value in my_union will be excluded in the newly created outside_range dataframe.

STEP (ISO 10303-21) Unset Attributes

I've been building a parser for STEP-formatted data (specifically the ISO 10303-21 standard), but I've run into a roadblock revolving around a single character - '$'.
A quick Google search reveals that in STEP, this character denotes an 'unset' value, I interpreted this as an uninitialized value, however I don't know exactly what I should do with it.
For example, take the following definitions:
#111=AXIS2_PLACEMENT_3D('Circle Axis2P3D',#109,#110,$) ;
#109=CARTESIAN_POINT('Axis2P3D Location',(104.14,0.,0.)) ;
#110=DIRECTION('Axis2P3D Direction',(1.,-0.,0.)) ;
To me I cannot see how this would even work, as the reference direction is uninitialized, and therefore an x-axis cannot be derived, meaning that anything using this Axis2Placement would also have undefined data.
Normally when I reach this point, I would just define some sort of default data for the given data-type (Vertices (0,0,0), Directions(1,0,0), etc.), however this doesn't seem to work, because there's the chance that my default direction would cause conflicts with the supplied data.
I have Googled what to do in this scenario, only to come up with nothing.
I also have a PDF for a very similar STEP format (ISO-10303-42), but it too doesn't mention any sort of default data, or provide any more insight in to how this works.
So, to explicitly state my question: what do I do with uninitialized data in STEP (ISO 10303-21) data?
You need to be able to represent 'unset' as a distinct value. It doesn't mean the same thing as an uninitialized value or a default value. For example you might represent AXIS2_PLACEMENT_3D as an object with data members that are pointers to point to CARTESIAN_POINT and DIRECTION, and the $ means that that pointer will be null in your representation.
Dealing with null values will depend on the context. It may be some kind of error if the data is really necessary. Or it may be that the data isn't necessary, such as if you don't need the axis to be oriented, and a point and direction are sufficient to represent the data.
In this case: #111 is local coordinate system with following 4 attributes:
character name;
pointer to origin (#109);
pointer to axis (#110);
pointer to second axis (reference direction).
If #111 is a coordinate system of circle (I guess it with 'name' value), axis is normal of circle plane, while reference direction points to start of circle (position of zero t parameter of circle). Since a circle is closed curve, you can locate this zero t position in arbitrary place, it has no influence on circle geometric shape, reference direction in this case is not mandatory, and is omitted. And it is your choice how you handle this situation.
When $ sign is used , the value is not required. Specifically, if there are a series of optional values and you want to specify a value for, let's say, 3rd optional argument and you don't want to specify values for the 1st and 2nd optional arguments, you can use $ sign for those two.
Take a look here for a better description.

Minimal and maximal magnitude in Fortran

I'm trying to rewrite minpack Fortran77 library to Java (for my own needs), so I met this in minpack.f source code:
integer mcheps(4)
integer minmag(4)
integer maxmag(4)
double precision dmach(3)
equivalence (dmach(1),mcheps(1))
equivalence (dmach(2),minmag(1))
equivalence (dmach(3),maxmag(1))
...
data dmach(1) /2.22044604926d-16/
data dmach(2) /2.22507385852d-308/
data dmach(3) /1.79769313485d+308/
dpmpar = dmach(i)
return
What are minmag and maxmag functions, and why dmach(2) and dmach(3) have these values?
There is an explanation in comments:
c dpmpar(1) = b**(1 - t), the machine precision,
c dpmpar(2) = b**(emin - 1), the smallest magnitude,
c dpmpar(3) = b**emax*(1 - b**(-t)), the largest magnitude.
What is smallest and largest magnitude? There must be a way to count these values on runtime; machine constants in source code is a bad style.
EDIT:
I suppose that static fields Double.MIN_VALUE and Double.MAX_VALUE are those values I looked for.
minmag and maxmag (and mcheps too) are not functions, they are declared to be rank 1 integer arrays with 4 elements each. Likewise dmach is a rank 1 3 element array of double precision values. It is very likely, but not certain, that each integer value occupies 4 bytes and each d-p value 8 bytes. Bear this in mind as the answer progresses.
So an expression such as mcheps(1) is not a function call but a reference to the 1st element of an array.
equivalence is an old FORTRAN feature, now deprecated both by language standards and by software engineering practices. A statement such as
equivalence (dmach(1),mcheps(1))
states that the first element of dmach is located, in memory, at the same address as the first element of mcheps. By implication, this also means that the 24 bytes of dmach occupy the same addresses as the 16 bytes of mcheps, and another 8 bytes too. I'll leave you to draw a picture of what is going on. Note that it is conceivable that the code originally (and perhaps still) uses 8 byte integers so that the elements of the equivalenced arrays match 1:1.
Note that equivalence gives, essentially, more than one name, and more than one interpretation, to the same memory locations. mcheps(1) is the name of an integer stored in 4 bytes of memory which form part of the storage for dmach(1). Equivalencing used to be used to implement all sorts of 'clever' tricks back in the days when every byte was precious.
Then the data statements assign values to the elements of dmach. To me those values look to be just what the comment tells us they are.
EDIT: The comment indicates that those magnitudes are the smallest and largest representable double precision numbers on the platform for which the code was last compiled. I think that in Java they are probably called doubles. I don't know Java so don't know what facilities it has for returning the value of the largest and smallest doubles, if you don't know this either hit the 'net or ask another SO question -- to which you'll probably get responses along the lines of search the net.
Most of this you should be able to ignore entirely. As you write, a better approach would be to find out those values at run-time by enquiry using intrinsic functions. Fortran 90 (and later) have such functions, I imagine Java has too but that's your domain not mine.

How is integer overflow exploitable?

Does anyone have a detailed explanation on how integers can be exploited? I have been reading a lot about the concept, and I understand what an it is, and I understand buffer overflows, but I dont understand how one could modify memory reliably, or in a way to modify application flow, by making an integer larger than its defined memory....
It is definitely exploitable, but depends on the situation of course.
Old versions ssh had an integer overflow which could be exploited remotely. The exploit caused the ssh daemon to create a hashtable of size zero and overwrite memory when it tried to store some values in there.
More details on the ssh integer overflow: http://www.kb.cert.org/vuls/id/945216
More details on integer overflow: http://projects.webappsec.org/w/page/13246946/Integer%20Overflows
I used APL/370 in the late 60s on an IBM 360/40. APL is language in which essentially everything thing is a multidimensional array, and there are amazing operators for manipulating arrays, including reshaping from N dimensions to M dimensions, etc.
Unsurprisingly, an array of N dimensions had index bounds of 1..k with a different positive k for each axis.. and k was legally always less than 2^31 (positive values in a 32 bit signed machine word). Now, an array of N dimensions has an location assigned in memory. Attempts to access an array slot using an index too large for an axis is checked against the array upper bound by APL. And of course this applied for an array of N dimensions where N == 1.
APL didn't check if you did something incredibly stupid with RHO (array reshape) operator. APL only allowed a maximum of 64 dimensions. So, you could make an array of 1-64 dimension, and APL would do it if the array dimensions were all less than 2^31. Or, you could try to make an array of 65 dimensions. In this case, APL goofed, and surprisingly gave back a 64 dimension array, but failed to check the axis sizes.
(This is in effect where the "integer overflow occurred"). This meant you could create an array with axis sizes of 2^31 or more... but being interpreted as signed integers, they were treated as negative numbers.
The right RHO operator incantation applied to such an array to could reduce the dimensionaly to 1, with an an upper bound of, get this, "-1". Call this matrix a "wormhole" (you'll see why in moment). Such an wormhole array has
a place in memory, just like any other array. But all array accesses are checked against the upper bound... but the array bound check turned out to be done by an unsigned compare by APL. So, you can access WORMHOLE[1], WORMHOLE[2], ... WORMHOLE[2^32-2] without objection. In effect, you can access the entire machine's memory.
APL also had an array assignment operation, in which you could fill an array with a value.
WORMHOLE[]<-0 thus zeroed all of memory.
I only did this once, as it erased the memory containing my APL workspace, the APL interpreter, and obvious the critical part of APL that enabled timesharing (in those days it wasn't protected from users)... the terminal room
went from its normal state of mechanically very noisy (we had 2741 Selectric APL terminals) to dead silent in about 2 seconds.
Through the glass into the computer room I could see the operator look up startled at the lights on the 370 as they all went out. Lots of runnning around ensued.
While it was funny at the time, I kept my mouth shut.
With some care, one could obviously have tampered with the OS in arbitrary ways.
It depends on how the variable is used. If you never make any security decisions based on integers you have added with input integers (where an adversary could provoke an overflow), then I can't think of how you would get in trouble (but this kind of stuff can be subtle).
Then again, I have seen plenty of code like this that doesn't validate user input (although this example is contrived):
int pricePerWidgetInCents = 3199;
int numberOfWidgetsToBuy = int.Parse(/* some user input string */);
int totalCostOfWidgetsSoldInCents = pricePerWidgetInCents * numberOfWidgetsToBuy; // KA-BOOM!
// potentially much later
int orderSubtotal = whatever + totalCostOfWidgetInCents;
Everything is hunky-dory until the day you sell 671,299 widgets for -$21,474,817.95. Boss might be upset.
A common case would be code that prevents against buffer overflow by asking for the number of inputs that will be provided, and then trying to enforce that limit. Consider a situation where I claim to be providing 2^30+10 integers. The receiving system allocates a buffer of 4*(2^30+10)=40 bytes (!). Since the memory allocation succeeded, I'm allowed to continue. The input buffer check won't stop me when I send my 11th input, since 11 < 2^30+10. Yet I will overflow the actually allocated buffer.
I just wanted to sum up everything I have found out about my original question.
The reason things were confusing to me was because I know how buffer overflows work, and can understand how you can easily exploit that. An integer overflow is a different case - you cant exploit the integer overflow to add arbitrary code, and force a change in the flow of an application.
However, it is possible to overflow an integer, which is used - for example - to index an array to access arbitrary parts of memory. From here, it could be possible to use that mis-indexed array to override memory and cause the execution of an application to alter to your malicious intent.
Hope this helps.

Resources