I'm currently working on a program which is supposed to find exploits for vulnerabilities in web-applications by looking at the "Document Object Model" (DOM) of the application.
One approach for narrowing down the number of possible DB-entries follows the strategy of further filtering the entries by comparing the word-count of the DOM and the database entry.
I already have two dicts (actually Dataframes, but showing dict here for better presentation), each containing the top 10 words in descending order of their numbers of ocurrences in the text.
word_count_dom = {"Peter": 10, "is": 6, "eating": 2, ...}
word_count_db = {"eating": 6, "is": 6, "Peter": 1, "breakfast": 1, ...}
Now i would like to calculate some kind of value, which represents how similar the two dicts are while accounting for the number of occurences.
Currently im using:
len([value in word_count_db for value in word_count_dom])
>>> 3
but this does not account for the number of occurrences at all.
Looking at the example i would like the program to give more value to the "is"-match, because of the generally better "Ranking-Position to Number of Occurences"-value.
Just an idea:
Compute for each dict the relative probability of each entry to occur (e.g. among all the top counts "Peter" occurs 20% of the time). Do this for each word occuring in either dict. And then use something like:
https://en.wikipedia.org/wiki/Bhattacharyya_distance
Recently I discovered the LARGE and SMALL worksheet functions, one can use for determining the first, second, third, ... larges of smalles value in an array.
At least, that's what I thought:
When having a look at the array [1, 3, 5, 7, 9] (in one column or row), the LARGE(...;2) gives 7 as expected, but:
When having a look at the array [1, 1, 5, 9, 9], I expect LARGE(...;2) to give 5 but instead I get 9.
Now this makes sense : it seems that the function LARGE(...;2) takes the largest entry in the array (value 9 on the last but one place), deletes this and gives the larges entry of the reduced array (which still contains another 9), but this is not what one might expect intuitively.
In order to get 5 from [1, 1, 5, 9, 9], I would need something like:
=LARGE_OF_UNIQUE_VALUES_OF(...;2))
I didn't find this in LARGE documentation.
Does anybody know an easy way to achieve this?
If you have the new Dynamic Array formulas:
=LARGE(UNIQUE(...),2)
If not use AGGREGATE:
=AGGREGATE(14,7,A1:A5/(MATCH(A1:A5,A1:A5)=ROW(A1:A5)),2)
This is a bit of a hack.
=LARGE(IF(YOUR_DATA=LARGE(YOUR_DATA,1),SMALL(YOUR_DATA,1)-1,YOUR_DATA),1)
The idea is to (a) take any value in your data that is equal to the largest element and set it to less than the smallest element, then (b) find the (new) largest element. It's OK if you want the 2nd largest, but extending to 3rd largest etc. gets progressively uglier.
Hope that helps
I have a program that creates a .csv file, and there is one column in the file that is giving me trouble. I have a running count of words for the file (totalWords). Here is my code that is creating the problem column:
list.append(("No. of Words", totalWords, "numeric", "total"))
However, rather than listing the individual values when the rows in the column are created, it is adding values. It should be placing a value for the word count in each line, but it is adding the values together. For example, the first line has two words, and the first row in the column has "2" as its value, so it is correct. The second line in the file has 8 words, and the second row in the column has "10" as its value, so it is adding the two together, and so on. I assume this has something to do with appending, but I am at a loss for how to go about fixing this.
Thank you for any help!
I think you need to look at what a list is. It's a mutable object, meaning it will change values without having to reassign it. Check out this example:
l = [1,2,3]
l
>>> [1, 2, 3]
l.append(4) # no assignment made
l
>>> [1, 2, 3, 4]
l = [1, 2, 3] # new assignment
l
>>> [1, 2, 3]
l.pop() # no assignment made
>>> 3
l
>>> [1, 2]
I'm a bit confused between subarray, subsequence & subset
if I have {1,2,3,4}
then
subsequence can be {1,2,4} OR {2,4} etc. So basically I can omit some elements but keep the order.
subarray would be( say subarray of size 3)
{1,2,3}
{2,3,4}
Then what would be the subset?
I'm bit confused between these 3.
Consider an array:
{1,2,3,4}
Subarray: contiguous sequence in an array i.e.
{1,2},{1,2,3}
Subsequence: Need not to be contiguous, but maintains order i.e.
{1,2,4}
Subset: Same as subsequence except it has empty set i.e.
{1,3},{}
Given an array/sequence of size n, possible
Subarray = n*(n+1)/2
Subseqeunce = (2^n) -1 (non-empty subsequences)
Subset = 2^n
In my opinion, if the given pattern is array, the so called subarray means contiguous subsequence.
For example, if given {1, 2, 3, 4}, subarray can be
{1, 2, 3}
{2, 3, 4}
etc.
While the given pattern is a sequence, subsequence contain elements whose subscripts are increasing in the original sequence.
For example, also {1, 2, 3, 4}, subsequence can be
{1, 3}
{1,4}
etc.
While the given pattern is a set, subset contain any possible combinations of original set.
For example, {1, 2, 3, 4}, subset can be
{1}
{2}
{3}
{4}
{1, 2}
{1, 3}
{1, 4}
{2, 3}
etc.
Consider these two properties in collection (array, sequence, set, etc) of elements: Order and Continuity.
Order is when you cannot switch the indices or locations of two or more elements (a collection with a single element has an irrelevant order).
Continuity is that an element must have their neighbors remain with them or be null.
A subarray has Order and Continuity.
A subsequence has Order but not Continuity.
A subset does not Order nor Continuity.
A collection with Continuity but not Order does not exist (to my knowledge)
In the context of an array, SubSequence - need not be contigious but needs to maintain the order. But SubArray is contigious and inherently maintains the order.
if you have {1,2,3,4} --- {1,3,4} is a valid SubSequence but its not a subarray.
And subset is no order and no contigious.. So you {1,3,2} is a valid sub set but not a subsequence or subarray.
{1,2} is a valid subarray, subset and subsequence.
All Subarrays are subsequences and all subsequence are subset.
But sometimes subset and subarrays and sub sequences are used interchangably and the word contigious is prefixed to make it more clear.
Per my understanding, for example, we have a list say [3,5,7,8,9]. here
subset doesn’t need to maintain order and has non-contiguous behavior. For example, [9,3] is a subset
subsequence maintain order and has non-contiguous behavior. For example, [5,8,9] is a subsequence
subarray maintains order and has contiguous behavior. For example, [8,9] is a subarray
subarray: some continuous elements in the array
subset: some elements in the collection
subsequence: in most case, some elements in the array maintaining relative order (not necessary to be continuous)
A Simple and Straightforward Explanation:
Subarray: It always should be in contiguous form.
For example, lets take an array int arr=[10,20,30,40,50];
-->Now lets see its various combinations:
subarr=[10,20] //true
subarr=[10,30] //false, because its not in contiguous form
subarr=[40,50] //true
Subsequence: which don't need to be in contiguous form but same order.
For example, lets take an array int arr=[10,20,30,40,50];
-->Now lets see its various combinations:
subseq=[10,20]; //true
subseq=[10,30]; //true
subseq=[30,20]; //false, because order isn't maintained
Subset: which mean any possible combinations.
For example, lets take an array int arr=[10,20,30,40,50];
-->Now lets see its various combinations:
subset={10,20}; //true
subset={10,30}; //true
subset={30,20}; //true
Following Are Example of Arrays
Array : 1,2,3,4,5,6,7,8,9
Sub Array : 2,3,4,5,6 >> Contagious Elements in order
Sub Sequence : 2,4,7,8 >> Elements in order by skipping any or 0 elements
Subset : 9,5,2,1 >> Elements by skipping any or 0 elements but not in order
Suppose an Array [3,4,6,7,9]
Sub Array is a continuous and ordered part of that array
example is [3,4,6],[7,9],[5]
Sub Sequence has not need to be continuous but they should be in order
example is [3,4,9],[3,7],[6]
Subset neither need to be continuous nor to be in order
Example is [9,4,7],[3,4],[5]
A subarray is a contiguous part of an array and maintains a relative ordering of elements. For an array/string of size n, there are n*(n+1)/2 non-empty subarrays/substrings.
A subsequence maintains a relative ordering of elements but may or may not be a contiguous part of an array. For a sequence of size n, we can have 2^n-1 non-empty sub-sequences in total.
A subset does not maintain a relative ordering of elements and is neither a contiguous part of an array. For a set of size n, we can have (2^n) sub-sets in total.
Let us understand it with an example.
Consider an array:
array = [1,2,3,4]
Subarray : [1,2],[1,2,3] — is continuous and maintains relative order of elements
Subsequence: [1,2,4] — is not continuous but maintains relative order of elements
Subset: [1,3,2] — is not continuous and does not maintain the relative order of elements
Some interesting observations:
Every Subarray is a Subsequence.
Every Subsequence is a Subset.