Astropy get table length - python-3.x

How can I get the length (i.e. number of rows) of an astropy Table? From the documentation, there are serveral ways of having the table length printed out, such as t.info(). However, I can't use this information in a script.
How do I assign the length of a table to a variable?

In Python the len() built-in function typically gives the length/size of some collection-like object. For example the length of a 1-D array is given like:
>>> a = [1, 2, 3]
>>> len(a)
3
For a table you could ask what the "size" of a table means--the number of rows? The number of columns? The total number of items in the table? But it sounds like you want the number of rows. In Python, this will almost always be given by len() on table-like objects as well (arguably anything that does otherwise is a mistake). You can consider this by analogy to how you might construct a table-like data structure with simple Python lists, by nesting them:
>>> t = [
... [1, 2, 3],
... [4, 5, 6],
... [7, 8, 9]
... ]
Here each "row" is represented by a single list nested in outer lists, so len(t) gives th number of rows. In fact this is just a convention and can be broken if need-be. For example you could also treat the above t as list of columns for some column-oriented data.
But in Python we typically assume 2-dimensional arrays to be row-oriented unless otherwise stated--to remember you can see that the syntax for a nested list as I wrote above looks row-oriented.
The logic extends to Numpy arrays and other more complicated data structures built on them such as Astropy's Table or Pandas DataFrames.

Related

Comparing the word-counts of two files, accounting for the number of occurrences

I'm currently working on a program which is supposed to find exploits for vulnerabilities in web-applications by looking at the "Document Object Model" (DOM) of the application.
One approach for narrowing down the number of possible DB-entries follows the strategy of further filtering the entries by comparing the word-count of the DOM and the database entry.
I already have two dicts (actually Dataframes, but showing dict here for better presentation), each containing the top 10 words in descending order of their numbers of ocurrences in the text.
word_count_dom = {"Peter": 10, "is": 6, "eating": 2, ...}
word_count_db = {"eating": 6, "is": 6, "Peter": 1, "breakfast": 1, ...}
Now i would like to calculate some kind of value, which represents how similar the two dicts are while accounting for the number of occurences.
Currently im using:
len([value in word_count_db for value in word_count_dom])
>>> 3
but this does not account for the number of occurrences at all.
Looking at the example i would like the program to give more value to the "is"-match, because of the generally better "Ranking-Position to Number of Occurences"-value.
Just an idea:
Compute for each dict the relative probability of each entry to occur (e.g. among all the top counts "Peter" occurs 20% of the time). Do this for each word occuring in either dict. And then use something like:
https://en.wikipedia.org/wiki/Bhattacharyya_distance

Calculate dot product between two dictionaries millions of times

I have two dictionaries d1 = {'string1': number1, ..., 'string5 000 000': number5000000} which does not change and many small dictionaries d_i = {'str1': num1, ..., 'str50': num50} (i = 2, 3, ..., a few million). I want to do a dot product between these dictionaries i.e. for every key in dictionary d_i that exists also in d_1 I would like their numbers multiplied and then added to the sum.
The problem is that first dictionary is extremely big and there are millions of small dictionaries.
How do I do that fast? Can I use some big data techniques for that?
You can put your data to pandas dataframe and then do dot product between series in dataframe. It can be faster but in you case I would measure how much time it takes in case of python implementation and pandas.

How to get the second largest value in a column

Recently I discovered the LARGE and SMALL worksheet functions, one can use for determining the first, second, third, ... larges of smalles value in an array.
At least, that's what I thought:
When having a look at the array [1, 3, 5, 7, 9] (in one column or row), the LARGE(...;2) gives 7 as expected, but:
When having a look at the array [1, 1, 5, 9, 9], I expect LARGE(...;2) to give 5 but instead I get 9.
Now this makes sense : it seems that the function LARGE(...;2) takes the largest entry in the array (value 9 on the last but one place), deletes this and gives the larges entry of the reduced array (which still contains another 9), but this is not what one might expect intuitively.
In order to get 5 from [1, 1, 5, 9, 9], I would need something like:
=LARGE_OF_UNIQUE_VALUES_OF(...;2))
I didn't find this in LARGE documentation.
Does anybody know an easy way to achieve this?
If you have the new Dynamic Array formulas:
=LARGE(UNIQUE(...),2)
If not use AGGREGATE:
=AGGREGATE(14,7,A1:A5/(MATCH(A1:A5,A1:A5)=ROW(A1:A5)),2)
This is a bit of a hack.
=LARGE(IF(YOUR_DATA=LARGE(YOUR_DATA,1),SMALL(YOUR_DATA,1)-1,YOUR_DATA),1)
The idea is to (a) take any value in your data that is equal to the largest element and set it to less than the smallest element, then (b) find the (new) largest element. It's OK if you want the 2nd largest, but extending to 3rd largest etc. gets progressively uglier.
Hope that helps

CSV append, adding values

I have a program that creates a .csv file, and there is one column in the file that is giving me trouble. I have a running count of words for the file (totalWords). Here is my code that is creating the problem column:
list.append(("No. of Words", totalWords, "numeric", "total"))
However, rather than listing the individual values when the rows in the column are created, it is adding values. It should be placing a value for the word count in each line, but it is adding the values together. For example, the first line has two words, and the first row in the column has "2" as its value, so it is correct. The second line in the file has 8 words, and the second row in the column has "10" as its value, so it is adding the two together, and so on. I assume this has something to do with appending, but I am at a loss for how to go about fixing this.
Thank you for any help!
I think you need to look at what a list is. It's a mutable object, meaning it will change values without having to reassign it. Check out this example:
l = [1,2,3]
l
>>> [1, 2, 3]
l.append(4) # no assignment made
l
>>> [1, 2, 3, 4]
l = [1, 2, 3] # new assignment
l
>>> [1, 2, 3]
l.pop() # no assignment made
>>> 3
l
>>> [1, 2]

Difference between subarray, subset & subsequence

I'm a bit confused between subarray, subsequence & subset
if I have {1,2,3,4}
then
subsequence can be {1,2,4} OR {2,4} etc. So basically I can omit some elements but keep the order.
subarray would be( say subarray of size 3)
{1,2,3}
{2,3,4}
Then what would be the subset?
I'm bit confused between these 3.
Consider an array:
{1,2,3,4}
Subarray: contiguous sequence in an array i.e.
{1,2},{1,2,3}
Subsequence: Need not to be contiguous, but maintains order i.e.
{1,2,4}
Subset: Same as subsequence except it has empty set i.e.
{1,3},{}
Given an array/sequence of size n, possible
Subarray = n*(n+1)/2
Subseqeunce = (2^n) -1 (non-empty subsequences)
Subset = 2^n
In my opinion, if the given pattern is array, the so called subarray means contiguous subsequence.
For example, if given {1, 2, 3, 4}, subarray can be
{1, 2, 3}
{2, 3, 4}
etc.
While the given pattern is a sequence, subsequence contain elements whose subscripts are increasing in the original sequence.
For example, also {1, 2, 3, 4}, subsequence can be
{1, 3}
{1,4}
etc.
While the given pattern is a set, subset contain any possible combinations of original set.
For example, {1, 2, 3, 4}, subset can be
{1}
{2}
{3}
{4}
{1, 2}
{1, 3}
{1, 4}
{2, 3}
etc.
Consider these two properties in collection (array, sequence, set, etc) of elements: Order and Continuity.
Order is when you cannot switch the indices or locations of two or more elements (a collection with a single element has an irrelevant order).
Continuity is that an element must have their neighbors remain with them or be null.
A subarray has Order and Continuity.
A subsequence has Order but not Continuity.
A subset does not Order nor Continuity.
A collection with Continuity but not Order does not exist (to my knowledge)
In the context of an array, SubSequence - need not be contigious but needs to maintain the order. But SubArray is contigious and inherently maintains the order.
if you have {1,2,3,4} --- {1,3,4} is a valid SubSequence but its not a subarray.
And subset is no order and no contigious.. So you {1,3,2} is a valid sub set but not a subsequence or subarray.
{1,2} is a valid subarray, subset and subsequence.
All Subarrays are subsequences and all subsequence are subset.
But sometimes subset and subarrays and sub sequences are used interchangably and the word contigious is prefixed to make it more clear.
Per my understanding, for example, we have a list say [3,5,7,8,9]. here
subset doesn’t need to maintain order and has non-contiguous behavior. For example, [9,3] is a subset
subsequence maintain order and has non-contiguous behavior. For example, [5,8,9] is a subsequence
subarray maintains order and has contiguous behavior. For example, [8,9] is a subarray
subarray: some continuous elements in the array
subset: some elements in the collection
subsequence: in most case, some elements in the array maintaining relative order (not necessary to be continuous)
A Simple and Straightforward Explanation:
Subarray: It always should be in contiguous form.
For example, lets take an array int arr=[10,20,30,40,50];
-->Now lets see its various combinations:
subarr=[10,20] //true
subarr=[10,30] //false, because its not in contiguous form
subarr=[40,50] //true
Subsequence: which don't need to be in contiguous form but same order.
For example, lets take an array int arr=[10,20,30,40,50];
-->Now lets see its various combinations:
subseq=[10,20]; //true
subseq=[10,30]; //true
subseq=[30,20]; //false, because order isn't maintained
Subset: which mean any possible combinations.
For example, lets take an array int arr=[10,20,30,40,50];
-->Now lets see its various combinations:
subset={10,20}; //true
subset={10,30}; //true
subset={30,20}; //true
Following Are Example of Arrays
Array : 1,2,3,4,5,6,7,8,9
Sub Array : 2,3,4,5,6 >> Contagious Elements in order
Sub Sequence : 2,4,7,8 >> Elements in order by skipping any or 0 elements
Subset : 9,5,2,1 >> Elements by skipping any or 0 elements but not in order
Suppose an Array [3,4,6,7,9]
Sub Array is a continuous and ordered part of that array
example is [3,4,6],[7,9],[5]
Sub Sequence has not need to be continuous but they should be in order
example is [3,4,9],[3,7],[6]
Subset neither need to be continuous nor to be in order
Example is [9,4,7],[3,4],[5]
A subarray is a contiguous part of an array and maintains a relative ordering of elements. For an array/string of size n, there are n*(n+1)/2 non-empty subarrays/substrings.
A subsequence maintains a relative ordering of elements but may or may not be a contiguous part of an array. For a sequence of size n, we can have 2^n-1 non-empty sub-sequences in total.
A subset does not maintain a relative ordering of elements and is neither a contiguous part of an array. For a set of size n, we can have (2^n) sub-sets in total.
Let us understand it with an example.
Consider an array:
array = [1,2,3,4]
Subarray : [1,2],[1,2,3] — is continuous and maintains relative order of elements
Subsequence: [1,2,4] — is not continuous but maintains relative order of elements
Subset: [1,3,2] — is not continuous and does not maintain the relative order of elements
Some interesting observations:
Every Subarray is a Subsequence.
Every Subsequence is a Subset.

Resources