CSV append, adding values - python-3.x

I have a program that creates a .csv file, and there is one column in the file that is giving me trouble. I have a running count of words for the file (totalWords). Here is my code that is creating the problem column:
list.append(("No. of Words", totalWords, "numeric", "total"))
However, rather than listing the individual values when the rows in the column are created, it is adding values. It should be placing a value for the word count in each line, but it is adding the values together. For example, the first line has two words, and the first row in the column has "2" as its value, so it is correct. The second line in the file has 8 words, and the second row in the column has "10" as its value, so it is adding the two together, and so on. I assume this has something to do with appending, but I am at a loss for how to go about fixing this.
Thank you for any help!

I think you need to look at what a list is. It's a mutable object, meaning it will change values without having to reassign it. Check out this example:
l = [1,2,3]
l
>>> [1, 2, 3]
l.append(4) # no assignment made
l
>>> [1, 2, 3, 4]
l = [1, 2, 3] # new assignment
l
>>> [1, 2, 3]
l.pop() # no assignment made
>>> 3
l
>>> [1, 2]

Related

Merge 2D list if duplicate for multiple value

[[20, 2, 2, 5], [20, 2, 2, 5], [40, 2, 2, 2, 5]]
So, I know dictionary can be used in some of cases. But how will I do it?
I want to add the first element to the inside list if two element is complete similar
So the solution will be:
[[40,2,2,5],[40,2,2,2,5]]
One way is to add each row to dictionary (in the key part). The dictionary will remove all the duplicates.
Another way is to add all the rows to a set. And then convert it back to a list . Duplicates will be gone as set removes duplicates.
Creating a empty list, then using if element not in list: list.append(element).

How to get the second largest value in a column

Recently I discovered the LARGE and SMALL worksheet functions, one can use for determining the first, second, third, ... larges of smalles value in an array.
At least, that's what I thought:
When having a look at the array [1, 3, 5, 7, 9] (in one column or row), the LARGE(...;2) gives 7 as expected, but:
When having a look at the array [1, 1, 5, 9, 9], I expect LARGE(...;2) to give 5 but instead I get 9.
Now this makes sense : it seems that the function LARGE(...;2) takes the largest entry in the array (value 9 on the last but one place), deletes this and gives the larges entry of the reduced array (which still contains another 9), but this is not what one might expect intuitively.
In order to get 5 from [1, 1, 5, 9, 9], I would need something like:
=LARGE_OF_UNIQUE_VALUES_OF(...;2))
I didn't find this in LARGE documentation.
Does anybody know an easy way to achieve this?
If you have the new Dynamic Array formulas:
=LARGE(UNIQUE(...),2)
If not use AGGREGATE:
=AGGREGATE(14,7,A1:A5/(MATCH(A1:A5,A1:A5)=ROW(A1:A5)),2)
This is a bit of a hack.
=LARGE(IF(YOUR_DATA=LARGE(YOUR_DATA,1),SMALL(YOUR_DATA,1)-1,YOUR_DATA),1)
The idea is to (a) take any value in your data that is equal to the largest element and set it to less than the smallest element, then (b) find the (new) largest element. It's OK if you want the 2nd largest, but extending to 3rd largest etc. gets progressively uglier.
Hope that helps

Interview question about "largest range" makes no sense

Here's the question. I'm actually dumbfounded. I don't even get the question. What are they on about?
What even is a largest range? What do they mean by largest? What's a range? They say a range is a collection of numbers that come right after each other in the set of real integers. Okay, so 1, 2, 3, 4, stuff like that, right? But then they say the numbers need not be ordered or even adjacent.... but then they're not coming right after each other!! They are contradicting their own previous statement. Now I have no idea what a range is.
Their example doesn't help either. Why is [0, 15, 5, 2, 4, 10, 7] the largest range in that vector?
What is going on?
It's not very clear in the question, but I'm pretty sure the interviewer means a "range" is a set of consecutive numbers (n, n+1).
The range [0,7] is actually [0,1,2,3,4,5,6,7] since all of those appear in the full set.
The actual order doesn't matter.
In the example you were given in the interview, which you list in your question as well, the input array is: [1, 11, 3, 0, 15, 5, 2, 4, 10, 7, 12, 6]. The reason that the "largest range" is identified as [0, 7] is because all the numbers between 0 and 7 are included in that array.
There isn't another range in the input array that has a longer range than 0 to 7. For instance, there is a [10, 12] range in the input array, but that array has a length of 3 that is smaller than the length of [0, 7] range, which is 8.
In this case, the range is understood as a continuous list of integers, the largest range is the list with the most number of integers.
It means
Find the largest continuous range of numbers
For eg. in array [0,1,2,5,6,7,8,9,10]
There are 2 continuous list
[0,1,2] and [5,6,7,8,9,10] but as the larger range is the second one. so the output must be [5,10].
i.e. The largest and smallest of the largest range.

Astropy get table length

How can I get the length (i.e. number of rows) of an astropy Table? From the documentation, there are serveral ways of having the table length printed out, such as t.info(). However, I can't use this information in a script.
How do I assign the length of a table to a variable?
In Python the len() built-in function typically gives the length/size of some collection-like object. For example the length of a 1-D array is given like:
>>> a = [1, 2, 3]
>>> len(a)
3
For a table you could ask what the "size" of a table means--the number of rows? The number of columns? The total number of items in the table? But it sounds like you want the number of rows. In Python, this will almost always be given by len() on table-like objects as well (arguably anything that does otherwise is a mistake). You can consider this by analogy to how you might construct a table-like data structure with simple Python lists, by nesting them:
>>> t = [
... [1, 2, 3],
... [4, 5, 6],
... [7, 8, 9]
... ]
Here each "row" is represented by a single list nested in outer lists, so len(t) gives th number of rows. In fact this is just a convention and can be broken if need-be. For example you could also treat the above t as list of columns for some column-oriented data.
But in Python we typically assume 2-dimensional arrays to be row-oriented unless otherwise stated--to remember you can see that the syntax for a nested list as I wrote above looks row-oriented.
The logic extends to Numpy arrays and other more complicated data structures built on them such as Astropy's Table or Pandas DataFrames.

Removing brackets from a DataFrame column when exporting to CSV

I have a column with values like this:
columnA
[12,4352,545]
[123123,5436,665]
[234,646,5747]
And when I write the DataFrame containing this column to a CSV, I want to remove the brackets around each array in the column. I've tried str.replace and str.strip, but the braces are never removed. I've also tried converting them all to tuples and then removing the parentheses instead, to no avail.
Try, if your values are list instead of strings:
df['colA'].astype(str).str.strip('[|]')
MVCE:
df = pd.DataFrame({'colA':[[1,2],[3,4]]})
df
Output:
colA
0 [1, 2]
1 [3, 4]
Convert list to string and strip characters.
df['colA'].astype(str).str.strip('[|]')
Output:
0 1, 2
1 3, 4
Name: colA, dtype: object
I'd recommend a different delimiter than comma. You can use whatever you want though.
ScottBoston's Setup
df = pd.DataFrame({'colA':[[1,2],[3,4]]})
applymap
# The Delimiter ▼
df.assign(colA=df.colA.map(lambda x: '|'.join(map(str, x))))
colA
0 1|2
1 3|4

Resources