How to get descriptive statistics of all columns in python [duplicate] - python-3.x

This question already has answers here:
How do I expand the output display to see more columns of a Pandas DataFrame?
(22 answers)
Closed 3 years ago.
I have a dataset with 200000 rows and 201 columns. I want to have descriptive statistics of all the variables.
I tried:
'''train.describe()'''
But this is only giving the output for the first and last 8 variables. This there any method I can use to get the statistics for all of the columns.

probably, some of your columns where in some type other than numerical. Try train.apply(pd.to_numeric) then train.describe()

Related

Date to Weeknum in Access/Excel/VBA [duplicate]

This question already has answers here:
MS Access query, how to use SQL to group single dates into weeks
(2 answers)
VBA Convert date to week number
(8 answers)
Closed 2 years ago.
I have an access-file, that I imported in excel as a PivotTable. I want the possibility to sort the data in a slicer as Week numbers. I have grouped and set days to 7, but that only gives me 2020-01-01 - 2020-01-07 for example.
Should I convert the weeknum's already in Access? And then, how do I do that?
Please explain it all, even where to paste the code and how to implement it in Access.
Thank you.

How to calculate the number of rows of a dataframe efficiently? [duplicate]

This question already has answers here:
Count on Spark Dataframe is extremely slow
(2 answers)
Getting the count of records in a data frame quickly
(2 answers)
Closed 3 years ago.
I have a very large pyspark dataframe and I would calculate the number of row, but count() method is too slow. Is there any other faster method?
If you don't mind getting an approximate count, you could try sampling the dataset first and then scaling by your sampling factor:
>>> df = spark.range(10)
>>> df.sample(0.5).count()
4
In this case, you would scale the count() results by 2 (or 1/0.5). Obviously, there is an statistical error with this approach.

Pandas: I want to multiply two columns with 19 million rows, but system runs out of memory (Memory Error) [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to multiply two columns with 19 million rows and add it to a new column.
So for example, I have a column col_X and a column col_Y, with 19 millions records. And col_X has values of type 'float' and col_Y has values of type 'numpy.float64'. I want to multiply them and add the values to a new column New_col.The code I am using for mulitplication is:
df['New_col']=df['col_X']*df['col_Y']
This worked well when I was working with 10 million records. But now with 19 million, I am facing the following error:
Memory Error: (lambda x: op(x, rvalues)) MemoryError)
I am thinking of multiplying these two columns in two parts (i.e. multiply initial 10 million records first and then multiply the next 9 million records after that, and then later join the two series and add it to a new column), but I don't know how to go about implementing this. Is there any other solution?
I am new to Python and would really appreciate your help.

Using median instead of mean as aggregation function in Spark [duplicate]

This question already has answers here:
How to find median and quantiles using Spark
(8 answers)
Closed 5 years ago.
Say I have a dataframe that contains cars, their brand and their price. I would like to replace the avg below by median (or another percentile):
df.groupby('carBrand').agg(F.avg('carPrice').alias('avgPrice'))
However, it seems that there is no aggregation function that allows to compute this in Spark.
You can try the approxQuantile function (see http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions)

Make columns fill the full width of QTableWidget [duplicate]

This question already has answers here:
Pyqt: How to maximize the column width in a tableview?
(2 answers)
Closed 7 years ago.
I'm having an instance of QTablewidget with some columns a few rows and some data in it.
Now I am searching for a way to make the columns fill the full width of the tablewidget. Also, I want all the columns to have the same width.
Is there a way to achieve this?
This code worked fine for me:
self.tablewidget_preview.horizontalHeader().setStretchLastSection(True) self.tablewidget_preview.horizontalHeader().setSectionResizeMode(QHeaderView.Stretch)

Resources