I wonder if there is a formula in Google Docs Spreadsheet which could identify and display (for example in column D) the most frequent (key)words in a spreadsheet?
Let's say that I have a column (Column B) full of tweets (see example image) and I would like to find top keywords in the column B and display them in column D. Is there a way to do that?
Thank you!
To return the top 10 individual words in column B, with their frequency, try:
=ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ";B3:B);" ")&{"";""});"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 10 label Col1 'Word', count(Col2) 'Frequency'";0))
Related
I have a database table with a set of category names in column A. think of it as category A, B, C, D, etc. then in column b I have numbers for each category. for a specific category, say category A, the related numbers are not unique and may occur in different frequencies.
now I have another table, a summary one and I want a formula to count unique numbers for each category from the database table and return it in front of the category name.
I use excel 2010
for some reasons I do not want to use pivot table or macros
data is as below
A 10
A 10
A 20
A 15
B 25
B 25
B 25
B 30
B empty(blank)
the desired results should be as this
A 3
B 2
thanks for help
So an example of sumif() and countif().
But I am not sure what you really want as there are 4 categories of A, and 4 of B - unless some numbers must be ignored.
However, see:
Note, if the columns A & B only have the relevant information you can define the range as A:A, but if the range has to be limited then you need to use A1:A17.
Edit try:
The best way of doing these is to use Frequency:
=SUM(--(FREQUENCY(IF(A$2:A$20=D2,B$2:B$20),B$2:B$20)>0))
entered as an array formula using CtrlShiftEnter
Plz search on 'Count unique with a condition' for more information.
I have 2 excel files. 1 is a workfile in which I work, the other is the output of a database. See pic 1 for my database output (simplified).
What we see here:
The purchase order numer in column A
The row in the database in column B
The status of the row in the database in column C
The classification in column D, where W means a product we want to measure and P meaning delivery costs, administration costs etc (we don't want to measure this)
The number of items ordered and the number of items delivered in column E
The company name and product info in column F
Now, what I want, is something like this:
I want this table to be filled automatically based on the database output. It works for column B, but I'm stuck on column C, D and E.
What I want from you!
I need help with column C, D and E.
Number of rows: it needs to calculate the rows only with W in column D. So for item 4410027708 it has to be 2 (only 2 rows with W) and for item 4410027709 it should be 1.
Items ordered: it needs to add-up all the values that are directly to the right of the W in column D. So, for 4410027708, it needs to add up 3 and 5. It must ignore all the rows with P!
Items to be delivered: You may already guess this, but it needs to add up all the values in column E that are on the same row as column C with To be delivered, but only for the W rows (not the P versions). So, for item 4410027708 this should be
I suggest easy if ColumnA can be filled down first (including for the last entry) then assuming the database output sheet is called Sheet1, in:
C2: =COUNTIFS(Sheet1!A:A,A2,Sheet1!D:D,"W")
D2: =SUMIFS(Sheet1!E:E,Sheet1!A:A,A2,Sheet1!D:D,"W")
E2: =SUMIFS(Sheet1!E:E,Sheet1!A:A,A2,Sheet1!C:C,"To be delivered")
copied down to suit.
I'm wondering if anybody knows how to sort 3 columns of data in a cross table, by the highest value for the 1st column. For example:
column 1 = project # identifier
column 2 = project customer name
column 3 = project customer location
value = project £ value
I need to show all 3 columns - but the first line should be the highest value project value, descending below. I understand that I could concatenate the 3 columns - but visually, I ma trying to maintain each column separately.
Any ideas very gratefully appreciated!
Thanks.
In a cross table you can just click on the column header to sort it. The columns sort independently of each other though, so you can't sort by Column A, then B, then C like you can in Excel and maintain dependence.
Hoping someone can help me here. :)
I have two columns of data in Worksheet 1:
COLUMN A = NAME (EG. TOM)
COLUMN C = TYPE OF QUERY (FAX, TEL, EMAIL, MAIL)
I would like to have in Worksheet 2:
COLUMN A = NAME (EG TOM)
COLUMN B = A COUNT OF HOW MANY FAXES TOM HAS
COLUNN C = A COUNT OF HOW MANY TELEPHONES TOM HAS
COLUMN D - A COUNT OF HOW MANY EMAILS TOM HAS
COLUMN E = A COUNT OF HOW MANY MAILS TOM HAS
If anyone can help me that would be great.
Thanks guys
You can use a pivot table. In sheet 1, click into the data table, then click Insert > Pivot table.
Drag the Name field to the rows. Drag the query type field to the columns.
Drag the Namie field again, this time to the Values area, where it will turn into a count.
Now you see a count of query types for each name in a matrix.
Use countifs instead if you really want to use formula. A pivot table would be the best way to go though.
eg for column B, row 1 on sheet 2:
=COUNTIFS(Sheet1!A:A, A1, Sheet1!C:C, "FAX")
I have a spreadsheet with four columns:
question_id user_id unique_question_ids # of unique_user_ids
X 11 X ? (=2)
X 12 Y ? (=3)
X 12
X 12
Y 13
Y 14
Y 15
The first two columns are questions and their corresponding users and include repeats of both.
The objective is this: I want to count the number of unique users for each unique question.
I started with first finding the unique_questions which I found using the UNIQUE function. This listed what questions are unique in the unique_question_ids column (i.e. X, Y)
Now I want to count the number of unique users that each unique question has? The other problem is that I do not know where X and Y start, however they are still sorted in order (real spreadsheet is very large).
How would I go about doing this? I am thinking I could use COUNTIFS but this doesn't count for unique values. I also thinking of using a function that would return the range of where X or Y is located in the question_id column, and then count in the next column (i.e. user_id) for unique values. But I cannot find a function that returns the cell range of a value in a column. I am also doing this on Google Spreadsheets.
Any thoughts or ideas would be appreciated, thanks
Assuming that your data starts at cell A1, you can use this formula in cell C1:
=ARRAYFORMULA(QUERY(UNIQUE(A2:B8),"SELECT Col1, COUNT(Col2) GROUP BY Col1 LABEL Col1 'unique_question', COUNT(Col2) 'unique_users'",-1))
It's basically an SQL query using the unique values from unique(A2:B8) and counting the values from the second column based on the first column values.
Google Spreadsheet sample
Some explanation:
The Unique table data after passing through UNIQUE() is like this:
question_id user_id
X 11
X 12
Y 13
Y 14
Y 15
The SQL language (specific to GoogleSpreadsheet) is like this:
SELECT -- From the data,
Col1, -- select column 1 (unique question_id)
COUNT(Col2) -- select the count of column 2 (unique user_id)
GROUP BY
Col1 -- group by the first column *
LABEL
Col1 'unique_question', -- label the first column as 'unique_question'
COUNT(Col2) 'unique_users' -- label the second column as 'unique_users'
When you apply an aggregate function to a column, e.g. COUNT() is an aggregate function, you have to use GROUP BY on the other columns to decide what should happen to them.
For instance, if you use this on the above data:
SELECT
COUNT(Col2)
You will end up with 5 (this is one row), because it is counting all the rows in the table data. If you try:
SELECT
Col1,
COUNT(Col2)
You will end up with 5 rows for Col1 and 1 row for the function result, which is not allowed. So, you need GROUP BY to tell that all similar values in Col1 should occupy one row, so that you have one row for X and one for Y and then, the count will adapt to this grouping by counting all the X separately from the Y.