Create single list/column from multiple tables in Excel - excel-formula

I want to combine the values in two tables into a single list - what function does this?

Courtesy chris-neilsen:
=FILTERXML("<t><s>"&TEXTJOIN("</s><s>",TRUE,B2:C3)&"</s><s>"&TEXTJOIN("</s><s>",TRUE,E2:F3)&"</s></t>","//s[not(preceding::*=.)]")
Note: this addresses the original [pre-revision] Q which stipulated unique values should be output, to reproduce all values incl. duplicates this function becomes: FILTERXML("<t><s>"&TEXTJOIN("</s><s>",TRUE,B2:C3)&"</s><s>"&TEXTJOIN("</s><s>",TRUE,E2:F3)&"</s></t>","//s")

Related

In Excel, How to Exclude a list of Values in COUNTIF

This is giving me a huge number
=SUM(COUNTIF(A3:A777,{"<>*United Kingdom";"<>*France";"<>*United states";"<>*Germany";"<>*Switzerland";;"<>Estonia";"?"}))
Also, is there a more efficiƫnter way, If I want to change or add values the list in the future
Best to switch to an MMULT construction when dealing with multiple 'not-equal-to' conditions:
=SUM(N(MMULT(N(ISNUMBER(SEARCH(TRANSPOSE(CountryList),A3:A777))),SEQUENCE(COUNTA(CountryList),,,0))=0))
where CountryList is a vertical range which comprises the list of values to exclude.

How can I filter all values in a CSV spreadsheet that don't match one of hundreds of values from another spreadsheet?

I'm using Google Sheets and Google Collab together and trying to clean up the data I've downloaded as a CSV file. The problem I'm facing is that I want to filter out all results that don't match one of 100+ values one could have as group names which I've grabbed from another spreadsheet and currently have stored in an array. I think there are one or two other filters I'll want to apply, but the others only have four or five possible values in comparison.
I succeeded using Pandas isin()
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html
Specifically I used something like the example here:
titanic[titanic["Pclass"].isin([2, 3])]
I understand isin() provided you with booleans telling you whether an index is in the array. Using it like above turns it into a filter of sorts only keeping the items that match the items in the array.
https://pandas.pydata.org/docs/reference/api/pandas.Series.isin.html

Generate a multicolumn table using docxtpl

I have a series of data (in 2-dimensional list 'CombinedTable') I need to use to populate a table in an MS Word template. The table has 7 columns so I attempted the following using docxtpl module:
context = {
'tpl_modules1': CombinedTable[0]
'tpl_modules2': CombinedTable[2]
'tpl_modules3': CombinedTable[4]
'tpl_modules4': CombinedTable[6]
'tpl_modules5': CombinedTable[8]
'tpl_modules6': CombinedTable[10]
'tpl_modules7': CombinedTable[12]
}
tpl.render(context)
tpl.save(FilePath + FileName)
Not the most elegant solution I know but am just trying to get this working- unfortunately using this code with the following template results in tpl_modules7 data being written in to all columns, rather than just the 7th.
Does anyone have advice for how to resolve this? I attempted to create a for loop through the columns as well as rows but was unsuccessful in writing anything to the doc (was saved as a blank & empty doc).
The CombinedTable variable is a list of 12 lists (one for each column in template, although only 7 contain data). Each of these 12 lists contains another list with cell data whose length is equal to the number of rows to be written to the table in that column. This means that the number of rows that are written to varies for each column.
EDIT: Looking more closely at the docs, it states that I cannot use %tr multiple times in the same row. I assume I will then have to use a loop through %tc and %tr (which I tried & couldn't get working). Any advice on how to implement this? Especially on the side of the word document. Thanks!
I was able to resolve this satisfactorily for my requirements, however my solution may not suit all. I simply set up 7 different tables in a document with 7 columns and adjusted margins/borders to suit the dimensions I required for the tables. Each of the 7 tables had identical docxtpl syntax as image in my question with the small buffer columns between them being replaced by columns in the word document.

Separating values that are combined in one string

I would like to solve this either in Excel or in SPSS:
I have categorical data (each number representing a medical diagnosis) that are combined into single cells. In other words, a row (patient) has multiple diagnoses. However, I would like to know the frequencies of each diagnosis. What is the best way to go about this? (See picture for reference)
For SPSS:
First just creating some sample data to demonstrate on:
data list free/e_cerv_dis_state (a20).
begin data
"{1/2/3/6}" "{1/2/4}" "{2/4/5}" "{1/5/6}" "{4}" "{4/5/6}" "{1/2/3/4/5/6}"
end data.
Now the following code will create a separate variable for each possible diagnosis, and will put a 1 in it if the diagnosis exists in the original variable.
do repeat vr=diag1 to diag9/vl=1 to 9.
compute vr=char.index(e_cerv_dis_state, string(vl, f1) ) > 0.
end repeat.
freq diag1 to diag6.
Note this will only work for up to 9 diagnoses. If you have more than that the solution will have to be adapted to multiple digits.
Assuming that the number of columns is fairly regular, I would suggest using text to columns, and then using COUNTIF on the cells if they are the value wanted. However there is a more robust and reproducible solution that would involve using SQL. If you download the free version of SQL Express here: https://www.microsoft.com/en-gb/sql-server/sql-server-downloads
Then you can import your table of data, here's how to do that: How to import an Excel file into SQL Server?
Then you could use the more friendly SQL database to get the answers you want. For example you can use a select statement that would say:
SELECT count(e_cerv_dis_state)
WHERE e_cerv_dis_state = '6'
It would also be possible to use a CASE WHEN statement to add-in the names of the diagnoses.

Combine two data ranges into one range (Google Drive Excel)

Hi there I am looking to combine two data ranges/arrays into one in order to feed them into excel FREQUENCY function.
Example:
First data range - B5:F50
Second data range - J5:N50
Bins data range - I5:I16
Function definition - FREQUENCY(data_array; bins_array)
Basically I am lazy and I don't want to reshuffle my excel script to spit out both datasets side by side so that I can reference them using something like B5:K50 range. Is there any way I can combine both datasets into data_array using some kind of formula? Maybe to end up with something along the line of =FREQUENCY((B5:F50,J5:N50); I5:I16) ?
BTW: Either of
=FREQUENCY(B5:F50; I5:I16)
=FREQUENCY(J5:N50; I5:I16)
work just file on their own for me.
Update
Actual formula definition FREQUENCY(data, classes)
2013 MS Excel (unrelated)
In MS Excel FREQUENCY function accepts a "union" as the first argument, i.e. a list of references separated by commas and enclosed in parentheses e.g.
=FREQUENCY((B5:F50,J5:N50),I5:I16)
Note: the "bins array" can also be a union if required
In "Google sheets" I don't think the same thing is possible - there may be a clever workaround, but I'm not aware of it
The Using Arrays page has some details that worked for me:
https://support.google.com/docs/answer/6208276?hl=en
It says:
"You can join multiple ranges into one continuous range using this same punctuation, which works the same way as VMERGE. For example, to combine values from A1-A10 with the values from D1-D10, you can use the following formula to create a range in a continuous column: ={A1:A10; D1:D10}"
I have two named ranges, so I was able to use {namedRange1;namedRange2} and it gave me one continuous range.

Resources