How to align multiline values in AsciiDoc table? - tabular

I would like to dynamically generate a table with asciidoc, which could look like this :
--------------------------------------
|Text | Parameter | Value1 | Value2 |
--------------------------------------
|foo | param1 | val1 | val2 |
--------------------------------------
|bar | param2 | val3 | val4 |
| | param3 | value_ | val6 |
| | | multi_ | |
| | | 5 | |
| | param4 | val7 | val8 |
--------------------------------------
| baz | param5 | val9 | val10 |
--------------------------------------
That is, there might be multiple parameters to one text, and their
values might span multiple lines. I am looking for a way to automatically
align these. I have a program that gathers data which changes, so I can
not manually fix things.
What I currently do: I have frame and gridless nested tables in the
Parameter, Value1 and Value2 columns. The problem with this is they only align if each value does not span multiple lines.
I also tried making Parameter, Value1 and Value2 a nested table together, with grid but no frame.
It works in terms of alignment, but doesn't look very good because the grid lines do not touch the gridlines of the outer table. Adding a frame also looks dull since it emphasizes multiparameter entries.
What I really want to do is add an extra line to the outer table (no table nesting) with no horizontal line in between, if there is an extra parameter.
I can not see how to do this with AsciiDoc. Is that possible at all? Any other suggestions on how to solve this?

It turns out this is rather easy with spans (see chapter 23.5):
.Multiline values alined with spans
[cols=",,,",width="60%", options="header"]
|================
|Text | Parameter | Value1 | Value2
|foo | param1 | val1 | val2
.3+<.<|foo .3+<.<|bar | val3 | val4
| razzle bla fasel foo bar | dazzle
|bli | bla
|foo2 | param3 | val5 | val6
|================
Now all I need to do is tell my templating system (jinja2) how much rows I need to span, but that is rather a diligent but routine piece of work.

If you're using asciidoctor, there are many other options for tables including putting columns on new lines and using the metadata for the table to specify how many columns the table contains. This is the recommended way of doing tables in Asciidoctor. You can see this example and many others in the user's guide. To give an example here on SO:
[cols="2*"]
|===
|Cell in column 1, row 1
|Cell in column 2, row 1
|Cell in column 1, row 2
|Cell in column 2, row 2
|===
Asciidoctor can be a drop in replacement for the asciidoc command, though you will want to look at differences between the two.

Related

Join on inequality in Power Query

I have been trying to answer this question
With the following data
+---------+---------+-----------+---------+
| Column1 | Column2 | Column3 | Column4 |
+---------+---------+-----------+---------+
| 1 | happy | 1-veggies | GHF |
| 1 | sad | 1-veggies | HGF |
| 2 | angry | 1-veggies | GHG |
| 2 | sad | 1-veggies | FGH |
| 3 | sad | 1-veggies | HGF |
| 4 | moody | 2-meat | FFF |
| 4 | sad | 2-meat | HGF |
| 5 | excited | 2-meat | HGF |
+---------+---------+-----------+---------+
OP was asking for a way of finding how many records there were which matched 'sad' and '1-veggies', and also had another record with the same value in column 1 and a code of GHF or FGH in column 4. The first two rows qualify, but the fourth row does not qualify because (if I understand correctly) it has the correct code, but in the same record as the one matching 'sad' and '1-veggies'. The count should be one.
I think the answer would have been fairly standard if this had been a SQL question - you would do a self-join with an equality on the first column and an inequality on the row number. In SQL it would look something like this:
create table Veggies
(
num integer,
emotion varchar(10),
food varchar(10),
code varchar(10),
seq integer
)
insert into Veggies
values
(1,'happy','1-veggies','GHF',1),
(1,'sad','1-veggies','HGF',2),
(2, 'angry' ,'1-veggies' ,'GHG',3),
(2, 'sad', '1-veggies', 'FGH',4),
(3, 'sad', '1-veggies', 'HGF',5),
(4, 'moody', '2-meat', 'FFF',6),
(4, 'sad', '2-meat', 'HGF',7),
(5, 'excited', '2-meat', 'HGF',8)
with t1 (num,seq)
as
(
select num,seq
from veggies
where emotion='sad' and food='1-veggies'
),
t2 (num,seq)
as
(
select num,seq
from veggies
where code='GHF' or code='FGH'
)
select *
from t1 inner join t2 on t1.num=t2.num and t1.seq<>t2.seq
I thought it might be possible to do the same thing (join on first column equal but row number unequal) in Power Query, but I have worked through the steps of getting the two queries with row numbers, and am stuck here:
I don't see any way of expressing an inequality and the documentation seems unhelpful. Does anyone have any inside knowledge on how to do this?
So although it looks as though you can't translate the SQL in the question directly into Power Query and replicate this in a single step
select *
from t1 inner join t2 on t1.num=t2.num and t1.seq<>t2.seq
you can split it into two steps as suggested by #Ron Rosenfeld.
To recap, the initial steps which hopefully were fairly straightforward were:
Establish a connection to the data as Table 1
Add an index column
Duplicate the table and call it Table 2
Filter table 1 by 'sad' and '1-veggies'
filter table 2 by 'GHF' or 'FGH'
Now join Table 2 to Table 1 using an inner join on Column 1:
and exclude rows that were in table 1 using a left anti join on the index column:
This leaves one row as required.

Efficiently update rows of a postgres table from another table in another database based on a condition in a common column

I have two pandas DataFrames:
df1 from database A with connection parameters {"host":"hostname_a","port": "5432", "dbname":"database_a", "user": "user_a", "password": "secret_a"}. The column key is the primary key.
df1:
| | key | create_date | update_date |
|---:|------:|:-------------|:--------------|
| 0 | 57247 | 1976-07-29 | 2018-01-21 |
| 1 | 57248 | | 2018-01-21 |
| 2 | 57249 | 1992-12-22 | 2016-01-31 |
| 3 | 57250 | | 2015-01-21 |
| 4 | 57251 | 1991-12-23 | 2015-01-21 |
| 5 | 57262 | | 2015-01-21 |
| 6 | 57263 | | 2014-01-21 |
df2 from database B with connection parameters {"host": "hostname_b","port": "5433", "dbname":"database_b", "user": "user_b", "password": "secret_b"}. The column id is the primary key (these values are originally the same than the one in the column key in df1; it's only a renaming of the primary key column of df1).
df2:
| | id | create_date | update_date | user |
|---:|------:|:-------------|:--------------|:------|
| 0 | 57247 | 1976-07-29 | 2018-01-21 | |
| 1 | 57248 | | 2018-01-21 | |
| 2 | 57249 | 1992-12-24 | 2020-10-11 | klm |
| 3 | 57250 | 2001-07-14 | 2019-21-11 | ptl |
| 4 | 57251 | 1991-12-23 | 2015-01-21 | |
| 5 | 57262 | | 2015-01-21 | |
| 6 | 57263 | | 2014-01-21 | |
Notice that the row[2] and row[3] in df2 have more recent update_date values (2020-10-11 and 2019-21-11 respectively) than their counterpart in df1 (where id = key) because their creation_date have been modified (by the given users).
I would like to update rows (i.e. in concrete terms; create_date and update_date values) of df1 where update_date in df2 is more recent than its original value in df1 (for the same primary keys).
This is how I'm tackling this for the moment, using sqlalchemy and psycopg2 + the .to_sql() method of pandas' DataFrame:
import psycopg2
from sqlalchemy import create_engine
connector = psycopg2.connect(**database_parameters_dictionary)
engine = create_engine('postgresql+psycopg2://', creator=connector)
df1.update(df2) # 1) maybe there is something better to do here?
with engine.connect() as connection:
df1.to_sql(
name="database_table_name",
con=connection,
schema="public",
if_exists="replace", # 2) maybe there is also something better to do here?
index=True
)
The problem I have is that, according to the documentation, the if_exists argument can only do three things:
if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’
Therefore, to update these two rows, I have to;
1) use .update() method on df1 using df2 as an argument, together with
2) replacing the whole table inside the .to_sql() method, which means "drop+recreate".
As the tables are really large (more than 500'000 entries), I have the feeling that this will need a lot of unnecessary work!
How could I efficiently update only those two newly updated rows? Do I have to generate some custom SQL queries to compares the dates for each rows and only take the ones that have really changed? But here again, I have the intuition that, looping through all rows to compare the update dates will take "a lot" of time. How is the more efficient way to do that? (It would have been easier in pure SQL if the two tables were on the same host/database but it's unfortunately not the case).
Pandas can't do partial updates of a table, no. There is a longstanding open bug for supporting sub-whole-table-granularity updates in .to_sql(), but you can see from the discussion there that it's a very complex feature to support in the general case.
However, limiting it to just your situation, I think there's a reasonable approach you could take.
Instead of using df1.update(df2), put together an expression that yields only the changed records with their new values (I don't use pandas often so I don't know this offhand); then iterate over the resulting dataframe and build the UPDATE statements yourself (or with the SQLAlchemy expression layer, if you're using that). Then, use the connection to DB A to issue all the UPDATEs as one transaction. With an indexed PK, it should be as fast as this would ever be expected to be.
BTW, I don't think df1.update(df2) is exactly correct - from my reading, that would update all rows with any differing fields, not just when updated_date > prev updated_date. But it's a moot point if updated_date in df2 is only ever more recent than those in df1.

How do you match Columns based on Column headers in excel Power Query

I am quite new to Power Query so forgive me if this is obvious. I have 3 files in a folder and have imported them all into excel using "Get data from folder". (I have put my tables at the bottom of the question)
When they are appended to each other they match based on the column rather then the column name. See below:
I was wondering if/how I could append these files using the column header rather then the column itself. So for example all the "Test2" columns would be in column 2.
Tables:
+-------+-------+-------+
| Test1 | Test2 | Test3 |
+-------+-------+-------+
| a | aa | aaa |
| b | bb | bbb |
| c | cc | ccc |
+-------+-------+-------+
+-------+-------+-------+
| Test1 | Test2 | Test3 |
+-------+-------+-------+
| d | dd | ddd |
| e | ee | eee |
| f | ff | fff |
+-------+-------+-------+
+-------+-------+-------+
| Test1 | Test3 | Test2 |
+-------+-------+-------+
| g | ggg | gg |
| h | hhh | hh |
| i | iii | ii |
+-------+-------+-------+
Edit, extra info for comments:
Once I clicked get data from folder they appeared as below:
I then added a column with a custom formula(find below) to expand the contents of the file:
= Excel.Workbook(File.Contents([Folder Path]&"\"&[Name]), null, true)
Note that I am trying to get it to work with 3 files, but then I will have to apply it to hundreds of files, so it would be nice if I didn't have to change each file.
The data looks like this:
Some blind advice, since you've not provided your code. Instead of:
= Excel.Workbook(File.Contents([Folder Path]&"\"&[Name]), null, true)
try:
= Excel.Workbook([Content], true, true)
(The main change is to pass true to the second parameter useHeaders instead of null, since that may allow the first row to be used as headers.
I believe using [Content] is equivalent to using File.Contents([Folder Path]&"\"&[Name]). So this isn't a change, just a simplification in code).
Also, ensure you're using Table.Combine to combine the tables, since that should take care of aligning the column names before appending the tables and get you the output you're expecting.

Transpositioning and matching values

Is there any formula or built in tool in Excel to make such a thing?
I have table:
| A | B | C |
1 | nam1 | val1 | val2 |
2 | nam2 | val3 | val4 |
3 | nam3 | val5 | val6 |
I want this to look like that:
1 | val1 | nam1
2 | val2 | nam1
3 | val3 | nam2
4 | val4 | nam2
I want to assign names to values with should be in rows.
I guess getting someone to write formulae for you counts as easier! Assuming a layout as shown I suggest two formulae because one column repeats a column and another column lists a matrix (which I have extended with two further columns in view of your Comment). Applying OFFSET as suggested by #Jeeped:
In Row 1 (in the example ColumnG) and copied down to suit:
=OFFSET($B$1,INT((ROW()-1)/4),MOD(ROW()-1,4))
In H1 and copied down to suit:
=OFFSET($A$1,INT((ROW()-1)/4),0)
The row numbers of the formulae are used to calculate the appropriate offsets for rows and columns relative to the reference cells.

Better way to refresh imported columns?

I have a table in spotfire with a couple columns imported from another table as a lookup. As an example, Col2 is used to match for the import of ImportedCol:
+------+------+-------------+
| Col1 | Col2 | ImportedCol |
+------+------+-------------+
| 1 | A | Val1 |
| 2 | B | Val2 |
| 3 | A | Val1 |
| 4 | C | Val3 |
| 5 | B | Val2 |
| 6 | A | Val1 |
| 7 | D | Val4 |
+------+------+-------------+
However, the data in Col2 is subject to change. In that event, I need ImportedCol to change with it, however Spotfire seems to just keep the old imported data. Right now I've been deleting the imported column and re-adding it to refresh the link. Is there a way to dynamically import the data as the document loads or with any refresh of the information links?
I have found that this happens sometimes although I'm not exactly sure how to explain why. my workaround is to create "virtual" data tables based on your existing ones.
consider your linked table as A and your embedded table as B. start from a default state -- that is, before importing any columns.
add a new data table. the source for this table should be "From Current Analysis" and using A. we will consider this one as C, and it becomes your main data table, and C will update when any changes are made to A or B.
to illustrate:
I found the issue.
Turns out that pivoting on data in the same table creates a circular reference which overrides the embed/link setting on that table. My workaround was to make the pivot as its own information link, then have the table join the original link and the new pivot one.

Resources