Mathematica equivalent of Pandas groupby and sum - pandas-groupby

I imported a World Health Organization (WHO) csv file with Covid-19 cases per country from January 2020 into Mathematica. The file is a table with "Date Reported", "Country Code", "Country Name", "WHO Region", "New Cases", "Cumulative Cases", "New Deaths", "Cumulative Deaths" as columns (see image).
I want to group the countries by the same "WHO Region" and sum the "New Cases", "Cumulative Cases", "New Deaths" and "Cumulative Deaths" that have the same "Date Reported" (for example sum all the cumulative cases in Europe in each day). I can easily do that in Python using the Pandas library with the functions groupby() and sum(), like in the code below:
import pandas as pd
worldCases = pd.read_csv("WHO-COVID-19-global-data_02April2022.csv",usecols=[0,2,3,4,5,6,7])
regionCases = worldCases.groupby(['WHO_region','Date_reported']).sum()
How could I achieve the same goal in Mathematica?
The csv file can be downloaded from here:
https://dl.dropboxusercontent.com/s/80ytphej636239x/WHO-COVID-19-global-data_02April2022.csv?dl=0

Tabulating for each date with a basic method first
data = Import[
"https://dl.dropboxusercontent.com/s/80ytphej636239x/WHO-COVID-19-global-data_02April2022.csv"];
regions = Union[Rest[data[[All, 4]]]];
splitout = Cases[data, {_, _, _, #, __}] & /# regions;
dates = Union[Rest[data[[All, 1]]]];
eachdate[onedate_] := Module[{},
splitout = Cases[data,
{onedate, _, _, #, __}] & /# regions;
countup[onesplit_] := Module[{totals},
totals = Total[onesplit[[All, {5, 6, 7, 8}]]];
NumberForm[#, DigitBlock -> 3] & /# totals];
Labeled[
TableForm[countup /# splitout, TableAlignments -> Right,
TableHeadings -> {regions, data[[1, {5, 6, 7, 8}]]}],
onedate, {{Top, Left}}]]
(* Just the last three dates *)
eachdate /# Take[dates, -3]
-tables not shown-
Plotting over the dates
eachdate[onedate_] := Module[{},
splitout = Cases[data,
{onedate, _, _, #, __}] & /# regions;
totals = Total[#[[All, {5, 6, 7, 8}]]] & /# splitout;
MapAt[{onedate, #} &, totals, {All, All}]]
set = eachdate /# dates;
newCases = set[[All, All, 1]];
cumulativeCases = set[[All, All, 2]];
newDeaths = set[[All, All, 3]];
cumulativeDeaths = set[[All, All, 4]];
DateListPlot[Transpose[newCases],
PlotLegends -> regions, PlotRange -> All,
PlotLabel -> data[[1, 5]]]
DateListPlot[Transpose[cumulativeCases],
PlotLegends -> regions, PlotRange -> All,
PlotLabel -> data[[1, 6]]]
DateListPlot[Transpose[newDeaths],
PlotLegends -> regions, PlotRange -> All,
PlotLabel -> data[[1, 7]]]
DateListPlot[Transpose[cumulativeDeaths],
PlotLegends -> regions, PlotRange -> All,
PlotLabel -> data[[1, 8]]]
Equivalent tabulation procedure using a Dataset
headers = First[data];
dataset = Dataset[Association[
Thread[headers -> #]] & /# Rest[data]];
cols = Take[headers, -4];
datatable = dataset[
GroupBy["WHO_region"], GroupBy["Date_reported"],
Total, cols];
sampledate = Last[dates];
Labeled[TableForm[Outer[
datatable[#1][sampledate][#2] &, regions, cols] /.
x_Integer :> NumberForm[x, DigitBlock -> 3],
TableAlignments -> Right, TableHeadings -> {regions, cols}],
sampledate, {{Top, Left}}]
2022-03-30
New_cases Cumulative_cases New_deaths Cumulative_deaths
-------------------------------------------------------------------
AFRO 2,812 8,580,381 24 171,086
AMRO 105,798 150,709,372 1,785 2,700,130
EMRO 3,789 21,576,432 62 340,628
EURO 718,203 202,693,049 1,450 1,940,630
Other 0 764 0 13
SEARO 33,699 57,130,457 220 778,986
WPRO 527,512 46,071,142 826 211,262
And a sample chart
DateListPlot[Map[Function[region,
{#, datatable[region][#][cols[[1]]]} & /# dates], regions],
PlotLegends -> regions, PlotRange -> All, PlotLabel -> cols[[1]]]

Related

Individual point coloring in ListPlot, ErrorListPlot in Mathematica

I can get a colored ListLinePlot by doing something like
ListLinePlot[Range[420, 680, 20], ColorFunction -> "VisibleSpectrum", ColorFunctionScaling -> False]
However, as indicated by the help file ("ColorFunction requires at least one dataset to be Joined"), if I do the equivalent
ListPlot[Range[420, 680, 20], ColorFunction -> "VisibleSpectrum", ColorFunctionScaling -> False]
all my points are blue. Is there a nice way to get ColorFunction to work for ListPlot with Joined -> False?
That is, is there a nicer way to get something like
ListPlot[
List /# Transpose[{Range[(680 - 420)/20 + 1], Range[420, 680, 20]}],
PlotMarkers -> ({Graphics[{#, Disk[]}], 0.05} & /# ColorData["VisibleSpectrum"] /# Range[420, 680, 20])
]
?
(Also, does anyone have an explanation of why Mathematica requires Joined -> True in order to make use of ColorFunction?)
Edit: I'm also looking for a way to do a similar coloring with ErrorListPlot in the ErrorBarPlots package.
The problem is, that Joined->True draws a Line[] which can be given VertexColors for each containing point. I assume doing the same for the points when setting Joined->False leads to situations where it does not work. Nevertheless, Line[] and Point[] work pretty much the same in your case. So what is about
ListLinePlot[Range[420, 680, 20], ColorFunction -> "VisibleSpectrum",
ColorFunctionScaling -> False] /. Line[arg___] :> Point[arg]
And, by the way, if your using a ListLinePlot only, where the only Line[] directives arising are the one from your data, this should work even if you have more datasets and {x,y} coordinates
data = Transpose[Table[{{x, Sin[x]}, {x, Cos[x]}}, {x, 0, 2 Pi, 0.2}]];
ListLinePlot[data, ColorFunction -> Hue] /. Line[arg___] :> Point[arg]
You can use DiscretePlot:
data = Range[420, 680, 20];
DiscretePlot[data[[i]], {i, Length[data]},
ColorFunction -> "VisibleSpectrum", ColorFunctionScaling -> False,
Filling -> None]
If you're plotting a list of x,y points, it gets a little trickier:
data = Transpose[{Range[420, 680, 20], Range[400, 530, 10]}];
mapping = Apply[Rule, data, 2];
DiscretePlot[i/.mapping, {i, data[[;;,1]]},
ColorFunction -> "VisibleSpectrum", ColorFunctionScaling -> False,
Filling -> None]
It does seem rather odd that DiscretePlot will let you color the points differently whereas ListPlot won't. I'm sure it must have something to do with the implementation details, but I can't think of a reason why that would be the case.
I came across this problem in my work too. I assign a colour to each point in the following manner:
data = ...
ListPlot[data] /. Point[args___] :> Point[args, VertexColors -> {c1, c2, ...}]
where c1 is the colour for the first data point, and so on. The colour list may be programmatically generated, eg
ColorData["Rainbow"] /# (Range#Length#data / Length#data)
Here is the result.
The good points of this method are as follows.
It's straightforward: we have a list of pairs, then we create a corresponding list of colours.
Our original ListPlot code needs not be modified (eg, changed into ListLinePlot).

List name to String in Mathematica

Considering :
daList=Range[10]
What I need is the title to be the list name but trying :
ListPlot[daList, PlotLabel -> ToString[daList]]
does not seem to work.
EDIT
"daList" is the Title I want. SORRY for previous my previous lack of precision
EDIT
I yet can`t make any solution work but I think I isolated the problem. The list name is a plotting Function argument. I believe the simple version replicating my problem is like :
list = {1, 2, 3, 4};
naming[list_] := ToString[HoldForm[list]];
naming[list]
an below my "real" code :
sequenceCountPlot[conditionSet_] :=
ListPlot[sequenceCountALL[conditionSet],
plotOptions[
("DisplayNo looking outside filter" <> (ToString[HoldForm[conditionSet]])),
"Number of Display",
"Filter Radius in Cm",
prefCOLORS],
PlotRange -> {{0, 10}, {0, Max#(Max /# sequenceCountALL[conditionSet])}}, Joined -> True]
Where plotOptions is a function to customize some options (Titles and Colors) and feet the plot with other.
Please know that Even with Evaluate[plotOptions] the results remain the same.
Try this:
ToString[HoldForm#daList]
so eg
ListPlot[daList, PlotLabel -> ToString[HoldForm#daList]]
I believe that two the most convenient ways to do this are:
daList = Range[10];
ListPlot[daList, PlotLabel -> "daList"]
ListPlot[daList, PlotLabel -> HoldForm[daList]]
Other possibilities:
ListPlot[daList, PlotLabel -> MakeBoxes[daList]]
ListPlot[daList, PlotLabel -> SymbolName[Unevaluated#daList]]
ListPlot[daList, PlotLabel -> ToString[Unevaluated#daList]]
ListPlot[daList, PlotLabel -> ToString[HoldForm#daList]]
Another possibility:
ListPlot[daList, PlotLabel -> ToString[Unevaluated[daList]]]
may be
ListPlot[daList, PlotLabel -> StringJoin[Map[ToString, daList]]]

Converting string to illegal/unsupported expression in Mathematica.

I need to input a variable, say var, into Mathematica function Series[ ] like this: Series[A^2+B^2+C^2, var]. Series[ ] has the following syntax:
Series[f, {x, x_0, n}] generates a power series expansion for f about the point x=x_0 to order n.
Series[f, {x, x_0, n}, {y, y_0, m}, ...] successively finds series expansions with respect to x, then y, etc.
Because I am not always computing Series[ ] in one dimension (i.e., B and C are not always variables at each iteration), var must be properly formatted to fit the dimension demands. The caveat is that Mathematica likes lists, so any table degenerated will have a set of outer {}.
Suppose my previous code generates the following two sets of sets:
table[1]= {{A, 0, n}};
table[2]= {{A, 0, n}, {B, 0, m}}; .
My best idea is to use string manipulation (for i= 2):
string = ToString[table[i]]; .
str = StringReplacePart[string, {" ", " "}, {{1}, {StringLength[string], StringLength[string]}}]
The next step is to convert str to an expression like var and do Series[A^2 + B^2 + C^2, var] by doing var= ToExpression[str], but this returns the following error:
ToExpression::sntx: Invalid syntax in or before "{A, 0, n}, {B, 0, m}".
$Failed
Help convert str to expression propertly or suggest another way to handle this problem.
If I understood correctly, you have
table[2] = {{A, 0, n}, {B, 0, m}};
and are trying to obtain from that
Series[f[A,B],{A,0,n},{B,0,m}]
This may be done using Sequence, like so (I will use series instead of Series to keep it unevaluated so you can see what is happening):
series[f[A, B], Sequence ## table[2]]
(*
-> series[f[A,B],{A,0,n},{B,0,m}]
*)
So for instance
table[3] = {{A, 0, 2}, {B, 0, 2}};
Series[f[A, B], Sequence ## table[3]]
gives the right series expansion.
You can use First or Last or more generally, Part to get the List you want. For e.g.,
var = {{x, 0, 3}, {x, 0, 5}};
Series[1/(1 + x), var[[1]]]
Out[1]= 1 - x + x^2 - x^3 + O[x]^4
Series[1/(1 + x), var[[2]]]
Out[2]= 1 - x + x^2 - x^3 + x^4 - x^5 + O[x]^6
EDIT:
For multiple variables, you can use a SlotSequence (##) along with Apply (##) like so:
Series[Sin[u + w], ##] & ## {{u, 0, 3}, {w, 0, 3}}

Arrange & Display Parts of Grids in Mathematica

In order to find quickly the location of my data, I display a table with my variables names as well as informations about each.
Since I have a lot of columns (variable) I copy and paste blocks of it in a cell to have them all on 1 screen.
I would like to code this, such that I would input several range of Rows to be extracted and efficiently displayed some how on a Grid that would fit an area of the screen ? I have failed yet to display 2 grid nicely together.
In case I did not express my problem properly above, here is a simple example:
How can I move the blue part to the side of the pink one if the output of laList is what we have to deal with ?
co1 = Range[6];
co2 = Range[11, 16];
co3 = {"A", "B", "C", "D", "E", "F"};
laList = Join[{co1}, {co2}, {co3}] // Transpose;
laListGraph = Grid[laList,
Dividers -> All,
Alignment -> {Left, Center},
ItemSize -> Automatic,
ItemStyle -> Directive[FontSize -> 14, Black, Italic, Bold],
Spacings -> {2, 1},
Background -> {None, None, {
{{1, 3}, {1, 3}} -> LightRed,
{{4, 6}, {1, 3}} -> LightBlue
} } ]
EDIT:
On second thoughts, what I had earlier is not what you wanted... you want it displayed as columns, but with the second half of the rows split and displayed beside the first. The following code should do that. Let me know if this is what you had in mind...
(Grid[#1, Dividers -> All, Alignment -> {Left, Center},
ItemSize -> Automatic,
ItemStyle -> Directive[FontSize -> 14, Black, Italic, Bold],
Spacings -> {2, 1},
Background -> {None,
None, {{1, 3}, {1, 3}} -> #2}] &) ### {{laList[[;; 3, All]],
LightRed}, {laList[[4 ;;, All]], LightBlue}} // Row
As I understood your question, the coloring was there just to show us which portion you want to "move up". So:
showG[l_List] :=
Grid[Join[
l[[ ;; IntegerPart[Length#l/2]]],
l[[IntegerPart[Length#l/2] + 1 ;;]]
, 2], Frame -> All];
showG[laList]
Edit
Or more to #Mr.'s taste:
showG[l_List] :=
Grid[Join[l[[ ;; #]], l[[# + 1 ;;]], 2], Frame -> All] &# Floor[Length#l/2];
Here is how I would do it. Starting with your laList as defined in the question:
laList2 = ArrayFlatten # {Partition[laList, 3]};
Grid[laList2,
Dividers -> All,
Alignment -> {Left, Center},
ItemSize -> Automatic,
ItemStyle -> Directive[FontSize -> 14, Black, Italic, Bold],
Spacings -> {2, 1},
Background -> {None, None, {
{{1, 3}, {1, 3}} -> LightRed,
{{1, 3}, {4, 6}} -> LightBlue
} }
]
Please notice:
the value 3 within Partition will need to be adjusted according to your list.
the region specification for the LightBlue area was reversed.
A variation of yoda's code I like is:
subgrid= Grid[#1,
Dividers -> All,
Alignment -> {Left, Center},
ItemSize -> Automatic,
ItemStyle -> Directive[FontSize -> 14, Black, Italic, Bold],
Spacings -> {2, 1},
Background-> #2] &;
MapThread[subgrid, {Partition[laList, 3], {LightRed, LightBlue}}] //Row
Also, with this method you can partition a list that does not divide evenly:
MapThread[subgrid, {
Partition[laList, 4, 4, 1, {}],
{LightRed, LightBlue}
}] //Row

Graph[] cuts off vertex labels in Mathematica

Graph[] has a tendency to cut off vertex labels in Mathematica. I am looking for a robust workaround.
Example:
Graph[{1 -> 2, 2 -> 3, 3 -> 1}, VertexLabels -> "Name"]
My present workaround:
SetOptions[Graph, ImagePadding -> 12]
This is not robust because the value of ImagePadding needs to be manually adjusted depending on the label size.
Apparently using FullGraphics on the Graph object will fix the clipping for the purpose of display, at the expense of interactivity.
Per the comment below, Show[] works as well, and avoids modifying the graphics.
Here are two possible workarounds.
Enlarge the vertex size and place the labels within the vertex. Of course, this also depends on the length of the labels, but for shortish labels it works well, whereas your example above clips off any label of more than one character for vertex 1.
ex:
Table[Graph[{1 -> 2, 2 -> 3, 3 -> 1}, VertexSize -> 0.3,
VertexLabels -> Table[i ->
Placed["vertex" <> ToString[i], p], {i, 3}],
VertexShapeFunction -> "Square", PlotLabel -> p],
{p, {Left, Top, Right, Bottom, Center}}]
Use tooltips to store the labels instead of displaying them on the graphic. [Edit: Center probably looks the best, and then you can wrap labels by putting \n in your string if you need to, but again, depends on the label length.]
ex:
Graph[{1 -> 2, 2 -> 3, 3 -> 1}, VertexLabels -> Placed["Name", Tooltip]]
While this stops you from being able to see all the labels at the same time, you never have any clipping.

Resources