Obtaining physical address trace from GEM5 - memory-address

I've been trying to extract physical address accessed by the application in order to analyze the row hits.
In doing so, I followed this page with little variation due to version change.
I fixed CacheConfig.py as:
system.monitor2 = CommMonitor()
system.monitor2.trace = MemTraceProbe(trace_file = "CT_mon2.trc.gz")
system.monitor2.slave = system.l2.mem_side
system.membus.slave = system.monitor2.master
system.l2.cpu_side = system.tol2bus.master
And ran a code:
build/X86/gem5.opt --debug-flag=CommMonitor configs/example/se.py --caches --l2cache --l2_size=2MB --mem-type=DDR4_2400_16x4 -c tests/test-progs/mm/bin/x86/linux/mm --cpu-type=TimingSimpleCPU
The mm is a binary from a simple matrix multiplication:
// C program to multiply two square matrices.
#include <stdio.h>
#define N 4
// This function multiplies mat1[][] and mat2[][],
// and stores the result in res[][]
void multiply(int mat1[][N], int mat2[][N], int res[][N])
{
int i, j, k;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
res[i][j] = 0;
for (k = 0; k < N; k++)
res[i][j] += mat1[i][k]*mat2[k][j];
}
}
}
int main()
{
int mat1[N][N] = { {1, 1, 1, 1},
{2, 2, 2, 2},
{3, 3, 3, 3},
{4, 4, 4, 4}};
int mat2[N][N] = { {1, 1, 1, 1},
{2, 2, 2, 2},
{3, 3, 3, 3},
{4, 4, 4, 4}};
int res[N][N]; // To store result
int i, j;
multiply(mat1, mat2, res);
printf("Result matrix is \n");
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
printf("%d ", res[i][j]);
printf("\n");
}
return 0;
}
After decoding the "CT_mon2.trc.gz", the memory trace are shown as:
5,u,15360,64,256,11500
6,u,183808,64,2,101000
5,u,18816,64,256,187000
6,u,183744,64,2,285000
5,u,18880,64,256,357000
6,u,171072,64,3,438000
6,u,171648,64,3,526000
6,u,172032,64,3,601000
6,u,174528,64,3,689000
5,u,18944,64,256,765000
The third one indicates physical address.
What I'm confusing is the "u" part. From decode stage, whatever that isn't read(r) or write(w) are notated as "u".
With debugging, commands were repeating with "UpgradeFailResp" and "ReadCleanReq".
I was expecting a trace with reads and writes, but I'm not sure what is happening here.
Can anyone tell me what am I missing?
Or even better way to obtain physical address will be a huge help.
Thanks,
jwlee

5,u,15360,64,256,11500
You can find the meaning of these numbers from the packet decoding script in the util folder. For example, 5 means the master(port) id. "u" means it was not either read or write command.
If you want to know the command itself, one possible way is to edit the gem5/src/mem/comm_monitor.cc where timing requests and responses are handled. For example:
DPRINTF(CommMonitor, "cmd: %s, cmdIndex: %d, addr: %lld masterId: %d \n", pkt->cmdString(),pkt->cmdToIndex(),pkt->getAddr(), pkt->masterId());
pkt->cmdString() shows the command or you can simply use pkt->print() to see the packet information. You can investigate packet.cc for more information.
You need to rebuild gem5 every time you change anything in the src folder.

The reason you will see traffic beyond reads and writes has to do with the placement of the CommMonitor. In your system, the membus is likely the point of coherency, so you will get all sorts of traffic generated by the l2 cache that is meant for cache coherency operations with other l2 caches (if they existed). If you move your CommMonitor beneath the point of coherency, e.g. between the membus and and your memory controllers, you should see only read and write traffic.

Related

(DP) Memoization - How to know if it starts from the top or bottom?

It hasn't been long since I started studying algorithm coding tests, and I found it difficult to find regularity in Memoization.
Here are two problems.
Min Cost Climbing Stairs
You are given an integer array cost where cost[i] is the cost of ith step on a staircase. Once you pay the cost, you can either climb one or two steps.
You can either start from the step with index 0, or the step with index 1.
Return the minimum cost to reach the top of the floor.
Min Cost Climbing Stairs
Recurrence Relation Formula:
minimumCost(i) = min(cost[i - 1] + minimumCost(i - 1), cost[i - 2] + minimumCost(i - 2))
House Robber
You are a professional robber planning to rob houses along a street. Each house has a certain amount of money stashed, the only constraint stopping you from robbing each of them is that adjacent houses have security systems connected and it will automatically contact the police if two adjacent houses were broken into on the same night.
Given an integer array nums representing the amount of money of each house, return the maximum amount of money you can rob tonight without alerting the police.
House Robber
Recurrence Relation Formula:
robFrom(i) = max(robFrom(i + 1), robFrom(i + 2) + nums(i))
So as you can see, first problem consist of the previous, and second problem consist of the next.
Because of this, when I try to make recursion function, start numbers are different.
Start from n
int rec(int n, vector<int>& cost)
{
if(memo[n] == -1)
{
if(n <= 1)
{
memo[n] = 0;
} else
{
memo[n] = min(rec(n-1, cost) + cost[n-1], rec(n-2, cost) + cost[n-2]);
}
}
return memo[n];
}
int minCostClimbingStairs(vector<int>& cost) {
const int n = cost.size();
memo.assign(n+1,-1);
return rec(n, cost); // Start from n
}
Start from 0
int getrob(int n, vector<int>& nums)
{
if(how_much[n] == -1)
{
if(n >= nums.size())
{
return 0;
} else {
how_much[n] = max(getrob(n + 1, nums), getrob(n + 2, nums) + nums[n]);
}
}
return how_much[n];
}
int rob(vector<int>& nums) {
how_much.assign(nums.size() + 2, -1);
return getrob(0, nums); // Start from 0
}
How can I easily know which one need to be started from 0 or n? Is there some regularity?
Or should I just solve a lot of problems and increase my sense?
Your question is right, but somehow examples are not correct. Both the problems you shared can be done in both ways : 1. starting from top & 2. starting from bottom.
For example: Min Cost Climbing Stairs : solution that starts from 0.
int[] dp;
public int minCostClimbingStairs(int[] cost) {
int n = cost.length;
dp = new int[n];
for(int i=0; i<n; i++) {
dp[i] = -1;
}
rec(0, cost);
return Math.min(dp[0], dp[1]);
}
int rec(int in, int[] cost) {
if(in >= cost.length) {
return 0;
} else {
if(dp[in] == -1) {
dp[in] = cost[in] + Math.min(rec(in+1, cost), rec(in+2, cost));
}
return dp[in];
}
}
However, there are certain set of problems where this is not easy. Their structure is such that if you start in reverse, the computation could get complicated or mess up the future results:
Example: Reaching a target sum from numbers in an array using an index at max only 1 time. Reaching 10 in {3, 4, 6, 5, 2} : {4,6} is one answer but not {6, 2, 2} as you are using index (4) 2 times.
This can be done easily in top down way:
int m[M+10];
for(i=0; i<M+10; i++) m[i]=0;
m[0]=1;
for(i=0; i<n; i++)
for(j=M; j>=a[i]; j--)
m[j] |= m[j-a[i]];
If you try to implement in bottom up way, you will end up using a[i] multiple times. You can definitely do it bottom up way if you figure a out a way to tackle this messing up of states. Like using a queue to only store reached state in previous iterations and not use numbers reached in current iterations. Or even check if you keep a count in m[j] instead of just 1 and only use numbers where count is less than that of current iteration count. I think same thing should be valid for all DP.

Reducing execution time C++

Task
You are given an amount of pages in a book. The book printer is lazy and will only print the side number on every other page. The first page number to be printed is 1. The task is to calculate the amount of times a specific number is used on the printed book pages. The goal is to print out all numbers used in bookpages being; 10^9 < pages < 10^12 in less than 5 seconds.
For example
Amount of book pages is 20. The book pages to be printed are then, 1, 3, 5, 7, 9, 11, 13, 15, 17 and 19. The 1 contains only the number 1 and should therefor only increment the savings on 1 by one. However, the number 13 contains 1 and 3, therefor the number 1 and 3 will in the savings will be incremented and so forth.
Question
How do I make the program execute faster at larger numbers? I've been thinking about using threads but I'm unsure if it's beneficial or not.
#include <iostream>
#include <string>
int main(int argc, char *argv[]) {
long long sideNumber;
long long numbers[10];
if(argv[1]) {
sideNumber = std::stoll(argv[1]);
} else {
std::printf("Please enter amount of pages.\n");
return -1;
}
for(int i = 0; i < 10; i++) numbers[i] = 0;
long long index = 1;
while(index < sideNumber) {
long long current = index;
while(current > 0) {
numbers[current%10]++;
current /= 10;
}
index += 2;
}
for(int i = 0; i < 10; i++) {
std::printf("%i : %i\n", i, numbers[i]);
}
return 0;
}
This is trivially a maths problem, not a computing problem.
However, if this has really been set as a computing problem, then the answer is probably recursive.
Consider the page numbers 1-9. How many do they tally for digits 0-9?
Now consider the pages 11-19. Can you re-use the tally from the previous task to make a new tally? and again for 2x,3x,4x, etc.
Then, can you reuse the tally from 1-99 for 101-199?, etcetera.
Note that you need to think some more about how you deal with middle zeros.
Alternatively, you can use a pencil to get the same result in half the time it takes you to write the program.

How do I change a string in a for loop

So i'm trying to make the variable "name" equal "slider1" but it gives me
the error
initialization with '{...}' expected for aggregate object
the code:
for (int TpNum = 1; TpNum < 2; TpNum++)
{
char name[8] = ("slider" + TpNum );
Enemy name(5, 5, 'r', name);
}
Arrays are aggregate types, and as such are initialized with an initializer list.
More spurious is your attempt to add char*s. You should use C++ std::strings, but here's how you can accomplish it with C strings.
for (int TpNum = 1; TpNum < 2; ++TpNum)
{
char[8] name;
sprintf(name,"slider%d",TpNum);
Enemy name(5, 5, 'r', name);
}
You'll need to include <stdio.h> to use sprintf.
EDIT
Also note that your loop will only execute once, so you could just say
Enemy name(5, 5, 'r', "slider1");

searching for dynamic programming solution

Problem :
There is a stack consisting of N bricks. You and your friend decide to play a game using this stack. In this game, one can alternatively remove 1/2/3 bricks from the top and the numbers on the bricks removed by the player is added to his score. You have to play in such a way that you obtain maximum possible score while it is given that your friend will also play optimally and you make the first move.
Input Format
First line will contain an integer T i.e. number of test cases. There will be two lines corresponding to each test case, first line will contain a number N i.e. number of element in stack and next line will contain N numbers i.e. numbers written on bricks from top to bottom.
Output Format
For each test case, print a single line containing your maximum score.
I have tried with recursion but didn't work
int recurse(int length, int sequence[5], int i) {
if(length - i < 3) {
int sum = 0;
for(i; i < length; i++) sum += sequence[i];
return sum;
} else {
int sum1 = 0;
int sum2 = 0;
int sum3 = 0;
sum1 += recurse(length, sequence, i+1);
sum2 += recurse(length, sequence, i+2);
sum3 += recurse(length, sequence, i+3);
return max(max(sum1,sum2),sum3);
}
}
int main() {
int sequence[] = {0, 0, 9, 1, 999};
int length = 5;
cout << recurse(length, sequence, 0);
return 0;
}
My approach to solving this problem was as follows:
Both players play optimally.
So, the solution is to be built in a manner that need not take the player into account. This is because both players are going to pick the best choice available to them for any given state of the stack of bricks.
The base cases:
Either player, when left with the last one/two/three bricks, will choose to remove all bricks.
For the sake of convenience, let's assume that the array is actually in reverse order (i.e. a[0] is the value of the bottom-most brick in the stack) (This can easily be incorporated by performing a reverse operation on the array.)
So, the base cases are:
# Base Cases
dp[0] = a[0]
dp[1] = a[0]+a[1]
dp[2] = a[0]+a[1]+a[2]
Building the final solution:
Now, in each iteration, a player has 3 choices.
pick brick (i), or,
pick brick (i and i-1) , or,
pick brick (i,i-1 and i-2)
If the player opted for choice 1, the following would result:
player secures a[i] points from the brick (i) (+a[i])
will not be able to procure the points on the bricks removed by the opponent. This value is stored in dp[i-1] (which the opponent will end up scoring by virtue of this choice made by the player).
will surely procure the points on the bricks not removed by the opponent. (+ Sum of all the bricks up until brick (i-1) not removed by opponent )
A prefix array to store the partial sums of points of bricks can be computed as follows:
# build prefix sum array
pre = [a[0]]
for i in range(1,n):
pre.append(pre[-1]+a[i])
And, now, if player opted for choice 1, the score would be:
ans1 = a[i] + (pre[i-1] - dp[i-1])
Similarly, for choices 2 and 3. So, we get:
ans1 = a[i]+ (pre[i-1] - dp[i-1]) # if we pick only ith brick
ans2 = a[i]+a[i-1]+(pre[i-2] - dp[i-2]) # pick 2 bricks
ans3 = a[i]+a[i-1]+a[i-2]+(pre[i-3] - dp[i-3]) # pick 3 bricks
Now, each player wants to maximize this value. So, in each iteration, we pick the maximum among ans1, ans2 and ans3.
dp[i] = max(ans1, ans2, ans3)
Now, all we have to do is to iterate from 3 through to n-1 to get the required solution.
Here is the final snippet in python:
a = map(int, raw_input().split())
a.reverse() # so that a[0] is bottom brick of stack
dp = [0 for x1 in xrange(n)]
dp[0] = a[0]
dp[1] = a[0]+a[1]
dp[2] = a[0]+a[1]+a[2]
# build prefix sum array
pre = [a[0]]
for i in range(1,n):
pre.append(pre[-1]+a[i])
for i in xrange(3,n):
# We can pick brick i, (i,i-1) or (i,i-1,i-2)
ans1 = a[i]+ (pre[i-1] - dp[i-1]) # if we pick only ith brick
ans2 = a[i]+a[i-1]+(pre[i-2] - dp[i-2]) # pick 2
ans3 = a[i]+a[i-1]+a[i-2]+(pre[i-3] - dp[i-3]) #pick 3
# both players maximise this value. Doesn't matter who is playing
dp[i] = max(ans1, ans2, ans3)
print dp[n-1]
At a first sight your code seems totally wrong for a couple of reasons:
The player is not taken into account. You taking a brick or your friend taking a brick is not the same (you've to maximize your score, the total is of course always the total of the score on the bricks).
Looks just some form of recursion with no memoization and that approach will obviously explode to exponential computing time (you're using the "brute force" approach, enumerating all possible games).
A dynamic programming approach is clearly possible because the best possible continuation of a game doesn't depend on how you reached a certain state. For the state of the game you'd need
Who's next to play (you or your friend)
How many bricks are left on the stack
With these two input you can compute how much you can collect from that point to the end of the game. To do this there are two cases
1. It's your turn
You need to try to collect 1, 2 or 3 and call recursively on the next game state where the opponent will have to choose. Of the three cases you keep what is the highest result
2. It's opponent turn
You need to simulate collection of 1, 2 or 3 bricks and call recursively on next game state where you'll have to choose. Of the three cases you keep what is the lowest result (because the opponent is trying to maximize his/her result, not yours).
At the very begin of the function you just need to check if the same game state has been processed before, and when returning from a computation you need to store the result. Thanks to this lookup/memorization the search time will not be exponential, but linear in the number of distinct game states (just 2*N where N is the number of bricks).
In Python:
memory = {}
bricks = [0, 0, 9, 1, 999]
def maxResult(my_turn, index):
key = (my_turn, index)
if key in memory:
return memory[key]
if index == len(bricks):
result = 0
elif my_turn:
result = None
s = 0
for i in range(index, min(index+3, len(bricks))):
s += bricks[i]
x = s + maxResult(False, i+1)
if result is None or x > result:
result = x
else:
result = None
for i in range(index, min(index+3, len(bricks))):
x = maxResult(True, i+1)
if result is None or x < result:
result = x
memory[key] = result
return result
print maxResult(True, 0)
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution {
public static void main(String[] args){
Scanner sc=new Scanner(System.in);
int noTest=sc.nextInt();
for(int i=0; i<noTest; i++){
int noBrick=sc.nextInt();
ArrayList<Integer> arr=new ArrayList<Integer>();
for (int j=0; j<noBrick; j++){
arr.add(sc.nextInt());
}
long sum[]= new long[noBrick];
sum[noBrick-1]= arr.get(noBrick-1);
for (int j=noBrick-2; j>=0; j--){
sum[j]= sum[j+1]+ arr.get(j);
}
long[] max=new long[noBrick];
if(noBrick>=1)
max[noBrick-1]=arr.get(noBrick-1);
if(noBrick>=2)
max[noBrick-2]=(int)Math.max(arr.get(noBrick-2),max[noBrick-1]+arr.get(noBrick-2));
if(noBrick>=3)
max[noBrick-3]=(int)Math.max(arr.get(noBrick-3),max[noBrick-2]+arr.get(noBrick-3));
if(noBrick>=4){
for (int j=noBrick-4; j>=0; j--){
long opt1= arr.get(j)+sum[j+1]-max[j+1];
long opt2= arr.get(j)+arr.get(j+1)+sum[j+2]-max[j+2];
long opt3= arr.get(j)+arr.get(j+1)+arr.get(j+2)+sum[j+3]-max[j+3];
max[j]= (long)Math.max(opt1,Math.max(opt2,opt3));
}
}
long cost= max[0];
System.out.println(cost);
}
}
}
I tried this using Java, seems to work alright.
here a better solution that i found on the internet without recursion.
#include <iostream>
#include <fstream>
#include <algorithm>
#define MAXINDEX 10001
using namespace std;
long long maxResult(int a[MAXINDEX], int LENGTH){
long long prefixSum [MAXINDEX] = {0};
prefixSum[0] = a[0];
for(int i = 1; i < LENGTH; i++){
prefixSum[i] += prefixSum[i-1] + a[i];
}
long long dp[MAXINDEX] = {0};
dp[0] = a[0];
dp[1] = dp[0] + a[1];
dp[2] = dp[1] + a[2];
for(int k = 3; k < LENGTH; k++){
long long x = prefixSum[k-1] + a[k] - dp[k-1];
long long y = prefixSum[k-2] + a[k] + a[k-1] - dp[k-2];
long long z = prefixSum[k-3] + a[k] + a[k-1] + a[k-2] - dp[k-3];
dp[k] = max(x,max(y,z));
}
return dp[LENGTH-1];
}
using namespace std;
int main(){
int cases;
int bricks[MAXINDEX];
ifstream fin("test.in");
fin >> cases;
for (int i = 0; i < cases; i++){
long n;
fin >> n;
for(int j = 0; j < n; j++) fin >> bricks[j];
reverse(bricks, bricks+n);
cout << maxResult(bricks, n)<< endl;
}
return 0;
}

MPI-IO deadlock using MPI_File_write_all

My MPI code deadlocks when I run this simple code on 512 processes on a cluster. I am far from the memory limit. If I increase the number of procesess to 2048, which is far too many for this problem, the code runs again. The deadlock occurs in the line containing the MPI_File_write_all.
Any suggestions?
int count = imax*jmax*kmax;
// CREATE THE SUBARRAY
MPI_Datatype subarray;
int totsize [3] = {kmax, jtot, itot};
int subsize [3] = {kmax, jmax, imax};
int substart[3] = {0, mpicoordy*jmax, mpicoordx*imax};
MPI_Type_create_subarray(3, totsize, subsize, substart, MPI_ORDER_C, MPI_DOUBLE, &subarray);
MPI_Type_commit(&subarray);
// SET THE VALUE OF THE GRID EQUAL TO THE PROCESS ID FOR CHECKING
if(mpiid == 0) std::printf("Setting the value of the array\n");
for(int i=0; i<count; i++)
u[i] = (double)mpiid;
// WRITE THE FULL GRID USING MPI-IO
if(mpiid == 0) std::printf("Write the full array to disk\n");
char filename[] = "u.dump";
MPI_File fh;
if(MPI_File_open(commxy, filename, MPI_MODE_CREATE | MPI_MODE_WRONLY | MPI_MODE_EXCL, MPI_INFO_NULL, &fh))
return 1;
// select noncontiguous part of 3d array to store the selected data
MPI_Offset fileoff = 0; // the offset within the file (header size)
char name[] = "native";
if(MPI_File_set_view(fh, fileoff, MPI_DOUBLE, subarray, name, MPI_INFO_NULL))
return 1;
if(MPI_File_write_all(fh, u, count, MPI_DOUBLE, MPI_STATUS_IGNORE))
return 1;
if(MPI_File_close(&fh))
return 1;
Your code looks right upon quick inspection. I would suggest that you let your MPI-IO library help tell you what's wrong: instead of returning from error, why don't you at least display the error? Here's some code that might help:
static void handle_error(int errcode, char *str)
{
char msg[MPI_MAX_ERROR_STRING];
int resultlen;
MPI_Error_string(errcode, msg, &resultlen);
fprintf(stderr, "%s: %s\n", str, msg);
MPI_Abort(MPI_COMM_WORLD, 1);
}
Is MPI_SUCCESS guaranteed to be 0? I'd rather see
errcode = MPI_File_routine();
if (errcode != MPI_SUCCESS) handle_error(errcode, "MPI_File_open(1)");
Put that in and if you are doing something tricky like setting a file view with offsets that are not monotonically non-decreasing, the error string might suggest what's wrong.

Resources