I am trying to calculate row count from a large file based on presence of a certain character and would like to use StreamReader and ReadBlock - below is my code.
protected virtual long CalculateRowCount(FileStream inStream, int bufferSize)
{
long rowCount=0;
String line;
inStream.Position = 0;
TextReader reader = new StreamReader(inStream);
char[] block = new char[4096];
const int blockSize = 4096;
int indexer = 0;
int charsRead = 0;
long numberOfLines = 0;
int count = 1;
do
{
charsRead = reader.ReadBlock(block, indexer, block.Length * count);
indexer += blockSize ;
numberOfLines = numberOfLines + string.Join("", block).Split(new string[] { "&ENDE" }, StringSplitOptions.None).Length;
count ++;
} while (charsRead == block.Length);//charsRead !=0
reader.Close();
fileRowCount = rowCount;
return rowCount;
}
But I get error
Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
I am not sure what is wrong... Can you help. Thanks ahead!
For one, read the StreamReader.ReadBlock() documentation carefully http://msdn.microsoft.com/en-us/library/system.io.streamreader.readblock.aspx and compare with what you're doing:
The 2nd argument (indexer) should be within the range of the block you're passing in, but you're passing something that will probably exceed it after one iteration. Since it looks like you want to reuse the memory block, pass 0 here.
The 3rd argument (count) indicates how many bytes to read into your memory block; passing something larger than the block size might not work (depends on implementation)
ReadBlock() returns the number of bytes actually read, but you increment indexer as if it will always return the size of the block exactly (most of the time, it won't)
Related
this is what i have of the function so far. This is only the beginning of the problem, it is asking to generate the random numbers in a 10 by 5 group of numbers for the output, then after this it is to be sorted by number size, but i am just trying to get this first part down.
/* Populate the array with 50 randomly generated integer values
* in the range 1-50. */
void populateArray(int ar[], const int n) {
int n;
for (int i = 1; i <= length - 1; i++){
for (int i = 1; i <= ARRAY_SIZE; i++) {
i = rand() % 10 + 1;
ar[n]++;
}
}
}
First of all we want to use std::array; It has some nice property, one of which is that it doesn't decay as a pointer. Another is that it knows its size. In this case we are going to use templates to make populateArray a generic enough algorithm.
template<std::size_t N>
void populateArray(std::array<int, N>& array) { ... }
Then, we would like to remove all "raw" for loops. std::generate_n in combination with some random generator seems a good option.
For the number generator we can use <random>. Specifically std::uniform_int_distribution. For that we need to get some generator up and running:
std::random_device device;
std::mt19937 generator(device());
std::uniform_int_distribution<> dist(1, N);
and use it in our std::generate_n algorithm:
std::generate_n(array.begin(), N, [&dist, &generator](){
return dist(generator);
});
Live demo
My MPI code deadlocks when I run this simple code on 512 processes on a cluster. I am far from the memory limit. If I increase the number of procesess to 2048, which is far too many for this problem, the code runs again. The deadlock occurs in the line containing the MPI_File_write_all.
Any suggestions?
int count = imax*jmax*kmax;
// CREATE THE SUBARRAY
MPI_Datatype subarray;
int totsize [3] = {kmax, jtot, itot};
int subsize [3] = {kmax, jmax, imax};
int substart[3] = {0, mpicoordy*jmax, mpicoordx*imax};
MPI_Type_create_subarray(3, totsize, subsize, substart, MPI_ORDER_C, MPI_DOUBLE, &subarray);
MPI_Type_commit(&subarray);
// SET THE VALUE OF THE GRID EQUAL TO THE PROCESS ID FOR CHECKING
if(mpiid == 0) std::printf("Setting the value of the array\n");
for(int i=0; i<count; i++)
u[i] = (double)mpiid;
// WRITE THE FULL GRID USING MPI-IO
if(mpiid == 0) std::printf("Write the full array to disk\n");
char filename[] = "u.dump";
MPI_File fh;
if(MPI_File_open(commxy, filename, MPI_MODE_CREATE | MPI_MODE_WRONLY | MPI_MODE_EXCL, MPI_INFO_NULL, &fh))
return 1;
// select noncontiguous part of 3d array to store the selected data
MPI_Offset fileoff = 0; // the offset within the file (header size)
char name[] = "native";
if(MPI_File_set_view(fh, fileoff, MPI_DOUBLE, subarray, name, MPI_INFO_NULL))
return 1;
if(MPI_File_write_all(fh, u, count, MPI_DOUBLE, MPI_STATUS_IGNORE))
return 1;
if(MPI_File_close(&fh))
return 1;
Your code looks right upon quick inspection. I would suggest that you let your MPI-IO library help tell you what's wrong: instead of returning from error, why don't you at least display the error? Here's some code that might help:
static void handle_error(int errcode, char *str)
{
char msg[MPI_MAX_ERROR_STRING];
int resultlen;
MPI_Error_string(errcode, msg, &resultlen);
fprintf(stderr, "%s: %s\n", str, msg);
MPI_Abort(MPI_COMM_WORLD, 1);
}
Is MPI_SUCCESS guaranteed to be 0? I'd rather see
errcode = MPI_File_routine();
if (errcode != MPI_SUCCESS) handle_error(errcode, "MPI_File_open(1)");
Put that in and if you are doing something tricky like setting a file view with offsets that are not monotonically non-decreasing, the error string might suggest what's wrong.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I saw this in an interview question ,
Given a sorting order string, you are asked to sort the input string based on the given sorting order string.
for example if the sorting order string is dfbcae
and the Input string is abcdeeabc
the output should be dbbccaaee.
any ideas on how to do this , in an efficient way ?
The Counting Sort option is pretty cool, and fast when the string to be sorted is long compared to the sort order string.
create an array where each index corresponds to a letter in the alphabet, this is the count array
for each letter in the sort target, increment the index in the count array which corresponds to that letter
for each letter in the sort order string
add that letter to the end of the output string a number of times equal to it's count in the count array
Algorithmic complexity is O(n) where n is the length of the string to be sorted. As the Wikipedia article explains we're able to beat the lower bound on standard comparison based sorting because this isn't a comparison based sort.
Here's some pseudocode.
char[26] countArray;
foreach(char c in sortTarget)
{
countArray[c - 'a']++;
}
int head = 0;
foreach(char c in sortOrder)
{
while(countArray[c - 'a'] > 0)
{
sortTarget[head] = c;
head++;
countArray[c - 'a']--;
}
}
Note: this implementation requires that both strings contain only lowercase characters.
Here's a nice easy to understand algorithm that has decent algorithmic complexity.
For each character in the sort order string
scan string to be sorted, starting at first non-ordered character (you can keep track of this character with an index or pointer)
when you find an occurrence of the specified character, swap it with the first non-ordered character
increment the index for the first non-ordered character
This is O(n*m), where n is the length of the string to be sorted and m is the length of the sort order string. We're able to beat the lower bound on comparison based sorting because this algorithm doesn't really use comparisons. Like Counting Sort it relies on the fact that you have a predefined finite external ordering set.
Here's some psuedocode:
int head = 0;
foreach(char c in sortOrder)
{
for(int i = head; i < sortTarget.length; i++)
{
if(sortTarget[i] == c)
{
// swap i with head
char temp = sortTarget[head];
sortTarget[head] = sortTarget[i];
sortTarget[i] = temp;
head++;
}
}
}
In Python, you can just create an index and use that in a comparison expression:
order = 'dfbcae'
input = 'abcdeeabc'
index = dict([ (y,x) for (x,y) in enumerate(order) ])
output = sorted(input, cmp=lambda x,y: index[x] - index[y])
print 'input=',''.join(input)
print 'output=',''.join(output)
gives this output:
input= abcdeeabc
output= dbbccaaee
Use binary search to find all the "split points" between different letters, then use the length of each segment directly. This will be asymptotically faster then naive counting sort, but will be harder to implement:
Use an array of size 26*2 to store the begin and end of each letter;
Inspect the middle element, see if it is different from the element left to it. If so, then this is the begin for the middle element and end for the element before it;
Throw away the segment with identical begin and end (if there are any), recursively apply this algorithm.
Since there are at most 25 "split"s, you won't have to do the search for more than 25 segemnts, and for each segment it is O(logn). Since this is constant * O(logn), the algorithm is O(nlogn).
And of course, just use counting sort will be easier to implement:
Use an array of size 26 to record the number of different letters;
Scan the input string;
Output the string in the given sorting order.
This is O(n), n being the length of the string.
Interview questions are generally about thought process and don't usually care too much about language features, but I couldn't resist posting a VB.Net 4.0 version anyway.
"Efficient" can mean two different things. The first is "what's the fastest way to make a computer execute a task" and the second is "what's the fastest that we can get a task done". They might sound the same but the first can mean micro-optimizations like int vs short, running timers to compare execution times and spending a week tweaking every millisecond out of an algorithm. The second definition is about how much human time would it take to create the code that does the task (hopefully in a reasonable amount of time). If code A runs 20 times faster than code B but code B took 1/20th of the time to write, depending on the granularity of the timer (1ms vs 20ms, 1 week vs 20 weeks), each version could be considered "efficient".
Dim input = "abcdeeabc"
Dim sort = "dfbcae"
Dim SortChars = sort.ToList()
Dim output = New String((From c In input.ToList() Select c Order By SortChars.IndexOf(c)).ToArray())
Trace.WriteLine(output)
Here is my solution to the question
import java.util.*;
import java.io.*;
class SortString
{
public static void main(String arg[])throws IOException
{
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
// System.out.println("Enter 1st String :");
// System.out.println("Enter 1st String :");
// String s1=br.readLine();
// System.out.println("Enter 2nd String :");
// String s2=br.readLine();
String s1="tracctor";
String s2="car";
String com="";
String uncom="";
for(int i=0;i<s2.length();i++)
{
if(s1.contains(""+s2.charAt(i)))
{
com=com+s2.charAt(i);
}
}
System.out.println("Com :"+com);
for(int i=0;i<s1.length();i++)
if(!com.contains(""+s1.charAt(i)))
uncom=uncom+s1.charAt(i);
System.out.println("Uncom "+uncom);
System.out.println("Combined "+(com+uncom));
HashMap<String,Integer> h1=new HashMap<String,Integer>();
for(int i=0;i<s1.length();i++)
{
String m=""+s1.charAt(i);
if(h1.containsKey(m))
{
int val=(int)h1.get(m);
val=val+1;
h1.put(m,val);
}
else
{
h1.put(m,new Integer(1));
}
}
StringBuilder x=new StringBuilder();
for(int i=0;i<com.length();i++)
{
if(h1.containsKey(""+com.charAt(i)))
{
int count=(int)h1.get(""+com.charAt(i));
while(count!=0)
{x.append(""+com.charAt(i));count--;}
}
}
x.append(uncom);
System.out.println("Sort "+x);
}
}
Here is my version which is O(n) in time. Instead of unordered_map, I could have just used a char array of constant size. i.,e. char char_count[256] (and done ++char_count[ch - 'a'] ) assuming the input strings has all ASCII small characters.
string SortOrder(const string& input, const string& sort_order) {
unordered_map<char, int> char_count;
for (auto ch : input) {
++char_count[ch];
}
string res = "";
for (auto ch : sort_order) {
unordered_map<char, int>::iterator it = char_count.find(ch);
if (it != char_count.end()) {
string s(it->second, it->first);
res += s;
}
}
return res;
}
private static String sort(String target, String reference) {
final Map<Character, Integer> referencesMap = new HashMap<Character, Integer>();
for (int i = 0; i < reference.length(); i++) {
char key = reference.charAt(i);
if (!referencesMap.containsKey(key)) {
referencesMap.put(key, i);
}
}
List<Character> chars = new ArrayList<Character>(target.length());
for (int i = 0; i < target.length(); i++) {
chars.add(target.charAt(i));
}
Collections.sort(chars, new Comparator<Character>() {
#Override
public int compare(Character o1, Character o2) {
return referencesMap.get(o1).compareTo(referencesMap.get(o2));
}
});
StringBuilder sb = new StringBuilder();
for (Character c : chars) {
sb.append(c);
}
return sb.toString();
}
In C# I would just use the IComparer Interface and leave it to Array.Sort
void Main()
{
// we defin the IComparer class to define Sort Order
var sortOrder = new SortOrder("dfbcae");
var testOrder = "abcdeeabc".ToCharArray();
// sort the array using Array.Sort
Array.Sort(testOrder, sortOrder);
Console.WriteLine(testOrder.ToString());
}
public class SortOrder : IComparer
{
string sortOrder;
public SortOrder(string sortOrder)
{
this.sortOrder = sortOrder;
}
public int Compare(object obj1, object obj2)
{
var obj1Index = sortOrder.IndexOf((char)obj1);
var obj2Index = sortOrder.IndexOf((char)obj2);
if(obj1Index == -1 || obj2Index == -1)
{
throw new Exception("character not found");
}
if(obj1Index > obj2Index)
{
return 1;
}
else if (obj1Index == obj2Index)
{
return 0;
}
else
{
return -1;
}
}
}
public void DoSomething(byte[] array, byte[] array2, int start, int counter)
{
int length = array.Length;
int index = 0;
while (count >= needleLen)
{
index = Array.IndexOf(array, array2[0], start, count - length + 1);
int i = 0;
int p = 0;
for (i = 0, p = index; i < length; i++, p++)
{
if (array[p] != array2[i])
{
break;
}
}
Given that your for loop appears to be using a loop body dependent on ordering, it's most likely not a candidate for parallelization.
However, you aren't showing the "work" involved here, so it's difficult to tell what it's doing. Since the loop relies on both i and p, and it appears that they would vary independently, it's unlikely to be rewritten using a simple Parallel.For without reworking or rethinking your algorithm.
In order for a loop body to be a good candidate for parallelization, it typically needs to be order independent, and have no ordering constraints. The fact that you're basing your loop on two independent variables suggests that these requirements are not valid in this algorithm.
I am new to c++ programming I have to call a function with following arguments.
int Start (int argc, char **argv).
When I try to call the above function with the code below I get run time exceptions. Can some one help me out in resolving the above problem.
char * filename=NULL;
char **Argument1=NULL;
int Argument=0;
int j = 0;
int k = 0;
int i=0;
int Arg()
{
filename = "Globuss -dc bird.jpg\0";
for(i=0;filename[i]!=NULL;i++)
{
if ((const char *)filename[i]!=" ")
{
Argument1[j][k++] = NULL; // Here I get An unhandled
// exception of type
//'System.NullReferenceException'
// occurred
j++;
k=0;
}
else
{
(const char )Argument1[j][k] = filename [j]; // Here I also i get exception
k++;
Argument++;
}
}
Argument ++;
return 0;
}
Start (Argument,Argument1);
Two things:
char **Argument1=NULL;
This is pointer to pointer, You need to allocate it with some space in memory.
*Argument1 = new char[10];
for(i=0, i<10; ++i) Argument[i] = new char();
Don't forget to delete in the same style.
You appear to have no allocated any memory to you arrays, you just have a NULL pointer
char * filename=NULL;
char **Argument1=NULL;
int Argument=0;
int j = 0;
int k = 0;
int i=0;
int Arg()
{
filename = "Globuss -dc bird.jpg\0";
//I dont' know why you have 2D here, you are going to need to allocate
//sizes for both parts of the 2D array
**Argument1 = new char *[TotalFileNames];
for(int x = 0; x < TotalFileNames; x++)
Argument1[x] = new char[SIZE_OF_WHAT_YOU_NEED];
for(i=0;filename[i]!=NULL;i++)
{
if ((const char *)filename[i]!=" ")
{
Argument1[j][k++] = NULL; // Here I get An unhandled
// exception of type
//'System.NullReferenceException'
// occurred
j++;
k=0;
}
else
{
(const char )Argument1[j][k] = filename [j]; // Here I also i get exception
k++;
Argument++;
}
}
Argument ++;
return 0;
}
The first thing you have to do is to find the number of the strings you will have. Thats easy done with something like:
int len = strlen(filename);
int numwords = 1;
for(i = 0; i < len; i++) {
if(filename[i] == ' ') {
numwords++;
// eating up all spaces to not count following ' '
// dont checking if i exceeds len, because it will auto-stop at '\0'
while(filename[i] == ' ') i++;
}
}
In the above code i assume there will be at least one word in the filename (i.e. it wont be an empty string).
Now you can allocate memory for Argument1.
Argument1 = new char *[numwords];
After that you have two options:
use strtok (http://www.cplusplus.com/reference/clibrary/cstring/strtok/)
implement your function to split a string
That can be done like this:
int i,cur,last;
for(i = last = cur = 0; cur < len; cur++) {
while(filename[last] == ' ') { // last should never be ' '
last++;
}
if(filename[cur] == ' ') {
if(last < cur) {
Argument1[i] = new char[cur-last+1]; // +1 for string termination '\0'
strncpy(Argument1[i], &filename[last], cur-last);
last = cur;
}
}
}
The above code is not optimized, i just tried to make it as easy as possible to understand.
I also did not test it, but it should work. Assumptions i made:
string is null terminated
there is at least 1 word in the string.
Also whenever im referring to a string, i mean a char array :P
Some mistakes i noticed in your code:
in c/c++ " " is a pointer to a const char array which contains a space.
If you compare it with another " " you will compare the pointers to them. They may (and probably will) be different. Use strcmp (http://www.cplusplus.com/reference/clibrary/cstring/strcmp/) for that.
You should learn how to allocate dynamically memory. In c you can do it with malloc, in c++ with malloc and new (better use new instead of malloc).
Hope i helped!
PS if there is an error in my code tell me and ill fix it.