Parallel For Each with a Thread.Sleep - multithreading

Basicly, i want to access a lot of pages from the same URl, but i don't wan't to spam it at the same secound.And i wan't it over in 30minutes.
Does the Thread.Sleep() inside the parallel works fine?
Because only after like 90 minutes i have it done.
var pages = new List<string>( urls );
var sources = new BlockingCollection<string>();
string htmlCode = "";
Random randomN = new Random();
Parallel.ForEach(pages, x =>
{
x = x.Replace("//", "http://");
int sleep = randomN.Next(0, 1800000);
Thread.Sleep(sleep);
HttpWebResponse response = null;
CookieContainer cookie = new CookieContainer();
HttpWebRequest newRequest = (HttpWebRequest)WebRequest.Create(x);
newRequest.Timeout = 15000;
newRequest.Proxy = null;
newRequest.CookieContainer = cookie;
newRequest.UserAgent = "Mozilla / 5.0(Windows NT 5.1) AppleWebKit / 537.11(KHTML, like Gecko) Chrome / 23.0.1300.0 Iron / 23.0.1300.0 Safari / 537.11";
newRequest.ContentLength = 0;
response = (HttpWebResponse)newRequest.GetResponse();
htmlCode = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding(1252)).ReadToEnd();
sources.Add(htmlCode);
});
return sources;

The requirement is not very clear, though, as to what you are trying to achieve. But Thread.Sleep in parallel.foreach might not be the right option. Try going thru the below threads, they might be helpful and solve your issue.
[1]. Parallel.ForEach using Thread.Sleep equivalent
[2]. Thread.Sleep blocking parallel execution of tasks
[3]. Parallel ForEach wait 500 ms before spawning

You don't want to sleep inside your Parallel.ForEach body--that'll just tie up the worker threads in the .NET thread pool.
This is what Task.Delay is for. Just create a bunch of delayed tasks and then do a WaitAll on them.

Related

How to add a timeout to an asynchronous FtpWebRequest

I have the following code that works fine for sending a file by FTP, but it blocks my UI.
FtpWebRequest request = (FtpWebRequest)WebRequest.Create(ftpUrl + filename);
request.UsePassive = false;
request.Method = WebRequestMethods.Ftp.UploadFile;
request.Credentials = new NetworkCredential(ftpUser, ftpPass);
request.Timeout = 10000; //10 second timeout
byte[] fileContents = File.ReadAllBytes(fullPath);
request.ContentLength = fileContents.Length;
//Stream requestStream = await request.GetRequestStreamAsync();
Stream requestStream = request.GetRequestStream();
requestStream.Write(fileContents, 0, fileContents.Length);
requestStream.Close();
I want to switch the Stream to the commented line so I call asynchronously and don't block my UI, and it works fine, except for the timeout, which according to the docs is for synchronous use only.
The question is how to make a timeout work on an async call?
From document FtpWebRequest.Timeout Property ,Timeout is the number of milliseconds that a synchronous request made with the GetResponse method waits for a response and that the GetRequestStream method waits for a stream. So there is no more api to use it asynchronously .
Maybe this can be a good way to realize it.Putting FtpWebRequest code into the Task to have a try.
// Start a new task (this launches a new thread)
Task.Factory.StartNew (() => {
// Do some work on a background thread, allowing the UI to remain responsive
DoSomething();
// When the background work is done, continue with this code block
}).ContinueWith (task => {
DoSomethingOnTheUIThread();
// the following forces the code in the ContinueWith block to be run on the
// calling thread, often the Main/UI thread.
}, TaskScheduler.FromCurrentSynchronizationContext ());

BlackBerry - App freezes when background thread executing

I have a BlackBerry App that sends data over a web service when a button has it state set to ON. When the button is ON a timer is started which is running continuously in the background at fixed intervals. The method for HttpConnection is called as follows:
if(C0NNECTION_EXTENSION==null)
{
Dialog.alert("Check internet connection and try again");
return;
}
else
{
confirmation=PostMsgToServer(encryptedMsg);
}
The method PostMsgToServer is as follows:
public static String PostMsgToServer(String encryptedGpsMsg) {
//httpURL= "https://prerel.track24c4i.com/track24prerel/service/spi/post?access_id="+DeviceBoardPassword+"&IMEI="+IMEI+"&hex_data="+encryptedGpsMsg+"&device_type=3";
httpURL= "https://t24.track24c4i.com/track24c4i/service/spi/post?access_id="+DeviceBoardPassword+"&IMEI="+IMEI+"&hex_data="+encryptedGpsMsg+"&device_type=3";
//httpURL= "http://track24.unit1.overwatch/track24/service/spi/post?access_id="+DeviceBoardPassword+"&IMEI="+IMEI+"&hex_data="+encryptedGpsMsg+"&device_type=3";
try {
String C0NNECTION_EXTENSION = checkInternetConnection();
if(C0NNECTION_EXTENSION==null)
{
Dialog.alert("Check internet connection and try again");
return null;
}
else
{
httpURL=httpURL+C0NNECTION_EXTENSION+";ConnectionTimeout=120000";
//Dialog.alert(httpURL);
HttpConnection httpConn;
httpConn = (HttpConnection) Connector.open(httpURL);
httpConn.setRequestMethod(HttpConnection.POST);
DataOutputStream _outStream = new DataOutputStream(httpConn.openDataOutputStream());
byte[] request_body = httpURL.getBytes();
for (int i = 0; i < request_body.length; i++) {
_outStream.writeByte(request_body[i]);
}
DataInputStream _inputStream = new DataInputStream(
httpConn.openInputStream());
StringBuffer _responseMessage = new StringBuffer();
int ch;
while ((ch = _inputStream.read()) != -1) {
_responseMessage.append((char) ch);
}
String res = (_responseMessage.toString());
responce = res.trim();
httpConn.close();
}
}catch (Exception e) {
//Dialog.alert("Connection Time out");
}
return responce;
}
My Question: The App freezes whenever the method is called, i.e. whenever the timer has to execute and send data to the web service the App freezes - at times for a few seconds and at times for a considerable amount of time applying to the user as if the handset has hanged. Can this be solved? Kindly help!!
You are running your networking operation on the Event Thread - i.e. the same Thread that processes your application's Ui interactions. Networking is a blocking operation so effectively this is stopping your UI operation. Doing this on the Event Thread is not recommended and to be honest, I'm surprised it is not causing your application to be terminated, as this is often what the OS will do, if it thinks the application has blocked the Event Thread.
The way to solve this is start your network processing using a separate Thread. This is generally the easy part, the difficult part is
blocking the User from doing anything else while waiting for the
response (assuming you need to do this)
updating the User Interface with the results of your networking
processing
I think the second of these issues are discussed in this Thread:
adding-field-from-a-nonui-thread-throws-exception-in-blackberry
Since it appears you are trying to do this update at regular intervals in the background, I don't think the first is an issue, - for completeness, you can search SO for answers including this one:
blackberry-please-wait-screen-with-time-out
There is more information regarding the Event Thread here:
Event Thread

Download an undefined number of files with HttpWebRequest.BeginGetResponse

I have to write a small app which downloads a few thousand files. Some of these files contain reference to other files that must be downloaded as part of the same process. The following code downloads the initial list of files, but I would like to download the others files as part of the same loop. What is happening here is that the loop completes before the first request come back. Any idea how to achieve this?
var countdownLatch = new CountdownEvent(Urls.Count);
string url;
while (Urls.TryDequeue(out url))
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.BeginGetResponse(
new AsyncCallback(ar =>
{
using (HttpWebResponse response = (ar.AsyncState as HttpWebRequest).EndGetResponse(ar) as HttpWebResponse)
{
using (var sr = new StreamReader(response.GetResponseStream()))
{
string myFile = sr.ReadToEnd();
// TODO: Look for a reference to another file. If found, queue a new Url.
}
}
}), webRequest);
}
ce.Wait();
One solution which comes to mind is to keep track of the number of pending requests and only finish the loop once no requests are pending and the Url queue is empty:
string url;
int requestCounter = 0;
int temp;
AutoResetEvent requestFinished = new AutoResetEvent(false);
while (Interlocked.Exchange(requestCounter, temp) > 0 || Urls.TryDequeue(out url))
{
if (url != null)
{
Interlocked.Increment(requestCounter);
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.BeginGetResponse(
new AsyncCallback(ar =>
{
try {
using (HttpWebResponse response = (ar.AsyncState as HttpWebRequest).EndGetResponse(ar) as HttpWebResponse)
{
using (var sr = new StreamReader(response.GetResponseStream()))
{
string myFile = sr.ReadToEnd();
// TODO: Look for a reference to another file. If found, queue a new Url.
}
}
}
finally {
Interlocked.Decrement(requestCounter);
requestFinished.Set();
}
}), webRequest);
}
else
{
// no url but requests are still pending
requestFinished.WaitOne();
}
}
You are tryihg to write a webcrawler. In order to write a good webcrawler, you first need to define some parameters...
1) How many request do you want to download simultaneously? In other words, how much throughput do you want? This will determine things like how many requests you want outstanding, what the threadpool size should be etc.
2) You will have to have a queue of URLs. This queue is populated by each request that completes. You now need to decide what the growth strategy of the queue is. For eg, you cannot have an unbounded queue, as you can pump workitems into the queue faster than you can download from the network.
Given this, you can design a system as follows:
Have max N worker threads that actually download from the web. They take one time from the queue, and download the data. They parse the data and populate your URL queue.
If there are more than 'M' URLs in the queue, then the queue blocks and does not allow anymore URLs to be queued. Now, here you can do one of two things. You can either cause the thread that is enqueuing to block, or you can just discard the workitem being enqueued. Once another workitem completes on another thread, and a URL is dequeued, the blocked thread will now be able to enqueue succesfully.
With a system like this, you can ensure that you will not run out of system resources while downloading the data.
Implementation:
Note that if you are using async, then you are using an extra I/O thread to do the download. THis is fine, as long as you are mindful of this fact. You can do a pure async implementation, where you can have 'N' BeginGetResponse() outstanding, and for each one that completes, you start another one. THis way you will always have 'N' requests outstanding.

Task is ignoring Thread.Sleep

trying to grasp the TPL.
Just for fun I tried to create some Tasks with a random sleep to see how it was processed. I was targeting a fire and forget pattern..
static void Main(string[] args)
{
Console.WriteLine("Demonstrating a successful transaction");
Random d = new Random();
for (int i = 0; i < 10; i++)
{
var sleep = d.Next(100, 2000);
Action<int> succes = (int x) =>
{
Thread.Sleep(x);
Console.WriteLine("sleep={2}, Task={0}, Thread={1}: Begin successful transaction",
Task.CurrentId, Thread.CurrentThread.ManagedThreadId, x);
};
Task t1 = Task.Factory.StartNew(() => succes(sleep));
}
Console.ReadLine();
}
But I don't understand why it outputs all lines to the Console ignoring the Sleep(random)
Can someone explain that to me?
Important:
The TPL default TaskScheduler does not guarantee Thread per Task - one thread can be used for processing several tasks.
Calling Thread.Sleep might impact other tasks performance.
You can construct your task with the TaskCreationOptions.LongRunning hint this way the TaskScheduler will assign a dedicated thread for the task and it will be safe to block on it.
Your code uses the value of i instead of the generated random number. It does not ignore the sleep but rather sleeps between 0 and 10ms each iteration.
Try:
Thread.Sleep(sleep);
The sentence
Task t1 = Task.Factory.StartNew(() => succes(sleep));
Will create the Task and automatically start it, then will iterate again inside the for, without waiting the task to end its process. So when the second task is created and executed, the first one may be finished. I mean you are not waiting for the tasks to end:
You should try
Task t1 = Task.Factory.StartNew(() => succes(sleep));
t1.Wait();

Parallel Task advice

I am trying to use the parallel task library to kick off a number of tasks like this:
var workTasks = _schedules.Where(x => x.Task.Enabled);
_tasks = new Task[workTasks.Count()];
_cancellationTokenSource = new CancellationTokenSource();
_cancellationTokenSource.Token.ThrowIfCancellationRequested();
int i = 0;
foreach (var schedule in _schedules.Where(x => x.Task.Enabled))
{
_log.InfoFormat("Reading task information for task {0}", schedule.Task.Name);
if(!schedule.Task.Enabled)
{
_log.InfoFormat("task {0} disabled.", schedule.Task.Name);
i++;
continue;
}
schedule.Task.ServiceStarted = true;
_tasks[i] = Task.Factory.StartNew(() =>
schedule.Task.Run()
, _cancellationTokenSource.Token);
i++;
_log.InfoFormat("task {0} has been added to the worker threads and has been started.", schedule.Task.Name);
}
I want these tasks to sleep and then wake up every 5 minutes and do their stuff, at the moment I am using Thread.Sleep in the Schedule object whose Run method is the Action that is passed into StartNew as an argument like this:
_tasks[i] = Task.Factory.StartNew(() =>
schedule.Task.Run()
, _cancellationTokenSource.Token);
I read somewhere that Thread.Sleep is a bad solution for this. Can anyone recommend a better approach?
By my understanding, Thread.Sleep is bad generally, because it force-shifts everything out of memory even when that's not necessary. It won't be a big deal in most cases, but it could be a performance issue.
I'm in the habit of using this snippet instead:
new System.Threading.EventWaitHandle(false, EventResetMode.ManualReset).WaitOne(1000);
Fits on one line, and isn't overly complicated -- it creates an event handle that will never be set, and then waits for the full timeout period before continuing.
Anyway, if you're just trying to have something repeat every 5 minutes, a better approach would probably be to use a Timer. You could even make a class to neatly wrap everything if your repeated work methods are already factored out:
using System.Threading;
using System.Threading.Tasks;
public class WorkRepeater
{
Timer m_Timer;
WorkRepeater(Action workToRepeat, TimeSpan interval)
{
m_Timer = new System.Timers.Timer((double)Interval.Milliseconds);
m_Timer.Elapsed +=
new System.Timers.ElapsedEventHandler((o, ea) => WorkToRepeat());
}
public void Start()
{
m_Timer.Start();
}
public void Stop()
{
m_Timer.Stop();
}
}
Bad solution are Tasks here. Task should be used for short living operations, like asynch IO. If you want to control life time of task you should use Thread and sleep as much as you like, because Thread is individual, but Tasks are rotated in thread pool which is shared.

Resources