WebRequest Threads Blocking at Two Requests

I need to test a Data Context and see what behavior it has under multiple simultaneous requests, for that I made a simple console application that [in theory] would send these requests:
private static DateTime startTime = DateTime.Now.AddSeconds(5);
public static Random rand = new Random();
static void Main(string[] args)
const byte testThreads = 10;
ThreadStart[] threadStarts = new ThreadStart[testThreads];
Thread[] threads = new Thread[testThreads];
for (byte i = 0; i < testThreads; i++)
threadStarts[i] = new ThreadStart(ExecutePOST);
threads[i] = new Thread(threadStarts[i]);
for (byte i = 0; i < testThreads; i++){ threads[i].Start(); }
for (byte i = 0; i < testThreads; i++){ threads[i].Join(); }
The called function is
private static void ExecutePOST()
while (DateTime.Now < startTime) { }
Console.WriteLine("{0} STARTING TEST", DateTime.Now.Millisecond);
WebRequest webRequest = WebRequest.Create(/*URL*/);
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.Method = "POST";
string name = string.Format("Test {0}", Program.rand.Next(1000));
byte[] bytes = Encoding.ASCII.GetBytes(/*PARAMETERS*/);
Stream output = null;
webRequest.ContentLength = bytes.Length;
output = webRequest.GetRequestStream();
output.Write(bytes, 0, bytes.Length);
Console.WriteLine("{0}:{1}", DateTime.Now.Millisecond, name);
catch (WebException ex)
if (output != null)
The output I get is:
Can anyone please explain this behavior? Why is it stopping after two requests?
Thank you

Yes, this is because the number of connections per URL is limited to 2 by default - the connections are pooled.
You're hogging the connection by writing data to the request stream, but then never getting the response. A simple:
using (webRequest.GetResponse()) {}
at the end of the method is likely to sort it out. That will finish the request and release the connection to be used by another request.
Also note that a using statement for the output stream would make your code simpler too.


NAudio Mp3 decoding click and pops

I followed this NAudio Demo modified to play ShoutCast.
In my full code I have to resample the incoming audio and stream it again over the network to a network player. Since I get many "clicks and pops", I came back to the demo code and I found that these artifacts are originated after the decoding block.
If I save the incoming stream in mp3 format, it is pretty clear.
When I save the raw decoded data (without other processing than the decoder) I get many audio artifacts.
I wonder whether I am doing some error, even if my code is almost equal to the NAudio demo.
Here the function from the example as modified by me to save the raw data. It is called as a new Thread.
private void StreamMP3(object state)
//Configuration config = ConfigurationManager.OpenExeConfiguration(ConfigurationUserLevel.None);
//SettingsSection section = (SettingsSection)config.GetSection("system.net/settings");
this.fullyDownloaded = false;
string url = "http://icestreaming.rai.it/5.mp3";//(string)state;
webRequest = (HttpWebRequest)WebRequest.Create(url);
int metaInt = 0; // blocksize of mp3 data
int framesize = 0;
webRequest.Headers.Add("GET", "/ HTTP/1.0");
// needed to receive metadata informations
webRequest.Headers.Add("Icy-MetaData", "1");
webRequest.UserAgent = "WinampMPEG/5.09";
HttpWebResponse resp = null;
resp = (HttpWebResponse)webRequest.GetResponse();
catch (WebException e)
if (e.Status != WebExceptionStatus.RequestCanceled)
byte[] buffer = new byte[16384 * 4]; // needs to be big enough to hold a decompressed frame
// read blocksize to find metadata block
metaInt = Convert.ToInt32(resp.GetResponseHeader("icy-metaint"));
IMp3FrameDecompressor decompressor = null;
byteOut = createNewFile(destPath, "salva", "raw");
using (var responseStream = resp.GetResponseStream())
var readFullyStream = new ReadFullyStream(responseStream);
readFullyStream.metaInt = metaInt;
if (mybufferedWaveProvider != null && mybufferedWaveProvider.BufferLength - mybufferedWaveProvider.BufferedBytes < mybufferedWaveProvider.WaveFormat.AverageBytesPerSecond / 4)
Debug.WriteLine("Buffer getting full, taking a break");
Mp3Frame frame = null;
frame = Mp3Frame.LoadFromStream(readFullyStream, true);
if (metaInt > 0)
UpdateSongName("No Song Info in Stream...");
catch (EndOfStreamException)
this.fullyDownloaded = true;
// reached the end of the MP3 file / stream
catch (WebException)
// probably we have aborted download from the GUI thread
if (decompressor == null)
// don't think these details matter too much - just help ACM select the right codec
// however, the buffered provider doesn't know what sample rate it is working at
// until we have a frame
WaveFormat waveFormat = new Mp3WaveFormat(frame.SampleRate, frame.ChannelMode == ChannelMode.Mono ? 1 : 2, frame.FrameLength, frame.BitRate);
decompressor = new AcmMp3FrameDecompressor(waveFormat);
this.mybufferedWaveProvider = new BufferedWaveProvider(decompressor.OutputFormat);
this.mybufferedWaveProvider.BufferDuration = TimeSpan.FromSeconds(200); // allow us to get well ahead of ourselves
framesize = (decompressor.OutputFormat.Channels * decompressor.OutputFormat.SampleRate * (decompressor.OutputFormat.BitsPerSample / 8) * 20) / 1000;
//this.bufferedWaveProvider.BufferedDuration = 250;
int decompressed = decompressor.DecompressFrame(frame, buffer, 0);
//Debug.WriteLine(String.Format("Decompressed a frame {0}", decompressed));
mybufferedWaveProvider.AddSamples(buffer, 0, decompressed);
while (mybufferedWaveProvider.BufferedDuration.Milliseconds >= 20)
byte[] read = new byte[framesize];
mybufferedWaveProvider.Read(read, 0, framesize);
byteOut.Write(read, 0, framesize);
} while (playbackState != StreamingPlaybackState.Stopped);
// was doing this in a finally block, but for some reason
// we are hanging on response stream .Dispose so never get there
if (decompressor != null)
OK i found the problem. I included the shoutcast metadata to the MP3Frame.
See the comment "HERE I COLLECT THE BYTES OF THE MP3 FRAME" to locate the correct point to get the MP3 frame with no streaming metadata.
The following code runs without audio artifacts:
private void SHOUTcastReceiverThread()
//-*- String server = "";
//String serverPath = "/";
//String destPath = "C:\\temp\\"; // destination path for saved songs
HttpWebRequest request = null; // web request
HttpWebResponse response = null; // web response
int metaInt = 0; // blocksize of mp3 data
int count = 0; // byte counter
int metadataLength = 0; // length of metadata header
string metadataHeader = ""; // metadata header that contains the actual songtitle
string oldMetadataHeader = null; // previous metadata header, to compare with new header and find next song
//CircularQueueStream framestream = new CircularQueueStream(2048);
QueueStream framestream = new QueueStream();
framestream.Position = 0;
bool bNewSong = false;
byte[] buffer = new byte[512]; // receive buffer
byte[] dec_buffer = new byte[decSIZE];
Mp3Frame frame;
IMp3FrameDecompressor decompressor = null;
Stream socketStream = null; // input stream on the web request
// create web request
request = (HttpWebRequest)WebRequest.Create(server);
// clear old request header and build own header to receive ICY-metadata
request.Headers.Add("GET", serverPath + " HTTP/1.0");
request.Headers.Add("Icy-MetaData", "1"); // needed to receive metadata informations
request.UserAgent = "WinampMPEG/5.09";
// execute request
response = (HttpWebResponse)request.GetResponse();
catch (Exception ex)
// read blocksize to find metadata header
metaInt = Convert.ToInt32(response.GetResponseHeader("icy-metaint"));
// open stream on response
socketStream = response.GetResponseStream();
var readFullyStream = new ReadFullyStream(socketStream);
frame = null;
// rip stream in an endless loop
if (IsBufferNearlyFull)
Debug.WriteLine("Buffer getting full, taking a break");
frame = null;
int bufLen = readFullyStream.Read(buffer, 0, buffer.Length);
if (framestream.CanRead && framestream.Length > 512)
frame = Mp3Frame.LoadFromStream(framestream);
frame = null;
catch (Exception ex)
frame = null;
if (bufLen < 0)
Debug.WriteLine("Buffer error 1: exit.");
// processing RAW data
for (int i = 0; i < bufLen; i++)
// if there is a header, the 'headerLength' would be set to a value != 0. Then we save the header to a string
if (metadataLength != 0)
metadataHeader += Convert.ToChar(buffer[i]);
if (metadataLength == 0) // all metadata informations were written to the 'metadataHeader' string
string fileName = "";
string fileNameRaw = "";
// if songtitle changes, create a new file
if (!metadataHeader.Equals(oldMetadataHeader))
// flush and close old byteOut stream
if (byteOut != null)
byteOut = null;
if (byteOutRaw != null)
byteOutRaw = null;
timeStart = timeEnd;
// extract songtitle from metadata header. Trim was needed, because some stations don't trim the songtitle
//fileName = Regex.Match(metadataHeader, "(StreamTitle=')(.*)(';StreamUrl)").Groups[2].Value.Trim();
fileName = Regex.Match(metadataHeader, "(StreamTitle=')(.*)(';)").Groups[2].Value.Trim();
// write new songtitle to console for information
if (fileName.Length == 0)
fileName = "shoutcast_test";
fileNameRaw = fileName + "_raw";
SongChanged(this, metadataHeader);
bNewSong = true;
// create new file with the songtitle from header and set a stream on this file
timeEnd = DateTime.Now;
if (bWrite_to_file)
byteOut = createNewFile(destPath, fileName, "mp3");
byteOutRaw = createNewFile(destPath, fileNameRaw, "raw");
timediff = timeEnd - timeStart;
// save new header to 'oldMetadataHeader' string, to compare if there's a new song starting
oldMetadataHeader = metadataHeader;
metadataHeader = "";
else // write mp3 data to file or extract metadata headerlength
if (count++ < metaInt) // write bytes to filestream
framestream.Write(buffer, i, 1);
else // get headerlength from lengthbyte and multiply by 16 to get correct headerlength
metadataLength = Convert.ToInt32(buffer[i]) * 16;
count = 0;
if (bNewSong)
decompressor = createDecompressor(frame);
bNewSong = false;
if (frame != null && decompressor != null)
framedec(decompressor, frame);
// fine Processing dati RAW
}//Buffer is not full
} while (playbackState != StreamingPlaybackState.Stopped);
} //try
catch (Exception ex)
if (byteOut != null)
if (socketStream != null)
if (decompressor != null)
decompressor = null;
if (null != request)
if (null != framestream)
if (null != bufferedWaveProvider)
//if (null != bufferedWaveProviderOut)
// bufferedWaveProviderOut.ClearBuffer();
if (null != mono16bitFsinStream)
if (null != middleStream2)
if (null != resampler)
public class QueueStream : MemoryStream
long ReadPosition = 0;
long WritePosition = 0;
public QueueStream() : base() { }
public override int Read(byte[] buffer, int offset, int count)
Position = ReadPosition;
var temp = base.Read(buffer, offset, count);
ReadPosition = Position;
return temp;
public override void Write(byte[] buffer, int offset, int count)
Position = WritePosition;
base.Write(buffer, offset, count);
WritePosition = Position;
public void reSetPosition()
WritePosition = 0;
ReadPosition = 0;
Position = 0;
private void framedec(IMp3FrameDecompressor decompressor, Mp3Frame frame)
int Ndecoded_samples = 0;
byte[] dec_buffer = new byte[decSIZE];
Ndecoded_samples = decompressor.DecompressFrame(frame, dec_buffer, 0);
bufferedWaveProvider.AddSamples(dec_buffer, 0, Ndecoded_samples);
NBufferedSamples += Ndecoded_samples;
if (Ndecoded_samples > decSIZE)
Debug.WriteLine(String.Format("Too many samples {0}", Ndecoded_samples));
if (byteOut != null)
byteOut.Write(frame.RawData, 0, frame.RawData.Length);
if (byteOutRaw != null) // as long as we don't have a songtitle, we don't open a new file and don't write any bytes
byteOutRaw.Write(dec_buffer, 0, Ndecoded_samples);
frame = null;
private IMp3FrameDecompressor createDecompressor(Mp3Frame frame)
IMp3FrameDecompressor dec = null;
if (frame != null)
// don't think these details matter too much - just help ACM select the right codec
// however, the buffered provider doesn't know what sample rate it is working at
// until we have a frame
WaveFormat srcwaveFormat = new Mp3WaveFormat(frame.SampleRate, frame.ChannelMode == ChannelMode.Mono ? 1 : 2, frame.FrameLength, frame.BitRate);
dec = new AcmMp3FrameDecompressor(srcwaveFormat);
bufferedWaveProvider = new BufferedWaveProvider(dec.OutputFormat);// decompressor.OutputFormat
bufferedWaveProvider.BufferDuration = TimeSpan.FromSeconds(400); // allow us to get well ahead of ourselves
// ------------------------------------------------
//Create an intermediate format with same sampling rate, 16 bit, mono
middlewavformat = new WaveFormat(dec.OutputFormat.SampleRate, 16, 1);
outwavFormat = new WaveFormat(Fs_out, 16, 1);
// wave16ToFloat = new Wave16ToFloatProvider(provider); // I have tried with and without this converter.
wpws = new WaveProviderToWaveStream(bufferedWaveProvider);
//Check middlewavformat.Encoding == WaveFormatEncoding.Pcm;
mono16bitFsinStream = new WaveFormatConversionStream(middlewavformat, wpws);
middleStream2 = new BlockAlignReductionStream(mono16bitFsinStream);
resampler = new MediaFoundationResampler(middleStream2, outwavFormat);
return dec;

Recursive linkscraper c#

I'm struggling with this a whole day now and I can't seem to figure it out.
I have a fucntion that gives me a list of all links on a specific url. That works fine.
However I want to make this function recursive so that it searches for the links found with the first search and adds them to the list and continue so that it goes through all my pages on the website.
How can I make this recursive?
My code:
class Program
public static List<LinkItem> urls;
private static List<LinkItem> newUrls = new List<LinkItem>();
static void Main(string[] args)
WebClient w = new WebClient();
int count = 0;
urls = new List<LinkItem>();
newUrls = new List<LinkItem>();
urls.Add(new LinkItem{Href = "http://www.smartphoto.be", Text = ""});
while (urls.Count > 0)
foreach (var url in urls)
if (RemoteFileExists(url.Href))
string s = w.DownloadString(url.Href);
urls = newUrls.Select(x => new LinkItem{Href = x.Href, Text=""}).ToList();
count += newUrls.Count;
Console.Write("Found: " + count + " links.");
private static void ReturnLinks()
foreach (LinkItem i in urls)
private static bool RemoteFileExists(string url)
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
request.Method = "HEAD";
//Getting the Web Response.
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
//Returns TURE if the Status code == 200
return (response.StatusCode == HttpStatusCode.OK);
return false;
The code behind LinkFinder.Find can be found here: http://www.dotnetperls.com/scraping-html
Anyone knows how I can either make that function recursive or can I make the ReturnLinks function recursive? I prefer to not touch the LinkFinder.Find method as this works perfect for one link, I just should be able to call it as many times as needed to expand my final url list.
I assume you want to load each link and find the link within, and continue until you run out of links?
Since it is likely that the recursion depth could get very large, i would avoid recursion, this should work i think.
WebClient w = new WebClient();
int count = 0;
urls = new List<string>();
newUrls = new List<LinkItem>();
while (urls.Count > 0)
foreach(var url in urls)
string s = w.DownloadString(url);
urls = newUrls.Select(x=>x.Href).ToList();
count += newUrls.Count;
Console.Write("Found: " + count + " links.");
static void Main()
WebClient w = new WebClient();
List<ListItem> allUrls = FindAll(w.DownloadString("http://www.google.be"));
private static List<ListItem> FindAll(string address)
List<ListItem> list = new List<ListItem>();
foreach (url in LinkFinder.Find(address))
list.AddRange(FindAll(url.Address)));//or url.ToString() or what ever the string that represents the address
return list;

Multithreaded Server using TCP in Java

I'm trying to implement a simple TCP connection between Client/Server. I made the Server multithreaded so that it can take either multiple requests (such as finding the sum, max, min of a string of numbers provided by the user) from a single client or accept multiple connections from different clients. I'm running both of them on my machine, but the server doesn't seem to push out an answer. Not sure what I'm doing wrong here --
public final class CalClient {
static final int PORT_NUMBER = 6789;
public static void main (String arg[]) throws Exception
String serverName;
String strListOfNumbers = null;
int menuIndex;
boolean exit = false;
BufferedReader inFromUser = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Please enter host name...");
System.out.print("> ");
serverName = inFromUser.readLine();
Socket clientSocket = new Socket(serverName, PORT_NUMBER);
DataOutputStream outToServer = new DataOutputStream(clientSocket.getOutputStream());
BufferedReader inFromServer = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
//outToServer.writeBytes(serverName + '\n');
System.out.println("Enter 1 to enter the list of numbers");
System.out.println("Enter 2 to perform Summation");
System.out.println("Enter 3 to calculate Maximum");
System.out.println("Enter 4 to calculate Minimum");
System.out.println("Enter 5 to Exit");
while (!exit) {
menuIndex = Integer.parseInt(inFromUser.readLine());
if (menuIndex == 1) {
System.out.println("Please enter the numbers separated by commas.");
strListOfNumbers = inFromUser.readLine();
outToServer.writeBytes("List" + strListOfNumbers);
else if (menuIndex == 2) {
else if (menuIndex == 3) {
else if (menuIndex == 4) {
else if (menuIndex == 5) {
exit = true;
public final class CalServer
static final int PORT_NUMBER = 6789;
public static void main(String[] args) throws IOException
try {
ServerSocket welcomeSocket = new ServerSocket(PORT_NUMBER);
while (true) {
Socket connectionSocket = welcomeSocket.accept();
if (connectionSocket != null) {
CalRequest request = new CalRequest(connectionSocket);
Thread thread = new Thread(request);
} catch (IOException ioe) {
System.out.println("IOException on socket listen: " + ioe);
final class CalRequest implements Runnable
Socket socket;
BufferedReader inFromClient;
DataOutputStream outToClient;
TreeSet<Integer> numbers = new TreeSet<Integer>();
int sum = 0;
public CalRequest(Socket socket)
this.socket = socket;
public void run()
try {
inFromClient = new BufferedReader(new InputStreamReader(socket.getInputStream()));
outToClient = new DataOutputStream(socket.getOutputStream());
while(inFromClient.readLine()!= null) {
} catch (IOException e) {
public void processRequest(String string) throws IOException
String strAction = string.substring(0,3);
if (strAction.equals("LIS")) {
String strNumbers = string.substring(5);
String[] strNumberArr;
strNumberArr = strNumbers.split(",");
// convert each element of the string array to type Integer and add it to a treeSet container.
for (int i=0; i<strNumberArr.length; i++)
numbers.add(new Integer(Integer.parseInt(strNumberArr[i])));
else if (strAction.equals("SUM")) {
Iterator it = numbers.iterator();
int total = 0;
while (it.hasNext()) {
total += (Integer)(it.next());
else if (strAction.equals("MAX")) {
outToClient.writeBytes("The max is: " + Integer.toString(numbers.last()));
else if (strAction.equals("MIN")) {
outToClient.writeBytes("The max is: " + Integer.toString(numbers.first()));
Since you are using readLine(), I would guess that you actually need to send line terminators.
My experience with TCP socket communications uses ASCII data exclusively, and my code reflects that I believe. If that's the case for you, you may want to try this:
First, try instantiating your data streams like this:
socket = new Socket (Dest, Port);
toServer = new PrintWriter (socket.getOutputStream(), true);
fromServer = new BufferedReader (new InputStreamReader
(socket.getInputStream()), 8000);
The true at the end the printWriter constructor tells it to auto flush (lovely term) the buffer when you issue a println.
When you actually use the socket, use the following:
toServer.println (msg.trim());
resp = fromServer.readLine().trim();
I don't have to append the \n to the outgoing text myself, but this may be related to my specific situation (more on that below). The incoming data needs to have a \n at its end or readLine doesn't work. I assume there are ways you could read from the socket byte by byte, but also that the code would not be nearly so simple.
Unfortunately, the TCP server I'm communicating with is a C++ program so the way we ensure the \n is present in the incoming data isn't going to work for you (And may not be needed in the outgoing data).
Finally, if it helps, I built my code based on this web example:
Edit: I found another code example that uses DataOutputStream... You may find it helpful, assuming you haven't already seen it.

How to get DateTime from the internet?

How to get current date and time from internet or server using C#? I am trying to get time as follows:
public static DateTime GetNetworkTime (string ntpServer)
IPAddress[] address = Dns.GetHostEntry(ntpServer).AddressList;
if (address == null || address.Length == 0)
throw new ArgumentException("Could not resolve ip address from '" + ntpServer + "'.", "ntpServer");
IPEndPoint ep = new IPEndPoint(address[0], 123);
return GetNetworkTime(ep);
I am passing server IP address as netServer, but it does not work properly.
Here is code sample that you can use to retrieve time from NIST Internet Time Service
var client = new TcpClient("time.nist.gov", 13);
using (var streamReader = new StreamReader(client.GetStream()))
var response = streamReader.ReadToEnd();
var utcDateTimeString = response.Substring(7, 17);
var localDateTime = DateTime.ParseExact(utcDateTimeString, "yy-MM-dd HH:mm:ss", CultureInfo.InvariantCulture, DateTimeStyles.AssumeUniversal);
Here is a quick code to get the time from the header, works without the need of port 13
public static DateTime GetNistTime()
var myHttpWebRequest = (HttpWebRequest)WebRequest.Create("http://www.microsoft.com");
var response = myHttpWebRequest.GetResponse();
string todaysDates = response.Headers["date"];
return DateTime.ParseExact(todaysDates,
"ddd, dd MMM yyyy HH:mm:ss 'GMT'",
Things could go wrong. All implements of the code founded above are prone to errors. Sometimes, it works and sometimes it trows a WebExpection error message.
A better implementation:
using (var response =
//string todaysDates = response.Headers["date"];
return DateTime.ParseExact(response.Headers["date"],
"ddd, dd MMM yyyy HH:mm:ss 'GMT'",
catch (WebException)
return DateTime.Now; //In case something goes wrong.
Having your web app depend on a service that provides accurate date information is critical. I have used one of the code founded here in my app and it really mess things up.
One more version of the same idea:
public static class InternetTime
public static DateTimeOffset? GetCurrentTime()
using (var client = new HttpClient())
var result = client.GetAsync("https://google.com",
return result.Headers.Date;
return null;
Here HttpCompletionOption.ResponseHeadersRead is used to prevent loading of the rest of the response, as we need only HTTP headers.
Use InternetTime.GetCurrentTime().Value.ToLocalTime() to get current local time.
Important: first check the avaible servers on
NIST Internet Time Servers.
public static DateTime GetServerTime()
var result = DateTime.Now;
// Initialize the list of NIST time servers
// http://tf.nist.gov/tf-cgi/servers.cgi
string[] servers = new string[] {
Random rnd = new Random();
foreach (string server in servers.OrderBy(x => rnd.NextDouble()).Take(9))
// Connect to the server (at port 13) and get the response. Timeout max 1second
string serverResponse = string.Empty;
var tcpClient = new TcpClient();
if (tcpClient.ConnectAsync(server, 13).Wait(1000))
using (var reader = new StreamReader(tcpClient.GetStream()))
serverResponse = reader.ReadToEnd();
// If a response was received
if (!string.IsNullOrEmpty(serverResponse))
// Split the response string ("55596 11-02-14 13:54:11 00 0 0 478.1 UTC(NIST) *")
string[] tokens = serverResponse.Split(' ');
// Check the number of tokens
if (tokens.Length >= 6)
// Check the health status
string health = tokens[5];
if (health == "0")
// Get date and time parts from the server response
string[] dateParts = tokens[1].Split('-');
string[] timeParts = tokens[2].Split(':');
// Create a DateTime instance
DateTime utcDateTime = new DateTime(
Convert.ToInt32(dateParts[0]) + 2000,
Convert.ToInt32(dateParts[1]), Convert.ToInt32(dateParts[2]),
Convert.ToInt32(timeParts[0]), Convert.ToInt32(timeParts[1]),
// Convert received (UTC) DateTime value to the local timezone
result = utcDateTime.ToLocalTime();
return result;
// Response successfully received; exit the loop
// Ignore exception and try the next server
return result;
public static Nullable<DateTime> GetDateTime()
Nullable<DateTime> dateTime = null;
System.Net.HttpWebRequest request = (System.Net.HttpWebRequest)System.Net.WebRequest.Create("http://www.microsoft.com");
request.Method = "GET";
request.Accept = "text/html, application/xhtml+xml, */*";
request.UserAgent = "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)";
request.ContentType = "application/x-www-form-urlencoded";
request.CachePolicy = new System.Net.Cache.RequestCachePolicy(System.Net.Cache.RequestCacheLevel.NoCacheNoStore);
System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)request.GetResponse();
if (response.StatusCode == System.Net.HttpStatusCode.OK)
string todaysDates = response.Headers["date"];
dateTime = DateTime.ParseExact(todaysDates, "ddd, dd MMM yyyy HH:mm:ss 'GMT'",
System.Globalization.CultureInfo.InvariantCulture.DateTimeFormat, System.Globalization.DateTimeStyles.AssumeUniversal);
dateTime = null;
return dateTime;

How to most efficently read a list of files as one stream and hash pieces from it?

I have a list of files, which need to be read, in chunks, into a byte[], which is then passed to a hashing function. The tricky part is this: if I reach the end of a file, I need to continue reading the next file untill I fill the buffer, like so:
read 16 bits as an example:
File 1: 00101010
File 2: 01101010111111111
would need to be read as 0010101001101010
The point is: these files can be as large as several gigabytes, and I don't want to completely load them into memory. Loading pieces into a buffer of, like, 30 MB would be perfectly fine.
I want to use threading, but would it be efficient to thread reading a file? I don't know if Disc I/O is such a large bottleneck that this would be worth it. Would the hashing be sped up sufficently if I only thread that part, and lock on the read of each chunk? It is important the hashes get saved in the correct order.
The second thing I need to do, is to generate the MD5sum from each file as well. Is there anyway to do this more efficiently than doing this as a separate step?
(This question has some overlap with Is there a built-in way to handle multiple files as one stream?, but I thought this differed enough)
I am really stumped what approach to take, as I am fairly new to C#, as well as to threading. I already tried the approaches listed below, but they do not suffice for me.
As I am new to C# I value every kind of input on any aspect of my code.
This piece of code was threaded, but does not 'append' the streams, and as such generates invalid hashes:
public void DoHashing()
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = numThreads;
options.CancellationToken = cancelToken.Token;
Parallel.ForEach(files, options, (string f, ParallelLoopState loopState) =>
using (BufferedStream fileStream = new BufferedStream(File.OpenRead(f), bufferSize))
// Get the MD5sum first:
using (MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider())
md5Sums[f] = BitConverter.ToString(md5.ComputeHash(fileStream)).Replace("-", "");
//setup for reading:
byte[] buffer = new byte[(int)pieceLength];
//I don't know if the buffer will f*ck up the filelenghth
long remaining = (new FileInfo(f)).Length;
int done = 0;
while (remaining > 0)
while (done < pieceLength)
//either try to read the piecelength, or the remaining length of the file.
int toRead = (int)Math.Min(pieceLength - done, remaining);
int read = fileStream.Read(buffer, done, toRead);
//if read == 0, EOF reached
if (read == 0)
remaining = 0;
done += read;
remaining -= read;
// Hash the piece
using (SHA1CryptoServiceProvider sha1 = new SHA1CryptoServiceProvider())
byte[] hash = sha1.ComputeHash(buffer);
done = 0;
buffer = new byte[(int)pieceLength];
This other piece of code isn't threaded (and doesn't calculate MD5):
void Hash()
//examples, these got handled by other methods
List<string> files = new List<string>();
long totalFileLength;
int pieceLength = Math.Pow(2,20);
foreach (string file in files)
totalFileLength += (new FileInfo(file)).Length;
//Reading the file:
long remaining = totalFileLength;
byte[] buffer = new byte[Math.min(remaining, pieceSize)];
int index = 0;
FileStream fin = File.OpenRead(files[index]);
int done = 0;
int offset = 0;
while (remaining > 0)
while (done < pieceLength)
int toRead = (int)Math.Min(pieceLength - offset, remaining);
int read = fin.Read(buffer, done, toRead);
//if read == 0, EOF reached
if (read == 0)
//if last file:
if (index > files.Count)
remaining = 0;
//get ready for next round:
offset = 0;
done += read;
offset += read;
remaining -= read;
//Doing the piece hash:
//reset for next piece:
done = 0;
byte[] buffer = new byte[Math.min(remaining, pieceSize)];
void HashPiece(byte[] piece)
using (SHA1CryptoServiceProvider sha1 = new SHA1CryptoServiceProvider())
//hashes is a List
Thank you very much for your time and effort.
I'm not looking for completely coded solutions, any pointer and idea where to go with this would be excellent.
Questions & remarks to yodaj007's answer:
Why if (currentChunk.Length >= Constants.CHUNK_SIZE_IN_BYTES)? Why not ==? If the chunk is larger than the chunk size, my SHA1 hash gets a different value.
currentChunk.Sources.Add(new ChunkSource()
Filename = fi.FullName,
StartPosition = 0,
Length = (int)Math.Min(fi.Length, (long)needed)
Is a really interesting idea. Postpone reading untill you need it. Nice!
chunks.Add(currentChunk = new Chunk());
Why do this in the if (currentChunk != null) block and in the for (int i = 0; i < (fi.Length - offset) / Constants.CHUNK_SIZE_IN_BYTES; i++) block? Isn't the first a bit redundant?
Here is my complete answer. I tested it on one of my anime folders. It processes 14 files totaling 3.64GiB in roughly 16 seconds. In my opinion, using any sort of parallelism is more trouble than it is worth here. You're being limited by disc I/O, so multithreading will only get you so far. My solution can be easily parallelized though.
It starts by reading "chunk" source information: source file, offset, and length. All of this is gathered very quickly. From here, you can process the "chunks" using threading however you wish. Code follows:
public static class Constants
public const int CHUNK_SIZE_IN_BYTES = 32 * 1024 * 1024; // 32MiB
public class ChunkSource
public string Filename { get; set; }
public int StartPosition { get; set; }
public int Length { get; set; }
public class Chunk
private List<ChunkSource> _sources = new List<ChunkSource>();
public IList<ChunkSource> Sources { get { return _sources; } }
public byte[] Hash { get; set; }
public int Length
get { return Sources.Select(s => s.Length).Sum(); }
static class Program
static void Main()
DirectoryInfo di = new DirectoryInfo(#"C:\Stuff\Anime\Shikabane Hime Aka");
string[] filenames = di.GetFiles().Select(fi=> fi.FullName).OrderBy(n => n).ToArray();
var chunks = ChunkFiles(filenames);
private static List<Chunk> ChunkFiles(string[] filenames)
List<Chunk> chunks = new List<Chunk>();
Chunk currentChunk = null;
int offset = 0;
foreach (string filename in filenames)
FileInfo fi = new FileInfo(filename);
if (!fi.Exists)
throw new FileNotFoundException(filename);
Debug.WriteLine(String.Format("File: {0}", filename));
// First, start off by either starting a new chunk or
// by finishing a leftover chunk from a previous file.
if (currentChunk != null)
// We get here if the previous file had leftover bytes that left us with an incomplete chunk
int needed = Constants.CHUNK_SIZE_IN_BYTES - currentChunk.Length;
if (needed == 0)
throw new InvalidOperationException("Something went wonky, shouldn't be here");
offset = needed;
currentChunk.Sources.Add(new ChunkSource()
Filename = fi.FullName,
StartPosition = 0,
Length = (int)Math.Min(fi.Length, (long)needed)
if (currentChunk.Length >= Constants.CHUNK_SIZE_IN_BYTES)
chunks.Add(currentChunk = new Chunk());
offset = 0;
// Note: Using integer division here
for (int i = 0; i < (fi.Length - offset) / Constants.CHUNK_SIZE_IN_BYTES; i++)
chunks.Add(currentChunk = new Chunk());
currentChunk.Sources.Add(new ChunkSource()
Filename = fi.FullName,
StartPosition = i * Constants.CHUNK_SIZE_IN_BYTES + offset,
Length = Constants.CHUNK_SIZE_IN_BYTES
Debug.WriteLine(String.Format("Chunk source created: Offset = {0,10}, Length = {1,10}", currentChunk.Sources[0].StartPosition, currentChunk.Sources[0].Length));
int leftover = (int)(fi.Length - offset) % Constants.CHUNK_SIZE_IN_BYTES;
if (leftover > 0)
chunks.Add(currentChunk = new Chunk());
currentChunk.Sources.Add(new ChunkSource()
Filename = fi.FullName,
StartPosition = (int)(fi.Length - leftover),
Length = leftover
currentChunk = null;
return chunks;
private static void ComputeHashes(IList<Chunk> chunks)
if (chunks == null || chunks.Count == 0)
Dictionary<string, MemoryMappedFile> files = new Dictionary<string, MemoryMappedFile>();
foreach (var chunk in chunks)
MemoryMappedFile mms = null;
byte[] buffer = new byte[Constants.CHUNK_SIZE_IN_BYTES];
Stopwatch sw = Stopwatch.StartNew();
foreach (var source in chunk.Sources)
lock (files)
if (!files.TryGetValue(source.Filename, out mms))
Debug.WriteLine(String.Format("Opening {0}", source.Filename));
files.Add(source.Filename, mms = MemoryMappedFile.CreateFromFile(source.Filename, FileMode.Open));
var view = mms.CreateViewStream(source.StartPosition, source.Length);
view.Read(buffer, 0, source.Length);
Debug.WriteLine("Done reading sources in {0}ms", sw.Elapsed.TotalMilliseconds);
MD5 md5 = MD5.Create();
chunk.Hash = md5.ComputeHash(buffer);
Debug.WriteLine(String.Format("Computed hash: {0} in {1}ms", String.Join("-", chunk.Hash.Select(h=> h.ToString("X2")).ToArray()), sw.Elapsed.TotalMilliseconds));
foreach (var x in files.Values)
I don't guarantee everything is spotlessly free of bugs. But I did have fun working on it. Look at the output window in Visual Studio for the debug information. It looks like this:
File: C:\Stuff\Anime\Shikabane Hime Aka\Episode 02.mkv
Chunk source created: Offset = 26966010, Length = 33554432
Chunk source created: Offset = 60520442, Length = 33554432
Chunk source created: Offset = 94074874, Length = 33554432
Chunk source created: Offset = 127629306, Length = 33554432
Chunk source created: Offset = 161183738, Length = 33554432
Chunk source created: Offset = 194738170, Length = 33554432
Chunk source created: Offset = 228292602, Length = 33554432
Opening C:\Stuff\Anime\Shikabane Hime Aka\Episode 02.mkv
Done reading sources in 42.9362ms
The thread '' (0xc10) has exited with code 0 (0x0).
Computed hash: 3C-81-A5-2C-90-02-24-23-42-5B-19-A2-15-56-AB-3F in 94.2481ms
Done reading sources in 0.0053ms
Computed hash: 58-F0-6D-D5-88-D8-FF-B3-BE-B4-6A-DA-63-09-43-6B in 98.9263ms
Done reading sources in 29.4805ms
Computed hash: F7-19-8D-A8-FE-9C-07-6E-DB-D5-74-A6-E1-E7-A6-26 in 85.0061ms
Done reading sources in 28.4971ms
Computed hash: 49-F2-CB-75-89-9A-BC-FA-94-A7-DF-E0-DB-02-8A-99 in 84.2799ms
Done reading sources in 31.106ms
Computed hash: 29-7B-18-BD-ED-E9-0C-68-4B-47-C6-5F-D0-16-8A-44 in 84.1444ms
Done reading sources in 31.2204ms
Computed hash: F8-91-F1-90-CF-9C-37-4E-82-68-C2-44-0D-A7-6E-F8 in 84.2592ms
Done reading sources in 31.0031ms
Computed hash: 65-97-ED-95-07-31-BF-C8-3A-BA-2B-DA-03-37-FD-00 in 95.331ms
Done reading sources in 33.0072ms
Computed hash: 9B-F2-83-E6-A8-DF-FD-8D-6C-5C-9E-F4-20-0A-38-4B in 85.9561ms
Done reading sources in 31.6232ms
Computed hash: B6-7C-6B-95-69-BC-9C-B2-1A-07-B3-13-28-A8-10-BC in 84.1866ms
Here is the parallel version. It's basically the same really. Using parallelism = 3 cut the processing time down to 9 seconds.
private static void ComputeHashes(IList<Chunk> chunks)
if (chunks == null || chunks.Count == 0)
Dictionary<string, MemoryMappedFile> files = new Dictionary<string, MemoryMappedFile>();
Parallel.ForEach(chunks, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, (chunk, state, index) =>
MemoryMappedFile mms = null;
byte[] buffer = new byte[Constants.CHUNK_SIZE_IN_BYTES];
Stopwatch sw = Stopwatch.StartNew();
foreach (var source in chunk.Sources)
lock (files)
if (!files.TryGetValue(source.Filename, out mms))
Debug.WriteLine(String.Format("Opening {0}", source.Filename));
files.Add(source.Filename, mms = MemoryMappedFile.CreateFromFile(source.Filename, FileMode.Open));
var view = mms.CreateViewStream(source.StartPosition, source.Length);
view.Read(buffer, 0, source.Length);
Debug.WriteLine("Done reading sources in {0}ms", sw.Elapsed.TotalMilliseconds);
MD5 md5 = MD5.Create();
chunk.Hash = md5.ComputeHash(buffer);
Debug.WriteLine(String.Format("Computed hash: {0} in {1}ms", String.Join("-", chunk.Hash.Select(h => h.ToString("X2")).ToArray()), sw.Elapsed.TotalMilliseconds));
foreach (var x in files.Values)
I found a bug, or what I think is a bug. Need to set the read offset to 0 if we're starting a new file.
EDIT 2 based on feedback
This processes the hashes in a separate thread. It's necessary to throttle the I/O. I was running into OutOfMemoryException without doing so. It doesn't really perform that much better, though. Beyond this... I'm not sure how it can be improved any further. Perhaps by reusing the buffers, maybe.
public class QueueItem
public Chunk Chunk { get; set; }
public byte[] buffer { get; set; }
private static void ComputeHashes(IList<Chunk> chunks)
if (chunks == null || chunks.Count == 0)
Dictionary<string, MemoryMappedFile> files = new Dictionary<string, MemoryMappedFile>();
foreach (var filename in chunks.SelectMany(c => c.Sources).Select(c => c.Filename).Distinct())
files.Add(filename, MemoryMappedFile.CreateFromFile(filename, FileMode.Open));
AutoResetEvent monitor = new AutoResetEvent(false);
ConcurrentQueue<QueueItem> hashQueue = new ConcurrentQueue<QueueItem>();
CancellationToken token = new CancellationToken();
Task.Factory.StartNew(() =>
int processCount = 0;
QueueItem item = null;
while (!token.IsCancellationRequested)
if (hashQueue.TryDequeue(out item))
MD5 md5 = MD5.Create();
item.Chunk.Hash = md5.ComputeHash(item.buffer);
if (processCount++ > 1000)
processCount = 0;
}, token);
foreach (var chunk in chunks)
if (hashQueue.Count > 10000)
QueueItem item = new QueueItem()
buffer = new byte[Constants.CHUNK_SIZE_IN_BYTES],
Chunk = chunk
Stopwatch sw = Stopwatch.StartNew();
foreach (var source in chunk.Sources)
MemoryMappedFile mms = files[source.Filename];
var view = mms.CreateViewStream(source.StartPosition, source.Length);
view.Read(item.buffer, 0, source.Length);
foreach (var x in files.Values)
I'm new to C# too, but I think what your are looking for is System.IO.MemoryMappedFiles namespace since C# 4.0
Using this API functions the operating system itself takes care how to manage the current file region in memory.
In stead of copy&paste code here, continue reading this article: http://www.developer.com/net/article.php/3828586/Using-Memory-Mapped-Files-in-NET-40.htm
Regarding the MD5 use the System.Security.Cryptography.MD5CryptoServiceProvider class. Maybe it's faster.
In your case where you have to go over the "boundaries" of one file, do it. Let the operating system handle how the memory mapped files are represented in memory. Work as you would do with "small" sized buffers.
In .Net 4 you now have System.IO.MemoryMappedFiles
You can create a ViewAccessor of a particular chuck size to match your hash function, and then just keep filling your hash function buffer from the current ViewAccessor, when you run out of file, start chunking the next file using the current hash chuck offset as your ViewAccessor offset
