Trying to download all URL's in html - c#-4.0

Can anybody help me with this code?
I am trying to download all the URL's in this html http://mises.org/books/ (they are all pdf's )
I understand the basic logic, I think I just am messing up the regular expression. This is what I have so far:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;
namespace DownloadPdfs
{
class Program
{
static void Main(string[] args)
{
StringBuilder sb = new StringBuilder();
byte[] buf = new byte[8192];
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create("http://mises.org/books/");
HttpWebResponse response = (HttpWebResponse)
request.GetResponse();
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
count = resStream.Read(buf, 0, buf.Length);
if (count != 0)
{
tempString = Encoding.ASCII.GetString(buf, 0, count);
sb.Append(tempString);
}
}
while (count > 0); // any more data to read?
string html = sb.ToString();
List<string> listoflinks = new List<string>();
string input = html;
Regex rx = new Regex(#"(?<="")[^""]+(?="")|[^\s""]\S*");
for (Match match = rx.Match(input); match.Success; match = match.NextMatch())
{
listoflinks.Add(match.ToString());
}
foreach (var v in listoflinks)
{
using (WebClient Client = new WebClient())
{
Client.DownloadFile(v,v);
}
}
}
}
}

Try the code below. The pattern will match the value of HREF attribute for anchors.
Regex rx = new Regex(#"href=""(?<Url>[^.""]+\.pdf)""",RegexOptions.IgnoreCase | RegexOptions.Multiline);
for (Match match = rx.Match(input); match.Success; match = match.NextMatch())
{
var link = match.Groups["Url"].Value;
listoflinks.Add(link);
}

Use a library to parse html like HtmlAgilityPack.
public List<string> GetLinks(string html)
{
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var linkNodes = htmlDoc.DocumentNode.SelectNodes("//a[#href]");
if (linkNodes == null)
{
return new List<string>();
}
var linkNodesWithLink = linkNodes.Where(x => x.Attributes.Contains("href")).ToList();
var links = linkNodesWithLink.Select(x => x.Attributes["href"].Value)
.Where(x => !string.IsNullOrWhiteSpace(x))
.Select(x => x.Trim())
.ToList();
links = links.Distinct().ToList();
return links;
}

Related

Javascript GZIP and btoa and decompress with C#

i am developing an application where i compress large JSON data using pako.gzip and then use the btoa function to make it base64string in order to post the data to the server. In the javascript i wrote:
var data = JSON.stringify(JSONData);
var ZippedData = pako.gzip(data, { to: 'string' });
var base64String = btoa(ZippedData);
/* post to server*/
$http.post("URL?base64StringParam=" + base64String").then(function (response) {
//do stuff
});
the problem is that i need to decompress the data again in C# code after posting in order to do other workings on it. In the C# code i wrote:
byte[] data = Convert.FromBase64String(base64StringParam);
string decodedString = System.Text.ASCIIEncoding.ASCII.GetString(data);
Encoding enc = Encoding.Unicode;
MemoryStream stream = new MemoryStream(enc.GetBytes(decodedString));
GZipStream decompress = new GZipStream(stream, CompressionMode.Decompress);
string plainDef = "";
and i get the error here
using (var sr = new StreamReader(decompress))
{
plainDef = sr.ReadToEnd();
}
Found invalid data while decoding.
any help to decompress the data back in C# will be appreciated
EDIT:to sum up what needed to be done
javascript does the following:
Plain text >> to >> gzip bytes >> to >> base64 string
i need C# to do the reverse:
Base64 >> to >> unzip bytes >> to >> plain text
Assuming the following js:
dataToCommitString = btoa(pako.gzip(dataToCommitString, { to: "string" }));
This is the correct c# code to compress/decompress with GZip: Taken from https://stackoverflow.com/a/7343623/679334
using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace YourNamespace
{
public class GZipCompressor : ICompressor
{
private static void CopyTo(Stream src, Stream dest)
{
byte[] bytes = new byte[4096];
int cnt;
while ((cnt = src.Read(bytes, 0, bytes.Length)) != 0)
{
dest.Write(bytes, 0, cnt);
}
}
public byte[] Zip(string str)
{
var bytes = Encoding.UTF8.GetBytes(str);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gs = new GZipStream(mso, CompressionMode.Compress))
{
//msi.CopyTo(gs);
CopyTo(msi, gs);
}
return mso.ToArray();
}
}
public string Unzip(byte[] bytes)
{
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gs = new GZipStream(msi, CompressionMode.Decompress))
{
//gs.CopyTo(mso);
CopyTo(gs, mso);
}
return Encoding.UTF8.GetString(mso.ToArray());
}
}
}
}
Calling it as follows:
value = _compressor.Unzip(Convert.FromBase64CharArray(value.ToCharArray(), 0, value.Length));
In client use:
let output = pako.gzip(JSON.stringify(obj));
send as: 'Content-Type': 'application/octet-stream'
=====================
then in C#:
[HttpPost]
[Route("ReceiveCtImage")]
public int ReceiveCtImage([FromBody] byte[] data)
{
var json = Decompress(data);
return 1;
}
public static string Decompress(byte[] data)
{
// Read the last 4 bytes to get the length
byte[] lengthBuffer = new byte[4];
Array.Copy(data, data.Length - 4, lengthBuffer, 0, 4);
int uncompressedSize = BitConverter.ToInt32(lengthBuffer, 0);
var buffer = new byte[uncompressedSize];
using (var ms = new MemoryStream(data))
{
using (var gzip = new GZipStream(ms, CompressionMode.Decompress))
{
gzip.Read(buffer, 0, uncompressedSize);
}
}
string json = Encoding.UTF8.GetString(buffer);
return json;
}

How to find object type from URL SharePoint online (o365)?

I found out how to determine the object type from URL for SharePoint on prem:
https://blogs.msdn.microsoft.com/sanjaynarang/2009/04/06/find-sharepoint-object-type-from-url/
But I didn't find anything for SharePoint Online (CSOM).
Is it possible for SharePoint online?
For the most scenarios such as:
folder url, e.g. https://contoso.sharepoint.com//Documents/Forms/AllItems.aspx?RootFolder=%2FDocuments%2FArchive
list item url, e.g. https://contoso.sharepoint.com/Lists/ShoppingCart/DispForm.aspx?ID=9
list/library url, e.g. https://contoso.sharepoint.com/Lists/Announcements
page url, e.g. https://contoso.sharepoint.com/Lists/Announcements/Newsletter.aspx
the following example demonstrates how to determine client object type:
using System;
using System.Linq;
using System.Linq.Expressions;
using System.Net;
using System.Web;
using Microsoft.SharePoint.Client;
namespace O365Console
{
static class ClientObjectExtensions
{
public static ClientObject ResolveClientObjectFromUrl(string resourceUrl, ICredentials credentials)
{
ClientObject targetObject = null;
var resourceUri = new Uri(resourceUrl);
using (var rootCtx = new ClientContext(resourceUri.Scheme + Uri.SchemeDelimiter + resourceUri.Host))
{
rootCtx.Credentials = credentials;
var webUrl = Web.WebUrlFromPageUrlDirect(rootCtx, resourceUri);
using (var ctx = new ClientContext(webUrl.ToString()))
{
ctx.Credentials = credentials;
var queryBag = System.Web.HttpUtility.ParseQueryString(resourceUri.Query);
if (queryBag["Id"] != null)
{
var listUrl = string.Join(string.Empty,
resourceUri.Segments.Take(resourceUri.Segments.Length - 1));
var list = ctx.Web.GetList(listUrl);
targetObject = TryRetrieve(() => list.GetItemById(Convert.ToInt32(queryBag["Id"])));
}
else if (queryBag["RootFolder"] != null)
{
var folderUrl = HttpUtility.UrlDecode(queryBag["RootFolder"]);
targetObject = TryRetrieve(() => ctx.Web.GetFolderByServerRelativeUrl(folderUrl));
}
else if (queryBag.Count > 0)
{
throw new Exception("Unsupported query string parameter found");
}
else
{
targetObject = TryRetrieve(() => ctx.Web.GetFileByServerRelativeUrl(resourceUri.AbsolutePath));
if (targetObject == null)
{
targetObject = TryRetrieve(() => ctx.Web.GetList(resourceUri.AbsolutePath),list => list.RootFolder);
if (targetObject == null || ((List)targetObject).RootFolder.ServerRelativeUrl != resourceUri.AbsolutePath)
targetObject = TryRetrieve(() => ctx.Web.GetFolderByServerRelativeUrl(resourceUri.AbsolutePath));
}
}
}
}
return targetObject;
}
private static T TryRetrieve<T>(Func<T> loadMethod, params Expression<Func<T,object>>[] retrievals) where T : ClientObject
{
try
{
var targetObject = loadMethod();
targetObject.Context.Load(targetObject, retrievals);
targetObject.Context.ExecuteQuery();
return targetObject;
}
catch
{
}
return default(T);
}
}
}
Usage
var credentials = GetCredentials(userName, password);
var clientObj = ClientObjectExtensions.ResolveClientObjectFromUrl("https://contoso.sharepoint.com/Lists/Announcements", credentials);
Console.WriteLine(clientObj.GetType().Name);
where
static ICredentials GetCredentials(string userName,string password)
{
var securePassword = new SecureString();
foreach (var c in password)
{
securePassword.AppendChar(c);
}
return new SharePointOnlineCredentials(userName, securePassword);
}

How to split a video into parts and then merge it with FileStream C#

I have some problems with splitting the file and I don't know what to do now. I have to split the movie and then merge the splited elements into one. And i have to use FileStream. If you can help me, i will be really happy :)
using System;
using System.Collections.Generic;
using System.IO;
class Program
{
static void Main()
{
string source = "../../movie.avi";
string destination;
int n = int.Parse(Console.ReadLine());
for (int i = 0; i < n; i++)
{
destination = "Part-" + i +".avi";
Slice(source, destination, n);
}
List<int> files = new List<int>();
//Assemble(, destination);
}
static void Slice(string sourceFile, string destinationDirectory, int parts)
{
using (var source = new FileStream(sourceFile, FileMode.Open))
{
for (int i = 0; i < parts; i++)
{
using (var destination = new FileStream(destinationDirectory, FileMode.CreateNew))
{
double fileLength = source.Length;
byte[] buffer = new byte[4096];
while (true)
{
int readBytes = source.Read(buffer, 0, buffer.Length);
if (readBytes == 0)
{
break;
}
destination.Write(buffer, 0, readBytes);
}
}
}
}
}
static void Assemble(List<string> files, string destinationDirectory)
{
}
}

More webrequest in a text file

I have this:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace ConsoleApplication2
{
public class Program
{
static void Main(string[] args)
{
//Consider making this configurable
const string sourceFile = "testSolar.txt";
const string pattern = "http://10.123.9.66:80";
Regex re = new Regex("^(http|https)://");
//var res = re.Match(str);
var webClient = new WebClient();
var times = new Dictionary<string, TimeSpan>();
var stopwatch = new System.Diagnostics.Stopwatch();
//Add header so if headers are tracked, it will show it is your application rather than something ambiguous
webClient.Headers.Add(HttpRequestHeader.UserAgent, "Response-Tester-Client");
var urlList = new List<string>();
//Loop through the lines in the file to get the urls
try
{
stopwatch.Start();
using (var reader = new StreamReader(sourceFile))
{
while (!reader.EndOfStream)
{
var urNewList = new List<string>();
var line = reader.ReadLine();
//line = line.Substring(line.IndexOf(pattern));
//line.Split("\t");
var columns = line.Split('\t');
if (columns[2] == "R")
{
var url = columns[4] + "?" + columns[5];
urlList.Add(url);
}
}
}
}
catch (Exception e)
{
Console.WriteLine("An error occured while attempting to access the source file at {0}", sourceFile);
}
finally
{
//Stop, record and reset the stopwatch
stopwatch.Stop();
times.Add("FileReadTime", stopwatch.Elapsed);
stopwatch.Reset();
}
//Try to connect to each url
var counter = 1;
foreach (var url in urlList)
{
try
{
stopwatch.Start();
webClient.DownloadString(url);
}
catch (Exception e)
{
Console.WriteLine("An error occured while attempting to connect to {0}", url);
}
finally
{
stopwatch.Stop();
//We use the counter for a friendlier url as the current ones are unwieldly
times.Add("Url " + counter, stopwatch.Elapsed);
counter++;
stopwatch.Reset();
}
}
//Release the resources for the WebClient
webClient.Dispose();
//Write the response times
foreach (var key in times.Keys)
{
Console.WriteLine("{0}: {1}", key, times[key].TotalSeconds);
}
Console.ReadKey();
}
}
}
And I have three lines in a text file, like this:
2014-08-25 14:20:43,949 DEV belkbyavlahok0jrvoutn2xd 21 R O http://10.123.9.66:80/solr_3.6/wiewaswie_live/select/ qt=standard_a2aperson&q=*:*&fq=(nosyn_name_last_exact:(qxqroelofqxq))&fq=(nosyn_name_patronym_exact:(qxqharmsqxq))&spellcheck.q=(qxqroelofqxq qxqharmsqxq)&fq={!tag%3Dalldoctypes}doc_type:1&fq=date_main:[0 TO 18133112]&facet.query={!ex%3Dalldoctypes}doc_type:3 AND (b_public:1)&facet.query={!ex%3Dalldoctypes}doc_type:2 AND (b_public:1)&spellcheck=true&spellcheck.count=-3&start=0&sort=name_last asc, score desc&omitHeader=true
2014-08-25 14:20:45,478 DEV z5gyjtcexs41vra4yegqejcf 0 R . http://10.123.9.66:80/solr_3.6/combi_live/select/ qt=standard_a2aperson&q=(((((nosyn_name_last_b_exact:(qxqkuilenburgqxq))))))&fq=(nosyn_name_last_exact:(qxqbroekqxq))&spellcheck.q=(qxqbroekqxq kuilenburg)&fq=(fk_collectiontype:6)&spellcheck=true&spellcheck.count=-3&start=10&sort=date_main asc, score desc&omitHeader=true
2014-08-25 14:20:45,930 DEV hmb0uqzc10s0ounwdhpwmflp 1 R O http://10.123.9.66:80/solr_3.6/combi_live/select/ qt=standard_a2aperson&q=(((((nosyn_name_last_b_exact:(qxqkruijsqxq))))))&fq=(nosyn_name_first_exact:(qxqw*qxq))&fq=(nosyn_name_last_exact:(qxqtreurenqxq))&spellcheck.q=(qxqw*qxq qxqtreurenqxq kruijs)&spellcheck=true&spellcheck.count=-3&start=0&sort=date_main asc, score desc&omitHeader=true
But if I just have 1 line, like this:
2014-08-25 14:20:45,930 DEV hmb0uqzc10s0ounwdhpwmflp 1 R O http://10.123.9.66:80/solr_3.6/combi_live/select/ qt=standard_a2aperson&q=(((((nosyn_name_last_b_exact:(qxqkruijsqxq))))))&fq=(nosyn_name_first_exact:(qxqw*qxq))&fq=(nosyn_name_last_exact:(qxqtreurenqxq))&spellcheck.q=(qxqw*qxq qxqtreurenqxq kruijs)&spellcheck=true&spellcheck.count=-3&start=0&sort=date_main asc, score desc&omitHeader=true
then it works, but if I have more then I get an
The remote server returned an error: (404) Not Found.
Thank you for you help

C# create configuration file

This code How can I create configuration file if so that i can change connection string easy
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using MySql.Data.MySqlClient;
using System.Web;
using mshtml;
namespace tabcontrolweb
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
string MyConString = "SERVER=192.168.0.78;" +
"DATABASE=webboard;" +
"UID=aimja;" +
"PASSWORD=aimjawork;" +
"charset=utf8;";
MySqlConnection connection = new MySqlConnection(MyConString);
MySqlCommand command = connection.CreateCommand();
MySqlDataReader Reader;
command.CommandText = "SELECT urlwebboard FROM `listweb` WHERE `urlwebboard` IS NOT NULL AND ( `webbordkind` = 'เว็บท้องถิ่น' ) and `nourl`= 'n' order by province, amphore limit 4 ";
connection.Open();
Reader = command.ExecuteReader();
string[] urls = new string[4];
string thisrow = "";
string sumthisrow = "";
while (Reader.Read())
{
thisrow = "";
for (int i = 0; i < Reader.FieldCount; i++)
{
thisrow += Reader.GetValue(i).ToString();
System.IO.File.AppendAllText(#"C:\file.txt", thisrow + " " + Environment.NewLine);
sumthisrow = Reader.GetValue(Reader.FieldCount - 1).ToString();
}
for (int m = 0; m < 4 ; m++)
{
urls[m] = sumthisrow;
MessageBox.Show(urls[m]);
}
webBrowser1.Navigate(new Uri(urls[0]));
webBrowser1.Dock = DockStyle.Fill;
webBrowser2.Navigate(new Uri(urls[1]));
webBrowser2.Dock = DockStyle.Fill;
webBrowser3.Navigate(new Uri(urls[2]));
webBrowser3.Dock = DockStyle.Fill;
webBrowser4.Navigate(new Uri(urls[3]));
webBrowser4.Dock = DockStyle.Fill;
}
connection.Close();
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//if (webBrowser1.Document != null)
//{
// IHTMLDocument2 document = webBrowser1.Document.DomDocument as IHTMLDocument2;
// if (document != null)
// {
// IHTMLSelectionObject currentSelection = document.selection;
// IHTMLTxtRange range = currentSelection.createRange() as IHTMLTxtRange;
// if (range != null)
// {
// const String search = "We";
// if (range.findText(search, search.Length, 2))
// {
// range.select();
// }
// }
// }
//}
}
}
}
You can create an XML configuration file looking like this :
<db-config>
<server>192.168.0.78</server>
<database>webboard</database>
<...>...</...>
</db-config>
Then, use XMLTextReader to parse it.
Here is a basic example of how you can use it to parse XML files :
using System;
using System.Xml;
namespace ReadXMLfromFile
{
/// <summary>
/// Summary description for Class1.
/// </summary>
class Class1
{
static void Main(string[] args)
{
XmlTextReader reader = new XmlTextReader ("books.xml");
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element: // The node is an element.
Console.Write("<" + reader.Name);
Console.WriteLine(">");
break;
case XmlNodeType.Text: //Display the text in each element.
Console.WriteLine (reader.Value);
break;
case XmlNodeType.EndElement: //Display the end of the element.
Console.Write("</" + reader.Name);
Console.WriteLine(">");
break;
}
}
Console.ReadLine();
}
}
}
Hint : Use the ReadTofollowing() to get your values.
Once your XML DB Config Reader class is done, you use it each time you need a new connection, and you'll only need to change your DB Config XML file to change your DB Connections configuration.
Edit : there is an interesting article about Storing database connection settings in .NET here.
Use the default
System.Configuration.ConfigurationManager
that C# supports!
See: http://msdn.microsoft.com/en-us/library/bb397750.aspx

Resources