azure search work around $skip limit

azure search work around $skip limit - azure

I'm doing a job to check if all records from my database exists on Azure Search (around 610k). However there's a 100000 limit with the $skip parameter. Is there a way to work around this limit?

You can not facet over more thank 100K documents, however, you can add facets to work around this. For example, let’s say you have a facet called Country and no one facet has more than 100K documents. You can facet over all documents where Country == ‘Canada’, then facet over all documents where Country == ‘USA’, etc…

Just to clarify the other answers: you can't bypass the limit directly but you can use a workaround.
Here's what you can do:
1) Add a unique field to the index. The contents can be a modification timestamp (if it's granular enough to make it unique) or for example a running number. Or alternatively you can use some existing unique field for this.
2) Take the first 100000 results from the index ordered by your unique field
3) Check what is the maximum value (if ordering ascending) in the results for your unique field - so the value of the last entry
4) Take the next 100000 results by ordering based on the same unique field and adding a filter which takes results only where the unique field's value is bigger that the previous maximum. This way the same first 100000 values are not returned but we get the next 100000 values.
5) Continue until you have all results
The downside is that you can't use other custom ordering with the results unless you do the ordering after getting the results.

I use data metadata_storage_last_modified as filter, and the following is my example.
offset skip time
0 --%--> 0
100,000 --%--> 100,000 getLastTime
101,000 --%--> 0 useLastTime
200,000 --%--> 99,000 useLastTime
201,000 --%--> 100,000 useLastTime & getLastTime
202,000 --%--> 0 useLastTime
Because Skip limit is 100k, so we can calculate skip by
AzureSearchSkipLimit = 100k
AzureSearchTopLimit = 1k
skip = offset % (AzureSearchSkipLimit + AzureSearchTopLimit)
If total search count will large than AzureSearchSkipLimit, then apply
orderby = "metadata_storage_last_modified desc"
When skip reach AzureSearchSkipLimit ,then get metadata_storage_last_modified time from end of data. And put metadata_storage_last_modified as next 100k search filer.
filter = metadata_storage_last_modified lt ${metadata_storage_last_modified}

I understand the limitation of the API with the 100K limit, and MS's site says as a work around "you can work around this limitation by adding code to iterate over, and filter on, a facet with less that 100K documents per facet value."
I'm using the "Back up and restore an Azure Cognitive Search index" sample solution provided by MS. (https://github.com/Azure-Samples/azure-search-dotnet-samples)
But can some when tell me where or how I implement this "iterate loop" on a facet" The facetable field I'm trying to use is "tribtekey" but I don't know where to place the code in the below. Any help would be greatly appreciated.
// This is a prototype tool that allows for extraction of data from a search index
// Since this tool is still under development, it should not be used for production usage
using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Models;
using Microsoft.Extensions.Configuration;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net.Http;
using System.Text.Json;
using System.Threading;
using System.Threading.Tasks;
namespace AzureSearchBackupRestore
{
class Program
{
private static string SourceSearchServiceName;
private static string SourceAdminKey;
private static string SourceIndexName;
private static string TargetSearchServiceName;
private static string TargetAdminKey;
private static string TargetIndexName;
private static string BackupDirectory;
private static SearchIndexClient SourceIndexClient;
private static SearchClient SourceSearchClient;
private static SearchIndexClient TargetIndexClient;
private static SearchClient TargetSearchClient;
private static int MaxBatchSize = 500; // JSON files will contain this many documents / file and can be up to 1000
private static int ParallelizedJobs = 10; // Output content in parallel jobs
static void Main(string[] args)
{
//Get source and target search service info and index names from appsettings.json file
//Set up source and target search service clients
ConfigurationSetup();
//Backup the source index
Console.WriteLine("\nSTART INDEX BACKUP");
BackupIndexAndDocuments();
//Recreate and import content to target index
//Console.WriteLine("\nSTART INDEX RESTORE");
//DeleteIndex();
//CreateTargetIndex();
//ImportFromJSON();
//Console.WriteLine("\r\n Waiting 10 seconds for target to index content...");
//Console.WriteLine(" NOTE: For really large indexes it may take longer to index all content.\r\n");
//Thread.Sleep(10000);
//
//// Validate all content is in target index
//int sourceCount = GetCurrentDocCount(SourceSearchClient);
//int targetCount = GetCurrentDocCount(TargetSearchClient);
//Console.WriteLine("\nSAFEGUARD CHECK: Source and target index counts should match");
//Console.WriteLine(" Source index contains {0} docs", sourceCount);
//Console.WriteLine(" Target index contains {0} docs\r\n", targetCount);
//
//Console.WriteLine("Press any key to continue...");
//Console.ReadLine();
}
static void ConfigurationSetup()
{
IConfigurationBuilder builder = new ConfigurationBuilder().AddJsonFile("appsettings.json");
IConfigurationRoot configuration = builder.Build();
SourceSearchServiceName = configuration["SourceSearchServiceName"];
SourceAdminKey = configuration["SourceAdminKey"];
SourceIndexName = configuration["SourceIndexName"];
TargetSearchServiceName = configuration["TargetSearchServiceName"];
TargetAdminKey = configuration["TargetAdminKey"];
TargetIndexName = configuration["TargetIndexName"];
BackupDirectory = configuration["BackupDirectory"];
Console.WriteLine("CONFIGURATION:");
Console.WriteLine("\n Source service and index {0}, {1}", SourceSearchServiceName, SourceIndexName);
Console.WriteLine("\n Target service and index: {0}, {1}", TargetSearchServiceName, TargetIndexName);
Console.WriteLine("\n Backup directory: " + BackupDirectory);
SourceIndexClient = new SearchIndexClient(new Uri("https://" + SourceSearchServiceName + ".search.windows.net"), new AzureKeyCredential(SourceAdminKey));
SourceSearchClient = SourceIndexClient.GetSearchClient(SourceIndexName);
// TargetIndexClient = new SearchIndexClient(new Uri($"https://" + TargetSearchServiceName + ".search.windows.net"), new AzureKeyCredential(TargetAdminKey));
// TargetSearchClient = TargetIndexClient.GetSearchClient(TargetIndexName);
}
static void BackupIndexAndDocuments()
{
// Backup the index schema to the specified backup directory
Console.WriteLine("\n Backing up source index schema to {0}\r\n", BackupDirectory + "\\" + SourceIndexName + ".schema");
File.WriteAllText(BackupDirectory + "\\" + SourceIndexName + ".schema", GetIndexSchema());
// Extract the content to JSON files
int SourceDocCount = GetCurrentDocCount(SourceSearchClient);
WriteIndexDocuments(SourceDocCount); // Output content from index to json files
}
static void WriteIndexDocuments(int CurrentDocCount)
{
// Write document files in batches (per MaxBatchSize) in parallel
string IDFieldName = GetIDFieldName();
int FileCounter = 0;
for (int batch = 0; batch <= (CurrentDocCount / MaxBatchSize); batch += ParallelizedJobs)
{
List<Task> tasks = new List<Task>();
for (int job = 0; job < ParallelizedJobs; job++)
{
FileCounter++;
int fileCounter = FileCounter;
if ((fileCounter - 1) * MaxBatchSize < CurrentDocCount)
{
Console.WriteLine(" Backing up source documents to {0} - (batch size = {1})", BackupDirectory + "\\" + SourceIndexName + fileCounter + ".json", MaxBatchSize);
tasks.Add(Task.Factory.StartNew(() =>
ExportToJSON((fileCounter - 1) * MaxBatchSize, IDFieldName, BackupDirectory + "\\" + SourceIndexName + fileCounter + ".json")
));
}
}
Task.WaitAll(tasks.ToArray()); // Wait for all the stored procs in the group to complete
}
return;
}
static void ExportToJSON(int Skip, string IDFieldName, string FileName)
{
// Extract all the documents from the selected index to JSON files in batches of 500 docs / file
string json = string.Empty;
try
{
SearchOptions options = new SearchOptions()
{
SearchMode = SearchMode.All,
Size = MaxBatchSize,
Skip = Skip,
// ,IncludeTotalCount = true
// ,Filter = Azure.Search.Documents.SearchFilter.Create('%24top=2&%24skip=0&%24orderby=tributeId%20asc')
//,Filter = String.Format("&search=*&%24top=2&%24skip=0&%24orderby=tributeId%20asc")
//,Filter = "%24top=2&%24skip=0&%24orderby=tributeId%20asc"
//,Filter = "tributeKey eq '5'"
};
SearchResults<SearchDocument> response = SourceSearchClient.Search<SearchDocument>("*", options);
foreach (var doc in response.GetResults())
{
json += JsonSerializer.Serialize(doc.Document) + ",";
json = json.Replace("\"Latitude\":", "\"type\": \"Point\", \"coordinates\": [");
json = json.Replace("\"Longitude\":", "");
json = json.Replace(",\"IsEmpty\":false,\"Z\":null,\"M\":null,\"CoordinateSystem\":{\"EpsgId\":4326,\"Id\":\"4326\",\"Name\":\"WGS84\"}", "]");
json += "\r\n";
}
// Output the formatted content to a file
json = json.Substring(0, json.Length - 3); // remove trailing comma
File.WriteAllText(FileName, "{\"value\": [");
File.AppendAllText(FileName, json);
File.AppendAllText(FileName, "]}");
Console.WriteLine(" Total documents: {0}", response.GetResults().Count().ToString());
json = string.Empty;
}
catch (Exception ex)
{
Console.WriteLine("Error: {0}", ex.Message.ToString());
}
}
static string GetIDFieldName()
{
// Find the id field of this index
string IDFieldName = string.Empty;
try
{
var schema = SourceIndexClient.GetIndex(SourceIndexName);
foreach (var field in schema.Value.Fields)
{
if (field.IsKey == true)
{
IDFieldName = Convert.ToString(field.Name);
break;
}
}
}
catch (Exception ex)
{
Console.WriteLine("Error: {0}", ex.Message.ToString());
}
return IDFieldName;
}
static string GetIndexSchema()
{
// Extract the schema for this index
// We use REST here because we can take the response as-is
Uri ServiceUri = new Uri("https://" + SourceSearchServiceName + ".search.windows.net");
HttpClient HttpClient = new HttpClient();
HttpClient.DefaultRequestHeaders.Add("api-key", SourceAdminKey);
string Schema = string.Empty;
try
{
Uri uri = new Uri(ServiceUri, "/indexes/" + SourceIndexName);
HttpResponseMessage response = AzureSearchHelper.SendSearchRequest(HttpClient, HttpMethod.Get, uri);
AzureSearchHelper.EnsureSuccessfulSearchResponse(response);
Schema = response.Content.ReadAsStringAsync().Result.ToString();
}
catch (Exception ex)
{
Console.WriteLine("Error: {0}", ex.Message.ToString());
}
return Schema;
}
private static bool DeleteIndex()
{
Console.WriteLine("\n Delete target index {0} in {1} search service, if it exists", TargetIndexName, TargetSearchServiceName);
// Delete the index if it exists
try
{
TargetIndexClient.DeleteIndex(TargetIndexName);
}
catch (Exception ex)
{
Console.WriteLine(" Error deleting index: {0}\r\n", ex.Message);
Console.WriteLine(" Did you remember to set your SearchServiceName and SearchServiceApiKey?\r\n");
return false;
}
return true;
}
static void CreateTargetIndex()
{
Console.WriteLine("\n Create target index {0} in {1} search service", TargetIndexName, TargetSearchServiceName);
// Use the schema file to create a copy of this index
// I like using REST here since I can just take the response as-is
string json = File.ReadAllText(BackupDirectory + "\\" + SourceIndexName + ".schema");
// Do some cleaning of this file to change index name, etc
json = "{" + json.Substring(json.IndexOf("\"name\""));
int indexOfIndexName = json.IndexOf("\"", json.IndexOf("name\"") + 5) + 1;
int indexOfEndOfIndexName = json.IndexOf("\"", indexOfIndexName);
json = json.Substring(0, indexOfIndexName) + TargetIndexName + json.Substring(indexOfEndOfIndexName);
Uri ServiceUri = new Uri("https://" + TargetSearchServiceName + ".search.windows.net");
HttpClient HttpClient = new HttpClient();
HttpClient.DefaultRequestHeaders.Add("api-key", TargetAdminKey);
try
{
Uri uri = new Uri(ServiceUri, "/indexes");
HttpResponseMessage response = AzureSearchHelper.SendSearchRequest(HttpClient, HttpMethod.Post, uri, json);
response.EnsureSuccessStatusCode();
}
catch (Exception ex)
{
Console.WriteLine(" Error: {0}", ex.Message.ToString());
}
}
static int GetCurrentDocCount(SearchClient searchClient)
{
// Get the current doc count of the specified index
try
{
SearchOptions options = new SearchOptions()
{
SearchMode = SearchMode.All,
IncludeTotalCount = true
};
SearchResults<Dictionary<string, object>> response = searchClient.Search<Dictionary<string, object>>("*", options);
return Convert.ToInt32(response.TotalCount);
}
catch (Exception ex)
{
Console.WriteLine(" Error: {0}", ex.Message.ToString());
}
return -1;
}
static void ImportFromJSON()
{
Console.WriteLine("\n Upload index documents from saved JSON files");
// Take JSON file and import this as-is to target index
Uri ServiceUri = new Uri("https://" + TargetSearchServiceName + ".search.windows.net");
HttpClient HttpClient = new HttpClient();
HttpClient.DefaultRequestHeaders.Add("api-key", TargetAdminKey);
try
{
foreach (string fileName in Directory.GetFiles(BackupDirectory, SourceIndexName + "*.json"))
{
Console.WriteLine(" -Uploading documents from file {0}", fileName);
string json = File.ReadAllText(fileName);
Uri uri = new Uri(ServiceUri, "/indexes/" + TargetIndexName + "/docs/index");
HttpResponseMessage response = AzureSearchHelper.SendSearchRequest(HttpClient, HttpMethod.Post, uri, json);
response.EnsureSuccessStatusCode();
}
}
catch (Exception ex)
{
Console.WriteLine(" Error: {0}", ex.Message.ToString());
}
}
}
}

Related

Azure Cognitive Search, how to iterate over facets to bypass 100K limit

I understand the limitation of the API with the 100K limit, and MS's site says as a work around "you can work around this limitation by adding code to iterate over, and filter on, a facet with less that 100K documents per facet value."
I'm using the "Back up and restore an Azure Cognitive Search index" sample solution provided by MS. (https://github.com/Azure-Samples/azure-search-dotnet-samples)
But can some when tell me where or how I implement this "iterate loop" on a facet" The facetable field I'm trying to use is "tributekey" but I don't know where to place the code in the below. Any help would be greatly appreciated.
// This is a prototype tool that allows for extraction of data from a search index
// Since this tool is still under development, it should not be used for production usage
using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Models;
using Microsoft.Extensions.Configuration;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net.Http;
using System.Text.Json;
using System.Threading;
using System.Threading.Tasks;
namespace AzureSearchBackupRestore
{
class Program
{
private static string SourceSearchServiceName;
private static string SourceAdminKey;
private static string SourceIndexName;
private static string TargetSearchServiceName;
private static string TargetAdminKey;
private static string TargetIndexName;
private static string BackupDirectory;
private static SearchIndexClient SourceIndexClient;
private static SearchClient SourceSearchClient;
private static SearchIndexClient TargetIndexClient;
private static SearchClient TargetSearchClient;
private static int MaxBatchSize = 500; // JSON files will contain this many documents / file and can be up to 1000
private static int ParallelizedJobs = 10; // Output content in parallel jobs
static void Main(string[] args)
{
//Get source and target search service info and index names from appsettings.json file
//Set up source and target search service clients
ConfigurationSetup();
//Backup the source index
Console.WriteLine("\nSTART INDEX BACKUP");
BackupIndexAndDocuments();
//Recreate and import content to target index
//Console.WriteLine("\nSTART INDEX RESTORE");
//DeleteIndex();
//CreateTargetIndex();
//ImportFromJSON();
//Console.WriteLine("\r\n Waiting 10 seconds for target to index content...");
//Console.WriteLine(" NOTE: For really large indexes it may take longer to index all content.\r\n");
//Thread.Sleep(10000);
//
//// Validate all content is in target index
//int sourceCount = GetCurrentDocCount(SourceSearchClient);
//int targetCount = GetCurrentDocCount(TargetSearchClient);
//Console.WriteLine("\nSAFEGUARD CHECK: Source and target index counts should match");
//Console.WriteLine(" Source index contains {0} docs", sourceCount);
//Console.WriteLine(" Target index contains {0} docs\r\n", targetCount);
//
//Console.WriteLine("Press any key to continue...");
//Console.ReadLine();
}
static void ConfigurationSetup()
{
IConfigurationBuilder builder = new ConfigurationBuilder().AddJsonFile("appsettings.json");
IConfigurationRoot configuration = builder.Build();
SourceSearchServiceName = configuration["SourceSearchServiceName"];
SourceAdminKey = configuration["SourceAdminKey"];
SourceIndexName = configuration["SourceIndexName"];
TargetSearchServiceName = configuration["TargetSearchServiceName"];
TargetAdminKey = configuration["TargetAdminKey"];
TargetIndexName = configuration["TargetIndexName"];
BackupDirectory = configuration["BackupDirectory"];
Console.WriteLine("CONFIGURATION:");
Console.WriteLine("\n Source service and index {0}, {1}", SourceSearchServiceName, SourceIndexName);
Console.WriteLine("\n Target service and index: {0}, {1}", TargetSearchServiceName, TargetIndexName);
Console.WriteLine("\n Backup directory: " + BackupDirectory);
SourceIndexClient = new SearchIndexClient(new Uri("https://" + SourceSearchServiceName + ".search.windows.net"), new AzureKeyCredential(SourceAdminKey));
SourceSearchClient = SourceIndexClient.GetSearchClient(SourceIndexName);
// TargetIndexClient = new SearchIndexClient(new Uri($"https://" + TargetSearchServiceName + ".search.windows.net"), new AzureKeyCredential(TargetAdminKey));
// TargetSearchClient = TargetIndexClient.GetSearchClient(TargetIndexName);
}
static void BackupIndexAndDocuments()
{
// Backup the index schema to the specified backup directory
Console.WriteLine("\n Backing up source index schema to {0}\r\n", BackupDirectory + "\\" + SourceIndexName + ".schema");
File.WriteAllText(BackupDirectory + "\\" + SourceIndexName + ".schema", GetIndexSchema());
// Extract the content to JSON files
int SourceDocCount = GetCurrentDocCount(SourceSearchClient);
WriteIndexDocuments(SourceDocCount); // Output content from index to json files
}
static void WriteIndexDocuments(int CurrentDocCount)
{
// Write document files in batches (per MaxBatchSize) in parallel
string IDFieldName = GetIDFieldName();
int FileCounter = 0;
for (int batch = 0; batch <= (CurrentDocCount / MaxBatchSize); batch += ParallelizedJobs)
{
List<Task> tasks = new List<Task>();
for (int job = 0; job < ParallelizedJobs; job++)
{
FileCounter++;
int fileCounter = FileCounter;
if ((fileCounter - 1) * MaxBatchSize < CurrentDocCount)
{
Console.WriteLine(" Backing up source documents to {0} - (batch size = {1})", BackupDirectory + "\\" + SourceIndexName + fileCounter + ".json", MaxBatchSize);
tasks.Add(Task.Factory.StartNew(() =>
ExportToJSON((fileCounter - 1) * MaxBatchSize, IDFieldName, BackupDirectory + "\\" + SourceIndexName + fileCounter + ".json")
));
}
}
Task.WaitAll(tasks.ToArray()); // Wait for all the stored procs in the group to complete
}
return;
}
static void ExportToJSON(int Skip, string IDFieldName, string FileName)
{
// Extract all the documents from the selected index to JSON files in batches of 500 docs / file
string json = string.Empty;
try
{
SearchOptions options = new SearchOptions()
{
SearchMode = SearchMode.All,
Size = MaxBatchSize,
Skip = Skip,
// ,IncludeTotalCount = true
// ,Filter = Azure.Search.Documents.SearchFilter.Create('%24top=2&%24skip=0&%24orderby=tributeId%20asc')
//,Filter = String.Format("&search=*&%24top=2&%24skip=0&%24orderby=tributeId%20asc")
//,Filter = "%24top=2&%24skip=0&%24orderby=tributeId%20asc"
//,Filter = "tributeKey eq '5'"
};
SearchResults<SearchDocument> response = SourceSearchClient.Search<SearchDocument>("*", options);
foreach (var doc in response.GetResults())
{
json += JsonSerializer.Serialize(doc.Document) + ",";
json = json.Replace("\"Latitude\":", "\"type\": \"Point\", \"coordinates\": [");
json = json.Replace("\"Longitude\":", "");
json = json.Replace(",\"IsEmpty\":false,\"Z\":null,\"M\":null,\"CoordinateSystem\":{\"EpsgId\":4326,\"Id\":\"4326\",\"Name\":\"WGS84\"}", "]");
json += "\r\n";
}
// Output the formatted content to a file
json = json.Substring(0, json.Length - 3); // remove trailing comma
File.WriteAllText(FileName, "{\"value\": [");
File.AppendAllText(FileName, json);
File.AppendAllText(FileName, "]}");
Console.WriteLine(" Total documents: {0}", response.GetResults().Count().ToString());
json = string.Empty;
}
catch (Exception ex)
{
Console.WriteLine("Error: {0}", ex.Message.ToString());
}
}
static string GetIDFieldName()
{
// Find the id field of this index
string IDFieldName = string.Empty;
try
{
var schema = SourceIndexClient.GetIndex(SourceIndexName);
foreach (var field in schema.Value.Fields)
{
if (field.IsKey == true)
{
IDFieldName = Convert.ToString(field.Name);
break;
}
}
}
catch (Exception ex)
{
Console.WriteLine("Error: {0}", ex.Message.ToString());
}
return IDFieldName;
}
static string GetIndexSchema()
{
// Extract the schema for this index
// We use REST here because we can take the response as-is
Uri ServiceUri = new Uri("https://" + SourceSearchServiceName + ".search.windows.net");
HttpClient HttpClient = new HttpClient();
HttpClient.DefaultRequestHeaders.Add("api-key", SourceAdminKey);
string Schema = string.Empty;
try
{
Uri uri = new Uri(ServiceUri, "/indexes/" + SourceIndexName);
HttpResponseMessage response = AzureSearchHelper.SendSearchRequest(HttpClient, HttpMethod.Get, uri);
AzureSearchHelper.EnsureSuccessfulSearchResponse(response);
Schema = response.Content.ReadAsStringAsync().Result.ToString();
}
catch (Exception ex)
{
Console.WriteLine("Error: {0}", ex.Message.ToString());
}
return Schema;
}
private static bool DeleteIndex()
{
Console.WriteLine("\n Delete target index {0} in {1} search service, if it exists", TargetIndexName, TargetSearchServiceName);
// Delete the index if it exists
try
{
TargetIndexClient.DeleteIndex(TargetIndexName);
}
catch (Exception ex)
{
Console.WriteLine(" Error deleting index: {0}\r\n", ex.Message);
Console.WriteLine(" Did you remember to set your SearchServiceName and SearchServiceApiKey?\r\n");
return false;
}
return true;
}
static void CreateTargetIndex()
{
Console.WriteLine("\n Create target index {0} in {1} search service", TargetIndexName, TargetSearchServiceName);
// Use the schema file to create a copy of this index
// I like using REST here since I can just take the response as-is
string json = File.ReadAllText(BackupDirectory + "\\" + SourceIndexName + ".schema");
// Do some cleaning of this file to change index name, etc
json = "{" + json.Substring(json.IndexOf("\"name\""));
int indexOfIndexName = json.IndexOf("\"", json.IndexOf("name\"") + 5) + 1;
int indexOfEndOfIndexName = json.IndexOf("\"", indexOfIndexName);
json = json.Substring(0, indexOfIndexName) + TargetIndexName + json.Substring(indexOfEndOfIndexName);
Uri ServiceUri = new Uri("https://" + TargetSearchServiceName + ".search.windows.net");
HttpClient HttpClient = new HttpClient();
HttpClient.DefaultRequestHeaders.Add("api-key", TargetAdminKey);
try
{
Uri uri = new Uri(ServiceUri, "/indexes");
HttpResponseMessage response = AzureSearchHelper.SendSearchRequest(HttpClient, HttpMethod.Post, uri, json);
response.EnsureSuccessStatusCode();
}
catch (Exception ex)
{
Console.WriteLine(" Error: {0}", ex.Message.ToString());
}
}
static int GetCurrentDocCount(SearchClient searchClient)
{
// Get the current doc count of the specified index
try
{
SearchOptions options = new SearchOptions()
{
SearchMode = SearchMode.All,
IncludeTotalCount = true
};
SearchResults<Dictionary<string, object>> response = searchClient.Search<Dictionary<string, object>>("*", options);
return Convert.ToInt32(response.TotalCount);
}
catch (Exception ex)
{
Console.WriteLine(" Error: {0}", ex.Message.ToString());
}
return -1;
}
static void ImportFromJSON()
{
Console.WriteLine("\n Upload index documents from saved JSON files");
// Take JSON file and import this as-is to target index
Uri ServiceUri = new Uri("https://" + TargetSearchServiceName + ".search.windows.net");
HttpClient HttpClient = new HttpClient();
HttpClient.DefaultRequestHeaders.Add("api-key", TargetAdminKey);
try
{
foreach (string fileName in Directory.GetFiles(BackupDirectory, SourceIndexName + "*.json"))
{
Console.WriteLine(" -Uploading documents from file {0}", fileName);
string json = File.ReadAllText(fileName);
Uri uri = new Uri(ServiceUri, "/indexes/" + TargetIndexName + "/docs/index");
HttpResponseMessage response = AzureSearchHelper.SendSearchRequest(HttpClient, HttpMethod.Post, uri, json);
response.EnsureSuccessStatusCode();
}
}
catch (Exception ex)
{
Console.WriteLine(" Error: {0}", ex.Message.ToString());
}
}
}
}
I tried adding a filter option in the ExportToJSON method but the request fails

Generate Text File Upon Button Action and Save to local Drive using Acumatica

I am trying to generate a text file on an Action Button, with contents of a dataview created in a Graph Class and Save the file in my Local Drive. But i am unable to do it.
Please help me with the file generation...Thanks
I AM USING Acumatica Version 2019R2 (v 19.203.0042)
MY CODE GOES HERE...
public PXSelect<MayBankGIRO> Document; //this is my dataview
public PXAction<MayBankGiroFilter> createTextFile;
[PXUIField(DisplayName = "Create Text File")]
[PXButton()]
public virtual IEnumerable CreateTextFile(PXAdapter adapter)
{
string filepath = "C:\\Subhashish Dawn";
System.IO.StreamWriter sw = new System.IO.StreamWriter(filepath);
MayBankGIRO giroObject = this.Document.Current;
List<object> myListObject = new List<object> { };
FixedLengthFile flatFile = new FixedLengthFile();
foreach (MayBankGIRO dacRecord in this.Document.Select())
{
if (giroObject.ReordType == "00")
{
myListObject.Add(dacRecord.ReordType + "|" + dacRecord.CorporateID + "|" + dacRecord.ClientBatchID + "|");
}
else
{
myListObject.Add(dacRecord.ReordType + "|" + dacRecord.CorporateID + "|" + dacRecord.ClientBatchID + "|" + dacRecord.Country + "|");
string data = dacRecord.ReordType;
}
this.Document.Update(dacRecord);
}
flatFile.WriteToFile(myListObject, sw);
sw.Flush();
sw.FlushAsync();
string path = "DAWN" + ".txt";
PX.SM.FileInfo file = new PX.SM.FileInfo(Guid.NewGuid(), path, null, System.Text.Encoding.UTF8.GetBytes(**path**)); // what shall i substitite in place of **path**
throw new PXRedirectToFileException(file, true);
}
``````````````````````````````````````````````````````
Can anyone please specify what changes in have to make in the above code.

I utilize UploadFileMaintenance to do this. I'm not sure if this will meet your needs, but here is the core of my code that works for me.
byte[] labelBytes = Encoding.ASCII.GetBytes(myLabelData);
if(labelBytes.Length > 0)
{
string filename = "label-" + Guid.NewGuid().ToString() + ".txt";
PX.SM.FileInfo labelFileInfo = new FileInfo(filename, null, labelBytes);
UploadFileMaintenance upload = PXGraph.CreateInstance<UploadFileMaintenance>();
if (upload.SaveFile(labelFileInfo))
{
string targetUrl = PXRedirectToFileException.BuildUrl(labelFileInfo.UID);
throw new PXRedirectToUrlException(targetUrl, "Print Labels");
}
}

Assuming your Document object is the view you need and its Select() method returns all the data records you need inside your file, this should work:
// We need at least these, listing them for reference
using PX.Data;
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
// This is your dataview
public PXSelect<MayBankGIRO> Document;
// Your action delegate
public PXAction<MayBankGiroFilter> createTextFile;
[PXUIField(DisplayName = "Create Text File")]
[PXButton()]
public virtual IEnumerable CreateTextFile(PXAdapter adapter)
{
// You can use this method to print debug information for your customizations
// Just remove when you are done testing
PXTrace.WriteInformation("Generating records");
// We will build the content as a string list first
List<string> myList = new List<string> { };
// If the value of 'ReordType' can change for each record, you don't need this
MayBankGIRO giroObject = this.Document.Current;
foreach (MayBankGIRO dacRecord in this.Document.Select())
{
// Does 'ReordType' change for each record?
// if it does you may need to use 'dacRecord.ReordType' in this if instead
if (giroObject.ReordType == "00")
{
// This only works if all these members are strings or can be cast to strings
myList.Add(dacRecord.ReordType + "|" + dacRecord.CorporateID + "|" + dacRecord.ClientBatchID + "|");
}
else
{
// This only works if all these members are strings or can be cast to strings
myList.Add(dacRecord.ReordType + "|" + dacRecord.CorporateID + "|" + dacRecord.ClientBatchID + "|" + dacRecord.Country + "|");
}
}
PXTrace.WriteInformation("Generating file");
// Set the name
string filename = "DAWN" + ".txt";
// Use our download method
Download(myList, filename);
}
// We can define a static method to be able to reuse this later for other DACs
public static void Download(List<string> lines, string name)
{
var bytes = default(byte[]);
// Write all lines to stream
using (MemoryStream stream = new MemoryStream())
{
StreamWriter sw = new StreamWriter(stream);
foreach (string line in lines)
{
sw.WriteLine(line);
}
sw.Close();
stream.Position = 0;
bytes = stream.ToArray();
};
// Save content to file object
PX.SM.FileInfo textDoc = new PX.SM.FileInfo(name, null, bytes);
if (textDoc != null)
{
// Trigger file download
throw new PXRedirectToFileException(textDoc, true);
} else {
//TODO: You could raise an exception here also to notify the user
PXTrace.WriteInformation("Could not generate file");
}
}

Hi Markoan your code helped me to create the textfile but the content of the textfile is getting repeated with the first record of the data-view. My reord type have only three values "00" for the first record "01" for the 2nd to n-1 record and "99" for the nth record.
Though I have made few changes to your code
// We need at least these, listing them for reference
using PX.Data;
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
// This is your dataview
public PXSelect<MayBankGIRO> Document;
// Your action delegate
public PXAction<MayBankGiroFilter> createTextFile;
[PXUIField(DisplayName = "Create Text File")]
[PXButton()]
public virtual IEnumerable CreateTextFile(PXAdapter adapter)
{
// You can use this method to print debug information for your customizations
// Just remove when you are done testing
PXTrace.WriteInformation("Generating records");
// We will build the content as a string list first
List<string> myList = new List<string> { };
// If the value of 'ReordType' can change for each record, you don't need this
MayBankGIRO giroObject = this.Document.Current;
foreach (MayBankGIRO dacRecord in this.Document.Select())
{
// Does 'ReordType' change for each record?
// if it does you may need to use 'dacRecord.ReordType' in this if instead
myList.Add(dacRecord.ReordType + "|" + dacRecord.CustomerReferenceNumber + "|" + dacRecord.ClientBatchID + "|" + dacRecord.Country + "|");
}
PXTrace.WriteInformation("Generating file");
// Set the name
string filename = "DAWN" + ".txt";
// Use our download method
Download(myList, filename);
}
// We can define a static method to be able to reuse this later for other DACs
public static void Download(List<string> lines, string name)
{
var bytes = default(byte[]);
// Write all lines to stream
using (MemoryStream stream = new MemoryStream())
{
StreamWriter sw = new StreamWriter(stream);
foreach (string line in lines)
{
sw.WriteLine(line);
}
// sw.Close(); this was showing some error
stream.Position = 0; // "Cannot reach a closed stream" hence i added it in the next line
bytes = stream.ToArray();
sw.Close();
};
// Save content to file object
PX.SM.FileInfo textDoc = new PX.SM.FileInfo(name, null, bytes);
if (textDoc != null)
{
// Trigger file download
throw new PXRedirectToFileException(textDoc, true);
} else {
//TODO: You could raise an exception here also to notify the user
PXTrace.WriteInformation("Could not generate file");
}
}

Update a SavedQuery (View) from the SDK

I am trying to change all the Business Unit references I got after importing a solution to the ones in the Acceptance environment.
QueryExpression ViewQuery = new QueryExpression("savedquery");
String[] viewArrayFields = { "name", "fetchxml" };
ViewQuery.ColumnSet = new ColumnSet(viewArrayFields);
ViewQuery.PageInfo = new PagingInfo();
ViewQuery.PageInfo.Count = 5000;
ViewQuery.PageInfo.PageNumber = 1;
ViewQuery.PageInfo.ReturnTotalRecordCount = true;
EntityCollection retrievedViews = service.RetrieveMultiple(ViewQuery);
//iterate though the values and print the right one for the current user
int oldValues = 0;
int accValuesUpdated = 0;
int prodValuesUpdated = 0;
int total = 0;
foreach (var entity in retrievedViews.Entities)
{
total++;
if (!entity.Contains("fetchxml"))
{ }
else
{
string fetchXML = entity.Attributes["fetchxml"].ToString();
for (int i = 0; i < guidDictionnary.Count; i++)
{
var entry = guidDictionnary.ElementAt(i);
if (fetchXML.Contains(entry.Key.ToString().ToUpperInvariant()))
{
Console.WriteLine(entity.Attributes["name"].ToString());
oldValues++;
if (destinationEnv.Equals("acc"))
{
accValuesUpdated++;
Console.WriteLine();
Console.WriteLine("BEFORE:");
Console.WriteLine();
Console.WriteLine(entity.Attributes["fetchxml"].ToString());
string query = entity.Attributes["fetchxml"].ToString();
query = query.Replace(entry.Key.ToString().ToUpperInvariant(), entry.Value.AccGuid.ToString().ToUpperInvariant());
entity.Attributes["fetchxml"] = query;
Console.WriteLine();
Console.WriteLine("AFTER:");
Console.WriteLine();
Console.WriteLine(entity.Attributes["fetchxml"].ToString());
}
else
{
prodValuesUpdated++;
string query = entity.Attributes["fetchxml"].ToString();
query = query.Replace(entry.Key.ToString().ToUpperInvariant(), entry.Value.ProdGuid.ToString().ToUpperInvariant());
entity.Attributes["fetchxml"] = query;
}
service.Update(entity);
}
}
}
}
Console.WriteLine("{0} values to be updated. {1} shall be mapped to acceptance, {2} to prod. Total = {3} : {4}", oldValues, accValuesUpdated, prodValuesUpdated, total, retrievedViews.Entities.Count);
I see that the new value is corrected, but it does not get saved. I get no error while updating the record and publishing the changes in CRM does not help.
Any hint?

According to your comments, it sounds like the value you're saving the entity as, is the value that you want it to be. I'm guessing your issue is with not publishing your change. If you don't publish it, it'll still give you the old value of the FetchXml I believe.
Try calling this method:
PublishEntity(service, "savedquery");
private void PublishEntity(IOrganizationService service, string logicalName)
{
service.Execute(new PublishXmlRequest()
{
ParameterXml = "<importexportxml>"
+ " <entities>"
+ " <entity>" + logicalName + "</entity>"
+ " </entities>"
+ "</importexportxml>"
});
}

With OrmLite, is there a way to automatically update table schema when my POCO is modified?

Can OrmLite recognize differences between my POCO and my schema and automatically add (or remove) columns as necessary to force the schema to remain in sync with my POCO?
If this ability doesn't exist, is there way for me to query the db for table schema so that I may manually perform the syncing? I found this, but I'm using the version of OrmLite that installs with ServiceStack and for the life of me, I cannot find a namespace that has the TableInfo classes.

I created an extension method to automatically add missing columns to my tables. Been working great so far. Caveat: the code for getting the column names is SQL Server specific.
namespace System.Data
{
public static class IDbConnectionExtensions
{
private static List<string> GetColumnNames(IDbConnection db, string tableName)
{
var columns = new List<string>();
using (var cmd = db.CreateCommand())
{
cmd.CommandText = "exec sp_columns " + tableName;
var reader = cmd.ExecuteReader();
while (reader.Read())
{
var ordinal = reader.GetOrdinal("COLUMN_NAME");
columns.Add(reader.GetString(ordinal));
}
reader.Close();
}
return columns;
}
public static void AlterTable<T>(this IDbConnection db) where T : new()
{
var model = ModelDefinition<T>.Definition;
// just create the table if it doesn't already exist
if (db.TableExists(model.ModelName) == false)
{
db.CreateTable<T>(overwrite: false);
return;
}
// find each of the missing fields
var columns = GetColumnNames(db, model.ModelName);
var missing = ModelDefinition<T>.Definition.FieldDefinitions
.Where(field => columns.Contains(field.FieldName) == false)
.ToList();
// add a new column for each missing field
foreach (var field in missing)
{
var alterSql = string.Format("ALTER TABLE {0} ADD {1} {2}",
model.ModelName,
field.FieldName,
db.GetDialectProvider().GetColumnTypeDefinition(field.FieldType)
);
Console.WriteLine(alterSql);
db.ExecuteSql(alterSql);
}
}
}
}

No there is no current support for Auto Migration of RDBMS Schema's vs POCOs in ServiceStack's OrmLite.
There are currently a few threads being discussed in OrmLite's issues that are exploring the different ways to add this.

Here is a slightly modified version of code from cornelha to work with PostgreSQL. Removed this fragment
//private static List<string> GetColumnNames(object poco)
//{
// var list = new List<string>();
// foreach (var prop in poco.GetType().GetProperties())
// {
// list.Add(prop.Name);
// }
// return list;
//}
and used IOrmLiteDialectProvider.NamingStrategy.GetTableName and IOrmLiteDialectProvider.NamingStrategy.GetColumnName methods to convert table and column names from PascalNotation to this_kind_of_notation used by OrmLite when creating tables in PostgreSQL.
public static class IDbConnectionExtensions
{
private static List<string> GetColumnNames(IDbConnection db, string tableName, IOrmLiteDialectProvider provider)
{
var columns = new List<string>();
using (var cmd = db.CreateCommand())
{
cmd.CommandText = getCommandText(tableName, provider);
var tbl = new DataTable();
tbl.Load(cmd.ExecuteReader());
for (int i = 0; i < tbl.Columns.Count; i++)
{
columns.Add(tbl.Columns[i].ColumnName);
}
}
return columns;
}
private static string getCommandText(string tableName, IOrmLiteDialectProvider provider)
{
if (provider == PostgreSqlDialect.Provider)
return string.Format("select * from {0} limit 1", tableName);
else return string.Format("select top 1 * from {0}", tableName);
}
public static void AlterTable<T>(this IDbConnection db, IOrmLiteDialectProvider provider) where T : new()
{
var model = ModelDefinition<T>.Definition;
var table = new T();
var namingStrategy = provider.NamingStrategy;
// just create the table if it doesn't already exist
var tableName = namingStrategy.GetTableName(model.ModelName);
if (db.TableExists(tableName) == false)
{
db.CreateTable<T>(overwrite: false);
return;
}
// find each of the missing fields
var columns = GetColumnNames(db, model.ModelName, provider);
var missing = ModelDefinition<T>.Definition.FieldDefinitions
.Where(field => columns.Contains(namingStrategy.GetColumnName(field.FieldName)) == false)
.ToList();
// add a new column for each missing field
foreach (var field in missing)
{
var columnName = namingStrategy.GetColumnName(field.FieldName);
var alterSql = string.Format("ALTER TABLE {0} ADD COLUMN {1} {2}",
tableName,
columnName,
db.GetDialectProvider().GetColumnTypeDefinition(field.FieldType)
);
Console.WriteLine(alterSql);
db.ExecuteSql(alterSql);
}
}
}

I implemented an UpdateTable function. The basic idea is:
Rename current table on database.
Let OrmLite create the new schema.
Copy the relevant data from the old table to the new.
Drop the old table.
Github Repo: https://github.com/peheje/Extending-NServiceKit.OrmLite
Condensed code:
public interface ISqlProvider
{
string RenameTableSql(string currentName, string newName);
string GetColumnNamesSql(string tableName);
string InsertIntoSql(string intoTableName, string fromTableName, string commaSeparatedColumns);
string DropTableSql(string tableName);
}
public static void UpdateTable<T>(IDbConnection connection, ISqlProvider sqlProvider) where T : new()
{
connection.CreateTableIfNotExists<T>();
var model = ModelDefinition<T>.Definition;
string tableName = model.Name;
string tableNameTmp = tableName + "Tmp";
string renameTableSql = sqlProvider.RenameTableSql(tableName, tableNameTmp);
connection.ExecuteNonQuery(renameTableSql);
connection.CreateTable<T>();
string getModelColumnsSql = sqlProvider.GetColumnNamesSql(tableName);
var modelColumns = connection.SqlList<string>(getModelColumnsSql);
string getDbColumnsSql = sqlProvider.GetColumnNamesSql(tableNameTmp);
var dbColumns = connection.SqlList<string>(getDbColumnsSql);
List<string> activeFields = dbColumns.Where(dbColumn => modelColumns.Contains(dbColumn)).ToList();
string activeFieldsCommaSep = ListToCommaSeparatedString(activeFields);
string insertIntoSql = sqlProvider.InsertIntoSql(tableName, tableNameTmp, activeFieldsCommaSep);
connection.ExecuteSql(insertIntoSql);
string dropTableSql = sqlProvider.DropTableSql(tableNameTmp);
//connection.ExecuteSql(dropTableSql); //maybe you want to clean up yourself, else uncomment
}
private static String ListToCommaSeparatedString(List<String> source)
{
var sb = new StringBuilder();
for (int i = 0; i < source.Count; i++)
{
sb.Append(source[i]);
if (i < source.Count - 1)
{
sb.Append(", ");
}
}
return sb.ToString();
}
}
MySql implementation:
public class MySqlProvider : ISqlProvider
{
public string RenameTableSql(string currentName, string newName)
{
return "RENAME TABLE `" + currentName + "` TO `" + newName + "`;";
}
public string GetColumnNamesSql(string tableName)
{
return "SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '" + tableName + "';";
}
public string InsertIntoSql(string intoTableName, string fromTableName, string commaSeparatedColumns)
{
return "INSERT INTO `" + intoTableName + "` (" + commaSeparatedColumns + ") SELECT " + commaSeparatedColumns + " FROM `" + fromTableName + "`;";
}
public string DropTableSql(string tableName)
{
return "DROP TABLE `" + tableName + "`;";
}
}
Usage:
using (var db = dbFactory.OpenDbConnection())
{
DbUpdate.UpdateTable<SimpleData>(db, new MySqlProvider());
}
Haven't tested with FKs. Can't handle renaming properties.

I needed to implement something similiar and found the post by Scott very helpful. I decided to make a small change which will make it much more agnostic. Since I only use Sqlite and MSSQL, I made the getCommand method very simple, but can be extended. I used a simple datatable to get the columns. This solution works perfectly for my requirements.
public static class IDbConnectionExtensions
{
private static List<string> GetColumnNames(IDbConnection db, string tableName,IOrmLiteDialectProvider provider)
{
var columns = new List<string>();
using (var cmd = db.CreateCommand())
{
cmd.CommandText = getCommandText(tableName, provider);
var tbl = new DataTable();
tbl.Load(cmd.ExecuteReader());
for (int i = 0; i < tbl.Columns.Count; i++)
{
columns.Add(tbl.Columns[i].ColumnName);
}
}
return columns;
}
private static string getCommandText(string tableName, IOrmLiteDialectProvider provider)
{
if(provider == SqliteDialect.Provider)
return string.Format("select * from {0} limit 1", tableName);
else return string.Format("select top 1 * from {0}", tableName);
}
private static List<string> GetColumnNames(object poco)
{
var list = new List<string>();
foreach (var prop in poco.GetType().GetProperties())
{
list.Add(prop.Name);
}
return list;
}
public static void AlterTable<T>(this IDbConnection db, IOrmLiteDialectProvider provider) where T : new()
{
var model = ModelDefinition<T>.Definition;
var table = new T();
// just create the table if it doesn't already exist
if (db.TableExists(model.ModelName) == false)
{
db.CreateTable<T>(overwrite: false);
return;
}
// find each of the missing fields
var columns = GetColumnNames(db, model.ModelName,provider);
var missing = ModelDefinition<T>.Definition.FieldDefinitions
.Where(field => columns.Contains(field.FieldName) == false)
.ToList();
// add a new column for each missing field
foreach (var field in missing)
{
var alterSql = string.Format("ALTER TABLE {0} ADD {1} {2}",
model.ModelName,
field.FieldName,
db.GetDialectProvider().GetColumnTypeDefinition(field.FieldType)
);
Console.WriteLine(alterSql);
db.ExecuteSql(alterSql);
}
}
}

So I took user44 answer, and modified the AlterTable method to make it a bit more efficient.
Instead of looping and running one SQL query per field/column, I merge it into one with some simple text parsing (MySQL commands!).
public static void AlterTable<T>(this IDbConnection db, IOrmLiteDialectProvider provider) where T : new()
{
var model = ModelDefinition<T>.Definition;
var table = new T();
var namingStrategy = provider.NamingStrategy;
// just create the table if it doesn't already exist
var tableName = namingStrategy.GetTableName(model.ModelName);
if (db.TableExists(tableName) == false)
{
db.CreateTable<T>(overwrite: false);
return;
}
// find each of the missing fields
var columns = GetColumnNames(db, model.ModelName, provider);
var missing = ModelDefinition<T>.Definition.FieldDefinitions
.Where(field => columns.Contains(namingStrategy.GetColumnName(field.FieldName)) == false)
.ToList();
string alterSql = "";
string addSql = "";
// add a new column for each missing field
foreach (var field in missing)
{
var alt = db.GetDialectProvider().ToAddColumnStatement(typeof(T), field); // Should be made more efficient, one query for all changes instead of many
int index = alt.IndexOf("ADD ");
alterSql = alt.Substring(0, index);
addSql += alt.Substring(alt.IndexOf("ADD COLUMN")).Replace(";", "") + ", ";
}
if (addSql.Length > 2)
addSql = addSql.Substring(0, addSql.Length - 2);
string fullSql = alterSql + addSql;
Console.WriteLine(fullSql);
db.ExecuteSql(fullSql);
}

Recursive linkscraper c#

I'm struggling with this a whole day now and I can't seem to figure it out.
I have a fucntion that gives me a list of all links on a specific url. That works fine.
However I want to make this function recursive so that it searches for the links found with the first search and adds them to the list and continue so that it goes through all my pages on the website.
How can I make this recursive?
My code:
class Program
{
public static List<LinkItem> urls;
private static List<LinkItem> newUrls = new List<LinkItem>();
static void Main(string[] args)
{
WebClient w = new WebClient();
int count = 0;
urls = new List<LinkItem>();
newUrls = new List<LinkItem>();
urls.Add(new LinkItem{Href = "http://www.smartphoto.be", Text = ""});
while (urls.Count > 0)
{
foreach (var url in urls)
{
if (RemoteFileExists(url.Href))
{
string s = w.DownloadString(url.Href);
newUrls.AddRange(LinkFinder.Find(s));
}
}
urls = newUrls.Select(x => new LinkItem{Href = x.Href, Text=""}).ToList();
count += newUrls.Count;
newUrls.Clear();
ReturnLinks();
}
Console.WriteLine();
Console.Write("Found: " + count + " links.");
Console.ReadLine();
}
private static void ReturnLinks()
{
foreach (LinkItem i in urls)
{
Console.WriteLine(i.Href);
//ReturnLinks();
}
}
private static bool RemoteFileExists(string url)
{
try
{
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
request.Method = "HEAD";
//Getting the Web Response.
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
//Returns TURE if the Status code == 200
return (response.StatusCode == HttpStatusCode.OK);
}
catch
{
return false;
}
}
}
The code behind LinkFinder.Find can be found here: http://www.dotnetperls.com/scraping-html
Anyone knows how I can either make that function recursive or can I make the ReturnLinks function recursive? I prefer to not touch the LinkFinder.Find method as this works perfect for one link, I just should be able to call it as many times as needed to expand my final url list.

I assume you want to load each link and find the link within, and continue until you run out of links?
Since it is likely that the recursion depth could get very large, i would avoid recursion, this should work i think.
WebClient w = new WebClient();
int count = 0;
urls = new List<string>();
newUrls = new List<LinkItem>();
urls.Add("http://www.google.be");
while (urls.Count > 0)
{
foreach(var url in urls)
{
string s = w.DownloadString(url);
newUrls.AddRange(LinkFinder.Find(s));
}
urls = newUrls.Select(x=>x.Href).ToList();
count += newUrls.Count;
newUrls.Clear();
ReturnLinks();
}
Console.WriteLine();
Console.Write("Found: " + count + " links.");
Console.ReadLine();

static void Main()
{
WebClient w = new WebClient();
List<ListItem> allUrls = FindAll(w.DownloadString("http://www.google.be"));
}
private static List<ListItem> FindAll(string address)
{
List<ListItem> list = new List<ListItem>();
foreach (url in LinkFinder.Find(address))
{
list.AddRange(FindAll(url.Address)));//or url.ToString() or what ever the string that represents the address
}
return list;
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

azure search work around $skip limit - azure

I'm doing a job to check if all records from my database exists on Azure Search (around 610k). However there's a 100000 limit with the $skip parameter. Is there a way to work around this limit?

Related

Azure Cognitive Search, how to iterate over facets to bypass 100K limit

Generate Text File Upon Button Action and Save to local Drive using Acumatica

Update a SavedQuery (View) from the SDK

With OrmLite, is there a way to automatically update table schema when my POCO is modified?

Recursive linkscraper c#

Categories

Resources