1 .Hi SO, I have a created a class for fetching user's tweets from twitter with the help of screen name. My problem is I'm getting rate limit exceeded very frequently.
2 .I had created table for screen name in which I'm saving all screen names and
3 .I had created another table to store user's tweets.
Below is my Code:
public List<TwitterProfileDetails> GetAllTweets(Func<SingleUserAuthorizer> AuthenticateCredentials,string screenname)
{
List<TwitterProfileDetails> lstofTweets = new List<TwitterProfileDetails>();
TwitterProfileDetails details = new TwitterProfileDetails();
var twitterCtx = new LinqToTwitter.TwitterContext(AuthenticateCredentials());
var helpResult =
(from help in twitterCtx.Help
where help.Type == HelpType.RateLimits &&
help.Resources == "search,users,socialgraph"
select help)
.SingleOrDefault();
foreach (var category in helpResult.RateLimits)
{
Console.WriteLine("\nCategory: {0}", category.Key);
foreach (var limit in category.Value)
{
Console.WriteLine(
"\n Resource: {0}\n Remaining: {1}\n Reset: {2}\n Limit: {3}",
limit.Resource, limit.Remaining, limit.Reset, limit.Limit);
}
}
var tweets = from t in twitterCtx.Status
where t.Type == StatusType.User && t.ScreenName == screename && t.Count == 15
select t;
if (tweets != null)
{
foreach (var tweetStatus in tweets)
{
if (tweetStatus != null)
{
lstofTweets.Add(new TwitterProfileDetails { Name = tweetStatus.User.Name, ProfileImagePath = tweetStatus.User.ProfileImageUrl, Tweets = tweetStatus.Text, UserID = tweetStatus.User.Identifier.UserID, PostedDate = Convert.ToDateTime(tweetStatus.CreatedAt),ScreenName=screename });
}
}
}
return lstofTweets;
}
I am using above method has below..
foreach (var screenObj in screenName)
{
var getTweets = api.GetAllTweets(api.AuthenticateCredentials, screenObj.UserName);
foreach (var obj in getTweets)
{
using (DBcontext = new DBContext())
{
tweets.Name = obj.Name;
tweets.ProfileImage = obj.ProfileImagePath;
tweets.PostedOn = obj.PostedDate;
tweets.Tweets = obj.Tweets;
tweets.CreatedOn = DateTime.Now;
tweets.ModifiedOn = DateTime.Now;
tweets.Status = EntityStatus.Active;
tweets.ScreenName = obj.ScreenName;
var exist = context.UserTweets.Any(user => user.Tweets.Equals(obj.Tweets));
if (!exist)
context.UserTweets.Add(tweets);
context.SaveChanges();
}
}
}
I see that you found the Help/RateLimits query. There are various approaches you can take. e.g. add a delay between queries, delay the next query if the limit has been exceeded, or catch the exception and delay until the next 15 minute window.
If you want to monitor interactively, you can watch the rate limit for each query. The TwitterContext instance you use for performing the query contains RateLimitXxx properties that populate after every query. You'll need to read those values after the query, which appears to be inside your GetAllTweets method. You have to expose those values to your loop somehow, via return object, out params, static field, or whatever logic you feel is necessary.
// the first time through, you have the whole rate limit for the 15 minute window
foreach (var screenObj in screenName)
{
var getTweets = api.GetAllTweets(api.AuthenticateCredentials, screenObj.UserName);
// your processing logic ...
// assuming you have the RateLimitXxx values in scope
if (rateLimitRemaining == 0)
Thread.Sleep(CalculateRemainingMilliseconds(RateLimitReset));
}
RateLimitRemaining is how many queries you can do in the current 15 minute window and RateLimitReset is the number of epoch seconds remaining until the rate limit resets (when you can start querying again).
It would be helpful to review the Twitter docs on Rate Limiting.
For reference, here are a couple other questions that might provide more ideas:
Twitter rate limiting
Get all followers using LINQ to Twitter
Related
i want to get all the records in particular record type , but i got 1000 only.
This is the code I used.
function getRecords() {
return nlapiSearchRecord('contact', null, null, null);
}
I need two codes.
1) Get whole records at a single time
2) Get the records page wise by passing pageindex as an argument to the getRecords [1st =>0-1000 , 2nd =>1000 - 2000 , ...........]
function getRecords(pageIndex) {
.........
}
Thanks in advance
you can't get whole records at a time. However, you can sort results by internalid, and remember the last internalId of 1st search result and use an additional filter in your next search result.
var totalResults = [];
var res = nlapiSearchRecord('contact', null, null, new nlobjSearchColumn('internalid').setSort()) || [];
lastId = res[res.length - 1].getId();
copyAndPushToArray(totalResult, res);
while(res.length < 1000)
{
res = nlapiSearchRecord('contact', null, ['internalidnumber', 'greaterthan', lastId], new nlobjSearchColumn('internalid').setSort());
copyAndPushToArray(totalResult, res);
lastId = res[res.length - 1].getId();
}
Beware, if the number of records are high you may overuse governance limit in terms of time and usage points.
If you remember the lastId you can write a logic in RESTlet to take id as param and then use that as additional filter to return nextPage.
You can write a logic to get nth pageresult but, you might have to run search uselessly n-1 times.
Also, I would suggest to use nlapiCreateSearch().runSearch() as it can return up to 4000 records
Here is another way to get more than 1000 results on a search:
function getItems() {
var columns = ['internalid', 'itemid', 'salesdescription', 'baseprice', 'lastpurchaseprice', 'upccode', 'quantityonhand', 'vendorcode'];
var searchcolumns = [];
for(var col in columns) {
searchcolumns.push(new nlobjSearchColumn(columns[col]));
}
var search = nlapiCreateSearch('item', null, searchcolumns);
var results = search.runSearch();
var items = [], slice = [], i = 0;
do {
slice = results.getResults(i, i + 1000);
for (var itm in slice) {
var item = {};
for(var col in columns) { item[columns[col]] = slice[itm].getValue(columns[col]); } // convert nlobjSearchResult into simple js object
items.push(item);
i++;
}
} while (slice.length >= 1000);
return items;
}
I am offloading my search feature on a relational database to Azure Search. My Products tables contains columns like serialNumber, PartNumber etc.. (there can be multiple serialNumbers with the same partNumber).
I want to create a suggestor that can autocomplete partNumbers. But in my scenario I am getting a lot of duplicates in the suggestions because the partNumber match was found in multiple entries.
How can I solve this problem ?
The Suggest API suggests documents, not queries. If you repeat the partNumber information for each serialNumber in your index and then suggest based on partNumber, you will get a result for each matching document. You can see this more clearly by including the key field in the $select parameter. Azure Search will eliminate duplicates within the same document, but not across documents. You will have to do that on the client side, or build a secondary index of partNumbers just for suggestions.
See this forum thread for a more in-depth discussion.
Also, feel free to vote on this UserVoice item to help us prioritize improvements to Suggestions.
I'm facing this problem myself. My solution does not involve a new index (this will only get messy and cost us money).
My take on this is a while-loop adding 'UserIdentity' (in your case, 'partNumber') to a filter, and re-search until my take/top-limit is met or no more suggestions exists:
public async Task<List<MachineSuggestionDTO>> SuggestMachineUser(string searchText, int take, string[] searchFields)
{
var indexClientMachine = _searchServiceClient.Indexes.GetClient(INDEX_MACHINE);
var suggestions = new List<MachineSuggestionDTO>();
var sp = new SuggestParameters
{
UseFuzzyMatching = true,
Top = 100 // Get maximum result for a chance to reduce search calls.
};
// Add searchfields if set
if (searchFields != null && searchFields.Count() != 0)
{
sp.SearchFields = searchFields;
}
// Loop until you get the desired ammount of suggestions, or if under desired ammount, the maximum.
while (suggestions.Count < take)
{
if (!await DistinctSuggestMachineUser(searchText, take, searchFields, suggestions, indexClientMachine, sp))
{
// If no more suggestions is found, we break the while-loop
break;
}
}
// Since the list might me bigger then the take, we return a narrowed list
return suggestions.Take(take).ToList();
}
private async Task<bool> DistinctSuggestMachineUser(string searchText, int take, string[] searchFields, List<MachineSuggestionDTO> suggestions, ISearchIndexClient indexClientMachine, SuggestParameters sp)
{
var response = await indexClientMachine.Documents.SuggestAsync<MachineSearchDocument>(searchText, SUGGESTION_MACHINE, sp);
if(response.Results.Count > 0){
// Fix filter if search is triggered once more
if (!string.IsNullOrEmpty(sp.Filter))
{
sp.Filter += " and ";
}
foreach (var result in response.Results.DistinctBy(r => new { r.Document.UserIdentity, r.Document.UserName, r.Document.UserCode}).Take(take))
{
var d = result.Document;
suggestions.Add(new MachineSuggestionDTO { Id = d.UserIdentity, Namn = d.UserNamn, Hkod = d.UserHkod, Intnr = d.UserIntnr });
// Add found UserIdentity to filter
sp.Filter += $"UserIdentity ne '{d.UserIdentity}' and ";
}
// Remove end of filter if it is run once more
if (sp.Filter.EndsWith(" and "))
{
sp.Filter = sp.Filter.Substring(0, sp.Filter.LastIndexOf(" and ", StringComparison.Ordinal));
}
}
// Returns false if no more suggestions is found
return response.Results.Count > 0;
}
public async Task<List<string>> SuggestionsAsync(bool highlights, bool fuzzy, string term)
{
SuggestParameters sp = new SuggestParameters()
{
UseFuzzyMatching = fuzzy,
Top = 100
};
if (highlights)
{
sp.HighlightPreTag = "<em>";
sp.HighlightPostTag = "</em>";
}
var suggestResult = await searchConfig.IndexClient.Documents.SuggestAsync(term, "mysuggestion", sp);
// Convert the suggest query results to a list that can be displayed in the client.
return suggestResult.Results.Select(x => x.Text).Distinct().Take(10).ToList();
}
After getting top 100 and using distinct it works for me.
You can use the Autocomplete API for that where does the grouping by default. However, if you need more fields together with the result, like, the partNo plus description it doesn't support it. The partNo will be distinct though.
I am wondering how can I achieve pagination using Cassandra.
Let us say that I have a blog. The blog lists max 10 posts per page. To access next posts a user must click on pagination menu to access page 2 (posts 11-20), page 3 (posts 21-30), etc.
Using SQL under MySQL, I could do the following:
SELECT * FROM posts LIMIT 20,10;
The first parameter of LIMIT is offset from the beginning of result set and second argument is amount of rows to fetch. The example above returns 10 rows starting from row 20.
How can I achieve the same effect in CQL?
I have found some solutions on Google, but all of them require to have "the last result from previous query". It works for having "next" button to paginate to another 10-results-set, but what if I want to jump from page 1 to page 5?
You don't need to use tokens, if you are using Cassandra 2.0+.
Cassandra 2.0 has auto paging.
Instead of using token function to create paging, it is now a built-in feature.
Now developers can iterate over the entire result set, without having to care that it’s size is larger than the memory. As the client code iterates over the results, some extra rows can be fetched, while old ones are dropped.
Looking at this in Java, note that SELECT statement returns all rows, and the number of rows retrieved is set to 100.
I’ve shown a simple statement here, but the same code can be written with a prepared statement, couple with a bound statement. It is possible to disable automatic paging, if it is not desired. It is also important to test various fetch size settings, since you will want to keep the memorize small enough, but not so small that too many round-trips to the database are taken. Check out this blog post to see how paging works server side.
Statement stmt = new SimpleStatement(
"SELECT * FROM raw_weather_data"
+ " WHERE wsid= '725474:99999'"
+ " AND year = 2005 AND month = 6");
stmt.setFetchSize(24);
ResultSet rs = session.execute(stmt);
Iterator<Row> iter = rs.iterator();
while (!rs.isFullyFetched()) {
rs.fetchMoreResults();
Row row = iter.next();
System.out.println(row);
}
Try using the token function in CQL:
https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useToken.html
Another suggestion, if you are using DSE, solr supports deep paging:
https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
Manual Paging
The driver exposes a PagingState object that represents where we were in the result set when the last page was fetched:
ResultSet resultSet = session.execute("your query");
// iterate the result set...
PagingState pagingState = resultSet.getExecutionInfo().getPagingState();
This object can be serialized to a String or a byte array:
String string = pagingState.toString();
byte[] bytes = pagingState.toBytes();
This serialized form can be saved in some form of persistent storage to be reused later. When that value is retrieved later, we can deserialize it and reinject it in a statement:
PagingState pagingState = PagingState.fromString(string);
Statement st = new SimpleStatement("your query");
st.setPagingState(pagingState);
ResultSet rs = session.execute(st);
Note that the paging state can only be reused with the exact same statement (same query string, same parameters). Also, it is an opaque value that is only meant to be collected, stored an re-used. If you try to modify its contents or reuse it with a different statement, the driver will raise an error.
Src: https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlshPaging.html
If you read this doc "Use paging state token to get next result",
https://datastax.github.io/php-driver/features/result_paging/
We can use "paging state token" to paginate at application level.
So PHP logic should look like,
<?php
$limit = 10;
$offset = 20;
$cluster = Cassandra::cluster()->withContactPoints('127.0.0.1')->build();
$session = $cluster->connect("simplex");
$statement = new Cassandra\SimpleStatement("SELECT * FROM paging_entries Limit ".($limit+$offset));
$result = $session->execute($statement, new Cassandra\ExecutionOptions(array('page_size' => $offset)));
// Now $result has all rows till "$offset" which we can skip and jump to next page to fetch "$limit" rows.
while ($result->pagingStateToken()) {
$result = $session->execute($statement, new Cassandra\ExecutionOptions($options = array('page_size' => $limit,'paging_state_token' => $result->pagingStateToken())));
foreach ($result as $row) {
printf("key: '%s' value: %d\n", $row['key'], $row['value']);
}
}
?>
Although the count is available in CQL, so far I have not seen a good solution for the offset part...
So... one solution I have been contemplating was to create sets of pages using a background process.
In some table, I would create the blog page A as a set of references to page 1, 2, ... 10. Then another entry for blog page B pointing to pages 11 to 20, etc.
In other words, I would build my own index with a row key set to the page number. You may still make it somewhat flexible since you can offer the user to choose to see 10, 20 or 30 references per page. For example, when set to 30, you display sets 1, 2, and 3 as page A, sets 4, 5, 6 as page B, etc.)
And if you have a backend process to handle all of that, you can update your lists as new pages are added and old pages are deleted from the blog. The process should be really fast (like 1 min. for 1,000,000 rows if even that slow...) and then you can find the pages to display in your list pretty much instantaneously. (Obviously, if you are to have thousands of users each posting hundreds of pages... that number can grow quickly.)
Where it becomes more complicated is if you wanted to offer a complex WHERE clause. By default a blog shows you a list of all the posts from the newest to the oldest. You could also offer lists of posts with tag Cassandra. Maybe you want to inverse the order, etc. That makes it difficult unless you have some form of advanced way to create your index(es). On my end I have a C-like language which goes and peek and poke to the values in a row to (a) select them and if selected (b) to sort them. In other words, on my end I can already have WHERE clauses as complex as what you'd have in SQL. However, I do not yet break up my lists in pages. Next step I suppose...
Using cassandra-node driver for node js (koa js,marko js) : Pagination
Problem
Due to the absence of skip functionality, we need to work around. Below is the implementation of manual paging for node app in case of anyone can get an idea.
code for simple users list
navigate between next and previous page states
easy to replicate
There are two solutions which i am going to state here but only gave the code for solution 1 below,
Solution 1 : Maintain page states for next and previous records (maintain stack or whatever data structure best fit)
Solution 2 : Loop through all records with limit and save all possible page states in variable and generate pages relatively to their pageStates
Using this commented code in model, we can get all states for pages
//for the next flow
//if (result.nextPage) {
// Retrieve the following pages:
// the same row handler from above will be used
// result.nextPage();
//}
Router Functions
var userModel = require('/models/users');
public.get('/users', users);
public.post('/users', filterUsers);
var users = function* () {//get request
var data = {};
var pageState = { "next": "", "previous": "" };
try {
var userCount = yield userModel.Count();//count all users with basic count query
var currentPage = 1;
var pager = yield generatePaging(currentPage, userCount, pagingMaxLimit);
var userList = yield userModel.List(pager);
data.pageNumber = currentPage;
data.TotalPages = pager.TotalPages;
console.log('--------------what now--------------');
data.pageState_next = userList.pageStates.next;
data.pageState_previous = userList.pageStates.previous;
console.log("next ", data.pageState_next);
console.log("previous ", data.pageState_previous);
data.previousStates = null;
data.isPrevious = false;
if ((userCount / pagingMaxLimit) > 1) {
data.isNext = true;
}
data.userList = userList;
data.totalRecords = userCount;
console.log('--------------------userList--------------------', data.userList);
//pass to html template
}
catch (e) {
console.log("err ", e);
log.info("userList error : ", e);
}
this.body = this.stream('./views/userList.marko', data);
this.type = 'text/html';
};
//post filter and get list
var filterUsers = function* () {
console.log("<------------------Form Post Started----------------->");
var data = {};
var totalCount;
data.isPrevious = true;
data.isNext = true;
var form = this.request.body;
console.log("----------------formdata--------------------", form);
var currentPage = parseInt(form.hdpagenumber);//page number hidden in html
console.log("-------before current page------", currentPage);
var pageState = null;
try {
var statesArray = [];
if (form.hdallpageStates && form.hdallpageStates !== '') {
statesArray = form.hdallpageStates.split(',');
}
console.log(statesArray);
//develop stack to track paging states
if (form.hdpagestateRequest === 'next') {
console.log('--------------------------next---------------------');
currentPage = currentPage + 1;
statesArray.push(form.hdpageState_next);
pageState = form.hdpageState_next;
}
else if (form.hdpagestateRequest === 'previous') {
console.log('--------------------------pre---------------------');
currentPage = currentPage - 1;
var p_st = statesArray.length - 2;//second last index
console.log('this index of array to be removed ', p_st);
pageState = statesArray[p_st];
statesArray.splice(p_st, 1);
//pageState = statesArray.pop();
}
else if (form.hdispaging === 'false') {
currentPage = 1;
pageState = null;
statesArray = [];
}
data.previousStates = statesArray;
console.log("paging true");
totalCount = yield userModel.Count();
var pager = yield generatePaging(form.hdpagenumber, totalCount, pagingMaxLimit);
data.pageNumber = currentPage;
data.TotalPages = pager.TotalPages;
//filter function - not yet constructed
var searchUsers = yield userModel.searchList(pager, pageState);
data.usersList = searchUsers;
if (searchUsers.pageStates) {
data.pageStates = searchUsers.pageStates;
data.next = searchUsers.nextPage;
data.pageState_next = searchUsers.pageStates.next;
data.pageState_previous = searchUsers.pageStates.previous;
//show previous and next buttons accordingly
if (currentPage == 1 && pager.TotalPages > 1) {
data.isPrevious = false;
data.isNext = true;
}
else if (currentPage == 1 && pager.TotalPages <= 1) {
data.isPrevious = false;
data.isNext = false;
}
else if (currentPage >= pager.TotalPages) {
data.isPrevious = true;
data.isNext = false;
}
else {
data.isPrevious = true;
data.isNext = true;
}
}
else {
data.isPrevious = false;
data.isNext = false;
}
console.log("response ", searchUsers);
data.totalRecords = totalCount;
//pass to html template
}
catch (e) {
console.log("err ", e);
log.info("user list error : ", e);
}
console.log("<------------------Form Post Ended----------------->");
this.body = this.stream('./views/userList.marko', data);
this.type = 'text/html';
};
//Paging function
var generatePaging = function* (currentpage, count, pageSizeTemp) {
var paging = new Object();
var pagesize = pageSizeTemp;
var totalPages = 0;
var pageNo = currentpage == null ? null : currentpage;
var skip = pageNo == null ? 0 : parseInt(pageNo - 1) * pagesize;
var pageNumber = pageNo != null ? pageNo : 1;
totalPages = pagesize == null ? 0 : Math.ceil(count / pagesize);
paging.skip = skip;
paging.limit = pagesize;
paging.pageNumber = pageNumber;
paging.TotalPages = totalPages;
return paging;
};
Model Functions
var clientdb = require('../utils/cassandradb')();
var Users = function (options) {
//this.init();
_.assign(this, options);
};
Users.List = function* (limit) {//first time
var myresult; var res = [];
res.pageStates = { "next": "", "previous": "" };
const options = { prepare: true, fetchSize: limit };
console.log('----------did i appeared first?-----------');
yield new Promise(function (resolve, reject) {
clientdb.eachRow('SELECT * FROM users_lookup_history', [], options, function (n, row) {
console.log('----paging----rows');
res.push(row);
}, function (err, result) {
if (err) {
console.log("error ", err);
}
else {
res.pageStates.next = result.pageState;
res.nextPage = result.nextPage;//next page function
}
resolve(result);
});
}).catch(function (e) {
console.log("error ", e);
}); //promise ends
console.log('page state ', res.pageStates);
return res;
};
Users.searchList = function* (pager, pageState) {//paging filtering
console.log("|------------Query Started-------------|");
console.log("pageState if any ", pageState);
var res = [], myresult;
res.pageStates = { "next": "" };
var query = "SELECT * FROM users_lookup_history ";
var params = [];
console.log('current pageState ', pageState);
const options = { pageState: pageState, prepare: true, fetchSize: pager.limit };
console.log('----------------did i appeared first?------------------');
yield new Promise(function (resolve, reject) {
clientdb.eachRow(query, [], options, function (n, row) {
console.log('----Users paging----rows');
res.push(row);
}, function (err, result) {
if (err) {
console.log("error ", err);
}
else {
res.pageStates.next = result.pageState;
res.nextPage = result.nextPage;
}
//for the next flow
//if (result.nextPage) {
// Retrieve the following pages:
// the same row handler from above will be used
// result.nextPage();
//}
resolve(result);
});
}).catch(function (e) {
console.log("error ", e);
info.log('something');
}); //promise ends
console.log('page state ', pageState);
console.log("|------------Query Ended-------------|");
return res;
};
Html side
<div class="box-footer clearfix">
<ul class="pagination pagination-sm no-margin pull-left">
<if test="data.isPrevious == true">
<li><a class='submitform_previous' href="">Previous</a></li>
</if>
<if test="data.isNext == true">
<li><a class="submitform_next" href="">Next</a></li>
</if>
</ul>
<ul class="pagination pagination-sm no-margin pull-right">
<li>Total Records : $data.totalRecords</li>
<li> | Total Pages : $data.TotalPages</li>
<li> | Current Page : $data.pageNumber</li>
</ul>
</div>
I am not very much experienced with node js and cassandra db, this solution can surely be improved. Solution 1 is working example code to start with the paging idea. Cheers
a detailed blog.
Our use case was similar. Pull everything from a Cassandra table (cassandra does it smartly by fetching ~5000 in one go and return a cursor), heavy personalized processing on each row, and keep going. Once our iteration reaches close to 5000, it again fetches the next chunk of 5000 rows internally and adds it to the result cursor. It does it so brilliantly that we don’t even feel this magic happening behind the scene.
but It became a bottleneck for us.As iterating over the chunk took some time and till it reached the end of the chunk, Cassandra thought the connection was not being used and closed the connection automatically yelling, its timeout. So we implemented with page state.
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import SimpleStatement
# connection with cassandra
cluster = Cluster(["127.0.0.1"], auth_provider=PlainTextAuthProvider(username="pankaj", password="pankaj"))
session = cluster.connect()
# setting keyspace
session.set_keyspace("my_keyspace")
# set fetch size
fetch_size = 100
# It will print first 100 records
next_page_available = True
paging_state = None
data_count = 0
while next_page_available is True:
# fetches a new chunk with given page state
result = fetch_a_fresh_chunk(paging_state)
paging_state = results.paging_state
for result in results:
# process payload here.....
# payload processed
data_count += 1
# once we reach fetch size, we stop cassandra to fetch more chunk, internally
if data_count == fetch_size:
i = 0
break
# fetches a fresh chunk with given page state
def fetch_a_fresh_chunk(paging_state = None)
query = "SELECT * FROM my_cute_cassandra_table;"
statement = SimpleStatement(query, fetch_size = fetch_size)
results = session.execute(statement, paging_state=paging_state)
I have a web app running on Azure shared web site mode. A simple method where I add items to a list and sort this list, when the list size is about 300 items, takes 0.3s on my machine and 10s after deploy (on azure machine).
Does anybody has any idea why Azure is so slow?
Is any configuration I do it wrong? I use default one but replaced FREE mode with SHARED mode because I thought this would help but it seems it does not.
UPDATE:
public ActionResult GetPosts(String selectedStreams, int implicitSelectedVisualiserId, int userId)
{
DateTime begin = DateTime.UtcNow;
List<SearchQuery> selectedSearchQueries = searchQueryRepository.GetSearchQueriesOfStreamsIds(selectedStreams == String.Empty ? new List<int>() : selectedStreams.Split(',').Select(n => int.Parse(n)).ToList());
var implicitSelectedVisualiser = VisualiserModel.ToVisualiserModel(visualiserRepository.GetVisualiser(implicitSelectedVisualiserId));
var twitterSearchQueryOfImplicitSelectedVisualiser = searchQueryRepository.GetSearchQuery(implicitSelectedVisualiser.Stream.Name, Service.Twitter, userId);
var instagramSearchQueryOfImplicitSelectedVisualiser = searchQueryRepository.GetSearchQuery(implicitSelectedVisualiser.Stream.Name, Service.Instagram, userId);
var facebookSearchQueryOfImplicitSelectedVisualiser = searchQueryRepository.GetSearchQuery(implicitSelectedVisualiser.Stream.Name, Service.Facebook, userId);
var manualSearchQueryOfImplicitSelectedVisualiser = searchQueryRepository.GetSearchQuery(implicitSelectedVisualiser.Stream.Name, Service.Manual, userId);
List<SearchResultModel> approvedSearchResults = new List<SearchResultModel>();
if (twitterSearchQueryOfImplicitSelectedVisualiser != null || instagramSearchQueryOfImplicitSelectedVisualiser != null || facebookSearchQueryOfImplicitSelectedVisualiser != null
|| manualSearchQueryOfImplicitSelectedVisualiser != null)
{
// Define search text to be displayed during slideshow;
SearchModel searchModel = new SearchModel();
// Set slideshow settings from implicit selected visualiser.
ViewBag.CurrentVisualiser = implicitSelectedVisualiser;
// Load search results from selected visualisers.
foreach (SearchQuery searchQuery in selectedSearchQueries)
{
approvedSearchResults.AddRange(
SearchResultModel.ToSearchResultModel(
searchResultRepository.GetSearchResults
(searchQuery.Id,
implicitSelectedVisualiser.Language)));
// Add defined query too.
searchModel.SearchValue += " " + searchQuery.Query;
}
// Add defined query for implicit selected visualiser.
if (twitterSearchQueryOfImplicitSelectedVisualiser != null)
searchModel.SearchValue += " " + twitterSearchQueryOfImplicitSelectedVisualiser.Query;
if (instagramSearchQueryOfImplicitSelectedVisualiser != null)
searchModel.SearchValue += " " + instagramSearchQueryOfImplicitSelectedVisualiser.Query;
if (facebookSearchQueryOfImplicitSelectedVisualiser != null)
searchModel.SearchValue += " " + facebookSearchQueryOfImplicitSelectedVisualiser.Query;
ViewBag.Search = searchModel;
// Also add search results from implicit selected visualiser
if (twitterSearchQueryOfImplicitSelectedVisualiser != null)
approvedSearchResults.AddRange(SearchResultModel.ToSearchResultModel(searchResultRepository.GetSearchResults(twitterSearchQueryOfImplicitSelectedVisualiser.Id, implicitSelectedVisualiser.Language)));
if (instagramSearchQueryOfImplicitSelectedVisualiser != null)
approvedSearchResults.AddRange(SearchResultModel.ToSearchResultModel(searchResultRepository.GetSearchResults(instagramSearchQueryOfImplicitSelectedVisualiser.Id, implicitSelectedVisualiser.Language)));
if (facebookSearchQueryOfImplicitSelectedVisualiser != null)
approvedSearchResults.AddRange(SearchResultModel.ToSearchResultModel(searchResultRepository.GetSearchResults(facebookSearchQueryOfImplicitSelectedVisualiser.Id, implicitSelectedVisualiser.Language)));
if (manualSearchQueryOfImplicitSelectedVisualiser != null)
approvedSearchResults.AddRange(SearchResultModel.ToSearchResultModel(searchResultRepository.GetSearchResults(manualSearchQueryOfImplicitSelectedVisualiser.Id, implicitSelectedVisualiser.Language)));
// if user selected to show only posts from specific number of last days.
var approvedSearchResultsFilteredByDays = new List<SearchResultModel>();
if (implicitSelectedVisualiser.ShowPostsFromLastXDays != 0)
{
foreach (SearchResultModel searchResult in approvedSearchResults)
{
var postCreatedTimeWithDays = searchResult.PostCreatedTime.AddDays(implicitSelectedVisualiser.ShowPostsFromLastXDays + 1);
if (postCreatedTimeWithDays >= DateTime.Now)
approvedSearchResultsFilteredByDays.Add(searchResult);
}
}
else
{
approvedSearchResultsFilteredByDays = approvedSearchResults;
}
// Order search results (posts to be displayed by created datetime).
var approvedSearchResultsOrdered = new List<SearchResultModel>();
if (implicitSelectedVisualiser.PostsSortOrder == PostsSortOrder.CREATED_DATE_ASC)
{
approvedSearchResultsOrdered = approvedSearchResultsFilteredByDays.OrderBy(s => s.PostCreatedTime).ToList(); ;
}
else if (implicitSelectedVisualiser.PostsSortOrder == PostsSortOrder.CREATED_DATE_DESC)
{
approvedSearchResultsOrdered = approvedSearchResultsFilteredByDays.OrderByDescending(s => s.PostCreatedTime).ToList(); ;
}
else if (implicitSelectedVisualiser.PostsSortOrder == PostsSortOrder.RANDOM)
{
var rnd = new Random();
approvedSearchResultsOrdered = approvedSearchResultsFilteredByDays.OrderBy(x => rnd.Next()).ToList();
}
// Load background images;
var visualiserImages = visualiserImageRepository.GetImages(implicitSelectedVisualiser.Id);
//foreach (SearchResultModel searchResultModel in approvedSearchResultsOrdered)
//{
// searchResultModel.BackgroundImagePath = TwitterUtils.GetRandomImageBackgroundForDisplay(visualiserImages);
//}
ViewBag.BackgroundImagePath = TwitterUtils.GetRandomImageBackgroundForDisplay(visualiserImages);
approvedSearchResults = approvedSearchResultsOrdered;
}
DateTime end = DateTime.UtcNow;
Elmah.ErrorSignal.FromCurrentContext().Raise(new Exception(String.Format("User {0}: Preparing {1} posts for visualiser took {2} seconds", MySession.Current.LoggedInUserName, approvedSearchResults.Count(), (end - begin).TotalMilliseconds / 1000)));
return PartialView("_DisplayPostsNew", approvedSearchResults);
}
This isn't surprising actually. The servers used in Windows Azure are currently mostly 1.6 GHz machines. The larger sized machine you use the more cores you get, but they are all the same speed. This likely is a much slower CPU than the development machine you use.
On Windows Azure Web Sites when you move to Shared mode you are still in a multi-tenant environment, so you could be seeing some noisy neighbors here. The difference between Free and Shared is that many of the quotas for free are removed since you are paying. When you move to Standard then you are assigned a Virtual Machine dedicated to your web sites (up to 100 of them), so that is the best case scenario since you are the only one using the resources at that point.
There was a thread on this on the MSDN forums a while back : http://social.msdn.microsoft.com/Forums/windowsazure/en-US/0d0a3a88-eac4-4b9e-8b10-4a547cbf653b/performance-of-azure-servers-slow-cpus?forum=windowsazuredevelopment
They have started offering different hardware configurations with more memory for Virtual Machines and Cloud Services and such, but I'm not sure the CPUs have been changed. It's hard to find the CPU stated on WindowsAzure.com anymore, but on the pricing calculator for Web Sites it references 1.6Ghz machines when you move the slider to Standard.
Actually I found the issue.
Locally, I tested with a few hundreds of records in my DB while in Azure DB I have over 70 000 records in that table which affects performance of the algorithm...
One mistake I did in the code above: I have filtered records from DB by specific date AFTER taking all out. By filtering directly in Linq, I increased the performance from 10s to 0.3s in Azure too.
I'd like to fetch top n rows from my Azure Table with a simple TableQuery. But with the code below, all rows are fetched regardless of my limit with the Take.
What am I doing wrong?
int entryLimit = 5;
var table = GetFromHelperFunc();
TableQuery<MyEntity> query = new TableQuery<MyEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "MyPK"))
.Take(entryLimit);
List<FeedEntry> entryList = new List<FeedEntry>();
TableQuerySegment<FeedEntry> currentSegment = null;
while (currentSegment == null || currentSegment.ContinuationToken != null)
{
currentSegment = table.ExecuteQuerySegmented(query, this.EntryResolver, currentSegment != null ? currentSegment.ContinuationToken : null);
entryList.AddRange(currentSegment.Results);
}
Trace.WriteLine(entryList.Count) // <-- Why does this exceed my limit?
The Take method on the storage SDK doesn't work like it would in LINQ. Imagine you do something like this:
TableQuery<TableEntity> query = new TableQuery<TableEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "temp"))
.Take(5);
var result = table.ExecuteQuery(query);
When you start iterating over result you'll initially get only 5 items. But underneath, if you keep iterating over the result, the SDK will keep querying the table (and proceed to the next 'page' of 5 items).
If I have 5000 items in my table, this code will output all 5000 items (and underneath the SDK will do 1000 requests and fetch 5 items per request):
TableQuery<TableEntity> query = new TableQuery<TableEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "temp"))
.Take(5);
var result = table.ExecuteQuery(query);
foreach (var item in result)
{
Trace.WriteLine(item.RowKey);
}
The following code will fetch exactly 5 items in 1 request and stop there:
TableQuery<TableEntity> query = new TableQuery<TableEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "temp"))
.Take(5);
var result = table.ExecuteQuery(query);
int index = 0;
foreach (var item in result)
{
Console.WriteLine(item.RowKey);
index++;
if (index == 5)
break;
}
Actually, the Take() method sets the page size or the "take count" (TakeCount property on TableQuery). But it's still up to you to stop iterating on time if you only want 5 records.
In your example, you should modify the while loop to stop when reaching the TakeCount (which you set by calling Take):
while (entryList.Count < query.TakeCount && (currentSegment == null || currentSegment.ContinuationToken != null))
{
currentSegment = table.ExecuteQuerySegmented(query, currentSegment != null ? currentSegment.ContinuationToken : null);
entryList.AddRange(currentSegment.Results);
}
AFAIK Storage Client Library 2.0 had a bug in Take implementation. It was fixed in ver 2.0.4.
Read last comments at http://blogs.msdn.com/b/windowsazurestorage/archive/2012/11/06/windows-azure-storage-client-library-2-0-tables-deep-dive.aspx
[EDIT]
Original MSDN post no longer available. Still present on WebArchive:
http://web.archive.org/web/20200722170914/https://learn.microsoft.com/en-us/archive/blogs/windowsazurestorage/windows-azure-storage-client-library-2-0-tables-deep-dive