I am developing a simple social networking application and want to use Azure Table Storage because it is cheap and is supposed to scale very well. Ive tried to follow the guidance in this article to design the tables and queries (https://msdn.microsoft.com/en-us/library/azure/hh508997.aspx). However, when I load the table with millions of microblog records, the query takes more than 5 minutes. Here are my tables and the query:
Users Table
PartitionKey - GUID
RowKey Email address
FirstName
LastName
ThumbnailImageUrl
Friends Table
PartitionKey Email address of user1
RowKey Email address of user2
Timestamp
Microblogs Table
PartitionKey - DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks
RowKey Email address of user
Timestamp
Text Text of message
FirstName First Name of user
LastName Last Name of user
ThumbnailImageUrl Thumbnail of user
Code snippet with the queries for the microblog:
public Social GetMicroblogs(string userEmail)
{
List<string> lstFriends = null;
lstFriends = context.CreateQuery<Friends>(Friends)
.Where(Friends => (Friends.PartitionKey.Equals(userEmail)))
.AsTableServiceQuery(context).Execute()
.Select(Friends => Friends.RowKey)
.ToList();
lstFriends.Add(userEmail);
if (lstFriends!= null)
{
var microblogTableQuery = new TableQuery();
var microblogCondition = string.Empty;
foreach (var Friends in lstFriends)
microblogCondition += "RowKey eq '" + Friends + "' or ";
microblogCondition = microblogCondition.Remove(microblogCondition.Length - 3);
var microblogDynamicTableEntities = microblogTable.ExecuteQuery(microblogTableQuery.Where(microblogCondition))
.Skip((pageIndex - 1)*pageSize).Take(pageSize)
.ToList();
}
}
Logic
Each time two users become friends, the software will add a record for each user in a Friends table. For example, if user1 connects with user2, then two records are added to the table, as follows: Record 1 would have PK = user1 and RK = user2, Record 2 would
have PK = user2 and RK = user1.
I chose DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks as the partition key in the Microblogs table so that the query results are automatically sorted in chronological order with the newest microblogs first. I do not need to use entity group transactions, and since I will have a large number of microblogs I assumed that having small partitions will enable the queries to be load balanced across multiple servers.
In my query above, I first get the list of friends and then use that as the filter for the query on the microblog table.
Question
My question is, why is this query so slow? Based on the article referenced above, I expected it to achieve the fastest results?
- Edited by Xethnic Friday, May 22, 2015 10:33 PM