My question is about FILE SHARE content source , we have big project coming up where we tons of data to crawl from file share, but this is in long run not now, currently we have file share of 25 GB and I have crawled it I can see 13000 items crawled, How do I get size of these items in SharePoint, as we go and increase the content in FILE Share search will increase, in this case how do I calculate search size and which search database I need to think of in terms of storage and planning search size. Please explain me. Do I need to concern about index size alone, if so how do I know exact index size based on the no of items and also which database is responsible for storing search data is it crawl or any other search component datab
Hi,
The Search service application has four databases that support SharePoint 2013. The four Search service application databases are shown in the following list. The tables that follow the list display the relevant database information.
- Search Administration: The Search Administration database hosts the Search service application configuration and access control list (ACL) for the crawl component.
- Analytics Reporting: The Analytics Reporting database stores the results for usage analysis reports and extracts information from the Link database when needed.
- Crawl:The Crawl database stores the state of the crawled data and the crawl history.
- Link: The Link database stores the information that is extracted by the content processing component and the click through information
Following is some information which might help you with planning and estimation of search service application and it limits.
For more information : https://technet.microsoft.com/en-us/library/jj219738.aspx
Limit |
Maximum value |
Search service applications |
20 per farm |
Crawl databases |
5 crawl databases per search service application |
Crawl components |
2 per search service application |
Index components |
60 per Search service application |
Index partitions |
20 per search service application |
Index replicas |
3 per index partition |
Indexed items |
100 million per search service application; 10 million per index partition |
Crawl log entries |
100 million per search application |
Property databases |
10 per search service application;128 total |
Link database |
Two per Search service application |
Query processing components |
1 per server computer |
Content processing components |
One per server computer |
Scope rules |
100 scope rules per scope; 600 total per search service application |
Scopes |
200 site scopes and 200 shared scopes per search service application |
Display groups |
25 per site |
Alerts |
100,000 per search application |
Content sources |
50 per search service application |
Start addresses |
100 per content source |
Concurrent crawls |
20 per search application |
Crawled properties |
500,000 per search application |
Crawl impact rule |
no limit |
Crawl rules |
no limit |
Managed properties |
50,000 per search service application |
Values per managed property |
100 |
Indexed managed property size |
512 KB per searchable/queryable managed property |
Managed property mappings |
100 per managed property |
Retrievable managed property size |
16 KB per managed property |
Sortable and refinable managed property size |
16 KB per managed property |
URL removals |
100 removals per operation |
Authoritative pages |
1 top level and minimal second and third level pages per search service application |
Keywords |
200 per site collection |
Metadata properties recognized |
10,000 per item crawled |
Analytics processing components |
6 per Search service application |
Analytics reporting database |
Four per Search service application |
Maximum eDiscovery KeywordQuery text length |
16 KB |
Maximum KeywordQuery text length |
4 KB |
Maximum length of eDiscovery KeywordQuery text at Search service application level |
20 KB |
Maximum length of KeywordQuery text at Search service application level |
20 KB |
Maximum size of documents pulled down by crawler |
64 MB (3 MB for Excel documents) |
Navigable results from search |
100,000 per query request per Search service application |
Number of entries in a custom entity extraction dictionary |
1 million |
Number of entries in a custom search dictionary |
5,000 terms per tenant |
Number of entries in a thesaurus |
1 million |
Ranking models |
1,000 per tenant |
Results removal |
No limit |
Term size |
300 characters |
Unique terms in the index |
2^31 (>2 billion terms) |
Unique contexts used for ranking |
15 unique contexts per rank model |
User defined full text indexes |
10 |
Hope this helps :)
I am working with Terabytes of data with SharePoint which needs to be crawled. Make sure your search architecture has multiple crawl and query servers for HA.
just 25-100 gb of data would not create too much of index so dont worry about that.
Yes, your crawl database can be expected to ocupy around 0.046 X (Size of data). So in your case you should be looking at about 1-1.5GB.
It depends on type of data. If you have image files, they will not be crawled at all, document pages will be crawled.
If you have data in lists then it will take maximum size here.. SO assume 1 to 5 % based on type of data