Not able to crawl all items from External content type

Hello All,

"All the records in my external content source are not getting crawled, only 1/3 rd of the data are getting crawled."

Steps:

  1. I created "External content type" using sharepoint designer which connects to a SQL Server database.
  2. Have written a SQL View joining 2 tables, which return 9,00,000 rows when executed using the SQL Server management studio.
  3. I used the default "Business Data Connectivity Service" and "Search Service Application" and made sure the necessary permissions are set.
  4. Created a External Content source for the search service application and selected the "Business Data Connectivity Service" -> "Crawl selected External datasource" -> <my external datasource i created in sharepoint designer>

Issue

  1. When i ran the full crawl for the first time it crawled "3,49,923" records in 01 hour and 07 seconds. And returned 1 error "Error crawling LOB Contents.(Error caused by exception: System.InvalidOperationException. There is an error in XML document...)
  2. Crawl Error

  3. Later i removed the below item from the index and started a full recrawl, this time it crawled "3,49,924" records 1 record extra from my previous crawl and no errors were returned.

Please let me know what could be the issue. It doesn't look like the permission related issues as i am able to crawl the 1/3rd of my total data. Also i am able to search the crawled data. I also set the throtteling limit for the "Business data catalog" to -maximum 10000000 -default 1000000 which is less than the data it has to crawl.

October 5th, 2013 11:41pm

there are a lot of reasons for this error, can you tell us more about the file returned from DB (ID 888), size? type?

one common reason is time out! have you tried to increase the timeout of the connection time (from Farm Search Administration). and increase the 60, 60 to 120, 120..

Central Administration || Application Managment || Manage Service application || < yourSaerch Server Application Name> || Farm Search Administration.


Free Windows Admin Tool Kit Click here and download it now
October 6th, 2013 12:01am

Thanks for your response. The default timeout of 60, 60 is set. I will increase the timeout and then do a full crawl and will update on how the crawl went.

Thanks again..

October 6th, 2013 12:12am

Check xml of schema definition is there is error?
Free Windows Admin Tool Kit Click here and download it now
October 6th, 2013 8:04am

Hi Jamil,

As mentioned by you i increased the connection timeout from 60, 60 to 180, 180, but still the crawl went for only 1 hour and 7 minutes crawling the same number of records 3,49, 923 records.

I queried the SQL View which i used in the BCS, it returned around 899000 rows in 1 hour and 11 minutes.

Do i need to further increase the timeout to a bigger value?

Thanks,

Sriram

October 6th, 2013 4:32pm

For Huge number of records in BCS, You should divide it into chunks. 

For more details check Url:

http://sharepointserver-2007.blogspot.com/2012/12/searching-external-systems-using.html

Tarek Yehia
blog:http://sharepointserver-2007.blogspot.com

Free Windows Admin Tool Kit Click here and download it now
October 6th, 2013 9:06pm

Thank you for your suggestion. Will check the approach from the link you shared and will update here.

I also looked into some of the other links which also mention about crawling large external data, but havent tried yet.

SharePoint 2013 Search - How to Crawl Large External Data using BCS/WCF Service

http://social.technet.microsoft.com/Forums/lync/en-US/1ac6aca7-cd02-4727-84fc-e89151c62e9a/bcs-sharepoint-2013-throttle-settings

Thanks Again,

October 7th, 2013 8:30pm

Hi ,

I started the change suggested in the link shared by you, but got stuck at a point,

  1. The field which i set as identifier in BCS earlier is not having unique values. Total rows returned by the sql view is 899000, but the unique values present in the column that is set as identifier is 3,49,923, which is equal to the number of rows crawled. - Is this the reason why it didnt crawl all records?
  2. The table that is used in the sql view has composite key, - Is it possible to have multiple identifier in BCS as well?
  3. Is it possible to make BCS to ignore the identifier? i mean creating BCS without an identifier column?

Please let me know your suggestions on this.

Thanks,

Free Windows Admin Tool Kit Click here and download it now
October 11th, 2013 7:59am

Are there any hints in the ULS logs as to what it is doing when it gets to the error?  Maybe a corrupt record?  Can you cut it into 1/3s and see if each 1/3 can be crawled independently to rule out bad data?
December 2nd, 2013 12:47am

Hi ,

I started the change suggested in the link shared by you, but got stuck at a point,

  1. The field which i set as identifier in BCS earlier is not having unique values. Total rows returned by the sql view is 899000, but the unique values present in the column that is set as identifier is 3,49,923, which is equal to the number of rows crawled. - Is this the reason why it didnt crawl all records?
  2. The table that is used in the sql view has composite key, - Is it possible to have multiple identifier in BCS as well?
  3. Is it possible to make BCS to ignore the identifier? i mean creating BCS without an identifier column?

Please let me know your suggestions on this.

T

Free Windows Admin Tool Kit Click here and download it now
December 2nd, 2013 1:25am

Hi,

I am also facing the same issue what Sriram faced. when I execute the query from SQL Management studio it returns 350,000 records, but the crawler crawls 297,000 records. I have Unique ID no duplicate Id's.

There is no error's in the crawl log, and ran Full crawl quiet number of times without luck.

I appreciate your help in this.

Thanks,

Srikanth

February 3rd, 2015 9:06am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics