SharePoint does not call iFilter on ASCII files

Hi,

I'm developing an iFilter for a custom file format on a client's system running SharePoint 2013.

The file extension is .NEZ - it's ASCII - just a bunch of x,y,z points followed by an optional description.

When crawled, SharePoint seems to look at the contents of the file, sees that it's ASCII, it then indexes the file itself without calling my iFilter.  This is a problem because the files could be a few megs and only about 2K is really text that needs to be indexed (real purpose of iFilters).

It's not like I'm trying to replace a native indexing system like DOCX or PDF - this is a specific file extension for a single customer.

I've tried using: Set-SPEnterpriseSearchFileFormatState (To disable parsing of a file format using a built-in format handler) - but .NEZ is not a file format that should be parsed by SharePoint - unless it's just detecting that it's ASCII.

Is there a way to stop SharePoint from doing this?  Or am I missing something?

Thanks,

Carl Ransdell

August 19th, 2013 10:26am

Interesting.  Are you sure the iFilter has been installed correctly so that SharePoint can find it and make the call?  It would need to be installed on all the content processing component servers.
Free Windows Admin Tool Kit Click here and download it now
December 1st, 2013 10:42pm

Yes, thanks for posting Chris - I need to follow up with this thread.

I went through Microsoft technical support because I was able to demonstrate this issue with an iFilter and two files with the same extension - one binary and the other ASCII.  SharePoint 2013 would call my iFilter to index the binary file.  However, it would take it upon itself to index the ASCII file on its own. 

Here's Microsoft's answer:

"Our escalation had a discussion with the SharePoint product group and shared the understanding on this issue:

 The reason why the ASCII (text format) file is not indexed with the custom IFilter is because the FAST Content Processing service uses Automatic Format Detection to process the known formats, such as TEXT files. The following MSDN article has the note:

http://technet.microsoft.com/en-us/library/jj219577.aspx Which says 'You can extend the initial collection of file formats that SharePoint 2013 can parse by adding third-party filter-based format handlers, known as IFilters. You cannot override a built-in format handler by installing a third-party iFilter.'

 

While its probably technically possible to modify FormatRegistry.xml to remove txt files from the list of built in handlers, that would leave the system in an unsupported state."

So - SharePoint 2013 does this by design - you cannot change it unless you want to leave SharePoint in an "unsupported state".  SharePoint 2010 did not do this - apparently, folding the FAST technology into SharePoint 2013 created this issue.

Carl

December 2nd, 2013 9:04am

Hi, You could go with the content enrichment web service instead. Saves you from writing ifilter code as well ;) Configure the service to receive the binary contents and output the data in the format you like, and set a trigger rule to only parse .nez files. Thanks, Mikael Svenson
Free Windows Admin Tool Kit Click here and download it now
December 2nd, 2013 2:41pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics