SharePoint Coder

Configure Acrobat PDF IFilter in SharePoint

SharePoint is very extensible and customizable, and this is really true. For example, let’s take a look at the search functionality in SharePoint. By default only Office documents (which are in a document library for example) are indexed by the Indexing Service so they can be found by using the search functionality of SharePoint.

Of course in the real world there are a lot more document types that are used, for example a lot of companies have PDF documents. So I get quite a lot questions of people asking if PDF documents can be indexed too. The good news is that the Indexing Service can be extended by using the IFilter interface:

The IFilter interface scans documents for text and properties (also called attributes). It extracts chunks of text from these documents, filtering out embedded formatting and retaining information about the position of the text. It also extracts chunks of values, which are properties of an entire document or of well-defined parts of a document. IFilter provides the foundation for building higher-level applications such as document indexers and application-independent viewers.

Steps  below for configuring PDF iFilter in MOSS 2007.

  1. Install Adobe PDF iFilter 9 for 64-bit platforms (HERE).
  2. Verify that PDF has been added to the registry.
    1. Run Regedit by browsing to c:\Windows\system32\regedt32.exe and double-clicking it.
    2. Within left-side tree, browse to:
      \\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Applications\{site GUID}\Gather\Portal_Content\Extensions\ExtensionList
    3. If PDF extension is present, skip to Step 3 . If PDF extension is not present, continue with Step d.
    4. Right click on right-side Extension List pane and choose New > String Value
    5. Add a name to the new Registry Key (e.g. “38”)
    6. Double click the new Registry Key. For “Value data”, enter “pdf”

    Note: This can also be achieved via SharePoint Server Search Administration page by adding ‘pdf’ to list of File Types in Search Administration->File Types. This would automatically add an entry for ‘pdf’ filetype as mentioned above in step 1

  3. Verify that PDF has the correct settings in a second registry location.
    1. While still in Regedit, within the left-side tree, browse to:
      \\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\Filters\.pdf
    2. Verify the following values. If values are not as shown, edit them.
      1. EG_SZ Default = value not set
      2. REG_SZ Extension = pdf
      3. REG_DWORD FileTypeBucket = 1
      4. REG_SZ MimeTypes = application/pdf
  4. Verify that PDF has the correct settings in a third registry location.
    1. While still in Regedit, within the left-side tree, browse to:
      \\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
    2. Verify the following values. If values are not as shown, edit them.
      1. Default = {E8978DA6-047F-4E3D-9C78-CDBE46041603}
  5. Verify that pdf.gif is present at the following location:
    C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\IMAGES
    (To get PDF icon – see adobe site for latest HERE)
  6. Add an entry in docicon.xml for the pdf icon:
    C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\TEMPLATE\XML
    Mapping Key=”pdf” Value=”pdf.gif”

    Note:Step 4 and 5 are done in order to have the pdf icon in SharePoint while it displays the search results.

  7. Restart all SharePoint services as well as IIS.
    1. Launch the DOS Shell (Start > All Programs > Accessories > Command Prompt).
    2. Type the following at the prompt: “net stop osearch”. Wait for success message.
    3. Type the following at the prompt: “net start osearch”. Wait for success message.
    4. Type the following at the prompt: “iisreset”. Wait for success message.
  8. Microsoft Office SharePoint Server can now index PDF files. Also, PDF icon should show in File Types list.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.