My SharePoint Blog

Blogs On SharePoint Technologies


Search Site


My SharePoint Blog recommends any of the following books...


Recent comments

Tags

Don't show

    Categories


    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2008

    SharePoint Search with PDF iFilter and scan existing PDF’s on your site.

    Tricky getting this to work on an existing site. Here goes...

    The trouble is when you have an existing site with many pdf's already in the sites. The indexer will NOT scan the already existing docs. Here is the workaround. In the central admin... 

    1. Operations > Services On Server > Stop "Windows SharePoint Services Search". Click ok to delete the index.
    2. Go to your SQL db and delete the the WSS search database (you may need to recycle the SharePoint app pool in iis to release it from SharePoint).
    3. Go back to Operations > Services On Server > And Start "Windows SharePoint Services Search". The instant that the "Operation In Progress" page is done....
    4. If you have already installed the PDF filter go to step 5 otherwise do the following:
      1. Download and then install the Adobe PDF IFilter from the following Adobe Web site: (http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611) Microsoft provides third-party contact information to help you find technical support. This contact information may change without notice. Microsoft does not guarantee the accuracy of this third-party contact information.
      2. Add the following registry entry, and then set the registry entry value to pdf: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList\38 To do this, follow these steps:
        1. Click Start, click Run, type regedit, and then click OK.
        2. Locate and then click the following registry subkey:
        3. HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\GUID\Gather\Search\Extensions\ExtensionList
        4. On the Edit menu, point to New, and then click String Value. d. Type 38, and then press ENTER.
        5. Right-click the registry entry that you created, and then click Modify.
        6. In the Value data box, type pdf, and then click OK.
      3. Verify that the following two registry subkeys are present and that they contain the appropriate values. Note These registry subkeys and the values that they contain are created when you installed the Adobe PDF IFilter on the server.
        1. HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
          1. The above registry subkey must contain the following registry entry:
          2. Name: Default Type: REG_MULTI_SZ Data: {4C904448-74A9-11D0-AF6E-00C04FD8DC02}
        2. HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\Filters\.pdf
          1. The above registry subkey must contain the following registry entries:
          2. Name: Default Type: REG_SZ Data: (value not set) Name: Extension Type: REG_SZ Data: pdf
          3. Name: FileTypeBucket Type: REG_DWORD Data: 0x00000001 (1)
          4. Name: MimeTypes Type: REG_SZ Data: application/pdf
    5. You need to recreate the "38" key (it is deleted when the index was deleted --this is a WWS 3.0 search bug-- )...
      1. Add the following registry entry, and then set the registry entry value to pdf: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList\38 To do this, follow these steps:
        1. Click Start, click Run, type regedit, and then click OK.
        2. Locate and then click the following registry subkey: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\GUID\Gather\Search\Extensions\ExtensionList
        3. On the Edit menu, point to New, and then click String Value.
        4. Type 38, and then press ENTER.
        5. Right-click the registry entry that you created, and then click Modify.
        6. In the Value data box, type pdf, and then click OK.
    6. Run "net stop spsearch" and then "net start spsearch" from command line.
    7. Go to Application Management > Content Databases > Select your content db and set your search search server.
    8. You are done.

     

    Summary: The problem is that when you stop the service the registry entries are dropped for the ExtensionList. Then once the service is restarted, the list is reset to the default ifilter and your custom ifilters (pdf's, etc) are dropped, thus your site is not scanned for your custom ifilter types. If there is a way to consolidate all or most of these steps into a command line script please post. :)

     

    Here is another option I found on the internet...

    I have seen it mentioned that if you already have PDF files in a document library and then add the Adobe iFilter you have to re-add the documents to the library before they will be indexed. Fortunately this is not true. SQL Server has some system stored procedures to manage indexing. Use SQL Query Analyser to run the following command (after installing your iFilter):

    1. USE Name_of_your_WSS_content_db EXEC sp_fulltext_catalog 'ix_STS_servername_xxxxxx', 'rebuild'
    2. You will find the correct string for 'ix_STS_servername_xxxxxx' by using SQL Server Enterprise Manager.
    3. Expand the WSS content database and click on Full-TextCatalogs. No restart of any services was required for me to be able to search on PDF contents.

    Categories: SharePoint
    Posted by Kevin on Monday, January 08, 2007 11:01 AM
    Permalink | Comments (0) | Post RSSRSS comment feed

    Comments

    Add comment


     

    [b][/b] - [i][/i] - [u][/u]- [quote][/quote]



    Live preview

    Thursday, December 04, 2008 6:01 PM