ISIS 2.0 Launch Event: Great Success!

Last evening, Cogniva’s President Yves Marleau and the entire team welcomed clients, colleagues and friends to help launch our innovative, unified IM tool: ISIS 2.0.

The new ISIS 2.0 introduces:
- Desktop Classifier
- Enhanced Sharepoint Scalability, Performance & Functionality
- Tagging Assistant
- Email Integration
- And more…

Following the ISIS 2.0 presentation was great food, drinks and conversations: all-around success!

A sincere thank you to everyone who participated – we are very grateful for all the support.

Exciting times ahead!

The Cogniva Team

CISRI, Cogniva & uOttawa ESIS Discussion at ARMA NCR 2013 FALL IM DAYS

Join us November 19th & 20th as ARMA NCR hosts the 2013 Fall IM Days Conference: the annual hub for the Information Management community in the National Capital.

Cogniva, CISRI & uOttawa ESIS will be joining forces to present a session entitled: Business Process Analysis & Automatic Classification of Email of Business Value: November 20th, 9:30 am at the Brookstreet Hotel in Kanata, Ottawa.

Join Yves Marleau, Craig Eby, Inges Alberts & André Vellino as they discuss the challenges associated with the automated identification of email of business value in a governmental context. The management of email is an essential stepping-stone on the path toward good governance and long-term business performance for the Canadian Government – come and see how your organization can benefit!

Read more: IM Fall Days Agenda

@CognivaNews @CognivaResearch

Automated Taxonomy Discovery

Here at CISRI we are excited to be engaged in a research project on automated concept and taxonomy discovery. This research is related to an IRAP grant which Cogniva Information Solutions received. The goal of the project is to simplify the creation of metadata taxonomies and the relations between the concepts using text analytics approaches. We are using a variety of open source tools and research methodologies. We will be posting updates and technical details as the research progresses, so stay tuned!

How to Set up Solr and ManifoldCF on an Ubuntu Based Computer

How to Set up Solr and ManifoldCF on an Ubuntu Based Computer

This blog post is intended to provide some guidance on how to set up a computer to run Apache Solr (http://lucene.apache.org/solr/) and Apache ManifoldCF (http://manifoldcf.apache.org/).  Solr is a wrapper for Lucene.  It provides a web UI and a variety of features such as document text extraction (via Apache Tika).  ManifoldCF is a utility for scheduling jobs and providing repository connectors.  We have used it to import documents from both Windows (CIFS) file share and MS SharePoint 2010 into Solr.

This guide was written while installing and configuring Solr and ManifoldCF on a VirtualBox virtual machine running Linux Mint 15 (Mate) x64 (http://www.linuxmint.com/).  I chose Linux Mint because it is a “hot” GNU/Linux distribution these days (http://distrowatch.com/dwres.php?resource=major).  These instructions can be used to install/configure Sorl and ManifoldCF on Ubuntu.  You just need to be aware that the standard text editor on Mate in pluma and on gnome its gedit.  So, anywhere you see ‘pluma’ below substitute ‘gedit’ for ubuntu.  These instructions should also work on Debian, but I have not verified this to be the case (substitute ‘gedit’ for ‘pluma’).

The development of this guide was a joint effort of Chris Salter and myself.

Install Solr

  • Download Solr 4.3.1
  • Decompress and move the solr-4.3.1 directory to /usr/share/solr
  • To do this via the terminal
    • cd ~/Downloads
    • wget http://archive.apache.org/dist/lucene/solr/4.3.1/solr-4.3.1.tgz
    • tar -xzvf solr-4.3.1.tgz
    • sudo cp -R solr-4.3.1 /usr/share/solr
  • Test Solr
    • Open a terminal
      • sudo java -jar /usr/share/solr/example/start.jar
    • Open another terminal
      • cd exampledocs
      • java -jar post.jar *.xml
    • Confirm that you see something like Figure 1.

Solr Test
Figure 1: Solr test

 

Solr WebUI
Figure 2: Solr WebUI

Install ManifoldCF

  • Download ManifoldCF 1.3
  • Decompress and move the apache-manifoldcf-1.3 directory to /usr/share/manifoldcf
  • To do this via the terminal
    • cd ~/Downloads
    • wget http://apache.mirror.rafal.ca/manifoldcf/apache-manifoldcf-1.3-bin.tar.gz
    • tar -xzvf apache-manifoldcf-1.3-bin.tar.gz
    • sudo cp -R apache-manifoldcf-1.3 /usr/share/manifoldcf
  • Test
    • Open a terminal
      • cd  /usr/share/manifoldcf/example
      • sudo java -jar start.jar
    • Open a browser
    • Confirm that you see something like Figure 3.

 

ManifoldCF WebUI
Figure 3: ManifoldCF WebUI

Configure Solr

  • Recommended: Ignore Tika Errors
    • Edit /usr/share/solr/example/solr/collection1/conf/solrconfig.xml
    • Add the line <bool name=”ignoreTikaException”>true</bool> to the list <lst name=”defaults”> under <requestHandler name=”/update/extract” … >
    • After the changes the relevant part of the file should look like figure 4.  The text highlighting was added for clarity.

 

Configure ManifoldCF

Connect ManifoldCF to Solr

  • Start Solr if it is not already running
  • Open a browser
  • Click “List Output Connections”
  • Click “Add a new output connection”
  • Name = Solr
  • Description = Connect to Solr
  • Click “Type” tab
  • Connection type = Solr
  • Click “Continue” button
  • Recommended: Click “Documents” tab
    • Maximum document length = 10240000 (i.e., 10MB)
  • Click Save
  • Confirm that Connection Status = Connection working
    • (see figure 4).

 

Solr Output Connection
Figure 4: Solr Output Connection

Add Windows File Share Support

  • Stop ManifoldCF
  • Download http://jcifs.samba.org/src/jcifs-1.3.17.jar
  • Move the file to jcifs-1.3.17.jar /usr/share/manifoldcf/connector-lib-proprietary
  • Edit /usr/share/manifoldcf/connectors.xml
  • Uncomment <repositoryconnector name=”Windows shares”/>
    • See Figure 5.
  • save
  • Start ManifoldCF

 

connectors.xml
Figure 5: connectors.xml

Create new List Authority Connection to Windows File Share

  • Click “List Authority Connections”
  • Click “Add a new connection”
  • Name = Active Directory
  • Description = optional
  • Click “Type” tab
  • Connection type = Active Directory
  • Click “Continue” button
  • Click “Domain Controller” tab
  • Domain controller name = your-domain-controller-name
  • Domain suffix = your-domain-name
  • Administrative user name = user-account-with-adequate-permissions
  • Administrative password = password
  • Click “Add to End” button
  • Click “Save” button

Set up File Share Repository Connection

  • Click “List Repository Connections”
  • Click “Add new connection”
  • Name = file-share-name
  • Description = optional
  • Click “Type” tab
  • Connection type = Windows shares
  • Authority = Active Directory (this is the name selected when creating the authority)
  • Click “Continue” button
  • Click “Server” tab
  • Server = server-name
  • Authentication domain (optional) = domain-name
  • User name = user-account-with-adequate-permissions
  • Password = account-password
  • Use SIDS for security = Yes (check)
  • Click “Save” button

Set up SharePoint

Set up File Share Repository Connection

  • Click “List Repository Connections”
  • Click “Add new connection”
  • Name = SharePoint
  • Description = SharePoint
  • Click “Type” tab
  • Connection type = SharePoint
  • Authority = Windows File Share Permissions
  • Click “Continue” button
  • Click “Server” tab
  • Server SharePoint version = SharePoint Services 4.0 (2010)
  • Server Protocol = https
  • Server Name = your-server-name (e.g., intranet.my-domain.com)
  • Server Port = your-sharepoint-port (e.g., 4443)
  • Site path = path-to-site (e.g., “/sites/my-main-ste”)
  • User name = account-with-read-permissions
  • Password = account-password
  • Click “Browse” button
  • Naviagate to and select your certificate file (e.g., my-domain.com.cer)
  • Click “Add” button
  • Click “Save” button

Set up File Share Crawl Job

  • Confirm that Connection Status = Connection working
  • Click “List all Jobs”
  • Click “Add a new job”
  • Name = Crawl FileShare
  • Click “Connection” tab
  • Output connection = Solr
  • Repository connection = FileShare
  • Start method = Don’t…
  • Click “Continue” button
  • Click “Scheduling” tab
  • Schedule type = Scan … once
  • Recrawl interval (if continuous) = <blank>
  • Reseed interval (if continuous) = <blank>
  • Click “Paths” tab
  • Select name-of-share (e.g., cognivashare)
    • See Figure 6
  • Click “Add” button
  • Set Filters: See Figure 7.
    • Set 1. Include directory(s) matching *
    • Set 2. Include indexable file(s) matching *
    • Set 3. Exclude un-indexable file(s) matching *
  • Click “Security” tab
  • File security = Enabled
  • Share security = Disabled
  • Recommended: Click “Content Length” tab
    • Maximum document length = 10240000
  • Click “Save” button

Select Share
Figure 6: Select Share

 

Path Filters
Figure 7: Path Filters

Run the File Share Crawl Job

  • Click ” Status and Job Management”
  • Click “Start”
  • Confirm that the numbers under Documents, Active, and Processed are non-zero and increasing.
  • To view the processing in more detail, click “Result Histogram”
  • Connection = FileShare
  • Click Continue button.
  • Confirm that there is a list of file reading activities.

CISRI now on Twitter

I’ve been on the road a bit over the last month, and realized that it was interfering with my blog updating, so I’ve gone ‘higher tech’- I’m now signed up on the CISRI twitter feed, and I’ll be tweeting my i-sci thoughts as they pop up. You can find me under CognivaResearch on twitter. I think this will be a lot of fun, and I hope it will generate some return discussion- if people see me tweeting about a topic of interest, let me know and I’ll be happy to expand upon it in a blog entry.

Le lancement du IRCSI fut un succès!

Bonjour,

Nous souhaitons vous remercier de votre présence et de l’intérêt que vous avez témoigné lors du lancement mardi soir de l’Institut de recherche Cogniva en Sciences de l’information (IRCSI).   Nous pouvons certainement affirmer que l’évènement à été un franc succès par le nombre de professionnels qui y ont assisté ainsi que par la qualité des échanges qui on eu lieu.   Il nous a fait grand plaisir de tous vous accueuilir et de vous rencontrer.

Nous vous suggérons fortement de visiter régulièrement sur le site internet pour avoir des nouvelles sur les projets de recherche et suivre les annonces que fera l’institut au cours des prochains mois.

Nous avons plusieurs projets de recherche planifiés, et avec votre soutien et votre collaboration, l’IRCSI sera bien positionné pour offrir un contribution significative à la communauté de professionnels en science de l’information, et ce, pour les années à venir.

Votre partenaire dans la recherché, la collaboration et l’innovation,

L’équipe de l’IRCSI

CISRI launch: success!

Hello all,

We would like to express sincere thanks for your interest and attendance Tuesday evening at the official launch of the Cogniva Information Science Research Institute. It was a huge success. It was fantastic to see the large number of fellow professionals who came out to support of the event, and we really enjoyed the opportunity to meet and greet many of the attendees.

If you have further or future interest in the Cogniva Information Science Research Institute, we invite you to contact us today.

We have many exciting and innovative projects planned, and with your continued support and collaboration, CISRI is well-positioned to make significant and ongoing contributions to the Information Management community for many years to come.

Yours in research, collaboration and innovation,

The CISRI team

CISRI launch preparations are accelerating!

CISRI launch preparations have been accelerating — we’re less than a month away from the launch and everything is falling into place. Having the website up is great — it’s nice to have a spot where people can go and look into our projects. On the research side, we’ve got a number of things already on the go, but I’ll hold off discussing them until after launch.

Nice IS Theory research review Site: The ‘Theories Used in IS Research’ Wiki

As someone who works in an interdisciplinary area, I find that I’m a bit of a hoarder of review articles, and now, judging by my favourites list, research review web sites. While writing an upcoming blog entry on certain aspects of organizational theory (to appear… soonish…), I came across <A href=”http://www.fsc.yorku.ca/york/istheory/wiki/index.php/Main_Page“>this very nice information science theory wiki</a>, with concise summaries and really quite amazing reference lists for a vast list of IS theories. What a great website. And, hey, since it’s a wiki, while perusing the research reviews, you could always contribute something to it yourself, as well.