Scanner

Once you have repos in the system, you will need to periodically pull in new commits and calculate statistics on the repo in order to see updated information in the web interface. This may take some time for large repos as every commit and author becomes a row in the database, and additional statistics are calculated from this imported data.

Scanner Fundamentals

Scanning is invoked with a shell command, not through the SourceOptics UI:


python manage.py scan
This should normally be invoked with ssh-agent in order to allow working with SSH keys and passphrases, which are used for some git checkouts, so like this:
ssh-agent python manage.py scan

Note that in order for the scan to NOT stall, the SSH fingerprint of the remote server (for SSH checkouts) must be added to known hosts for the user running the scanner command. You can easily do this by just attempting to do a manual git clone for one of the many repos, and then saying "yes" at the known host prompt.

To scan just a single organization (or organizations matching a substring), you can run as follows:

ssh-agent python manage.py scan -o csc201
And you can also scan a specific repository:
ssh-agent python manage.py scan -o repo_name

Details

You can also disable organizations or repositories in Django admin, to prevent them from being scanned. Disabled repos will still appear in the UI. The ability to hide organizations may come in a later release.

The scanner is governed by a setting in settings.py, PULL_THRESHOLD, which ensures that a repository is not scanned more than every X minutes. If running the scanner frequently (such as every 5 minutes), setting PULL_THRESHOLD to a value that is greater than 30 minutes or so may be appropriate.

The scanner contains/will-contain a flock() call that will prevent concurrent runs from happening on the same machine.

If parallelism is desired, process other organizations on a different machine. Feature additions to enable parallel scans on the same machine may be added in the future.

If you have problems with the scanner going interactive, make sure that you are using ssh:// URLs for repositories in the Django admin configuration for the repo, that the process is wrapped with ssh-agent, and if any credentials have locked keys (SSH keys with passphrases) those are stored on the credential objects.

You should manually add any IP addresses for SSH clones to ~/.ssh/known_hosts or the scanner will not be able to check out the repository.