Technical Overview


The Genius Scan SDK

The Genius Scan SDK enables application developers to add a scanning module taking benefit of the power of the same technology embedded in the Genius Scan app.

The SDK provides:

The SDK includes a licensing system and needs to be initialized with a key to work. Without it, all the methods included in the SDK will fail.


This document is a very high-level overview of the SDK and its main routines. Should you want more technical details, please refer to the API documentation located in the /doc folder of the SDK.


A license key is needed to initialize the SDK. The key contains the Application ID / Bundle ID of your application and an expiration date. Please contact us if you don’t have a license key yet. If the key is invalid or your license is expired, the initialization will fail, and the SDK will not work. A good practice is to check if the initialization succeeds properly, and if not, you can take an appropriate workaround (e.g. disable the feature, prompt the user to update the application…).


if (![GSK initWithLicenseKey:@"<YOUR LICENSE KEY>"]) {
   // The license is expired or invalid


The best place to initialize the SDK is in the main Activity of your application:

try {
   GeniusScanLibrary.init(getApplicationContext(), "<YOUR LICENSE KEY>");
} catch (RuntimeException e) {
   // The license is expired or invalid

Core document processing

At the core of the SDK, you will find the image processing methods doing all the transformations on the images. There are four main image processing routines, listed below, and they are generally used one after another, piping the output of the previous one into the next one.

Document detection

The edge detection takes in an image and returns a quadrangle representing the four corners of the detected document.


GSKQuadrangle *quadrangle;
quadrangle = [GSK detectQuadrangleFromImage:imageOutOfCamera options:0];


Quadrangle quadrangle = GeniusScanLibrary.detectFrame(imageToAnalyzePath);

Document perspective correction

Also referred to as image warping, the perspective correction takes in the original image as well as a quadrangle (typically the quadrangle returned from the document detection) and returns a new image, warped.


UIImage *warpedImage = [GSK warpImage:imageOutOfCamera withQuadrangle:quadrangle];


GeniusScanLibrary.warpImage(imageToWarpPath, warpedImagePath, quadrangle);

Document type detection

The document type detection estimates the best filter that you can apply to the given image. Typically, you apply this routine to the warped image.


GSKPostProcessingType type = [GSK bestPostProcessingForImage:warpedImage];


ImageType type = GeniusScanLibrary.detectImageType(warpedImagePath);

Document enhancement

Document enhancement applies a filter (ie a set of image processing routines) to the image given in the parameters. The image output of this method is generally what’s considered as the final document.


UIImage *enhancedImage = [self enhanceImage:warpedImage withPostProcessing:type];


GeniusScanLibrary.enhanceImage(warpedImagePath, enhancedImagePath, type);

PDF generation

The PDF generation module provides a couple objects to generate a PDF file.

PDF Page

An object wrapping the information to create a PDF page.


GSKPDFPage *page = [[GSKPDFPage alloc] initWithFilePath:imageFilePath inchesSize:[[GSKPDFSize alloc] initWithWidth:8.27 height:11.69] /* size in inches for an A4 sheet */];


PDFPage page = new PDFPage(imageFilePath, new PDFSize(8.27f, 11.69f));

PDF Document

A PDF document representing a collection of PDF pages.


GSKPDFPage *page1 = 
GSKPDFPage *page2 = 

GSKPDFDocument *document = [[GSKPDFDocument alloc] initWithTitle:title password:nil keywords:nil pages:@[ page1, page2 ]];


PDFPage page1 = 
PDFPage page2 = 

PDFDocument document = new PDFDocument(title, password, keywords, Arrays.asList(pages1, page2));

PDF Generator

The generator takes in a PDF document and offers the ability to write the PDF file.


GSKPDFGenerator *generator = [GSKPDFGenerator createWithDocument:document];
[generator generatePDF:outputFilePath];


PDFGenerator generator = PDFGenerator.createWithDocument(document, null, null);

UI components

Capture Screen

The “capture” view displays a camera preview. It takes care of setting up the entire camera stack for you. The view comes free of buttons or any UI element, so that you can design it as you want. You will need to take care of this in your implementation.


On iOS, you subclass GSKCameraViewController, and you can customize it as desired. You can refer to the CameraViewController class in the GSSDKDemo for an example.


On Android, you will need to include the ScanFragment into an Activity that implements the ScanFragment.CameraCallbackProvider. You can refer to the ScanActivity class in the CustomDemo application for an example.

To enable or disable live document detection, use the method scanFragment.setRealTimeDetectionEnabled. And to implement auto trigger or react based on document detection events, set a listener and implement the callbacks with scanFragment.setBorderDetectorListener.

Edit Frame Screen

The edit frame screen lets the user adjust the auto-detected edges of a document.


On iOS, you subclass GSKEditFrameViewController and you can customize it as desired. You can refer to the EditFrameViewController class in the GSSDKDemo for an example.


On Android, you subclass the BorderDetectionImageView and include it into the layout of an Activity. You can refer to the BorderDetectionActivity class in the CustomDemo application for an example.

Text Recognition (OCR)

The OCR module provides a way to extract text and its layout from scanned images. The extraction outputs the text in 2 different formats: raw text and XML containing both the text and its layout (also called hOCR). It’s then possible to generate a PDF document using this information to make it searchable and selectable.

Extract text from images

Text Recognition relies on the Tesseract library and needs training data files to be present on the mobile device. Training data files are specific for each language in which text needs to be recognized and are available on this page. They need to be stored in a directory called tessdata on the device.


GSKOCRConfiguration *ocrConfiguration = [GSKOCRConfiguration new];
ocrConfiguration.trainedDataPath = 
ocrConfiguration.languageCodes = @[@"eng"];

GSKOCRResult *result = [GSKOCR recognizeTextForImageAtPath:filePath ocrConfiguration:ocrConfiguration onProgress:&progress error:&error];

GSKTextLayout* textLayout = result.textLayout;


OcrConfiguration ocrConfiguration = new OcrConfiguration(Arrays.asList("eng"), tessdataDirectory, false);

OcrResult result = ocrProcessor.processImage(image, ocrConfiguration, progressListener);

String xmlTextLayout = result.textLayout;

Generate PDF document with text

PDF generation allows a text layout to be provided for each page of the document that is converted to PDF.


GSKPDFPage *page = [[GSKPDFPage alloc] initWithFilePath:imageFilePath inchesSize:A4Size textLayout:textLayout];
// Generate PDF


PDFPage page = new PDFPage(image.getAbsolutePath(), A4_SIZE, xmlTextLayout);
// Generate PDF

© 2010 The Grizzly Labs, Inc. All rights reserved.