The Genius Scan SDK enables application developers to add a scanning module taking benefit of the power of the same technology embedded in the Genius Scan app.
The Core and the OCR modules of the SDK allow to add a fully customizable scan flow to any native app.
The Core module provides:
This guide assumes that you have followed the Getting Started guide :
GSKCore.xcframework
(and, optionally, GSKOCR.xcframework
) in your app.We’ve split the document processing operations into two classes: GSKDocumentDetector
, which mainly handles the real-time document detection on the camera preview; and GSKDocumentProcessor
, which applies various image processing algorithms such as perspective correction and filters to a single scan.
The edge detection takes in an image and returns a quadrangle representing the four corners of the detected document.
GSKDocumentDetector *documentDetector = [GSKDocumentDetector new];
NSError *error = nil;
GSKQuadrangleDetectionResult *result = [self.documentDetector detectQuadrangleFromImage:self.image options:GSKDetectQuadrangleOptionsNone error:&error];
if (!result) {
NSLog(@"Error while detecting document frame: %@", error);
return;
}
GSKQuadrangle *quadrangle = result.quadrangle;
The GSKDocumentProcessor
class takes an image as input and a list of operations to apply to this image. It returns a result object containing the processed image, and the enhancements that it applied.
The SDK can apply the following operations to an image:
UIImage *image = …
// If previous code already applied the detection, we have a quadrangle:
GSKQuadrangle *quadrangle = …
GSKPerspectiveCorrectionConfiguration *perspectiveCorrectionConfiguration = [GSKPerspectiveCorrectionConfiguration perspectiveCorrectionConfigurationWithQuadrangle:quadrangle];
GSKCurvatureCorrectionConfiguration *curvatureCorrectionConfiguration = [GSKCurvatureCorrectionConfiguration curvatureCorrectionConfigurationWithCurvatureCorrection:YES];
GSKEnhancementConfiguration *enhancementConfiguration = [GSKEnhancementConfiguration enhancementConfigurationWithFilter:GSKFilterBlackAndWhite];
NSError *error = nil;
GSKProcessingResult *result = [[GSKDocumentProcessor new] processImage:originalImage
perspectiveCorrectionConfiguration:perspectiveCorrectionConfiguration
curvatureCorrectionConfiguration:curvatureCorrectionConfiguration
enhancementConfiguration:enhancementConfiguration
rotationConfiguration:nil
outputConfiguration:[GSKOutputConfiguration defaultConfiguration]
error:&error];
if (!result) {
NSLog(@"Error while processing scan: %@", error);
return;
}
UIImage *image = [UIImage contentsOfFile:result.processedImagePath];
The PDF generation module provides a couple objects to generate a PDF file.
An object wrapping the information to create a PDF page.
GSKPDFPage *page = [[GSKPDFPage alloc] initWithFilePath:imageFilePath inchesSize:[[GSKPDFSize alloc] initWithWidth:8.27 height:11.69] /* size in inches for an A4 sheet */];
A PDF document representing a collection of PDF pages.
GSKPDFPage *page1 = …
GSKPDFPage *page2 = …
GSKPDFDocument *document = [[GSKPDFDocument alloc] initWithTitle:title password:nil keywords:nil pages:@[ page1, page2 ]];
The generator takes in a PDF document and offers the ability to write the PDF file.
GSKPDFGenerator *generator = [GSKPDFGenerator createWithDocument:document];
[generator generatePDF:outputFilePath];
The “capture” view displays a camera preview. It takes care of setting up the entire camera stack with automatic capture for you. The view comes free of buttons or any UI element so that you can design it as you want. You will need to take care of this in your implementation.
You subclass GSKCameraViewController
and you can customize it as desired. You can refer to the CameraViewController
class in the GSSDKDemo for an example.
Note that GSKCameraViewController
sets up a cameraView
that takes the entire screen. To add toolbars or other buttons, in your subclass’ viewDidLoad
, take control of the layout by disabling the cameraView
’s autoresizing mask: cameraView.translatesAutoresizingMaskIntoConstraints = false
.
The edit frame screen lets the user adjust the auto-detected edges of a document.
You subclass GSKEditFrameViewController
and you can customize it as desired. You can refer to the EditFrameViewController
class in the GSSDKDemo for an example.
The OCR module provides a way to extract text and its layout from scanned images. The extraction outputs the text in two different formats: raw text and XML containing both the text and its layout (also called hOCR). It’s then possible to generate a PDF document using this information to make it searchable and selectable.
Text Recognition relies on the Tesseract library and needs training data files on the mobile device. Training data files are specific for each language in which text needs to be recognized and are available on this page. You have to place them in a directory named tessdata
on the device.
GSKOCRConfiguration *ocrConfiguration = [GSKOCRConfiguration new];
ocrConfiguration.trainedDataPath = …
ocrConfiguration.languageCodes = @[@"eng"];
GSKOCRResult *result = [GSKOCR recognizeTextForImageAtPath:filePath ocrConfiguration:ocrConfiguration onProgress:&progress error:&error];
GSKTextLayout* textLayout = result.textLayout;
PDF generation allows a text layout to be provided for each page of the document.
GSKPDFPage *page = [[GSKPDFPage alloc] initWithFilePath:imageFilePath inchesSize:A4Size textLayout:textLayout];
// Generate PDF
By default, the PDF generation uses a standard font which supports English and Western European languages characters. If you perform text recognition for another language, you need to specify a font supporting this language’s characters when generating the PDF document.
© 2024 The Grizzly Labs. All rights reserved.