Browse Prior Art Database

Methodology for Performing Machine Learning on Database Data in a SQL Statement Disclosure Number: IPCOM000252457D
Publication Date: 2018-Jan-13
Document File: 3 page(s) / 135K

Publishing Venue

The Prior Art Database

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.


Methodology for Performing Machine Learning on Database Data in a SQL Statement

Currently, most machine learning services (e.g., "create model", "predict") using the data in database system are provided as a Representational State Transfer (REST) service or an application invocation outside the database system.

Machine Learning for an enterprise system is one example that requires a separate application server, and services are exposed as REST application programming interfaces (APIs).

The machine learning database (MLDB) is an open-source "database" for machine learning; however, it is not really a database with tables and columns. In fact, its "dataset" is created from a CSV file or inserted from other sources via a REST call.

When the data used to create the model, or the data used as input to predict, reside in a database, why is the data sent (may contain user ID/password) over the network, which is subject to potential security problems, and its performance is dependent on network conditions?

In addition, most of time, the next step of model creation or predict is another database action. For example, data from a database is used as input to predict. After prediction, based on the prediction result, the current process inserts the data into another table. In this case, the application needs to send another (REST) call to the database. This complicates the application logic and reduces performance quality.

The novel solution is a methodology for performing machine learning on database data in a structured query language (SQL) statement. The approach is to implement two relational database user defined functions (UDF) to perform machine learning using the data inside the database system: one for create model, one for predict/scoring.

The process follows:

1. Implement two Java* applications: A. Model creation program: prepare input to model, create pipeline stage and

model, write model i. Inputs are in JavaScript* Object Notation (JSON) format, which

provides flexibility in terms of the number of parameters. The user can optionally specify the SQL statement to generate input. In other words, inputs can be a result of table JOIN or other database operations.

ii. Output is the file location of the model created. Development can further enhance it to write the model to DBMS.

B. Predict program: prepare model input from user input, and then call transform function


2. Install the executable from #1 to DBMS 3. Create UDF functions reference the executable in #2 4. When UDFs are executed, the system pulls in the required ML libraries as


The disclosed "create model" and "predict" JAVA UDFs leverage Machine Learnin...