RESEARCH INTERESTS

  • Parallel Computing & Parallel Algorithms
  • High Performance Computing
  • Data Mining
  • Machine Learning
  • Expert Systems
  • Artificial Intelligence
  • Image Processing

PUBLICATIONS

  1. Lan Vu, Gita Alaghband, " A Self-Adaptive Method for Frequent Pattern Mining using a CPU-GPU Hybrid Model " in in the Proceedings of the 2015 High Performance Computing Symposium, ACM, April 2015.
    .
  2. Lan Vu, Gita Alaghband, " A Load Balancing Parallel Method for Frequent Pattern Mining on Multi-core Cluster " in in the Proceedings of the 2015 High Performance Computing Symposium, ACM, April 2015.
    .
  3. Lan Vu, " High Performance Methods for Frequent Pattern Mining " in PhD Thesis Dissertation, University of Colorado at Denver, ProQuest, , Dec. 2014.
    .
  4. Lan Vu, Gita Alaghband, " Novel Parallel Method for Association Rule Mining on Multi-core Shared Memory Systems " in Journal of Parallel Computing, July 2014.
    .
  5. Lan Vu, Gita Alaghband, " Efficient Algorithms for Mining Frequent Patterns from Sparse and Dense Databases," in Journal of Intelligent Systems, Aug 2014.
    .
  6. Lan Vu, Hari Sivarman, Rishi Bidarkar, " GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor," the Proceedings of the 2014 High Performance Computing Symposium, Tampa, FL, April 2014.
    .
  7. Lan Vu, Gita Alaghband, " An Efficient Approach for Mining Association Rules from Sparse and Dense Databases," in the Proceedings of the 2014 International Conference on Information and Knowledge Management, IEEE, Jan 2014.
    .
  8. Lan Vu, Gita Alaghband, " Novel Parallel Method for Mining Frequent Patterns on Multi-core Shared Memory Systems," in the Proceedings of the 2nd International Workshop on Data-Intensive Scalable Computing Systems, ACM, pp. 49-54, Nov 2013.
    .
  9. Lan Vu, Gita Alaghband, " Mining Frequent Patterns Based on Data Characteristics," in the Proceedings of the 2012 International Conference on Information and Knowledge Engineering, pp. 369-375, July 2012.
    .
  10. Lan Vu, Gita Alaghband, " High Performance Frequent Pattern Mining on Multi-core Cluster," in the Proceedings of the 2012 Int. Conference on Collaboration Technologies and Systems, IEEE, pp. 630 - 633, May 2012.
    .
  11. Lan Vu, Gita Alaghband, " A Fast Algorithm Combining FP-Tree and TID-list for Frequent Pattern Mining," in the Proceedings of the 2011 Int. Conf. on Information and Knowledge Engineering, pp. 472-477, July 2011.
    .
  12. Lan Vu, Thanh Le, "Offline Handwriting Digit Recognition for Mark List Scanning System," Technical Report Book, UEH, 2006.
    .
  13. Thanh Le, Lan Vu, Viet Nguyen, "Research and Implementation of Mark List Scanning System," Technical Report Book, UEH, 2006.
    .
  14. Huong Vu, Thanh Le, Lan Vu, "Objective Testing System", Technical Report Book, MOET, 2004.

PATENT

  • One filed patent application.

RESEARCH POSTER

  • Lan Vu, Gita Alaghband, "Mining Frequent Patterns on Multi-core Shared Memory Systems," in the Proceeding of the 2013 GHC, Minneapolis, MN, 2013.
    .
  • Lan Vu, Hari Sivarman, Rishi Bidarkar, "vmCUDA for High Performance GPGPU on ESX," in VMware, Palo Alto, CA 2013.
    .
  • Lan Vu, Gita Alaghband, "High Performance Data Mining Techniques for Gene Function Prediction," in the Proceeding of the 2012 Graduate Cohort Workshop, Bellevue, WA, 2012.
    .
  • Lan Vu, Gita Alaghband, "Novel Methods for Mining Frequent Patterns and their Applications," in the Proceeding of the 15th Research And Creative Activities Symposium,, UCD, Denver, CO, 2012.
    .
  • Lan Vu, Gita Alaghband, "High Performance Human Gene Prediction Using Gene Ontology,"in the Proceeding of the 13th Research And Creative Activities Symposium,UCD, Denver, CO, 2010.

ACADEMIC & INDUSTRIAL RESEARCH PROJECTS

 

 

1. High Performance Techniques for Mining Frequent Patterns and Association Rules

This project proposes to design and implement high performance frequent pattern mining and association rule mining framework including new mining methods for a variety of parallel architectural platforms. At the core of this research is a novel dynamic approach that performs efficiently and fast for both sparse and dense databases and outperforms its sequential counterparts. This approach is capable of dynamically switching between mining strategies that are specifically suitable for either sparse to dense databases. At run time, on the fly the algorithm can determine the degree of density of the remaining database and switch to the appropriate strategy

 

 

2. GPGPU Virtualization on the ESX Hypervisor

Virtualization technologies are increasingly applied to HPC to reduce administration costs and improve system utilization. However, virtualizing the GPU to support general purpose computing presents many challenges because of the complexity of this device. On VMware's ESX hypervisor, DirectPath I/O can provide virtual machines (VMs) high performance access to physical GPUs. However, this technology does not allow multiplexing for sharing GPUs among VMs and is not compatible with vMotion, VMware's technology for transparently migrating VMs among hosts inside clusters. In this research, we address these issues by implementing a solution that uses "remote API execution" and takes advantage of DirectPath I/O to enable general purpose GPU on ESX. This solution, named vmCUDA, allows CUDA applications running concurrently in multiple VMs on ESX to share GPU(s). Our solution requires neither recompilation nor even editing of the source code of CUDA applications.

 

 

3. Cloud Storage Systems

Cloud Computing is recently emerging as a hot topic and receiving a lot of concerns and interests of large community. Cloud Storage is an essential part of Cloud Computing Infrastructure and an important service provided by Cloud. This project is a research on technologies, benefits and challenges of efficient deployment of Cloud Storage System. In this research, I also proposed some suggestions to the development of Cloud Storage as well as Cloud Computing. (Research Paper is available upon request)

 

 

4. Human Gene Function Prediction using Gene Ontology

In this research, I proposed a new method for human gene function prediction by discovering the association rules from the gene annotation data and the concept hierarchy of Gene Ontology (GO). The proposed method can efficiently find out the relations between GO terms which are used later in the reasoning process of a rule-based expert system to predict unknown human gene functions. Besides, the research also presented the result of applying the parallel algorithm on multicore machine to speed up the mining tasks which are usually slow and memory-consuming. The experiment results showed that the proposed prediction method can be used to find the unknown gene functions. The implementation uses C++, MySQL and Drools  (Research Paper is available upon request)

 

 

 

5. Parallel & Distributed Association Rule Mining on Multi-core Cluster

Association rule mining (ARM) is an important data mining technique to discover patterns/rules among items in a large database of variable-length transactions. The goal of ARM is to identify groups of items that most often occur together. It is widely used in market-basket transaction data analysis, graph mining applications like substructure discovery in chemical compounds, pattern finding in web browsing, word occurrence analysis in text documents, and so on. Although ARM has a simple statement, it is a computationally and I/O intensive task. In this project, I researched on design and implementation efficient parallel and distributed ARM algorithms. The experiments were conducted on Duo-core PC and Multi-core Cluster at Parallel Distributed Systems Lab.(Research Paper is available upon request)

 

 

 

6. Gene selection and classification of Microarray gene expression data

In microarray classification, the number of features is always much greater than the number of samples. Therefore, gene selection plays an important role to classification. In addition, standard statistic methodologies in classification usually do not work well on this type of data. Modification of existing statistical methodologies or development of new methodologies is needed for the analysis of microarray data. In this research, I studied a gene selection technique using t-statistic and two classification algorithms for classification of microarray gene expression data. These algorithms include support vector machines (SVM) and diagonal linear discriminant analysis (DLDA). The experimental results using this classification approach indicates the improvement of classification performance in comparison to using linear discriminant analysis (LDA).(Research Paper is available upon request)

 

 

 

7. Performance evaluation of classification algorithms using ROC

Classification is one of central tasks in data mining and there are more and more classification algorithms have been proposed for data classification. It makes difficulty for us in choosing a suitable algorithm. In this research, I compared four popular algorithms for learning binary classification models. I combined an experimental comparison with an examination of ROC curves to assess the algorithmsí relative performance. I also studied the performance characteristics of these algorithms when comparing them on different types and different sizes of data sets. Performance was measured based on the area under the ROC curve which is considered better than accuracy for measuring the ability of a classification method. The research was aimed to verify several hypothesis questions which are guidelines for data mining professional to select good classification methods for their system design task. Finally, I proposed a process of choosing most potential classification methods from all available ones. (Research Paper is available upon request)  

 

 

 

8. Applying Parallel Algorithms in Handwriting Character Recognition Problem

This project introduced handwriting recognition problem and its need in applying parallel methods for speed up. It also presented the parallel propagation/back-propagation algorithms using in neural networks. The implementation uses Visual C++ with OpenMP extension and performance testing is done on PCs with Intel Core 2 Duo processors.  Presentation file (pdf)

 

 

 

9. GPGPU Research and CUDA Programming

In this project, our group did the research on using graphics processing units for general-purpose computing. The project includes two main parts: traditional GPGPU computing and modern GPGPU computing. Our research consists of the approach, GPU architectures, applications, computing model, latest researches and programming method. In the first part (my partner's part), we explored traditional method using shading languages on pipeline architecture GPUs (e.g.  NVIDIA 7 series GPUs and older) . In the second part (my part), we researched modern method using CUDA on unified architecture GPUs (e.g NVIDIA 8 series GPU and later). The project includes a demonstration of Cg and CUDA programs and performance comparison between multi-core CPU and GPU. Implementation used Visual C++, Cg and CUDA. Testing was on  NVIDIA GeForce 7600 GT and Geforce 9400 GT Graphic Cards. Presentation file (pdf)

 

 

 

10. Research on Performance of Secure Web Services

In this project, our group implemented some web services using model client/server and then applied different secure methods for these web services including X.509 KBI, Kerberos and Security Token Service. The security was applied at the message level (application layer). The performance of each secure method was measured and evaluated on LAN network. The implementation uses Microsoft WSE 3.0, X.509 Certificate, WCF, IIS, language C#, Microsoft Windows XP(client) and Windows Server 2008(server), MMC - Certificate Store and some more Windows tools for environment configuration. (Research Paper is available upon request). Presentation file (pdf)

 

 

 

11. Fuzzy rule-based Car Evaluation System

This project is a research of modeling and building a fuzzy rule-based expert system to evaluate how suitable the car is for purchasing. The system works by reasoning over a set of built-in fuzzy rules to calculate an overall rating of a car to provide users helpful information for their purchase decision. The rule set includes the rules f age, engine capacity, cost, efficiency (in terms of Miles per Gallon), top speed and condition of a car. The implementation uses Visual C#

 

 

12. Handwriting Character Recognition Software using Neural Networks

In this project, I implemented the neural network training module for handwriting character recognition application. Many models of artificial neural network (single layer, hybrid networks, multi-layer) were tested to choose the best one that works well on NIST database (a standard database for character recognition problem). This project then was integrated as a module in recognition system of another research project (see Research Project). The implementation uses Visual C++ and NIST database for training and testing.

 

 

13. FX Composer

This project is a research on FX Composer, a tool developed by NVIDIA. FX Composer is very great tool to create shading libraries used in Graphic  Application like 3D games or movies. This virtualization tool allows  users to create their shading libraries without the knowledge of shading languages. It also provides a very powerful programming environment for graphic application. I used FX Composer to create a set of shading libraries and then combined them with 3D objects as a part of the stimulation program that I developed on Microsoft XNA framework, game development tool for Xbox 360. The implementation uses FX Composer from NVIDIA, Microsoft XNA. Presentation file (pdf)

 

 

 

14. Rabin Cryptosystem

This project is a research on Rabin Cryptosystem: its concept, mathematics model, security features, its research trends, applications and comparison to RSA Cryptosystem. Rabin's scheme has avantage in computation time which is several hundred times faster RSA. It was proved that is more secure than RSA. However, because of its limitations like non-deterministic outputs with a quadratic residue, Rabin Cryptosystem is less popular than the other ones. A demonstration of encryption/decryption Rabin system was implemented using Visual C#. (Research Paper is available upon request). Presentation file (pdf) 

 

 

15. RSS in Web 2.0 Economic System

This project is a research on RSS, its usages, creating/using methods as well as its role in Web 2.0 Economic system. In general, RSS is an popular method to publish frequently updated works - such as blog entries, news headlines, audio, and video - in a standardized format. Applying RSS in a right way can bring huge economical benefits for both publishers and readers. Therefore, there are a number of IT companies specialized in collecting and redistributing RSS news. My project also included the demonstration of using RSS with existing tools, an implementation of web-based RSS Reader and an implementation of desktop-based RSS Reader. The implementation uses Visual C# and ASP.NET. Presentation file (pdf)

 

 

16. Objective Testing System (2000 - Aug 2004)

We conducted a research to develop the computer-based testing software that is feasible for deploying the computer-based midterm and final exams in many universities in Vietnam. Our software system enables most test forms like paper-based test, computer-based test and internet-based test. Compared to the existing testing software, our software system is more cost-effective, more secure, faster and transparently scaling on any types of computer labs. Our system allows utilizing all available computer labs and avoids equipping the expensive labs. The project software system consists of six software modules: uEditor, uTest, uServer, uScanner, uMarkScanner and uStatistics. It is currently used at many universities and higher education institutes in Vietnam and one million downloads of free version at the website of Vietnam Ministry of Education and Training from 2004 to 2006.

 

 

17. Offline Handwriting Digit Recognition for Mark List Scanner System (Jan 2004 - Jun 2006)

This project aimed to solve the difficulties of applying and integrating Handwriting Digit Recognition technology into our Mark List Scanning System (MLSS), the software that automatically reads grade list in designed sheets and stores them into the student information system of University of Economics HCMC. MLSS can save hundred hours of manually entering data. The integration of Handwriting Digit Recognition helps to improve the accuracy of processing results and enable the automatic error detection. I built up a recognition engine using Artificial Neural Network and trained this engine using our self-generated dataset and the NIST's database. The research applied the artificial intelligence and image processing techniques such as recognition artificial neural network, Huffman transform, denoising, skeletoning, etc. The recognition module can be intergrated to any other types of handwriting character regconition applications.

 

 

18. Research and Implementation of Mark List Scanning System (Apr 2003 - Jun 2006)

Our project researched an effective solution for building the Mark List Scanning System used at University of Economics HCMC. This software system can automatically reading thousands of grade lists and stores them into the student information system which is a time and cost effective solution to replace the manually data entry task. Conducting this project, we had to solve the challenges of using regular scanners and different-quality grade sheets created by regular printers. The use of low cost regular scanners and printers usually results in the poor quality of input data which decrease the accuracy of information retrieved from the scanned data. Hence, we studied innovative methods of image processing and measurements that produce high accuracy information from the poor quality of input data . Applying our solution can  save up to 90% of cost in investing expensive specialized scanning machines. The software module can be customized for any kind of sheet reading applications.

 

 

19. Econometrics Problem Solver

In this project, I implemented an software which works as learning support tool for students who are studying linear programming. Linear programming is a system of methodology which is used to find optimal solutions of the problems with many variables and constraints. Using this method helps to solve many complicated problems in the fields like manufacturing or business. Users will specify a set of input elements and their constraints and this software will provides step-by-step solutions with detail explanation of different linear programming problems. The implementation was used Visual C++. 

 

 

20. Rule-based Fashion Consulting System

Fashion Consulting System is a rule-based expert system that recommends customers of an online store the most suitable products to purchase. The recommendation of products is based on many elements likes the event, customer's favourite, the suitability of fashion trend, style, etc. The system works by reasoning over the sale databases, product database, expert knowledge in fashion domain and information provided by customer.  The implementation was used Visual C#, ASP.NET and MS Access.

 

 

 

21. OLAP and Data Warehouse

This is a research on OLAP and Data Warehouse. In a brief view,  OLAP is an very efficient tools used in decision support systems. It is built on top of Data Warehouse systems to create multidimensional query machine that provides users the multi dimension querying ability with super fast response time. Managers or researchers can use OLAP to support their decisions. In this project, I implemented demo OLAP tools using MS SQL Server and MS Access.

 

 

22. Minimum Vertex Cover

A research on minimum vertex cover problem. This problem is a one of famous NP-Complete problems. Because of its complexity, it requires approximation methods to solve like Significant Efforts, Branch and Bound. We implemented several solving methods and compare them to specify the efficient ones. Implementation was in Java. Presentation file (pdf)

 

23. Designing and Implementation of Testing System

For this project, I researched and implemented a network-based application system that enables the computer-based test. The software consists of three modules: test server, test client and test editor. The system design satisfies requirements of a normal computer-based test system like security (no data lost on power outage), fairness and efficiency. I used the client/server model and TCP/IP protocol for data communication so that the machines of the system can work together without sharing file system over the network to improve the security. A part of this system design was then applied in another Research project (see Research Project).

 

 

24. Puzzle Game

This is one of my first computer science projects in which I wrote a Puzzle game in Assembly. The complexity of this project was comming from handling the interrupts and processing the signals from keybroad and mouse when players interact with the game. My game also had to process game rules and user graphic interface. This game is small but implementing it gave me huge understanding about working mechanism of computers.