Insight Program Application

www.linkedin.com/in/ahrimhan https://ahrimhan.github.io/cv

I have the L2 visa and the work permit (Employment Authorization Document). I applied for a green card (EB2-NIW) in May 2018. I can work anytime.

1. Ph.D. Research

My main research area has been software engineering in computer science. My main research interests include software maintenance and empirical studies. I have been working on evaluating and improving the maintainability of software design quality using statistical, data mining and automated software analysis techniques.

In research on change-proneness prediction, I proposed the new behavioral dependency metric that captures the aspects of the dynamic behavior program, and these metrics in conjunction with existing structural metrics help to make a more accurate prediction change-proneness model. (https://ahrimhan.github.io/portfolio/research_project1/)

In research on refactoring candidate identification for the large-scale software, I proposed several new methods to find the cost-effective refactorings that can efficiently improve maintainability using dynamic information, multiple refactorings, and a two-step approach. (https://ahrimhan.github.io/portfolio/research_project2/) For the scalable computing, I also provided a fast refactoring candidate assessment metric, which can evaluate candidates quickly using the matrix computation. (https://ahrimhan.github.io/portfolio/research_project3/)

More information about my research and published papers can be found in the following links: research and publications. I have a personal webpage introducing about me: homepage.

2. Skills and tools

As my major is computer science and software engineering, I had a chance to learn and use various languages such as Java, Python, SQL, Markdown, Latex, C++, and Fortran (courses). For data analysis, I have used R and SPSS when performing the experiments for my research. I and my graduate colleagues have implemented the following programs for the research purpose.

  • Delta search (written in Python)
    Prototype of the two-phased refactoring identification approach: search space reduction based on the Delta Table
  • Mass refactoring (written in Python)
    Implementation for choosing multiple refactoring candidates with the Delta Table
  • Rank distance (written in Python)
    Implementation for comparing the rank distance between static based and dynamic based refactoring approaches (for experiment purpose)
  • Java Code Quality Analysis Tool (written in Java)
    Java source code analysis and metric measurement tool

I suggested the refactorings identified by our program, Mass refactoring written in Python, to the active open source project, JGit (version 4.7.1). Our program extracts those refactorings to move methods to the inner classes to reduce the program dependencies. Among the suggested refactorings, we selected to submit two of the refactorings. Here is the link for the refactoring suggestions to JGit: suggestions

I quickly learn new languages and tools. I am planning to learn deep learning and machine learning techniques, which can help me to become an advanced data scientist.

3. Side Project / Ongoing Project (from kaggle, not academic)

My main research area is software engineering, and I have been doing research to improve the maintainability of software design quality. Software engineering provides systematic methods, automated tools, techniques, processes to assist developers for software development. Data science and software engineering are very similar in that they are both data-driven activities. Thus, all of my research projects are closely relevant to data science. I list the project links explaining in detail the motivation, goal, method, and experiments.

  • Improvement of change-proneness prediction
    (https://ahrimhan.github.io/portfolio/research_project1/)
  • Efficient refactoring candidate identification
    (https://ahrimhan.github.io/portfolio/research_project2/)
  • Fast refactoring candidate assessment metric
    (https://ahrimhan.github.io/portfolio/research_project3/)

In addition to the research projects, to help customers buy airline tickets, I analyzed the data affecting airplane ticket prices and used data mining techniques to find patterns. Data on airplane fares and the factors affecting ticket prices (e.g., season, week, time, and the number of stops) are directly collected from Priceline and Kayak. I also collected external factors (e.g., oil prices) that can affect ticket prices. Based on the prediction that the airplane ticket will rise or fall tomorrow, I can assist customers to make the decision whether it is good to buy an airplane ticket now (Buy) or better to wait (Wait). Here is the project link.

  • Supporting airline ticket purchase: Buy now or wait?
    (https://ahrimhan.github.io/portfolio/airlineticketPrediction/)

4. Coursework

In Data Mining course in Spring 2008, I analyzed the Global Competitiveness Index (GCI) data from 43 countries for a total of six years from 2002 to 2007 to find valuable insights of the advanced countries. I focused on Asia countries to make a model to suggest a direction for developing countries to become advanced countries. From the result of the analysis, to become advanced countries, I suggest developing countries should strengthen the capabilities of infrastructure, business sophistication, and innovation. Here is the project link.

  • Model for Asia Countries using Global Competitiveness Index (GCI) Data
    (https://ahrimhan.github.io/portfolio/asiaModel/)

5. why data scientist?

I want to become a data scientist in high tech software companies. First, building a career path as a data scientist is a way to continue to use my research background. Second, I would like to have a chance to work on analyzing the large-scale data set for solving real-world problems. Third, I want to feel excited about working in leading tech companies. I am fast to learn new technologies and enjoy working in a dynamically changing environment.

I have the special interest in analyzing the data of online consumer reviews and find business insights. The rich data of consumer reviews (e.g., reviews at Yelp or Amazon) or reactions (e.g., likes at Facebook) is the valuable source for finding business insights and patterns. I am also interested in software defect prediction, fraud news detection, or challenges using Kaggle dataset.

I explain my motivation and interests to become a data scientist and summarize projects related to data science in the link: data science.

6. Data Engineering

Yes, I am also interested in applying for the Data Engineering Fellows Program. I live in Irvine and want to work in Orange County, but I am available to move to Silicon Valley or Seattle areas next summer.

I understand the importance of data collection, storage, and processing because I have had difficulty obtaining the quality of the data. The larger the amount of data, the more important it becomes to collect and process the data smarter. I think that the ability to process data directly is like a chef can cook more delicious food with better ingredients, so I want to have the data engineering skill in order to become an advanced data scientist.

Here is the list of technical skills that I have for being a data engineer.

  • Statistical analysis and modeling
  • SQL-based technologies
  • Python, C++, Java
  • R and SPSS

As I have the background in computer science and software engineering, I can quickly learn languages and technologies (e.g, MapReduce). I have used the data modeling tools (e.g., Enterprise Architect) for my research purpose.