ETC

3. Side Project / Ongoing Project (from kaggle, not academic)

flight ticket prediction 오빠랑 돌려본것 (금) - wisconsin breast cancer Yelp reviews

분야도 바꾸어서 나라도 바꾸어서

저는 지금저의 커리어 상에서 3가지 큰 도전을 마주하고 있는데, 하나는 아카데미아에서 회사로 옮기려는 것이고, 하나는 분야를 소프트웨어 엔지니어 에서 데이터 사이언티스로 바꾸려고 하는것이고, 마지막은 이민을 통하여 살고 일하던 나라를 변경하는 것이다.

Data Science Insight cover letter

I have the special interest in finding business insights by leveraging big data intelligence.

focus on software such as developers’ productivity

transit and focus on people (user) business insights social behavior

어렸을때 부터 분석하고 의미를 찾는걸 좋아했다.

컴퓨터 사이언스 엔지니어로써 공부할때도 사람들에 관심이 많아서 소프트웨어 엔지니어링 분야 software engineering 를 연구했었다. 지금 나는 아카데미아에서 회사로써의 트랜지션이라는 큰 도전을 페이싱 하고 있는데 내가 관심이 많고 잘 할수 있을 것 같은 직업인 데이터 사이언스로서 앞으로 커리어를 쌓고 성장하고 싶다.

내가 좋은 데이터 사이언스가 될 수 있는 자질은 다음과 같다.

  1. Strong research capability with the prediction research using statistics techniques
    (strong research backgrounds)

  2. insight 찾는 능력

  3. Fast to learn new technologies and to adapt to a new environment 빨리 배우고 성장 I am eager to have an opportunity to work at Western Digital because of the brilliant colleagues and supportive working environment.

  • I can play a role well as an insight provider or a modeling specialist. I have used the statistics, data mining, and automated software analysis techniques.

With all these considerations, even though I do not have the direct experience as a data scientist, I believe I have the qualifications and skills to become a good data scientist.

======

Data scientists work with data and are becoming mainstream in software companies. The demand for analyzing large scale data is rapidly increasing in software industry.

Are you free to move other cities?

I am now currently want to work in Orange County, but I can available to move to Silicon Valley or Seattle areas next summer.

Project in insight, yelp

Reviews from customers Facebook ‘likes’ Yelp reviews Amazon reviews buying patterns transit from knowing software to people (user) help to make decision on making better (high-quality) software in software selling more products and earning more money in business

Yelp Reviews (final)

  1. 어떤 데이터 구할 수 있나
  2. 어떤 분석을 할 것인가?
    • 어떤 인자들이 가게가 popular 해지는데 영향을 주는가? 빠른 시간에 하도록 하는가?
    • 어떤 리뷰에 신뢰를 안 가지는가? 거르는가?
    • 칭찬일색, 몇개의 리뷰가 없는데 별점이 매우 좋으면 의심, 짧은 리뷰 위주 (업체에서 리뷰 써주는 대가로 음식 제공한 경우일 확률 많음) * 유저들은 어떤 리뷰에 신뢰를 갖는가? 클릭수가 높은거? 잘 정리된 메뉴판 그림, 많은 음식 혹은 선명한 음식 사진들? –> 아마존의 amazon choice 처럼 리뷰가 필요 등 (elite member는 단순히 많이 쓴 사람? 많은 신뢰를 받은 사람에 대한 유저 정의 필요). * In the change-proneness prediction project, I proposed a more accurate predictive model using the metrics capturing the behavioral aspects of the program in conjunction with existing structural metrics.

Research

My previous research was predicting change-prone parts of software by building a predictive model. I proposed the new behavioral dependency metrics that capture the aspects of the dynamic behavior program, and these metrics in conjunction with existing structural metrics help to make a more accurate prediction change-proneness model.

  • In the refactoring identification project, to support developers applying refactorings, I proposed a method to suggest a list of refactoring candidates that would be expected to maximize the values in the maintainability metrics.

Recent research was focused on defining the cost-effective software refactoring process by suggesting the refactoring opportunities that can maximize improvement in software design quality (e.g., maintainability), and I proposed the several new methods. 1) To extract the candidates in classes where real changes have occurred, I made the refactoring candidate identification model that uses the dynamic profiling technique measuring the most frequently used functions based on the several scenarios of user behavior. 2) To find a sequence of cost-effective refactorings, I proposed the method for selecting multiple refactorings that have no dependencies each other and can be applied simultaneously. 3) To reduce the search space of candidates to be examined, I suggested using the two-phase approach by choosing the candidates that are more likely to improve maintainability using the prediction model constructed based on the structural dependencies, change history, and textual information.

scipy numpy search Based prediction model constructed based on the structural dependencies, change history, and textual information.

  • search-based Breast cancer prediction

Research projects related to Data Science


title: “Research projects related to Data Science” #excerpt: “short project description
” collection: portfolio —

Finding similar software processes using Case-Based Reasoning (2007)

Description

Method

Results

#testing prioritization

#bug prediction

#outlier detection

#developers allocation


title: “Research projects related to Data Science” #excerpt: “short project description
” collection: portfolio —

Bug prediction

Kaggle data set

Fraud news detection

I am a research professor in the Next-generation Game Research Center at the Computer Science Department at Korea University. My main research area has been Software Engineering, and I have been working on assessing and improving software design quality using statistics, data mining, and automated software analysis techniques.

My previous research was predicting change-prone parts of software by building a predictive model. I proposed the new behavioral dependency metrics that capture the aspects of the dynamic behavior program, and these metrics in conjunction with existing structural metrics help to make a more accurate prediction change-proneness model.

Recent research was focused on defining the cost-effective software refactoring process by suggesting the refactoring opportunities that can maximize improvement in software design quality (e.g., maintainability), and I proposed the several new methods. 1) To extract the candidates in classes where real changes have occurred, I made the refactoring candidate identification model that uses the dynamic profiling technique measuring the most frequently used functions based on the several scenarios of user behavior. 2) To find a sequence of cost-effective refactorings, I proposed the method for selecting multiple refactorings that have no dependencies each other and can be applied simultaneously. 3) To reduce the search space of candidates to be examined, I suggested using the two-phase approach by choosing the candidates that are more likely to improve maintainability using the prediction model constructed based on the structural dependencies, change history, and textual information.

From now on, I would like to work in industries for solving real-world problems and performing more practical research. I would like to have a chance to work on analyzing the large-scale data set and want to build my expertise as a data scientist. Based on my research experience on predicting the software design defects and finding the cost-effective refactoring candidates, I want to expand my work to find the insights such as predicting the user behavior patterns or estimating software failure rates by analyzing the various kinds of data set (e.g., behavioral traces of user interactions with online systems).