Untitled

aries_project.ipynb

우주선 타이타닉

우주선 타이타닉이 충돌하는 동안 어떤 승객이 변칙적으로 운송되었는지 예측

Video

Private video on Vimeo

파일 및 데이터 필드

<https://www.kaggle.com/competitions/spaceship-titanic>

- train.csv - Personal records for about two-thirds (~8700) of the passengers, to be used as training data.  
  - `PassengerId` : A unique Id for each passenger. Each Id takes the form `gggg_pp` where `gggg` indicates a group the passenger is travelling with and `pp` is their number within the group. People in a group are often family members, but not always.
  - `HomePlanet` : The planet the passenger departed from, typically their planet of permanent residence.
  - `CryoSleep` : Indicates whether the passenger elected to be put into suspended animation for the duration of the voyage. Passengers in cryosleep are confined to their cabins.
  - `Cabin` : The cabin number where the passenger is staying. Takes the form deck/num/side, where side can be either P for Port or S for Starboard.
  - `Destination` : The planet the passenger will be debarking to.
  - `Age` : The age of the passenger.
  - `VIP` : Whether the passenger has paid for special VIP service during the voyage.
  - `RoomService, FoodCourt, ShoppingMall, Spa, VRDeck` : Amount the passenger has billed at each of the Spaceship Titanic's many luxury amenities.
  - `Name` : The first and last names of the passenger.
  - `Transported` : Whether the passenger was transported to another dimension. This is the target, the column you are trying to predict.

- test.csv - Personal records for the remaining one-third (~4300) of the passengers, to be used as test data. Your task is to predict the value of Transported for the passengers in this set.

- sample_submission.csv - A submission file in the preprocessingect format.  
  - `PassengerId` : Id for each passenger in the test set.  
  - `Transported` : The target. For each passenger, predict either True or False.

Feature Engineering 절차

  1. 결측치 확인
  2. 데이터 전처리
  3. 상관관계분석 → 결측치 전략 수립
  4. 결측치 처리
  5. 최종 데이터 모판 생성
  6. Random Forest, Logistic Regression 모델 적용

Kaggle Submit 결과