Pandas - DataFrame 가공하기(feat.apply())

Notice

Recent Posts

Tags more

Archives

관리 메뉴

노트 :

Python

IT_달토끼 2022. 11. 6. 14:22

데이터 전처리를 할 때, 데이터프레임 또는 시리즈의 레코드를 가공해야 하는 경우가 있다.

칼럼 내 레코드를 한번에 가공할 때 사용하는 함수가 apply()이다.

사이킷런의 iris 내장데이터를 이용해, 실습을 해보자.

먼저, iris 데이터를 아래와 같이 불러온다.

from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
iris_data = iris.data
iris_label = iris.target

그다음, iris_data를 아래와 같이 데이터프레임으로 변환한다.

iris_data = pd.DataFrame(iris_data, columns=['petal length', 'petal width', 'sepal length', 'sepal width'])
iris_data

출력해보면 다음과 같이 나올 것이다.

이제 float64형인 petal length 데이터를 int64형으로 apply() 함수를 이용해 변환해 보자.

기존에 불러온 petal length 데이터는 아래와 같이 float64형이다.

iris_data['petal length'].dtype

apply()함수를 아래와 같이 적용한다.

이때 apply()함수의 인자로는 사용자 지정 함수를 넣어주어도 되지만, 편하게 lambda함수를 사용할 수도 있다.

iris_data['petal length'] = iris_data['petal length'].apply(lambda x: int(x))

이제 다시 iris_data의 데이터 타입을 확인해보면, int64형으로 나올 것이다.

iris_data['petal length'].dtype

자세히 확인하기 위해, 다시 iris_data를 찍어보자.

iris_data

petal length 데이터가 바뀐 것을 알 수 있다.

int() 메서드는 float형을 int형으로 변환할 때, 정수 부분을 제외한 실수 부분은 버리기 때문에 0행의 1.4가 1로 바뀌었다.

Streamlit 설치 (0)	2023.01.28
Pandas - DataFrame과 list, dict, ndarray 상호 변환 (0)	2022.11.27
Tqdm - 진행율 표시하기(progress bar) (0)	2022.10.23
Numpy - 1차원 변환 flatten, ravel, reshape (0)	2022.10.23
정규표현식 (Regular Expressions) (0)	2022.10.03

'Python' Related Articles