7장

2023. 4. 17. 18:17· Python, Jupyter 🐍/[python]파이썬 데이터분석

7장

상단으로

	col1	col2	col3	col4	col5
row1	1	2	3	<NA>	5
row2	6	<NA>	8	<NA>	10
row3	11	12	13	14	15
row4	<NA>	<NA>	<NA>	<NA>	<NA>

	col1	col2	col3	col4	col5
row1	1	2	3	<NA>	5
row2	6	<NA>	8	<NA>	10
row3	11	12	13	14	15

	col1	col2	col3	col4	col5
row1	1	2	3	<NA>	5
row2	6	<NA>	8	<NA>	10
row3	11	12	13	14	15
row4	<NA>	<NA>	<NA>	<NA>	<NA>

	col1	col2	col3	col4	col5
row1	1	2	3	<NA>	5
row2	6	<NA>	8	<NA>	10
row3	11	12	13	14	15

	col1	col2	col3	col4	col5
row1	1	2	3	<NA>	5
row3	11	12	13	14	15

	col1	col2	col3	col4	col5
row1	6.0	2.0	NaN	4.0	15.0
row2	6.0	7.0	NaN	9.0	15.0
row3	11.0	17.0	NaN	14.0	15.0
row4	NaN	17.0	NaN	NaN	20.0
row5	NaN	22.0	NaN	NaN	25.0

	col1	col2	col3	col4
row1	False	False	True	False
row2	True	False	False	False
row3	False	False	False	True

	0	1	2
0	1.059722	-1.037365	-1.077267
1	-0.964502	-1.123697	-0.019287
2	-0.509794	-1.116728	0.725433
3	-0.202545	-1.504613	-0.720672
4	-1.000958	-0.020754	0.561334
5	-1.469984	0.189896	1.294239
6	-0.798884	0.573638	-0.516727

	0	1	2
0	1.059722	0.000000	0.000000
1	-0.964502	0.000000	0.000000
2	-0.509794	0.000000	0.725433
3	-0.202545	0.000000	-0.720672
4	-1.000958	-0.020754	0.561334
5	-1.469984	0.189896	1.294239
6	-0.798884	0.573638	-0.516727

	0	1	2
0	1.059722	0.500000	0.000000
1	-0.964502	0.500000	0.000000
2	-0.509794	0.500000	0.725433
3	-0.202545	0.500000	-0.720672
4	-1.000958	-0.020754	0.561334
5	-1.469984	0.189896	1.294239
6	-0.798884	0.573638	-0.516727

	0	1	2
0	1.123367	0.651640	-0.080896
1	1.135335	2.878529	0.209940
2	-0.359406	-2.514877	-1.076355
3	0.609690	0.764702	1.757571
4	-0.597993	-0.178420	-1.306245
5	-1.356092	-0.181653	1.192103

인자	설명
value	비어 있는 값을 채울 스칼라값이나 사전 형식의 객체
method	보간 방식. 기본적으로 'ffill' 사용
axis	값을 채워 넣을 축. 기본값은 axis=0
inplace	복사본을 생성하지 않고 호출한 객체 변경. 기본값은 False
limit	값을 앞 혹은 뒤에서부터 몇 개까지 채울지 지정

	food	ounces
0	bacon	4.0
1	pulled pork	3.0
2	bacon	12.0
3	Pastrami	6.0
4	corned beef	7.5
5	Bacon	8.0
6	pastrami	3.0
7	honey ham	5.0
8	nova lox	6.0

	0	1	2	3
0	1.773577	-0.552240	-0.068250	0.337443
1	1.104798	1.613470	-0.343073	0.241429
2	-0.216391	-0.509969	-0.880070	-1.280800
3	3.379592	-1.192191	1.467162	1.830935
4	3.387019	-0.708173	-0.395083	-1.232405
...	...	...	...	...
995	0.716992	0.852429	1.127615	0.184839
996	0.483992	0.995459	-1.882597	0.818562
997	0.568487	0.350285	-0.019806	0.003262
998	-0.140211	0.297317	-0.049651	0.368414
999	0.797491	0.317061	0.277754	-1.230214

	0	1	2	3
count	1000.000000	1000.000000	1000.000000	1000.000000
mean	0.021956	-0.021349	-0.030139	-0.053767
std	1.015655	0.932248	0.970404	0.995527
min	-4.098433	-2.656883	-3.040000	-3.346149
25%	-0.638619	-0.679098	-0.694012	-0.737299
50%	-0.012248	-0.032211	-0.047610	-0.059516
75%	0.720834	0.647759	0.599406	0.603536
max	3.387019	2.807338	3.517849	2.887271

	0	1	2	3
0	1.0	-1.0	-1.0	1.0
1	1.0	1.0	-1.0	1.0
2	-1.0	-1.0	-1.0	-1.0
3	1.0	-1.0	1.0	1.0
4	1.0	-1.0	-1.0	-1.0

	movie_id	title	genres
0	1	Toy Story (1995)	Animation\|Children's\|Comedy
1	2	Jumanji (1995)	Adventure\|Children's\|Fantasy
2	3	Grumpier Old Men (1995)	Comedy\|Romance
3	4	Waiting to Exhale (1995)	Comedy\|Drama
4	5	Father of the Bride Part II (1995)	Comedy
5	6	Heat (1995)	Action\|Crime\|Thriller
6	7	Sabrina (1995)	Comedy\|Romance
7	8	Tom and Huck (1995)	Adventure\|Children's
8	9	Sudden Death (1995)	Action
9	10	GoldenEye (1995)	Action\|Adventure\|Thriller

	Animation	Children's	Comedy	Adventure	Fantasy	Romance	Drama	Action	Crime	Thriller	Horror	Sci-Fi	Documentary	War	Musical	Mystery	Film-Noir	Western
0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
3878	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3879	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3880	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3881	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3882	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

	(0.0, 0.2]	(0.2, 0.4]	(0.4, 0.6]	(0.6, 0.8]	(0.8, 1.0]
0	0	0	0	0	1
1	0	1	0	0	0
2	1	0	0	0	0
3	0	1	0	0	0
4	0	0	1	0	0
5	0	0	1	0	0
6	0	0	0	0	1
7	0	0	0	1	0
8	0	0	0	1	0
9	0	0	0	1	0

	year	quarter	realgdp	realcons	realinv	realgovt	realdpi	cpi	m1	tbilrate	unemp	pop	infl	realint
0	1959.0	1.0	2710.349	1707.4	286.898	470.045	1886.9	28.98	139.7	2.82	5.8	177.146	0.00	0.00
1	1959.0	2.0	2778.801	1733.7	310.859	481.301	1919.7	29.15	141.7	3.08	5.1	177.830	2.34	0.74
2	1959.0	3.0	2775.488	1751.8	289.226	491.260	1916.4	29.35	140.5	3.82	5.3	178.657	2.74	1.09
3	1959.0	4.0	2785.204	1753.7	299.356	484.052	1931.3	29.37	140.0	4.33	5.6	179.386	0.27	4.06
4	1960.0	1.0	2847.699	1770.5	331.722	462.199	1955.5	29.54	139.6	3.50	5.2	180.007	2.31	1.19

	date	item	value
0	1959-03-31 23:59:59.999999999	realgdp	2710.349
1	1959-03-31 23:59:59.999999999	infl	0.000
2	1959-03-31 23:59:59.999999999	unemp	5.800
3	1959-06-30 23:59:59.999999999	realgdp	2778.801
4	1959-06-30 23:59:59.999999999	infl	2.340
5	1959-06-30 23:59:59.999999999	unemp	5.100
6	1959-09-30 23:59:59.999999999	realgdp	2775.488
7	1959-09-30 23:59:59.999999999	infl	2.740
8	1959-09-30 23:59:59.999999999	unemp	5.300
9	1959-12-31 23:59:59.999999999	realgdp	2785.204

item	infl	realgdp	unemp
date
1959-03-31 23:59:59.999999999	0.00	2710.349	5.8
1959-06-30 23:59:59.999999999	2.34	2778.801	5.1
1959-09-30 23:59:59.999999999	2.74	2775.488	5.3
1959-12-31 23:59:59.999999999	0.27	2785.204	5.6
1960-03-31 23:59:59.999999999	2.31	2847.699	5.2
...	...	...	...
2008-09-30 23:59:59.999999999	-3.16	13324.600	6.0
2008-12-31 23:59:59.999999999	-8.79	13141.920	6.9
2009-03-31 23:59:59.999999999	0.94	12925.410	8.1
2009-06-30 23:59:59.999999999	3.37	12901.504	9.2
2009-09-30 23:59:59.999999999	3.56	12990.341	9.6

	value			value2
item	infl	realgdp	unemp	infl	realgdp	unemp
date
1959-03-31 23:59:59.999999999	0.00	2710.349	5.8	-1.929406	0.559769	-1.536088
1959-06-30 23:59:59.999999999	2.34	2778.801	5.1	-1.606365	-0.325483	-0.033743
1959-09-30 23:59:59.999999999	2.74	2775.488	5.3	0.062062	-0.367464	0.057192
1959-12-31 23:59:59.999999999	0.27	2785.204	5.6	0.540428	-1.495289	1.040933
1960-03-31 23:59:59.999999999	2.31	2847.699	5.2	-0.032334	0.111673	-0.270219

9장 (0)	2023.04.18
10장 데이터 집계와 그룹 연산 (0)	2023.04.17
[오류] 'jupyter'은(는) 내부 또는 외부 명령, 실행할 수 있는 프로그램, 또는배치 파일이 아닙니다. (0)	2023.04.09
[오류] TypeError: concat() got an unexpected keyword argument 'join_axes' (0)	2023.04.07
numpy (0)	2023.03.30

7장 데이터 정제 및 준비¶

7.1 누락된 데이터 처리하기¶

isnull¶

dropna¶

axis값에 따른 결측치 제거 수행¶

how로 연산기준 정할 경우¶

thresh 이용하는 경우¶

subset 인수를 통한 레이블 지정¶

inplace 인수를 통한 원본의 수정¶

fillna¶

value의 형식에 따른 사용¶

method 인수 사용¶

limit 인수 사용¶

isnull(isna)¶

notnull(notna)¶

7.1.1 누락된 데이터 골라내기¶

dropna¶

notnull¶

7.1.2 결측치 채우기¶

fillna¶

ffill¶

7.2 데이터 변형¶

7.2.1 중복 제거¶

duplicated¶

drop_duplicates¶

7.2.2 함수나 매핑을 이용해 데이터 변형¶

str.lower¶

map¶

7.2.3 값 치환¶

replace¶

7.2.4 축 색인 이름 바꾸기¶

index.map¶

rename¶

inplace = True¶

7.2.5 개별화와 양자화¶

cut¶

codes¶

categories¶

pd.value_counts¶

precision¶

qcut¶

numpy의 np.random.randint vs rand/randn¶

np.random.randint¶

np.random.rand(m,n)¶

np.random.randn(m,n)¶

7.2.6 특잇값을 찾고 제외하기¶

np.sign¶

7.2.7 치환과 임의 샘플링¶

np.random.permutation¶

take¶

sample¶

replace=True¶

샘플 추출(sample)¶

n의 사용과 replace의 사용¶

frac을 사용하는 경우¶

7.2.8 표시자/더미 변수 계산¶

get_dummies¶

prefix¶

7.3 문자열 다루기¶

7.3.1 문자열 객체 메서드¶

count¶

endswith¶

startswith¶

join¶

index¶

find¶

rfind¶

replace¶

strip, rstrip, lstrip¶

split¶

lower¶

upper¶

casefold¶

ljust, rjust¶

7.3.2 정규 표현식¶

7.3.3 pandas의 벡터화된 문자열 함수¶

'Python, Jupyter 🐍 > [python]파이썬 데이터분석' 카테고리의 다른 글

티스토리툴바

	(0.0, 0.2]	(0.2, 0.4]	(0.4, 0.6]	(0.6, 0.8]	(0.8, 1.0]
0	0	0	0	0	1
1	0	1	0	0	0
2	1	0	0	0	0
3	0	1	0	0	0
4	0	0	1	0	0
5	0	0	1	0	0
6	0	0	0	0	1
7	0	0	0	1	0
8	0	0	0	1	0
9	0	0	0	1	0