728x90

Reference
- DataManim (https://www.datamanim.com/dataset/99_pandas/pandasMain.html#)
- <파이썬 한권으로 끝내기>, 데싸라면▪빨간색 물고기▪자투리코드, 시대고시기획 시대교육

DataSet

식당데이터 : justmarkham/DAT8
DataUrl = ‘https://raw.githubusercontent.com/Datamanim/pandas/main/chipo.csv’

Question

✔ 데이터를 로드하라.

import pandas as pd
DataUrl = 'https://raw.githubusercontent.com/Datamanim/pandas/main/chipo.csv'
df = pd.read_csv(DataUrl)
type(df)

✔ quantity컬럼 값이 3인 데이터를 추출하여 첫 5행을 출력하라

In [ ]:

df.loc[df['quantity']==3].head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price
409	178	3	Chicken Bowl	[[Fresh Tomato Salsa (Mild), Tomatillo-Green C...	$32.94
445	193	3	Bowl	[Braised Carnitas, Pinto Beans, [Sour Cream, C...	$22.20
689	284	3	Canned Soft Drink	[Diet Coke]	$3.75
818	338	3	Bottled Water	NaN	$3.27
850	350	3	Canned Soft Drink	[Sprite]	$3.75

✔ quantity컬럼 값이 3인 데이터를 추출하여 index를 0부터 정렬하고 첫 5행을 출력하라

In [ ]:

df.loc[df['quantity']==3].head().reset_index(drop=True)

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price
0	178	3	Chicken Bowl	[[Fresh Tomato Salsa (Mild), Tomatillo-Green C...	$32.94
1	193	3	Bowl	[Braised Carnitas, Pinto Beans, [Sour Cream, C...	$22.20
2	284	3	Canned Soft Drink	[Diet Coke]	$3.75
3	338	3	Bottled Water	NaN	$3.27
4	350	3	Canned Soft Drink	[Sprite]	$3.75

+ reset_index() : 기존의 인덱스가 하나의 칼럼으로 들어가게 되고, 새로운 인덱스가 사용됨

+ reset_index(drop=True) : 기존의 인덱스가 칼럼으로 더해지는 것을 방지 가능

✔ quantity , item_price 두개의 컬럼으로 구성된 새로운 데이터 프레임을 정의하라

In [ ]:

Ans = df[['quantity','item_price']]
Ans.head()

Out[ ]:

	quantity	item_price
0	1	$2.39
1	1	$3.39
2	1	$3.39
3	1	$2.39
4	2	$16.98

+대괄호 [] 안에 선택하고자 하는 열의 이름을 리스트로 넣어주면 해당 열들로 구성된 새로운 데이터프레임이 반환

✔ item_price 컬럼의 달러표시 문자를 제거하고 float 타입으로 저장하여 new_price 컬럼에 저장하라

In [ ]:

df['new_price'] = df['item_price'].str[1:].astype('float')
df['new_price'].head()

# OR

df['new_price'] = df['item_price'].str.replace('$', '').astype('float')
df['new_price'].head()

Out[ ]:

0     2.39
1     3.39
2     3.39
3     2.39
4    16.98
Name: new_price, dtype: float64

+ df['item_price'].str[n:] : 컬럼의 각 문자열에서 n 번째 문자를 제외한 나머지 부분을 선택함

+ str.replace(a, b) : 문자열 a에서 b로 대체. 문자열이 아닌 다른 데이터 유형에는 적용할 수 없으므로, 문자열이 아니라면 df['item_price'].astype(str) 다음과 같은 방법으로 문자열 변환 가능

✔ new_price 컬럼이 5이하의 값을 가지는 데이터프레임을 추출하고, 전체 갯수를 구하여라

In [ ]:

len(df.loc[df.new_price <= 5])

Out[ ]:

✔ item_name명이 Chicken Salad Bowl 인 데이터 프레임을 추출하라고 index 값을 초기화 하여라

In [ ]:

Ans = df.loc[df.item_name == 'Chicken Salad Bowl'].reset_index(drop=True)
Ans.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
0	20	1	Chicken Salad Bowl	[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...	$8.75	8.75
1	60	2	Chicken Salad Bowl	[Tomatillo Green Chili Salsa, [Sour Cream, Che...	$22.50	22.50
2	94	2	Chicken Salad Bowl	[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...	$22.50	22.50
3	111	1	Chicken Salad Bowl	[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...	$8.75	8.75
4	137	2	Chicken Salad Bowl	[Fresh Tomato Salsa, Fajita Vegetables]	$17.50	17.50

✔ new_price값이 9 이하이고 item_name 값이 Chicken Salad Bowl 인 데이터 프레임을 추출하라

In [ ]:

Ans = df.loc[(df.item_name=="Chicken Salad Bowl") & (df.new_price <= 9)]
Ans.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
44	20	1	Chicken Salad Bowl	[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...	$8.75	8.75
256	111	1	Chicken Salad Bowl	[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...	$8.75	8.75
526	220	1	Chicken Salad Bowl	[Roasted Chili Corn Salsa, [Black Beans, Sour ...	$8.75	8.75
528	221	1	Chicken Salad Bowl	[Tomatillo Green Chili Salsa, [Fajita Vegetabl...	$8.75	8.75
529	221	1	Chicken Salad Bowl	[Tomatillo Green Chili Salsa, [Fajita Vegetabl...	$8.75	8.75

✔ df의 new_price 컬럼 값에 따라 오름차순으로 정리하고 index를 초기화 하여라

In [ ]:

Ans = df.sort_values('new_price').reset_index(drop=True)
Ans.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
0	471	1	Bottled Water	NaN	$1.09	1.09
1	338	1	Canned Soda	[Coca Cola]	$1.09	1.09
2	1575	1	Canned Soda	[Dr. Pepper]	$1.09	1.09
3	47	1	Canned Soda	[Dr. Pepper]	$1.09	1.09
4	1014	1	Canned Soda	[Coca Cola]	$1.09	1.09

+ sort_values() : 기본적으로 오름차순(ascending order)으로 데이터를 정렬
+ sort_values('Age', ascending=False) : 내림차순(descending order)으로 정렬하려면 ascending=False 옵션을 사용할 수 있음

✔ df의 item_name 컬럼 값중 Chips 포함하는 경우의 데이터를 출력하라

In [ ]:

Ans = df.loc[df.item_name.str.contains('Chips')]
Ans.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
0	1	1	Chips and Fresh Tomato Salsa	NaN	$2.39	2.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NaN	$2.39	2.39
6	3	1	Side of Chips	NaN	$1.69	1.69
10	5	1	Chips and Guacamole	NaN	$4.45	4.45
14	7	1	Chips and Guacamole	NaN	$4.45	4.45

✔ df의 짝수번째 컬럼만을 포함하는 데이터프레임을 출력하라

In [ ]:

Ans = df.iloc[:,::2]
Ans.head()

Out[ ]:

	order_id	item_name	item_price
0	1	Chips and Fresh Tomato Salsa	$2.39
1	1	Izze	$3.39
2	1	Nantucket Nectar	$3.39
3	1	Chips and Tomatillo-Green Chili Salsa	$2.39
4	2	Chicken Bowl	$16.98

+ df.iloc[:,::2]: 콜론(:)은 모든 행을 선택함을 의미. 두 번째 콜론(::)은 열 인덱스를 건너뛰는 슬라이싱. 마지막 숫자 2는 건너뛸 간격을 의미

✔ df의 new_price 컬럼 값에 따라 내림차순으로 정리하고 index를 초기화 하여라

In [ ]:

Ans = df.sort_values('new_price', ascending=False).reset_index(drop=True)
Ans.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
0	1443	15	Chips and Fresh Tomato Salsa	NaN	$44.25	44.25
1	1398	3	Carnitas Bowl	[Roasted Chili Corn Salsa, [Fajita Vegetables,...	$35.25	35.25
2	511	4	Chicken Burrito	[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...	$35.00	35.00
3	1443	4	Chicken Burrito	[Fresh Tomato Salsa, [Rice, Black Beans, Chees...	$35.00	35.00
4	1443	3	Veggie Burrito	[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...	$33.75	33.75

✔ df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 인덱싱하라

In [ ]:

df.loc[(df.item_name=="Steak Salad")|(df.item_name=="Bowl")]

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
445	193	3	Bowl	[Braised Carnitas, Pinto Beans, [Sour Cream, C...	$22.20	22.20
664	276	1	Steak Salad	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$8.99	8.99
673	279	1	Bowl	[Adobo-Marinated and Grilled Steak, [Sour Crea...	$7.40	7.40
752	311	1	Steak Salad	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$8.99	8.99
893	369	1	Steak Salad	[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...	$8.99	8.99
3502	1406	1	Steak Salad	[[Lettuce, Fajita Veggies]]	$8.69	8.69

✔ df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 데이터 프레임화 한 후, item_name를 기준으로 중복행이 있으면 제거하되 첫번째 케이스만 남겨라

In [ ]:

df.loc[(df.item_name=="Steak Salad") | (df.item_name=="Bowl")].drop_duplicates('item_name')

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
445	193	3	Bowl	[Braised Carnitas, Pinto Beans, [Sour Cream, C...	$22.20	22.20
664	276	1	Steak Salad	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$8.99	8.99

+ drop_duplicates() : 모든 열을 고려하여 중복된 행을 찾고 제거. 첫 번째로 등장한 행은 유지되고 그 이후의 중복된 행은 제거.

✔ df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 데이터 프레임화 한 후, item_name를 기준으로 중복행이 있으면 제거하되 마지막 케이스만 남겨라

In [ ]:

df.loc[(df.item_name=="Steak Salad") | (df.item_name=="Bowl")].drop_duplicates('item_name', keep='last')

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
673	279	1	Bowl	[Adobo-Marinated and Grilled Steak, [Sour Crea...	$7.40	7.40
3502	1406	1	Steak Salad	[[Lettuce, Fajita Veggies]]	$8.69	8.69

✔ df의 데이터 중 new_price값이 new_price값의 평균값 이상을 가지는 데이터들을 인덱싱하라

In [ ]:

Ans = df.loc[df.new_price >= df.new_price.mean()]
Ans.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98	16.98
5	3	1	Chicken Bowl	[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...	$10.98	10.98
7	4	1	Steak Burrito	[Tomatillo Red Chili Salsa, [Fajita Vegetables...	$11.75	11.75
8	4	1	Steak Soft Tacos	[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...	$9.25	9.25
9	5	1	Steak Burrito	[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...	$9.25	9.25

✔ df의 데이터 중 item_name의 값이 Izze 데이터를 Fizzy Lizzy로 수정하라

In [ ]:

df.loc[df.item_name=='Izze', 'item_name'] = "Fizzy Lizzy"
df.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
0	1	1	Chips and Fresh Tomato Salsa	NaN	$2.39	2.39
1	1	1	Fizzy Lizzy	[Clementine]	$3.39	3.39
2	1	1	Nantucket Nectar	[Apple]	$3.39	3.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NaN	$2.39	2.39
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98	16.98

+ df.loc[df.item_name=='Izze', 'item_name']

.loc[df.item_name=='Izze', 열] : "item_name" 열의 값이 "Izze"인 행을 선택

.loc[행, 'item_name']: 선택된 행 중에 'item_name' 열을 지정

✔ df의 데이터 중 choice_description 값이 NaN 인 데이터의 갯수를 구하여라

In [ ]:

df.choice_description.isnull().sum()

Out[ ]:

+ isnull() : 해당 요소가 결측값인 경우에는 True를 반환하고 그렇지 않은 경우에는 False를 반환

✔ df의 데이터 중 choice_description 값이 NaN 인 데이터를 NoData 값으로 대체하라(loc 이용)

In [ ]:

df.loc[df.choice_description.isnull(), 'choice_description'] = 'NoData'
df.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
0	1	1	Chips and Fresh Tomato Salsa	NoData	$2.39	2.39
1	1	1	Fizzy Lizzy	[Clementine]	$3.39	3.39
2	1	1	Nantucket Nectar	[Apple]	$3.39	3.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NoData	$2.39	2.39
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98	16.98

✔ df의 데이터 중 choice_description 값에 Black이 들어가는 경우를 인덱싱하라

In [ ]:

Ans = df[df.choice_description.str.contains('Black')]
Ans.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98	16.98
7	4	1	Steak Burrito	[Tomatillo Red Chili Salsa, [Fajita Vegetables...	$11.75	11.75
9	5	1	Steak Burrito	[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...	$9.25	9.25
11	6	1	Chicken Crispy Tacos	[Roasted Chili Corn Salsa, [Fajita Vegetables,...	$8.75	8.75
12	6	1	Chicken Soft Tacos	[Roasted Chili Corn Salsa, [Rice, Black Beans,...	$8.75	8.75

✔ df의 데이터 중 choice_description 값에 Vegetables 들어가지 않는 경우의 갯수를 출력하라

In [ ]:

len(df.loc[~df.choice_description.str.contains('Vegetables')])

Out[ ]:

✔ df의 데이터 중 item_name 값이 N으로 시작하는 데이터를 모두 추출하라

In [ ]:

Ans = df[df.item_name.str.startswith('N')]
Ans.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
2	1	1	Nantucket Nectar	[Apple]	$3.39	3.39
22	11	1	Nantucket Nectar	[Pomegranate Cherry]	$3.39	3.39
105	46	1	Nantucket Nectar	[Pineapple Orange Banana]	$3.39	3.39
173	77	1	Nantucket Nectar	[Apple]	$3.39	3.39
205	91	1	Nantucket Nectar	[Peach Orange]	$3.39	3.39

✔ df의 데이터 중 item_name 값의 단어갯수가 15개 이상인 데이터를 인덱싱하라

In [ ]:

Ans = df[df.item_name.str.len()>15]
Ans.head()

Out[ ]:

	order_id	quantity	item_name	choice_description	item_price	new_price
0	1	1	Chips and Fresh Tomato Salsa	NoData	$2.39	2.39
2	1	1	Nantucket Nectar	[Apple]	$3.39	3.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NoData	$2.39	2.39
8	4	1	Steak Soft Tacos	[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...	$9.25	9.25
10	5	1	Chips and Guacamole	NoData	$4.45	4.45

✔ df의 데이터 중 new_price값이 lst에 해당하는 경우의 데이터 프레임을 구하고 그 갯수를 출력하라 lst =[1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98]

In [ ]:

lst =[1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98]
Ans = df.loc[df.new_price.isin(lst)]
display(Ans.head())
print(len(Ans))

	order_id	quantity	item_name	choice_description	item_price	new_price
0	1	1	Chips and Fresh Tomato Salsa	NoData	$2.39	2.39
1	1	1	Fizzy Lizzy	[Clementine]	$3.39	3.39
2	1	1	Nantucket Nectar	[Apple]	$3.39	3.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NoData	$2.39	2.39
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98	16.98

+ .isin(lst) : 주어진 리스트(lst)에 속하는 값을 가진 행에 대해 True를 반환하고, 그렇지 않은 값에 대해 False를 반환

728x90

'🥇 certification logbook' 카테고리의 다른 글

[python 데이터 핸들링] 판다스 연습 튜토리얼 - 06_Pivot (0)	2023.06.15
[python 데이터 핸들링] 판다스 연습 튜토리얼 - 05_Time_Series (0)	2023.06.13
[python 데이터 핸들링] 판다스 연습 튜토리얼 - 04_Apply , Map (0)	2023.06.13
[python 데이터 핸들링] 판다스 연습 튜토리얼 - 03_Grouping (0)	2023.06.13
[python 데이터 핸들링] 판다스 연습 튜토리얼 - 01 Getting & Knowing Data (0)	2023.06.08
[ADsP] 비지도학습 - 자기조직화지도(SOM) & 다차원척도법(MDS) (0)	2023.06.08
앙상블 (Ensemble) - 랜덤 포레스트 분류 (Random Forest Classifier) (0)	2023.06.07
다중 회귀 (Multiple Regression Model) (0)	2023.06.07

[python 데이터 핸들링] 판다스 연습 튜토리얼 - 02 Filtering & Sorting

DataSet

Question

'🥇 certification logbook' 카테고리의 다른 글

티스토리툴바