[data]Pandas基础1

条件筛选

595. 大的国家

World 表：

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| name        | varchar |
| continent   | varchar |
| area        | int     |
| population  | int     |
| gdp         | bigint  |
+-------------+---------+
name 是该表的主键（具有唯一值的列）。
这张表的每一行提供：国家名称、所属大陆、面积、人口和 GDP 值。

如果一个国家满足下述两个条件之一，则认为该国是大国：

面积至少为 300 万平方公里（即，3000000 km<sup>2</sup>），或者
人口至少为 2500 万（即 25000000）

编写解决方案找出大国的国家名称、人口和面积。

按 任意顺序 返回结果表。

Solo

import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    # 使用query方法过滤面积大于300万或人口大于2500万的国家
    big_countries = world.query('area >= 3000000 | population >= 25000000')
    # 只返回name, population和area三列
    return big_countries[['name', 'population', 'area']]

1757. 可回收且低脂的产品

表：Products

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_id  | int     |
| low_fats    | enum    |
| recyclable  | enum    |
+-------------+---------+
product_id 是该表的主键（具有唯一值的列）。
low_fats 是枚举类型，取值为以下两种 ('Y', 'N')，其中 'Y' 表示该产品是低脂产品，'N' 表示不是低脂产品。
recyclable 是枚举类型，取值为以下两种 ('Y', 'N')，其中 'Y' 表示该产品可回收，而 'N' 表示不可回收。

编写解决方案找出既是低脂又是可回收的产品编号。

返回结果 无顺序要求 。

示例 1：

输入：
Products 表：
+-------------+----------+------------+
| product_id  | low_fats | recyclable |
+-------------+----------+------------+
| 0           | Y        | N          |
| 1           | Y        | Y          |
| 2           | N        | Y          |
| 3           | Y        | Y          |
| 4           | N        | N          |
+-------------+----------+------------+
输出：
+-------------+
| product_id  |
+-------------+
| 1           |
| 3           |
+-------------+
解释：
只有产品 id 为 1 和 3 的产品，既是低脂又是可回收的产品。

Solo

def find_products(products: pd.DataFrame) -> pd.DataFrame:
    result = products[(products['low_fats'] == 'Y') & (products['recyclable'] == 'Y')] #找出兩個條件都符合的產品
    return result[['product_id']] #回傳產品ID

183. 从不订购的客户

Customers 表：

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
+-------------+---------+
在 SQL 中，id 是该表的主键。
该表的每一行都表示客户的 ID 和名称。

Orders 表：

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| customerId  | int  |
+-------------+------+
在 SQL 中，id 是该表的主键。
customerId 是 Customers 表中 ID 的外键( Pandas 中的连接键)。
该表的每一行都表示订单的 ID 和订购该订单的客户的 ID。

找出所有从不点任何东西的顾客。

以 任意顺序 返回结果表。

Solo

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
  # 获取Orders表中所有客户id
  ordered_ids = set(orders['customerId'])
  # 过滤客户id不在ordered_ids中的行
  result = customers[~customers['id'].isin(ordered_ids)]
  # 只保留name列
  result = result[['name']]
  # 重置索引 
  result = result.reset_index(drop=True)
  # 添加表头
  result.columns = ['Customers']
  return result

1148. 文章浏览 I

Views 表：

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |
+---------------+---------+
此表可能会存在重复行。（换句话说，在 SQL 中这个表没有主键）
此表的每一行都表示某人在某天浏览了某位作者的某篇文章。
请注意，同一人的 author_id 和 viewer_id 是相同的。

请查询出所有浏览过自己文章的作者

结果按照 id 升序排列。

示例 1：

输入：
Views 表：
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+------------+-----------+-----------+------------+

输出：
+------+
| id   |
+------+
| 4    |
| 7    |
+------+

Solo

import pandas as pd

def article_views(views: pd.DataFrame) -> pd.DataFrame:
    """
    在views表中过滤出author_id等于viewer_id的行
    在过滤后的结果中取出author_id列
    对author_id列去重
    """
  result = views[views['author_id'] == views['viewer_id']]['author_id'].unique()
  
  result = pd.DataFrame(result, columns=['id']).sort_values('id')

  return result

Twitter Facebook LinkedIn

YAE-Combo

[data]Pandas基础1

条件筛选

595. 大的国家

1757. 可回收且低脂的产品

183. 从不订购的客户

1148. 文章浏览 I

分享

笺評 (issue)

相关随笔

▶ CUMCM State Prize Travelogues

▶ [OS]Linux 游侠终端

▶ [Web]从Gin到gofiber

▶ [BigData]Hive学习