[data]Pandas基础2

字符串处理

1683. 无效的推文

表：Tweets

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| tweet_id       | int     |
| content        | varchar |
+----------------+---------+
在 SQL 中，tweet_id 是这个表的主键。
这个表包含某社交媒体 App 中所有的推文。

查询所有无效推文的编号（ID）。当推文内容中的字符数严格大于 15 时，该推文是无效的。

以任意顺序返回结果表。

查询结果格式如下所示：

示例 1：

输入：
Tweets 表：
+----------+----------------------------------+
| tweet_id | content                          |
+----------+----------------------------------+
| 1        | Vote for Biden                   |
| 2        | Let us make America great again! |
+----------+----------------------------------+

输出：
+----------+
| tweet_id |
+----------+
| 2        |
+----------+

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    return tweets.loc[tweets['content'].str.len() > 15, ['tweet_id']]

1873. 计算特殊奖金

表: Employees

+-------------+---------+
| 列名        | 类型     |
+-------------+---------+
| employee_id | int     |
| name        | varchar |
| salary      | int     |
+-------------+---------+
employee_id 是这个表的主键(具有唯一值的列)。
此表的每一行给出了雇员id ，名字和薪水。

编写解决方案，计算每个雇员的奖金。如果一个雇员的 id 是奇数并且他的名字不是以 'M' 开头，那么他的奖金是他工资的 100% ，否则奖金为 0 。

返回的结果按照 employee_id 排序。

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    """
    首先计算一个默认为0的bonus列
    使用loc根据条件更新满足的行的bonus为salary
    最后返回所需列,并排序
    """
    employees['bonus'] = 0
    employees.loc[(employees['employee_id'] % 2 == 1) & (~employees['name'].str.startswith('M')), 'bonus'] = employees['salary']
    return employees[['employee_id', 'bonus']].sort_values('employee_id')

1527. 患某种疾病的患者

患者信息表： Patients

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| patient_id   | int     |
| patient_name | varchar |
| conditions   | varchar |
+--------------+---------+
在 SQL 中，patient_id （患者 ID）是该表的主键。
'conditions' （疾病）包含 0 个或以上的疾病代码，以空格分隔。
这个表包含医院中患者的信息。

查询患有 I 类糖尿病的患者 ID （patient_id）、患者姓名（patient_name）以及其患有的所有疾病代码（conditions）。I 类糖尿病的代码总是包含前缀 DIAB1 。

按 任意顺序 返回结果表。

import re
def find_patients(patients):
  pattern = r'\bDIAB1\w*\b' 
  # 过滤conditions列包含模式的行
  result = patients[patients['conditions'].str.contains(pattern)]
  # 返回指定的三列
  return result[['patient_id', 'patient_name', 'conditions']]

Twitter Facebook LinkedIn

YAE-Combo

[data]Pandas基础2

字符串处理

1683. 无效的推文

1873. 计算特殊奖金

1527. 患某种疾病的患者

分享

笺評 (issue)

相关随笔

▶ CUMCM State Prize Travelogues

▶ [OS]Linux 游侠终端

▶ [Web]从Gin到gofiber

▶ [BigData]Hive学习