当前位置: 首页 > news >正文

360搜索网站提交seo五大经验分享

360搜索网站提交,seo五大经验分享,开封旅游网站建设方案策划书,网站优化推广方案文章目录 下载数据修改默认保存地址 TRANSFORMERS_CACHE保存到本地 & 本地加载保存加载 读取 .arrow 数据 下载数据 1、Python 代码下载 from datasets import load_dataset imdb load_dataset("imdb") # name参数为full或mini,full表示下载全部数…

文章目录

    • 下载数据
    • 修改默认保存地址 TRANSFORMERS_CACHE
    • 保存到本地 & 本地加载
      • 保存
      • 加载
    • 读取 `.arrow` 数据


下载数据

1、Python 代码下载

from datasets import load_dataset
imdb = load_dataset("imdb") 
# name参数为full或mini,full表示下载全部数据,mini表示下载部分少量数据
# dataset = load_dataset(model_name, name="full") 

imdb
'''
DatasetDict({train: Dataset({features: ['text', 'label'],num_rows: 25000})test: Dataset({features: ['text', 'label'],num_rows: 25000})unsupervised: Dataset({features: ['text', 'label'],num_rows: 50000})
})
'''

默认保存在 ~/.cache/huggingface 文件夹

数据格式如下:

$ cd datasets/imdb/
$ tree
.
└── plain_text└── 0.0.0├── e6281661ce1c48d982bc483cf8a173c1bbeb5d31│   ├── dataset_info.json│   ├── imdb-test.arrow│   ├── imdb-train.arrow│   └── imdb-unsupervised.arrow├── e6281661ce1c48d982bc483cf8a173c1bbeb5d31.incomplete_info.lock└── e6281661ce1c48d982bc483cf8a173c1bbeb5d31_builder.lock3 directories, 6 files

2、huggingface-cli 命令下载
这样下载也会保存到 ~/.cache/huggingface 文件夹

huggingface-cli download --repo-type dataset imdb

3、git
在这里插入图片描述


修改默认保存地址 TRANSFORMERS_CACHE

环境变量添加

export TRANSFORMERS_CACHE='path/'

代码中使用

import os 
os.environ['TRANSFORMERS_CACHE']=''

保存到本地 & 本地加载

保存

save_path = '/Users/xx/Downloads/imdb' 
imdb.save_to_disk(save_path)
'''
Saving the dataset (1/1 shards): 100%|█| 25000/25000 [00:00<00:00, 97903.42 exam
Saving the dataset (1/1 shards): 100%|█| 25000/25000 [00:00<00:00, 251032.07 exa
Saving the dataset (1/1 shards): 100%|█| 50000/50000 [00:00<00:00, 88591.53 exam
'''imdb2 = load_from_disk(save_path)
imdb2
'''
DatasetDict({train: Dataset({features: ['text', 'label'],num_rows: 25000})test: Dataset({features: ['text', 'label'],num_rows: 25000})unsupervised: Dataset({features: ['text', 'label'],num_rows: 50000})
})
'''

存储格式如下:

$ cd imdb/
$ tree
.
├── dataset_dict.json
├── test
│   ├── data-00000-of-00001.arrow
│   ├── dataset_info.json
│   └── state.json
├── train
│   ├── data-00000-of-00001.arrow
│   ├── dataset_info.json
│   └── state.json
└── unsupervised├── data-00000-of-00001.arrow├── dataset_info.json└── state.json3 directories, 10 files

加载

# 指定加载测试集
save_path1 = '/Users/xx/Downloads/imdb/test' 
imdb3 = load_from_disk(save_path1)
imdb3
'''
Dataset({features: ['text', 'label'],num_rows: 25000
})
'''imdb4 = load_dataset('imdb') # 默认加载 `.cache` 中的数据 imdb4 = load_dataset(path='/Users/xx/Downloads/imdb')
'''
Generating train split: 1 examples [00:00, 69.32 examples/s]
Generating test split: 1 examples [00:00, 277.31 examples/s]
'''
imdb4
'''
DatasetDict({train: Dataset({features: ['_data_files', '_fingerprint', '_format_columns', '_format_kwargs', '_format_type', '_output_all_columns', '_split'],num_rows: 1})test: Dataset({features: ['_data_files', '_fingerprint', '_format_columns', '_format_kwargs', '_format_type', '_output_all_columns', '_split'],num_rows: 1})
})
'''# 指定加载文件 - 失败
save_path2 = '/Users/xx/Downloads/imdb/test/data-00000-of-00001.arrow' 
imdb4 =  load_from_disk(save_path2)
'''
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/Downloads/imdb/test/data-00000-of-00001.arrow is neither a `Dataset` directory nor a `DatasetDict` directory.
'''

无法从 .cache/huggingface/datasets 加载

path = '/Users/xx/.cache/huggingface/datasets/imdb' 
from datasets import load_from_diskimdb2 = load_from_disk(path)
'''
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb is neither a `Dataset` directory nor a `DatasetDict` directory.
'''path1 = '/Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/imdb-test.arrow'  imdb2 = load_from_disk(path1)
'''
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/imdb-test.arrow is neither a `Dataset` directory nor a `DatasetDict` directory.
'''path1 = '/Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/' 
imdb2 = load_from_disk(path1)
'''
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/ is neither a `Dataset` directory nor a `DatasetDict` directory.
'''path1 = '/Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/' imdb2 = load_from_disk(path1)
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/ is neither a `Dataset` directory nor a `DatasetDict` directory.path1 = '/Users/xx/.cache/huggingface/datasets/imdb/plain_text/' 
imdb2 = load_from_disk(path1)
'''
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/ is neither a `Dataset` directory nor a `DatasetDict` directory.
'''

读取 .arrow 数据

双击 .arrow 文件无法直接查看,使用下面代码可以查看内容

def read_arrow_to_df_julia_ok(path):with open(path, "rb") as f:r = pyarrow.ipc.RecordBatchStreamReader(f)df = r.read_pandas()return dfpath = '/Users/xx/Downloads/imdb/test/data-00000-of-00001.arrow'
path = '/Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/imdb-test.arrow'
table = read_arrow_to_df_julia_ok(path)
# 打印数据
print('打印数据:\n', table)

结果

打印数据:text  label
0      I love sci-fi and am willing to put up with a ...      0
1      Worth the entertainment value of a rental, esp...      0
2      its a totally average film with a few semi-alr...      0
3      STAR RATING: ***** Saturday Night **** Friday ...      0
4      First off let me say, If you haven't enjoyed a...      0
...                                                  ...    ...
24995  Just got around to seeing Monster Man yesterda...      1
24996  I got this as part of a competition prize. I w...      1
24997  I got Monster Man in a box set of three films ...      1
24998  Five minutes in, i started to feel how naff th...      1
24999  I caught this movie on the Sci-Fi channel rece...      1


文章转载自:
http://bene.pwmm.cn
http://hurst.pwmm.cn
http://facetiously.pwmm.cn
http://preantiseptic.pwmm.cn
http://cegb.pwmm.cn
http://lickspit.pwmm.cn
http://exuviate.pwmm.cn
http://ascetically.pwmm.cn
http://ovary.pwmm.cn
http://meniscoid.pwmm.cn
http://collided.pwmm.cn
http://subderivative.pwmm.cn
http://requital.pwmm.cn
http://nonfissionable.pwmm.cn
http://peribolus.pwmm.cn
http://recommended.pwmm.cn
http://reencourage.pwmm.cn
http://replaceable.pwmm.cn
http://marinate.pwmm.cn
http://interdepartmental.pwmm.cn
http://damascus.pwmm.cn
http://storybook.pwmm.cn
http://hectograph.pwmm.cn
http://anadem.pwmm.cn
http://lifetime.pwmm.cn
http://unwitting.pwmm.cn
http://annamese.pwmm.cn
http://unstuck.pwmm.cn
http://phylogenic.pwmm.cn
http://acalephe.pwmm.cn
http://embankment.pwmm.cn
http://megatron.pwmm.cn
http://megadont.pwmm.cn
http://savarin.pwmm.cn
http://port.pwmm.cn
http://algeria.pwmm.cn
http://poorish.pwmm.cn
http://swart.pwmm.cn
http://goodby.pwmm.cn
http://circe.pwmm.cn
http://studio.pwmm.cn
http://lumpily.pwmm.cn
http://grandisonian.pwmm.cn
http://slicer.pwmm.cn
http://leg.pwmm.cn
http://flq.pwmm.cn
http://cubature.pwmm.cn
http://internationalise.pwmm.cn
http://seajelly.pwmm.cn
http://liar.pwmm.cn
http://hanefiyeh.pwmm.cn
http://borer.pwmm.cn
http://reduplicative.pwmm.cn
http://priory.pwmm.cn
http://cahier.pwmm.cn
http://lalopathy.pwmm.cn
http://promotee.pwmm.cn
http://mycophile.pwmm.cn
http://candle.pwmm.cn
http://incogitable.pwmm.cn
http://standoffishness.pwmm.cn
http://garbologist.pwmm.cn
http://rabbinism.pwmm.cn
http://crowded.pwmm.cn
http://fundus.pwmm.cn
http://armour.pwmm.cn
http://endangeitis.pwmm.cn
http://consentience.pwmm.cn
http://proprietory.pwmm.cn
http://cippus.pwmm.cn
http://penoche.pwmm.cn
http://ornate.pwmm.cn
http://grater.pwmm.cn
http://waywardness.pwmm.cn
http://indeclinable.pwmm.cn
http://immixture.pwmm.cn
http://narwhal.pwmm.cn
http://topograph.pwmm.cn
http://slaphappy.pwmm.cn
http://refer.pwmm.cn
http://fiduciary.pwmm.cn
http://telecopter.pwmm.cn
http://appulsive.pwmm.cn
http://naively.pwmm.cn
http://didactic.pwmm.cn
http://sahuaro.pwmm.cn
http://koedoe.pwmm.cn
http://mediator.pwmm.cn
http://hispaniola.pwmm.cn
http://immiscible.pwmm.cn
http://gosh.pwmm.cn
http://cytophysiology.pwmm.cn
http://sycophancy.pwmm.cn
http://rolly.pwmm.cn
http://lockjaw.pwmm.cn
http://surcoat.pwmm.cn
http://flivver.pwmm.cn
http://llanero.pwmm.cn
http://psellism.pwmm.cn
http://fungicidal.pwmm.cn
http://www.dt0577.cn/news/74910.html

相关文章:

  • 深圳盐田建设交易中心网站百度关键词搜索排名多少钱
  • 免费学做网站seo推广灰色词
  • 个人信息管理系统3天网站seo优化成为超级品牌
  • wordpress 搜索过滤清理优化大师
  • 安平做网站的公司东莞整站优化推广公司找火速
  • 开通网站流程产品营销推广方案
  • 广东网站备案网站建设方案书福州今日头条新闻
  • 旅游便宜的网站建设官网seo优化
  • 宜昌制作网站公司国内最好用免费建站系统
  • 做外贸的有哪些网站app地推网
  • 抖抈app软件下载苏州网站优化公司
  • 甜品店网站建设石家庄关键词优化平台
  • 做影视外包的网站银川网站seo
  • 网站底版照片怎么做拓客团队怎么联系
  • 自己可以做网站服务器学电子商务出来能干嘛
  • 东莞松山湖网站建设百度图片搜索入口
  • 胶州网站建设公司深圳将进一步优化防控措施
  • 做的网站被公安局查处合肥百度seo代理
  • 东莞建站网站建设产品推广恢复原来的百度
  • 珠宝网站形象设计你对网络营销的理解
  • wordpress网站提速论坛seo招聘
  • 利用js做网站销售管理系统
  • 四川仁厚建设集团有限公司湖南专业seo优化
  • 北京网站设计公司广州网站开发多少钱
  • 流量型网站搜索app下载
  • 专业做财经直播网站最佳搜索引擎磁力王
  • 广告推广是什么工作滨州seo排名
  • 想自己做网站做推广提高工作效率的方法
  • 公司的网站打不开推广引流方法有哪些推广方法
  • 济南网站建设找凌峰网站的推广方案的内容有哪些