Dataset API
FastGPT OpenAPI Dataset API
| How to Get Dataset ID (datasetId) | How to Get Collection ID (collection_id) |
|---|---|
![]() | ![]() |
Create Training Order
New Example
curl --location --request POST 'http://localhost:3000/api/support/wallet/usage/createTrainingUsage' \
--header 'Authorization: Bearer {{apikey}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"datasetId": "Dataset ID",
"name": "Optional, custom order name, e.g.: Document Training-fastgpt.docx"
}'data is the billId, which can be used for bill aggregation when adding dataset data.
{
"code": 200,
"statusText": "",
"message": "",
"data": "65112ab717c32018f4156361"
}Dataset
Create a Dataset
curl --location --request POST 'http://localhost:3000/api/core/dataset/create' \
--header 'Authorization: Bearer {{authorization}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"parentId": null,
"type": "dataset",
"name":"测试",
"intro":"介绍",
"avatar": "",
"vectorModel": "text-embedding-ada-002",
"agentModel": "gpt-3.5-turbo-16k",
"vlmModel": "gpt-4.1"
}'- parentId - Parent ID for building directory structure. Usually can be null or omitted.
- type -
datasetorfolder, represents regular dataset or folder. If not provided, creates a regular dataset. - name - Dataset name (required)
- intro - Description (optional)
- avatar - Avatar URL (optional)
- vectorModel - Vector model (recommended to leave empty, use system default)
- agentModel - Text processing model (recommended to leave empty, use system default)
- vlmModel - Image understanding model (recommended to leave empty, use system default)
{
"code": 200,
"statusText": "",
"message": "",
"data": "65abc9bd9d1448617cba5e6c"
}Get Dataset List
curl --location --request POST 'http://localhost:3000/api/core/dataset/list?parentId=' \
--header 'Authorization: Bearer xxxx' \
--header 'Content-Type: application/json' \
--data-raw '{
"parentId":""
}'- parentId - Parent ID. Pass empty string or null to get datasets in the root directory
{
"code": 200,
"statusText": "",
"message": "",
"data": [
{
"_id": "65abc9bd9d1448617cba5e6c",
"parentId": null,
"avatar": "",
"name": "测试",
"intro": "",
"type": "dataset",
"permission": "private",
"canWrite": true,
"isOwner": true,
"vectorModel": {
"model": "text-embedding-ada-002",
"name": "Embedding-2",
"charsPointsPrice": 0,
"defaultToken": 512,
"maxToken": 8000,
"weight": 100
}
}
]
}Get Dataset Details
curl --location --request GET 'http://localhost:3000/api/core/dataset/detail?id=6593e137231a2be9c5603ba7' \
--header 'Authorization: Bearer {{authorization}}' \- id: Dataset ID
{
"code": 200,
"statusText": "",
"message": "",
"data": {
"_id": "6593e137231a2be9c5603ba7",
"parentId": null,
"teamId": "65422be6aa44b7da77729ec8",
"tmbId": "65422be6aa44b7da77729ec9",
"type": "dataset",
"status": "active",
"avatar": "/icon/logo.svg",
"name": "FastGPT test",
"vectorModel": {
"model": "text-embedding-ada-002",
"name": "Embedding-2",
"charsPointsPrice": 0,
"defaultToken": 512,
"maxToken": 8000,
"weight": 100
},
"agentModel": {
"model": "gpt-3.5-turbo-16k",
"name": "FastAI-16k",
"maxContext": 16000,
"maxResponse": 16000,
"charsPointsPrice": 0
},
"intro": "",
"permission": "private",
"updateTime": "2024-01-02T10:11:03.084Z",
"canWrite": true,
"isOwner": true
}
}Delete a Dataset
curl --location --request DELETE 'http://localhost:3000/api/core/dataset/delete?id=65abc8729d1448617cba5df6' \
--header 'Authorization: Bearer {{authorization}}' \- id: Dataset ID
{
"code": 200,
"statusText": "",
"message": "",
"data": null
}Collection
Common Creation Parameters (Must Read)
Request
| Parameter | Description | Required |
|---|---|---|
| datasetId | Dataset ID | ✅ |
| parentId: | Parent ID. Defaults to root directory if not provided | |
| trainingType | Data processing method. chunk: split by text length; qa: Q&A extraction | ✅ |
| indexPrefixTitle | Whether to auto-generate title index | |
| customPdfParse | Whether to enable enhanced PDF parsing. Default false: disabled; true: enabled | |
| autoIndexes | Whether to auto-generate indexes (commercial version only) | |
| imageIndex | Whether to auto-generate image indexes (commercial version only) | |
| chunkSettingMode | Chunk parameter mode. auto: system default; custom: manual specification | |
| chunkSplitMode | Chunk split mode. size: split by length; char: split by character. Ineffective when chunkSettingMode=auto. | |
| chunkSize | Chunk size, default 1500. Ineffective when chunkSettingMode=auto. | |
| indexSize | Index size, default 512, must be less than index model max token. Ineffective when chunkSettingMode=auto. | |
| chunkSplitter | Custom highest priority split symbol. Won't split further unless exceeding file processing max context. Ineffective when chunkSettingMode=auto. | |
| qaPrompt | QA split prompt | |
| tags | Collection tags (string array) | |
| createTime | File creation time (Date / String) |
Response
- collectionId - New collection ID
- insertLen:Number of inserted chunks
Create an Empty Collection
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/create' \
--header 'Authorization: Bearer {{authorization}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"datasetId":"6593e137231a2be9c5603ba7",
"parentId": null,
"name":"测试",
"type":"virtual",
"metadata":{
"test":111
}
}'- datasetId: Dataset ID(Required)
- parentId: Parent ID. Defaults to root directory if not provided
- name: Collection name (required)
- type:
- folder:Folder
- virtual: Virtual collection (manual collection)
- metadata: Metadata (not currently used)
data is the collection ID.
{
"code": 200,
"statusText": "",
"message": "",
"data": "65abcd009d1448617cba5ee1"
}Create a Text Collection
Pass in text to create a collection. The text will be split accordingly.
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/create/text' \
--header 'Authorization: Bearer {{authorization}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"text":"xxxxxxxx",
"datasetId":"6593e137231a2be9c5603ba7",
"parentId": null,
"name":"测试训练",
"trainingType": "qa",
"chunkSettingMode": "auto",
"qaPrompt":"",
"metadata":{}
}'- text: Original text
- datasetId: Dataset ID(Required)
- parentId: Parent ID. Defaults to root directory if not provided
- name: Collection name (required)
- metadata: Metadata (not currently used)
data is the collection ID.
{
"code": 200,
"statusText": "",
"message": "",
"data": {
"collectionId": "65abcfab9d1448617cba5f0d",
"results": {
"insertLen": 5, // Split into how many segments
"overToken": [],
"repeat": [],
"error": []
}
}
}Create a Link Collection
Pass in a web link to create a collection. Content will be fetched from the webpage first, then split.
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/create/link' \
--header 'Authorization: Bearer {{authorization}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"link":"https://doc.fastgpt.io/docs/course/quick-start/",
"datasetId":"6593e137231a2be9c5603ba7",
"parentId": null,
"trainingType": "chunk",
"chunkSettingMode": "auto",
"qaPrompt":"",
"metadata":{
"webPageSelector":".docs-content"
}
}'- link: Web link
- datasetId: Dataset ID(Required)
- parentId: Parent ID. Defaults to root directory if not provided
- metadata.webPageSelector: Web page selector to specify which element to use as text (optional)
data is the collection ID.
{
"code": 200,
"statusText": "",
"message": "",
"data": {
"collectionId": "65abd0ad9d1448617cba6031",
"results": {
"insertLen": 1,
"overToken": [],
"repeat": [],
"error": []
}
}
}Create a File Collection
Pass in a file to create a collection. File content will be read and split. Currently supports: pdf, docx, md, txt, html, csv.
When uploading via code, note that Chinese filenames need to be encoded to avoid garbled text.
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/create/localFile' \
--header 'Authorization: Bearer {{authorization}}' \
--form 'file=@"C:\\Users\\user\\Desktop\\fastgpt测试File\\index.html"' \
--form 'data="{\"datasetId\":\"6593e137231a2be9c5603ba7\",\"parentId\":null,\"trainingType\":\"chunk\",\"chunkSize\":512,\"chunkSplitter\":\"\",\"qaPrompt\":\"\",\"metadata\":{}}"'Use POST form-data format for upload. Contains file and data fields.
- file: File
- data: Dataset-related info (pass as serialized JSON). See "Common Creation Parameters" above
data is the collection ID.
{
"code": 200,
"statusText": "",
"message": "",
"data": {
"collectionId": "65abc044e4704bac793fbd81",
"results": {
"insertLen": 1,
"overToken": [],
"repeat": [],
"error": []
}
}
}Create an API Collection
Pass in a file ID to create a collection. File content will be read and split. Currently supports: pdf, docx, md, txt, html, csv.
When uploading via code, note that Chinese filenames need to be encoded to avoid garbled text.
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/create/apiCollection' \
--header 'Authorization: Bearer fastgpt-xxx' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "A Quick Guide to Building a Discord Bot.pdf",
"apiFileId":"A Quick Guide to Building a Discord Bot.pdf",
"datasetId": "674e9e479c3503c385495027",
"parentId": null,
"trainingType": "chunk",
"chunkSize":512,
"chunkSplitter":"",
"qaPrompt":""
}'Use POST form-data format for upload. Contains file and data fields.
- name: Collection name, recommended to use filename, required.
- apiFileId: File ID, required.
- datasetId: Dataset ID(Required)
- parentId: Parent ID. Defaults to root directory if not provided
- trainingType:Training mode (required)
- chunkSize: Length of each chunk (optional). chunk mode: 100
3000; qa mode: 4000model max token (16k models usually recommended not to exceed 10000) - chunkSplitter: Custom highest priority split symbol (optional)
- qaPrompt: QA split custom prompt (optional)
data is the collection ID.
{
"code": 200,
"statusText": "",
"message": "",
"data": {
"collectionId": "65abc044e4704bac793fbd81",
"results": {
"insertLen": 1,
"overToken": [],
"repeat": [],
"error": []
}
}
}Create an External File Collection (Commercial)
curl --location --request POST 'http://localhost:3000/api/proApi/core/dataset/collection/create/externalFileUrl' \
--header 'Authorization: Bearer {{authorization}}' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
"externalFileUrl":"https://image.xxxxx.com/fastgpt-dev/%E6%91%82.pdf",
"externalFileId":"1111",
"createTime": "2024-05-01T00:00:00.000Z",
"filename":"自定义File名.pdf",
"datasetId":"6642d105a5e9d2b00255b27b",
"parentId": null,
"tags": ["tag1","tag2"],
"trainingType": "chunk",
"chunkSize":512,
"chunkSplitter":"",
"qaPrompt":""
}'| Parameter | Description | Required |
|---|---|---|
| externalFileUrl | File access URL (can be temporary) | ✅ |
| externalFileId | External file ID | |
| filename | Custom filename with extension | |
| createTime | File creation time (Date or ISO string both ok) |
data is the collection ID.
{
"code": 200,
"statusText": "",
"message": "",
"data": {
"collectionId": "6646fcedfabd823cdc6de746",
"results": {
"insertLen": 1,
"overToken": [],
"repeat": [],
"error": []
}
}
}Get Collection List
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/listV2' \
--header 'Authorization: Bearer {{authorization}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"offset":0,
"pageSize": 10,
"datasetId":"6593e137231a2be9c5603ba7",
"parentId": null,
"searchText":""
}'- offset: Offset
- pageSize: Items per page, max 30 (optional)
- datasetId: Dataset ID(Required)
- parentId: Parent ID (optional)
- searchText: Fuzzy search text (optional)
{
"code": 200,
"statusText": "",
"message": "",
"data": {
"list": [
{
"_id": "6593e137231a2be9c5603ba9",
"parentId": null,
"tmbId": "65422be6aa44b7da77729ec9",
"type": "virtual",
"name": "Manual entry",
"updateTime": "2099-01-01T00:00:00.000Z",
"dataAmount": 3,
"trainingAmount": 0,
"externalFileId": "1111",
"tags": ["11", "测试的"],
"forbid": false,
"trainingType": "chunk",
"permission": {
"value": 4294967295,
"isOwner": true,
"hasManagePer": true,
"hasWritePer": true,
"hasReadPer": true
}
},
{
"_id": "65abd0ad9d1448617cba6031",
"parentId": null,
"tmbId": "65422be6aa44b7da77729ec9",
"type": "link",
"name": "快速上手 | FastGPT",
"rawLink": "https://doc.fastgpt.io/docs/course/quick-start/",
"updateTime": "2024-01-20T13:54:53.031Z",
"dataAmount": 3,
"trainingAmount": 0,
"externalFileId": "222",
"tags": ["测试的"],
"forbid": false,
"trainingType": "chunk",
"permission": {
"value": 4294967295,
"isOwner": true,
"hasManagePer": true,
"hasWritePer": true,
"hasReadPer": true
}
}
],
"total": 93
}
}Get Collection Details
curl --location --request GET 'http://localhost:3000/api/core/dataset/collection/detail?id=65abcfab9d1448617cba5f0d' \
--header 'Authorization: Bearer {{authorization}}' \- id: Collection ID
{
"code": 200,
"statusText": "",
"message": "",
"data": {
"_id": "65abcfab9d1448617cba5f0d",
"parentId": null,
"teamId": "65422be6aa44b7da77729ec8",
"tmbId": "65422be6aa44b7da77729ec9",
"datasetId": {
"_id": "6593e137231a2be9c5603ba7",
"parentId": null,
"teamId": "65422be6aa44b7da77729ec8",
"tmbId": "65422be6aa44b7da77729ec9",
"type": "dataset",
"status": "active",
"avatar": "/icon/logo.svg",
"name": "FastGPT test",
"vectorModel": "text-embedding-ada-002",
"agentModel": "gpt-3.5-turbo-16k",
"intro": "",
"permission": "private",
"updateTime": "2024-01-02T10:11:03.084Z"
},
"type": "virtual",
"name": "测试训练",
"trainingType": "qa",
"chunkSize": 8000,
"chunkSplitter": "",
"qaPrompt": "11",
"rawTextLength": 40466,
"hashRawText": "47270840614c0cc122b29daaddc09c2a48f0ec6e77093611ab12b69cba7fee12",
"createTime": "2024-01-20T13:50:35.838Z",
"updateTime": "2024-01-20T13:50:35.838Z",
"canWrite": true,
"sourceName": "测试训练"
}
}Update Collection Info
Update Collection Info by Collection ID
curl --location --request PUT 'http://localhost:3000/api/core/dataset/collection/update' \
--header 'Authorization: Bearer {{authorization}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"id":"65abcfab9d1448617cba5f0d",
"parentId": null,
"name": "测2222试",
"tags": ["tag1", "tag2"],
"forbid": false,
"createTime": "2024-01-01T00:00:00.000Z"
}'Update Collection Info by External File ID, Just replace id with datasetId and externalFileId.
curl --location --request PUT 'http://localhost:3000/api/core/dataset/collection/update' \
--header 'Authorization: Bearer {{authorization}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"datasetId":"6593e137231a2be9c5603ba7",
"externalFileId":"1111",
"parentId": null,
"name": "测2222试",
"tags": ["tag1", "tag2"],
"forbid": false,
"createTime": "2024-01-01T00:00:00.000Z"
}'- id: Collection ID
- parentId: Update parent ID (optional)
- name: Update collection name (optional)
- tags: Update collection tags (optional)
- forbid: Update collection disabled status (optional)
- createTime: Update collection creation time (optional)
{
"code": 200,
"statusText": "",
"message": "",
"data": null
}Delete a Collection
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/delete' \
--header 'Authorization: Bearer fastgpt-' \
--header 'Content-Type: application/json' \
--data-raw '{
"collectionIds": ["65a8cdcb0d70d3de0bf08d0a"]
}'- collectionIds: Collection ID list
{
"code": 200,
"statusText": "",
"message": "",
"data": null
}Data
Data Structure
Data Structure
| Field | Type | Description | Required |
|---|---|---|---|
| teamId | String | Team ID | ✅ |
| tmbId | String | Member ID | ✅ |
| datasetId | String | Dataset ID | ✅ |
| collectionId | String | CollectionID | ✅ |
| q | String | Primary data | ✅ |
| a | String | Auxiliary data | ✖ |
| fullTextToken | String | Tokenization | ✖ |
| indexes | Index[] | Vector indexes | ✅ |
| updateTime | Date | Update time | ✅ |
| chunkIndex | Number | Chunk index | ✖ |
Index Structure
Maximum 5 custom indexes per data group
| Field | Type | Description | Required |
|---|---|---|---|
| type | String | Optional index types: default-default index; custom-custom index; summary-summary index; question-question index; image-image index | |
| dataId | String | Associated vector ID. Pass this ID when updating data for incremental updates instead of full updates | |
| text | String | Text content | ✅ |
type If not provided, defaults to custom index. A default index will also be created based on q/a. If a default index is provided, no additional one will be created.
Batch Add Data to Collection
Note: Maximum 200 data groups per push.
curl --location --request POST 'http://localhost:3000/api/core/dataset/data/pushData' \
--header 'Authorization: Bearer apikey' \
--header 'Content-Type: application/json' \
--data-raw '{
"collectionId": "64663f451ba1676dbdef0499",
"trainingType": "chunk",
"prompt": "Optional. QA split guide prompt, ignored in chunk mode",
"billId": "可选。如果有这个值,本次的Data会被聚合到一个订单中,这个值可以重复使用。可以参考 [Create Training Order] 获取该值。",
"data": [
{
"q": "Who are you?",
"a": "I'm FastGPT Assistant"
},
{
"q": "What can you do?",
"a": "I can do anything",
"indexes": [
{
"text":"Custom index 1"
},
{
"text":"Custom index 2"
}
]
}
]
}'-
collectionId: Collection ID (required)
-
trainingType:Training mode (required)
-
prompt: Custom QA split prompt. Must follow template strictly. Recommended not to pass. (optional)
-
data:(Specific data)
- q: Primary data(Required)
- a: Auxiliary data (optional)
- indexes: Custom indexes (optional). Can omit or pass empty array. By default, an index will be created from q and a.
{
"code": 200,
"statusText": "",
"data": {
"insertLen": 1, // Final number of successful insertions
"overToken": [], // Exceeding token
"repeat": [], // Number of duplicates
"error": [] // Other errors
}
}[theme] content can be replaced with the data theme. Default: They may contain multiple theme contents
I'll give you a text, [theme], learn it, and organize the learning results, requirements:
1. Propose up to 25 questions.
2. Provide answers to each question.
3. Answers should be detailed and complete, and can include plain text, links, code, tables, formulas, media links, and other markdown elements.
4. Return multiple questions and answers in format:
Q1: Question.
A1: Answer.
Q2:
A2:
……
My text:"""{{text}}"""Get Collection Data List
curl --location --request POST 'http://localhost:3000/api/core/dataset/data/v2/list' \
--header 'Authorization: Bearer {{authorization}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"offset": 0,
"pageSize": 10,
"collectionId":"65abd4ac9d1448617cba6171",
"searchText":""
}'- offset: Offset (optional)
- pageSize: Items per page, max 30 (optional)
- collectionId: Collection ID(Required)
- searchText: Fuzzy search term (optional)
{
"code": 200,
"statusText": "",
"message": "",
"data": {
"list": [
{
"_id": "65abd4b29d1448617cba61db",
"datasetId": "65abc9bd9d1448617cba5e6c",
"collectionId": "65abd4ac9d1448617cba6171",
"q": "N o . 2 0 2 2 1 2中 国 信 息 通 信 研 究 院京东探索研究院2022年 9月人工智能生成内容(AIGC)白皮书(2022 年)版权声明本白皮书版权属于中国信息通信研究院和京东探索研究院,并受法律保护。转载、摘编或利用其它方式使用本白皮书文字or观点的,应注明“来源:中国信息通信研究院和京东探索研究院”。违反上述声明者,编者将追究其相关法律责任。前 言习近平总书记曾指出,“数字技术正以新理念、新业态、新模式全面融入人类经济、政治、文化、社会、生态文明建设各领域和全过程”。在当前数字世界和物理世界加速融合的大背景下,人工智能生成内容(Artificial Intelligence Generated Content,简称 AIGC)正在悄然引导着一场深刻的变革,重塑甚至颠覆数字内容的生产方式和消费模式,将极大地丰富人们的数字生活,是未来全面迈向数字文明新时代不可或缺的支撑力量。",
"a": "",
"chunkIndex": 0
},
{
"_id": "65abd4b39d1448617cba624d",
"datasetId": "65abc9bd9d1448617cba5e6c",
"collectionId": "65abd4ac9d1448617cba6171",
"q": "本白皮书重点从 AIGC 技术、应用和治理等维度进行了阐述。在技术层面,梳理提出了 AIGC 技术体系,既涵盖了对现实世界各种内容的数字化呈现和增强,也包括了基于人工智能的自主内容创作。在应用层面,重点分析了 AIGC 在传媒、电商、影视等行业和场景的应用情况,探讨了以虚拟数字人、写作机器人等为代表的新业态和新应用。在治理层面,从政策监管、技术能力、企业应用等视角,分析了AIGC 所暴露出的版权纠纷、虚假信息传播等各种Question.最后,从政府、行业、企业、社会等层面,给出了 AIGC 发展和治理建议。由于人工智能仍处于飞速发展阶段,我们对 AIGC 的认识还有待进一步深化,白皮书中存在不足之处,敬请大家批评指正。目 录一、 人工智能生成内容的发展历程与概念.............................................................. 1(一)AIGC 历史沿革 .......................................................................................... 1(二)AIGC 的概念与内涵 .................................................................................. 4二、人工智能生成内容的技术体系及其演进方向.................................................... 7(一)AIGC 技术升级步入深化阶段 .................................................................. 7(二)AIGC 大模型架构潜力凸显 .................................................................... 10(三)AIGC 技术演化出三大前沿能力 ............................................................ 18三、人工智能生成内容的应用场景.......................................................................... 26(一)AIGC+传媒:人机协同生产,",
"a": "",
"chunkIndex": 1
}
],
"total": 63
}
}Get Single Data Details
curl --location --request GET 'http://localhost:3000/api/core/dataset/data/detail?id=65abd4b29d1448617cba61db' \
--header 'Authorization: Bearer {{authorization}}' \- id: Data ID
{
"code": 200,
"statusText": "",
"message": "",
"data": {
"id": "65abd4b29d1448617cba61db",
"q": "N o . 2 0 2 2 1 2中 国 信 息 通 信 研 究 院京东探索研究院2022年 9月人工智能生成内容(AIGC)白皮书(2022 年)版权声明本白皮书版权属于中国信息通信研究院和京东探索研究院,并受法律保护。转载、摘编或利用其它方式使用本白皮书文字or观点的,应注明“来源:中国信息通信研究院和京东探索研究院”。违反上述声明者,编者将追究其相关法律责任。前 言习近平总书记曾指出,“数字技术正以新理念、新业态、新模式全面融入人类经济、政治、文化、社会、生态文明建设各领域和全过程”。在当前数字世界和物理世界加速融合的大背景下,人工智能生成内容(Artificial Intelligence Generated Content,简称 AIGC)正在悄然引导着一场深刻的变革,重塑甚至颠覆数字内容的生产方式和消费模式,将极大地丰富人们的数字生活,是未来全面迈向数字文明新时代不可或缺的支撑力量。",
"a": "",
"chunkIndex": 0,
"indexes": [
{
"type": "default",
"dataId": "3720083",
"text": "N o . 2 0 2 2 1 2中 国 信 息 通 信 研 究 院京东探索研究院2022年 9月人工智能生成内容(AIGC)白皮书(2022 年)版权声明本白皮书版权属于中国信息通信研究院和京东探索研究院,并受法律保护。转载、摘编或利用其它方式使用本白皮书文字or观点的,应注明“来源:中国信息通信研究院和京东探索研究院”。违反上述声明者,编者将追究其相关法律责任。前 言习近平总书记曾指出,“数字技术正以新理念、新业态、新模式全面融入人类经济、政治、文化、社会、生态文明建设各领域和全过程”。在当前数字世界和物理世界加速融合的大背景下,人工智能生成内容(Artificial Intelligence Generated Content,简称 AIGC)正在悄然引导着一场深刻的变革,重塑甚至颠覆数字内容的生产方式和消费模式,将极大地丰富人们的数字生活,是未来全面迈向数字文明新时代不可或缺的支撑力量。",
"_id": "65abd4b29d1448617cba61dc"
}
],
"datasetId": "65abc9bd9d1448617cba5e6c",
"collectionId": "65abd4ac9d1448617cba6171",
"sourceName": "中文-AIGC白皮书2022.pdf",
"sourceId": "65abd4ac9d1448617cba6166",
"isOwner": true,
"canWrite": true
}
}Update Single Data
curl --location --request PUT 'http://localhost:3000/api/core/dataset/data/update' \
--header 'Authorization: Bearer {{authorization}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"dataId":"65abd4b29d1448617cba61db",
"q":"Test 111",
"a":"sss",
"indexes":[
{
"dataId": "xxxx",
"type": "default",
"text": "Default index"
},
{
"dataId": "xxx",
"type": "custom",
"text": "旧的Custom index 1"
},
{
"type":"custom",
"text":"New custom index"
}
]
}'- dataId: Data ID
- q: Primary data (optional)
- a: Auxiliary data (optional)
- indexes: Custom indexes (optional). See
Batch Add Data to Collectionfor types. If custom indexes exist when created,
{
"code": 200,
"statusText": "",
"message": "",
"data": null
}Delete Single Data
curl --location --request DELETE 'http://localhost:3000/api/core/dataset/data/delete?id=65abd4b39d1448617cba624d' \
--header 'Authorization: Bearer {{authorization}}' \- id: Data ID
{
"code": 200,
"statusText": "",
"message": "",
"data": "success"
}Search Test
curl --location --request POST 'http://localhost:3000/api/core/dataset/searchTest' \
--header 'Authorization: Bearer fastgpt-xxxxx' \
--header 'Content-Type: application/json' \
--data-raw '{
"datasetId": "Dataset ID",
"text": "Who is the director",
"limit": 5000,
"similarity": 0,
"searchMode": "embedding",
"usingReRank": false,
"datasetSearchUsingExtensionQuery": true,
"datasetSearchExtensionModel": "gpt-5",
"datasetSearchExtensionBg": ""
}'- datasetId - Dataset ID
- text - Text to test
- limit - Maximum tokens
- similarity - Minimum similarity (0~1, optional)
- searchMode - Search mode: embedding | fullTextRecall | mixedRecall
- usingReRank - Use rerank
- datasetSearchUsingExtensionQuery - Use query extension
- datasetSearchExtensionModel - Query extension model
- datasetSearchExtensionBg - Query extension background description
Returns top k results. limit is the maximum tokens, up to 20000 tokens.
{
"code": 200,
"statusText": "",
"data": [
{
"id": "65599c54a5c814fb803363cb",
"q": "你是谁",
"a": "I'm FastGPT Assistant",
"datasetId": "6554684f7f9ed18a39a4d15c",
"collectionId": "6556cd795e4b663e770bb66d",
"sourceName": "GBT 15104-2021 装饰单板贴面人造板.pdf",
"sourceId": "6556cd775e4b663e770bb65c",
"score": 0.8050316572189331
},
......
]
}File Updated

