Day 3

Words and phrases for Day 3

Word Example
intimidate He tries to intimidate his rivals.
reclaim He returned three years later to reclaim his title as director of advertising.
discern I can discern no difference between the two policies.
saturate Japan’s electronics industry began to saturate the world markets.
alleviate He couldn’t prevent her pain, only alleviate it.
postulate The chapter was then allowed to postulate the bishop of Bath.
aggravate military action would only aggravate the situation.
repel Like poles repel and unlike poles attract.
mobilize He used the press to mobilize support for his party.
inaugurarte We inaugurate our new plant tomorrow.
immerse Immerse the paper in water for twenty minutes.
  • get the better of
  • boast of
  • take the floor
  • last but one
  • in no case
  • center one’s attention on
  • round the clock
  • commit one’s idea to writing

Listening

  • Hi there. I’ve come to see the flat. My name is Mark Atoms, we spoke on the phone on wednesday?
  • Hi Mark, come on up. I’ll buzz you in. Green door on the second floor on the right side.
    Nice to meet you. I spoken to all your refrences, and they all checked out ok. So let me show you around. The place actually belongs to my mother, but her health isn’t great, so we finally managed to pesuad her to move in with us, and rent this old place out.
  • It’s a great size, plenty of space., very versatile. I think it’s a winner for us.
  • Yes! All the appliances are brand new. There is a washing machine and a tumble dryer in the utility room next to the kitchen.
  • Lots of closet space too, which is fabulas! My wife has redicules number of shoes. Now, the big question: What about the noise and the neighbors?
  • Well, all the neighbors are elderly so no noisy kids and the back of the house over looks clear and pieceful pond. So it’s perfect if tran quility is what you are looking for.
  • That’s good news! We’ve been living in a less than Glamorous part of Aberdeen constently harassed day and night by noisy neighbors. Getting to work was a night mare too. As we only have one car, and my wife has to use it, as she works at nights at a hospital.
  • Well, if you like the place. It’s yours, as soon as I got the contract drawn up with the solicitor. The first month rent in a deposit are mandatory on siginning the contract. Then we can work out when is the best day for you to pay rent each month.
  • We’ll be incradibly happy to be your new tenant. Thank you so much. My wife will be therialled to get out of the shaby place we are now in. And start filling with those wardrobes with all those shoes.

Reading for Day 3

The now extinct passenger pigeon has the dubious honor of being the last species anyone ever expected to disappear. At one point, there were more passenger pigeons than any other species of bird. Rough estimates of their population went as high as five billion and they accounted for around 40 percent of the total indigenous bird population of North America in the early 19th century.

Despite their huge population, passenger pigeons were vulnerable to human intrusion into their nesting territory. Their nests were shabby things and two weeks after the eggs hatched , the parent pigeons would abandon their offspring, leaving them to take care of themselves. People discovered that these baby pigeons were really tasty, and the adult birds were also quite edible . First the Native Americans and then the transplanted Europeans came to consider the birds a great delicacy.

By the 1850s, commercial trapping of passenger pigeons was proceeding at an unprecedented pace. Hundreds of thousands of the birds were being harvested every day to be made into popular pigeon pies. In addition, large tracts of the pigeons’ nesting territory were being cleared away for planting crops and creating pasture land. As numerous as the passenger pigeons were, they were not an infinite resource. By the 1880s, it was noticed that the bird population had become seriously depleted. The last passenger pigeons killed in the wild were shot in 1899.

Eventually those billions and billions of birds shrank to a single remaining specimen , a passenger pigeon named Martha? who died on September 1, 1914, in captivity at the Cincinnati Zoo. In addition to being the end of an era, it was also the first time humans were able to exactly time the extinction of a species.

Deep Learning for Day 3

  • Create milvusDB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

connections.connect("default", host="localhost", port="19530", user='root', password='Milvus')
collection_name = 'qiye_gpt'
if utility.has_collection(collection_name):
utility.drop_collection(collection_name)
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1024),
FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=8000)
]
schema = CollectionSchema(fields, collection_name)
collection = Collection(collection_name, schema)
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "IP",
"params": {"nlist": 1024},
}
collection.create_index(field_name="embedding", index_params=index_params)
collection.load()
print("success Create collection: ", collection_name)
  • Extract embedding
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import torch
from transformers import AutoTokenizer, AutoModel

device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained('D:/qiyegpt/knowledge_augment/model/bge_large_zh_v1.5', cache_dir='./model')
model = AutoModel.from_pretrained('D:/qiyegpt/knowledge_augment/model/bge_large_zh_v1.5', cache_dir='./model').to(device)


def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] # First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


def get_embedding(sentences,is_insert, batch_size=10):
if is_insert:
res_embeddings = []
for i in range(0,len(sentences), batch_size):
batch = sentences[i:i + batch_size]
encoded_input = tokenizer(batch, padding=True, truncation=True,
max_length=512, return_tensors='pt').to(device)
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
sentence_embeddings_norm = sentence_embeddings / sentence_embeddings.norm(p=2, dim=-1, keepdim=True)
#sentence_embeddings_norm = sentence_embeddings_norm.squeeze(0)
sentence_embeddings_norm = sentence_embeddings_norm.cpu().detach().numpy().tolist()
res_embeddings.extend(sentence_embeddings_norm)
return res_embeddings
else:
encoded_input = tokenizer(sentences, padding=True, truncation=True,
max_length=512, return_tensors='pt').to(device)
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
sentence_embeddings_norm = sentence_embeddings / sentence_embeddings.norm(p=2, dim=-1, keepdim=True)
sentence_embeddings_norm = sentence_embeddings_norm.squeeze(0)
sentence_embeddings_norm = sentence_embeddings_norm.cpu().detach().numpy().tolist()
return sentence_embeddings_norm
  • Insert_db
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
from pymilvus import connections, Collection
# from pdf_load import extract_text_from_pdf
from extract_embedding import get_embedding
import json
import os

def get_all_files_in_folder(folder_path):
file_paths = []

for root, dirs, files in os.walk(folder_path):
for file in files:
file_paths.append(os.path.join(root, file))

return file_paths

folder_path = "D:/qiyegpt/knowledge_augment/data"
file_paths = get_all_files_in_folder(folder_path)


connections.connect("default", host="localhost", port="19530", user='root', password='Milvus')
collection_name = 'qiye_gpt'
collection = Collection(collection_name)

# def InsertDb_pdf(filepath):
# docs = extract_text_from_pdf(filepath)
# #sentences = [i.page_content for i in docs]
# embeddings = get_embedding(docs, is_insert=True, batch_size=10)
# mr = collection.insert([embeddings,docs])
# collection.flush()
# print('提取特征并存放ai数据库', mr.succ_count)


# def Insert_json(filepath):
# with open(filepath, "r", encoding="utf-8") as f:
# data = json.load(f)
# temp = []
# for i in data:
# if len(i["content"]) != 0:
# temp.append(i["name"] + ":" + i["content"])
# for j in i["subsections"]:
# if j["name"] in i["name"]:
# temp.append(j["name"] + ":" + j["content"])
# else:
# temp.append(i["name"] + j["name"] + ":" + j["content"])
# embeddings = get_embedding(temp, is_insert=True, batch_size=10)
# mr = collection.insert([embeddings, temp])
# collection.flush()
# print('提取特征并存放ai数据库', mr.succ_count)


def Insert_json(filepath):
with open(filepath, "r", encoding="utf-8") as f:
data = json.load(f)

name = []
content = []
for i in data:
name.append(i["name"])
if len(i["content"]) > 1500:
content.append((i["name"] + ":" + i["content"][:1500]))
else:
content.append(i["name"] + ":" + i["content"])
embeddings = get_embedding(name, is_insert=True, batch_size=10)
mr = collection.insert([embeddings, content])
collection.flush()
print('提取特征并存放ai数据库', mr.succ_count)


# InsertDb_pdf('中科大不完全入学指南.pdf')
for path in file_paths:
Insert_json(path)

print('读取完成')
  • newapp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
from transformers import AutoModel, AutoTokenizer
import streamlit as st
from streamlit_chat import message
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility
from search_db import GeneratePrompt
import torch

connections.connect("default", host="localhost", port="19530", user='root', password='Milvus')
collection_name = 'qiye_gpt'
collection = Collection(collection_name)

st.set_page_config(
page_title="企业gpt 演示",
page_icon=":robot:",
layout='wide'
)

def alter_format(string):
quanjiao=[ '`', '”', '’', '“', '‘', '_', '-',
'~', '=', '+', '|', '(', ')', '[', ']', '【', '】', '{', '}', '<', '>', ',', ';', '!', '^', '%',
'#', '@', '$', '&', '?', '*']
banjiao=['`', '"', "'", '"', "'", '_', '-',
'~', '=', '+', '|', '(', ')', '[', ']', '[', ']', '{', '}', '<', '>', ',', ';', '!', '^', '%',
'#', '@', '$', '&', '?', '*']
for i in range(len(string)):
for j in range(len(banjiao)):
if string[i]==banjiao[j]:
string = string.replace(string[i], quanjiao[j], 1)
break
return string

@st.cache_resource
def get_model():
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(
"D:/qiyegpt/knowledge_augment/model/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained(
"D:/qiyegpt/knowledge_augment/model/chatglm2-6b", trust_remote_code=True) \
.half().quantize(4).to(device)
model = model.eval()
return tokenizer, model


def new_click():
st.session_state['show_history'] = []
st.session_state['input_history'] = []
st.session_state['past_key_values'] = None


tokenizer, model = get_model()

st.title("企业-GPT")

max_length = st.sidebar.slider(
'max_length', 0, 32768, 8192, step=1
)
# top_p = st.sidebar.slider(
# 'top_p', 0.0, 1.0, 0.8, step=0.01
# )
top_k = st.sidebar.slider(
'top_k', 0, 5, 1, step=1
)
temperature = st.sidebar.slider(
'temperature', 0.0, 1.0, 0.8, step=0.01
)

if 'show_history' not in st.session_state:
st.session_state.show_history = []

if 'input_history' not in st.session_state:
st.session_state.input_history = []

if 'past_key_values' not in st.session_state:
st.session_state.past_key_values = None

button2 = st.button("重新发起对话", key="delete")
if button2:
new_click()


for i, (query, response) in enumerate(st.session_state.show_history):
with st.chat_message(name="user", avatar="user"):
st.markdown(query)
with st.chat_message(name="assistant", avatar="assistant"):
st.markdown(response)

with st.chat_message(name="user", avatar="user"):
input_placeholder = st.empty()
with st.chat_message(name="assistant", avatar="assistant"):
message_placeholder = st.empty()

prompt_text = st.text_area(label="用户输入",
height=100,
placeholder="请在这儿输入您的问题")



button1 = st.button("发送", key="predict")

if button1:
input_placeholder.markdown(prompt_text)
input_history, show_history, past_key_values = st.session_state.input_history, st.session_state.show_history,\
st.session_state.past_key_values
if len(show_history) == 0:
query = GeneratePrompt(prompt_text, top_k)
print(query)
else:
query = prompt_text
print(query)
i = 0
top_p = 0.8
# for response, input_history, past_key_values in model.stream_chat(tokenizer, query, input_history,
# past_key_values=past_key_values,
# max_length=max_length, top_p=top_p,
# temperature=temperature,
# return_past_key_values=True):
out = model.stream_chat(tokenizer, query, input_history,
past_key_values=past_key_values,
max_length=max_length, top_p=top_p,
temperature=temperature,
return_past_key_values=True)

for response, input_history, past_key_values in out:
response = alter_format(response)
message_placeholder.markdown(response)
i += 1
# print(type(past_key_values))
# print(past_key_values)
st.session_state.input_history = input_history
st.session_state.show_history = show_history + [(prompt_text, input_history[-1][1])]
st.session_state.past_key_values = past_key_values
  • Requirements
1
2
3
4
5
6
7
8
cpm_kernels==1.0.11
pandas==2.0.3
pymilvus==2.2.14
sentencepiece==0.1.99
streamlit==1.25.0
streamlit_chat==0.1.1
torch==2.0.1+cu117
transformers==4.31.0
  • Search Db
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
from extract_embedding import get_embedding
from pymilvus import connections, Collection

connections.connect("default", host="localhost", port="19530", user='root', password='Milvus')
collection_name = 'qiye_gpt'
collection = Collection(collection_name)

def SearchDb(query_str:str, topk=5):
#输入问题,ai数据库中返回topk个相近的答案
collection.load()
search_params = {"metric_type": "IP", "params": {"nprobe": 64}}
embedding = get_embedding(query_str,is_insert=False)
results = collection.search(
data=[embedding],
anns_field="embedding",
param=search_params,
limit=topk,
output_fields=["text"]
)
res = []
for hits in results:
for hit in hits:
score = round(hit.distance, 3)
if score >= 0.2:
res.append({
"score": score,
"text": hit.entity.get('text')
})
print("Successfully searched similar texts!")
return res

def GeneratePrompt(query: str,topk):
# 基于上下文的prompt模版,请务必保留"{question}"和"{context}"
related_docs = SearchDb(query, topk)
if len(related_docs)>0:
PROMPT_TEMPLATE = """已知信息:
{context}

上述各条信息之间不存在关联,根据上述已知信息,来回答用户的问题。如果上述信息没提到相关问题,请拒绝回答。问题是:{question}"""
# context = "\n".join([doc['text'] for doc in related_docs])
# prompt = PROMPT_TEMPLATE.replace("{question}", query).replace("{context}", context)
context = ""
for i, doc in enumerate(related_docs):
context += "\n信息" + str(i + 1) +":"+ doc["text"]
prompt = PROMPT_TEMPLATE.replace("{question}", query).replace("{context}", context)

else:
prompt = query
print(prompt)
return prompt

Day 4

Words and phrases for Day 4

Word Example
manipulate He manipulates people.
contend It is time, once again, to contend with racism.
refrain Mrs Hardie refrained from making any comment.
mediate My mom was the one who mediated between Zelda and her mom.
conform Conform to sth.
  • keep company with
  • compare… with…
  • compare… to…
  • by comparison
  • comply with
  • concentrate on
  • at the conclusion of
  • condemn sb. to
  • in that
  • now that
  • for all that
  • confess to
  • confide in
  • in confidence
  • confine to
  • be confront with
  • consent to

Listening for Day 4

A new study has found a positive corlation between how much television children whatch and their parents’ stress levels. Why? Because the more television children watch the more they are exposed to advertizing. The more advertizement they see, the more likely they are to insist on purchesing items when they go with their parents to stores. This could generate conflict if the parents refuse. All that reshercher say can contribute to parents overall stress levels. Whats the solution? Perhaps the most obvius is to curtenling screen time. Conmercial content is there for a reason to a lisen purchasing behavier. So parents might want to shut off the TV. Researchers conseed that this is easier said than done. So they suggest another option. Parents can change how they talk to their kids about purchases. The researchers suggest that parents seek inpact from their children on family purchasing decisions. They shouldn’t try to control all purchases. Instead, parents might tell their children things like “I will listen to your advise on some certain products or brands.” This type of conmunication the reserchers assert can lead children making fewer purchasing demends. That means less parents stress. However the protective of affect of this kind of conmunication deminishis with greater exposure to television. This is because advertizing inda children is a specialy persuasive. Advertizers use a sortment of technics such as bright colors, happy music and cerleberty indosment to appeal to children. Plus, children don’t have the cogneti ability to fully understand advertizing is intent. That makes them perticularly vulnerable to advertisements.

Reading for Day 4

  • sought-after
  • keep up
  • constraint
  • enrollment
  • bill
  • university admission
  • amended
  • admission purpose
  • well-rounded
  • correlate
  • mutual respect
  • resume

Linear Algebra for Day 4

Day 5

Words and phrases for Day 5

Word Expamles

Listening for Day 5

Reading for Day 5

Deep Learning for Day 5