文章詳目資料

資訊電子學刊

  • 加入收藏
  • 下載文章
篇名 以文本挖掘為基礎之企業風險評分模型的研究
卷期 10:1
並列篇名 THE STUDY OF CORPORATE RISK SCORING MODEL BASED ON TEXT MINING
作者 王立天顏秀珍王磊李御璽
頁次 133-148
關鍵字 文本挖掘雜訊過濾分類模型風險詞Text MiningNoise FilteringClassification ModelRisk Words
出刊日期 202207

中文摘要

銀行的主要收益來源就是用貸款給客戶收取利息,客戶分為企業和個人,其中企業所需要的貸款金額非常巨大,如果企業破產或者是資金鏈斷開導致還不上貸款,對銀行的損失會非常巨大。因此在以前的時候,會請資深的分析師對於企業狀況進行分析,判斷企業未來的發展趨勢,從而輔助處理貸款問題。在大數據時代,銀行利用了企業的數據對企業建立了風險評分系統,風險評分系統通過對過往的數據,比如企業過去的經營狀況,來分析判斷企業的未來發展趨勢,從而輔助銀行處理貸款問題,儘量避免因企業出現問題而無法還貸款從而導致銀行損失的問題。現存的研究企業未來走向的方法,它的數據集都是大多是用企業過去幾個季度的財報,或者是企業公佈的一些資訊,這些數據集都有各自的弊端,財報的時效性不高,在企業的經營過程中很容易出現突發狀況,以財報為數據集的評分系統沒辦法預測到這些突發狀況;企業公佈的一些訊息,帶有企業主觀的情感,會放大自己的優點,縮小自己的缺點,不夠客觀。本研究預計以近兩年的新聞資訊作為數據源,新聞資訊來自各大媒體論壇雜志,優點是時效性強、數據量大,同時比較客觀。利用了文本挖掘技術,先對非結構化的新聞文本進行斷詞轉化為結構化的數據,然後進行雜訊去除和風險評分兩個大階段。在雜訊去除階段,將是新聞文本否和企業風險有關作為目標屬性,再利用權重的計算方法計算每個文本中風險詞的權重,建立特徵向量,利用分類模型做訓練和預測,再利用效能分析判斷出好的雜訊去除模型,對數據集進行過濾。過濾完以後新的數據集,將企業有無風險作為目標屬性,再次使用權重的計算方式,建立新的特徵向量,利用分類模型做訓練和預測,最後做風險評分的效能分析,最後得到一個企業風險評分模型。用過濾雜訊的數據集作訓練之後的模型預測能力會大大的提升,從而改善了風險評分模型,幫助銀行規避風險。

英文摘要

The bank's main source of income is to charge interest on loans to customers. Customers are divided into companies and individuals. The amount of loans required by the company is very huge. If the company goes bankrupt or the capital chain is disconnected, it will cause huge losses to the bank. Therefore, in the past, senior analysts were invited to analyze the company's situation and judge the company's future development trend, so as to assist in dealing with the loan problem. In the era of big data, the bank has used the company's data to establish a risk scoring system for the company. The risk scoring system analyzes and judges the company's future development trend by analyzing the past data, such as the company's past operating conditions, so as to assist the bank in dealing with the loan problem and try to avoid the problem that the company is unable to repay the loan due to the company's problems, resulting in bank losses. The existing methods to study the future direction of the company are mostly based on the company's financial reports in the past few quarters or some information published by the company. These data sets have their own disadvantages. The timeliness of financial reports is not high, and it is easy to have emergencies in the company's business process. The scoring system based on the financial report data set can not predict these emergencies; Some of the information released by the company, with the company's subjective feelings, will amplify their own advantages and narrow their own shortcomings, which is not objective enough. This study is expected to use the news information in recent two years as the data source. The news information comes from major media forums and magazines. The advantages are strong timeliness, large amount of data and more objective. Using the text mining technology, the unstructured news text is segmented into structured data, and then the noise removal and risk scoring are carried out. In the noise removal stage, take whether the news text is related to the company risk as the target attribute, then use the weight calculation method to calculate the weight of risk words in each text, establish the feature vector, use the classification model for training and prediction, and then use the efficiency analysis to judge a good noise removal model to filter the data set. After filtering, the new data set takes the risk of the company as the target attribute, uses the calculation method of weight again, establishes a new feature vector, uses the classification model for training and prediction, finally makes the efficiency analysis of risk scoring, and finally obtains a company risk scoring model. After training with the noise filtered data set, the prediction ability of the model will be greatly improved, which improves the risk scoring model and helps banks avoid risks.

相關文獻