The diagnostic conundrum in non-alcoholic fatty liver disease

Non-alcoholic fatty liver disease (NAFLD) has become the most common liver alteration worldwide. It encompasses a spectrum of disorders that range from simple steatosis to a progressive form, defined non-alcoholic steatohepatitis (NASH), that can lead to advanced fibrosis and eventually cirrhosis and hepatocellular carcinoma. On liver histology, NASH is characterized by the concomitant presence of significant fat accumulation and inflammatory reaction with hepatocellular injury. Until now, liver biopsy is still required to differentiate simple steatosis from NASH and evaluate the degree of liver fibrosis. Unfortunately, this technique has well-known limitations, including invasiveness and expensiveness. Moreover, it may be biased by sampling error and intraor inter-observed variability. Furthermore, due to the increasing prevalence of NAFLD worldwide, to program a systematic screening with liver biopsy is not imaginable. In recent years, different techniques were developed and validated with the aim of non-invasively identifying NASH and assess liver fibrosis degrees. The non-invasive tests range from simple blood-tests analyses to composite scores and complex imaging techniques. Nevertheless, even if they could represent cost-effective strategies for diagnosing NASH, advanced fibrosis and cirrhosis, their accuracy and consequent usefulness are to be discussed. With this aim, in this review the authors summarize the current state of non-invasive assessment of NAFLD. In particular, in addition to the well-established tests, the authors describe the future perspectives in this field, reporting the latest tests based on OMICS, gut-miocrobioma and micro-RNAs. Finally, the authors provide an accurate assessment of how these non-invasive tools perform in clinical practice depending on the clinical context, with the aim of giving the clinicians a useful tool to try to resolve the diagnostic conundrum of NAFLD.


Introduction
In the last two decades, non-alcoholic fatty liver disease (NAFLD) has become the most common-liver disorder worldwide, because of the burden reduction of viral hepatitis together with the increase of metabolic derangements related to the "western lifestyle" [1]. NAFLD encompasses a spectrum of liver disorders of which the starting point is the "simple steatosis" [or non-alcoholic fatty liver (NAFL)], defined as the presence of a significant fat accumulation in the liver (> 5% of hepatocytes), developed in absence of secondary causes (i.e. an "unsafe" quantity of alcohol consumption, medications or heritable conditions) [2]. The concomitant presence of an inflammatory reaction with hepatocellular injury defines the condition of non-alcoholic steatohepatitis (NASH), which is an evolutionary disease that could lead to advanced fibrosis (AF) and eventually cirrhosis and hepatocellular carcinoma (HCC) [3]. Alarmingly, the possibility of an HCC development in NASH patients has been reported also in the absence of cirrhosis [4].
It has been estimated that NAFLD has a prevalence of about 25% in the world adult population, reaching 75% in obese individuals and even more in patients with type 2 diabetes mellitus (T2DM). This estimate has been reported by several studies, based mostly on imaging techniques [ultrasound (US), computed tomography, and magnetic resonance imaging (MRI)/spectroscopy] [5,6].
Evaluating the prevalence of NASH on general population is problematic, because its ideal diagnosis should require a liver biopsy which is, obviously, not routinely performed. However, in studies on biopsyproven NAFLD patients, in about 20% of the cases a NASH histology has been demonstrated [7]. Furthermore, in case a series of liver biopsies performed on healthy subjects (living donors for liver transplantations or volunteers), NASH was found in about 1.5-15% of subjects [8,9]. Therefore, on the basis of these evidences, a NASH prevalence on general population of 3-6% could be indirectly estimated [2]. Nevertheless, it has to be pointed out that, if we analyze only specialized tertiary centers treating liver diseases, the estimates of the prevalence of NASH in NAFLD patients may be as high as 90.4% [10].
Even if the most common cause of death among NAFLD patients is represented by cardiovascular disease (CVD), independently of presence of other metabolic comorbidities, NAFLD itself is becoming a major cause of liver-related mortality [11]. Patients with both simple steatosis or NASH may develop progressive liver fibrosis, but only NASH patients show a higher risk of rapid progression in AF [12]. Because of the close association with T2DM and obesity, it has been estimated that the prevalence of NASH will increase, causing a significant clinical and economic impact and poorer patient-reported outcomes [13]. In fact, at the moment, NASH is the first indication for liver transplantation in women and, probably, it will become soon the leading indication also in men, overtaking alcoholic liver disease [14]. Furthermore, in NASH liver transplant patients, it has been shown an increasing trend of the prevalence of HCC, higher than that for any other etiology [15].
Although international guidelines on this issue encourage an early identification of NAFLD patients, this disease has a completely asymptomatic course, especially in the initial stages. In this way, often the diagnosis happens incidentally and, unfortunately, often in an advanced stage. The routine radiological examinations, such as abdomen US, could easily detect this liver disorder but cannot satisfactorily discriminate NASH or liver fibrosis degrees [16].
The gold standard for NAFL and NASH discrimination still remains liver biopsy, but this technique is expensive and invasive, and it may be biased by sampling error and intra-observer and inter-observer variability [17]. Furthermore, the utility of biopsy is still controversial as no NAFLD-specific treatments have yet been approved, and the only generally recommended interventions for this condition are lifestyle modifications, regardless of the presence of a NASH or a simple steatosis [18]. In a recent multi-country preference study, conducted among 121 physicians managing NASH patients, it is reported that in about 57% of cases they are reluctant to perform confirmatory liver biopsies due to these limited therapeutic options together with patient refusal, despite the fact that 81% of them reported performing liver biopsy [19].
In recent years, increasing attention has been given to "NAFLD biomarkers" and several non-invasive diagnostic methods have been developed, ranging from simple blood-based tests to composite scores and complex imaging techniques, with the aim of precisely, and non-invasively, identifying NAFLD and assessing liver fibrosis degrees [20]. The application of non-invasive tests for NAFLD is aimed at the discrimination of simple steatosis from steatohepatitis, given their different prognosis, and the evaluation of fibrosis, in order to guide the management decisions [1,21].
The objective of this review is to evaluate the available data on this topic and analyze the reliability of non-invasive tests for NAFLD, in order to offer a guide to the clinicians evaluating NAFLD patients to help them in untangling the NAFLD diagnostic conundrum.

The clinical problem
Who should we screen?
Identifying NAFLD or NASH may be a complex challenge to face, as these diseases almost always have an asymptomatic course [22]. The symptoms usually reported are completely non-specific, as fatigue or vague abdominal pain and, if cirrhosis has not yet been developed, a physical examination is typically unrevealing. Very often liver US is the imagining exam that incidentally identifies a hepatic steatosis, most of the time in patients with normal liver function tests, but also, occasionally, in subjects that are already in an advanced stage [23].
NAFLD is strongly associated with features of metabolic syndrome (MetS) (obesity, dyslipidemia, T2DM, and hypertension), and vice-versa these conditions increase the risk of developing NAFLD [24]. In fact, NAFLD is often referred to as the "hepatic manifestation" of MetS, even though it has been demonstrated that NAFLD may precede the development of an overt MetS or T2DM [6]. In a meta-analysis on 20 prospective studies (117, 020 patients followed for a median time of 5 years), it has been shown that the presence of NAFLD, diagnosed by either liver enzymes or ultrasonography, is associated with a relative risk of incident T2DM or MetS ranging between 1.58-1.97 and 1.8-3.2 respectively [25]. Similarly, in a more recent metaanalysis on 19 observational studies (296, 439 individuals), a higher risk of incident diabetes (hazard ratio 2.2) has been shown in patients with NAFLD compared to those without NAFLD [26]. However, the pathophysiologic relationship between NAFLD and insulin resistance is still not completely clear, as demonstrated by the fact that there are epidemiological studies showing that NAFLD is not invariably associated with MetS [27]. Moreover, several evidences showed an association between NAFLD and other extra-hepatic manifestations (endocrinopathies, osteoporosis, polycystic ovarian syndrome, psoriasis, and sleep apnea) [11]. Furthermore, NAFLD has been independently associated with fatal/non-fatal CVD and arrhythmic complications, including atrial fibrillation [28].
As an evidence of the close correlation between NAFLD and MetS or T2DM, it has been shown that the coexistence of these two last conditions increases the risk of NASH and, consequently, worsens liver fibrosis [16]. As a proof, in a study on 118 biopsy-proven NAFLD patients, insulin resistance has been associated with the whole pathological spectrum of NAFLD, notably including fibrosis which dictates the long-term prognosis of NAFLD [29,30].
It has been described that 5-20% of patients with NAFLD are neither overweight nor obese, defining the so-called metabolically obese normal-weight individuals, who are at high risk of liver damage and cardiovascular events despite the normal weight [31]. In a recent retrospective study on 1, 000 biopsyproven NAFLD patients, it has been reported that the subjects with a body mass index (BMI) > 25 kg/m 2 have no differences in histological disease severity compared to patients with a BMI < 25 kg/m 2 [32].
Therefore, it is difficult to define which patient deserves an in-depth screening for NAFLD. A recent costeffectiveness analysis on the utility of screening for NASH among diabetic patients showed an improvement in liver-related outcomes, but a lack of cost-effectiveness due to the side effects of the selected treatment (pioglitazone) [33]. However, the study suggested that the screening for NASH may be cost-effective as soon as it will be available specific medications with milder side effects [33]. Thus, the recommendations of the international guidelines do not advise a routine screening for NAFLD in high-risk populations (diabetes, obesity et cetera), but only in patients with symptoms or signs attributable to liver disease, abnormal liver biochemistry or incidental detection of hepatic steatosis on imaging. These last subjects should be evaluated as "suspect NAFLD" and assessed for liver fibrosis, metabolic risk factors or alternate causes for fatty liver (alcohol, medications et cetera) [2,18]. Moreover, if a NAFLD diagnosis is made, a systematic screening of family members of the diagnosed subjects is not currently recommended, despite several evidences from studies on twin's cohorts which suggest a familial clustering of NAFLD [34].
A new definition for NAFLD has been recently suggested, to better highlight the close relationship with MetS, namely the "metabolic associated fatty liver disease (MAFLD)", proposing also new "positive" criteria for diagnosis (overweight/obesity, T2DM or evidences of metabolic dysregulation), regardless of alcohol consumption or other concomitant liver disease [35] (Figure 1). These new diagnostic criteria would allow recognizing straightforwardly the metabolic liver changes coexisting with other conditions, but also clarifying the most appropriate method of identifying NAFLD patients [35]. However, further clear recommendations of the international guidelines are necessary in order to establish the appropriate screening strategies, following this "update". A proposed diagnostic and subsequent follow-up flowchart, which is based on the European Association for the Study of the Liver (EASL), American Association for the Study of Liver Diseases (AASLD) guidelines and clinical experiences, is presented in Figure 2. Once a NAFLD has been identified and other coexisting conditions of chronic liver disease (CLD) excluded, an assessment of the presence of NASH and the degree of liver fibrosis is required. As previously mentioned, the gold-standard method to make a diagnosis of NASH is by performing liver biopsy. Nevertheless, several noninvasive tests, based on imaging and biochemistry, have been proposed and evaluated to differentiate NASH from simple steatosis and to graduate liver fibrosis.

Biomarkers for NASH diagnosis
Several biomarkers, ranging from clinical parameters (gender, age, BMI, diabetes), liver enzymes (transaminases, bilirubin, ferritin), metabolic [insulin, homeostatic model assessment of insulin resistance, (HOMA-IR)], or lipid [cholesterol, triglycerides (TG)] markers, have been investigated in their accuracy to predict NASH. Predictably, given the complex pathogenesis of this disorder (including inflammation, oxidative stress, apoptosis, lipid and glucose metabolism), a single biomarker is not able, per se, to satisfactorily discriminate between NAFLD and NASH, therefore more complex models and scores based on multiple variables have been developed.
Several studies have shown that high values of ALT and AST are associated with higher risk of NASH, but, unfortunately, also normal values of transaminases were found in patients with NASH and/or advanced liver fibrosis [36]. In an Italian study on 458 NAFLD biopsy-proven patients, NASH was diagnosed in 59% of patients with normal ALT [37]. Moreover, it has been shown that NAFLD patients with normal aminotransferase levels are characterized by a prevalence of AF similar to that found in patients with elevated aminotransferase [38]. In fact, the conventional ALT cutoff level has shown only 72% sensitivity (Se) and 51% specificity (Sp) for the diagnosis of NASH [39]. Also, GGT and alkaline phosphatase were proposed to diagnose NASH, but their use alone has not shown an acceptable diagnostic reliability [40][41][42].
Therefore, in order to differentiate NASH from NAFL by means of routine serum biomarkers, different predictive models have been developed, in which multiple biomarkers were combined to improve the diagnostic accuracy [43,44]. The HAIR score has been proposed, it is determined using 3 parameters (hypertension, ALT, insulin resistance), and in its presenting study, showed an Se 80% and an Sp of 89%, with an area under the receiver operating curve (AUROC) of 0.9, for NASH [45]. Despite these first promising results, subsequently, it revealed an underperformance in other populations, for example in diabetic patients, reaching unsatisfactory AUROCs [46]. The FT and Acti-Test (AT), composed of fibrosis indexes (alfa2macroglobulin, apolipoprotein A1, haptoglobin, total bilirubin and GGT) and necro-inflammatory indexes (FT plus ALT), were initially patented for the assessment of liver fibrosis, showing an AUROC of 0.88 [47]. Subsequently, in order to identify NASH, the "NashTest" (combing FT-AT plus weight, height, AST, glucose, TG, cholesterol adjusted for age and gender) has been developed with an estimated AUROC of 0.79 [48]. While NashTest and FT were also validated in population of patients with severe obesity and hyperlipidemia, maintaining a reasonable reliability for diagnosis of NASH [49,50], in patients with type 2 diabetes FT-AT and NashTest underperformed, showing a lower AUROC of about 0.7 [51]. undergo to follow-up at 2-3 years. In order to exclude also F2 patients, in general populations settings, a FIB-4 cutoff < 1, able to rule out any fibrosis (F0 vs. F1-4), was recently proposed. The identified patients with intermediate (FIB-4 1.3-3.25, NFS 1.45-0.67) or high risk (FIB-4 > 3.2 or NFS > 0.67) of AF should be referred to a tertiary center for a "second-line evaluation" with more specific non-invasive tests. In patients with intermediate risk, patented serum biomarkers [FibroTest (FT), FibroMeter, Hepascore, ELF test] could be considered to discriminate low-risk patients with good diagnostic accuracy (NPV > 90%), when (and if) it should be possible to obtain the funds to bear the costs. TE is the most widely available and best validated technique. At values less than 8 kPa, measured by M-probe, it allows excluding intermediate or high-risk patients with high accuracy (NPV 95-100%). XL-probe may reduce the TE failure rate, especially in patients with a skin-liver capsule distance > 25 mm. Other techniques as 2D-SWE and ARFI or MRE may be considered according to local availability (particularly in obese patients, with BMI > 35 kg/m 2 ). In patients with intermediate (8-10 kPa) or high risk (> 10 kPa, PPV 47-70%) for AF a further assessment by liver biopsy should be considered. Lifestyle modification and exercise should be suggested to all patients with NAFLD. Patients with AF or cirrhosis should be also screened for esophageal varices and HCC. In addition, the eligibility for therapeutic trials should be taken into consideration. * addressed to rule out any fibrosis (F0 vs. F1-4); ** patented test; ALT: alanine transaminase; AST: aspartate transaminase; FIB-4: fibrosis 4 index; ELF: enhanced liver fibrosis; GGT: gamma-glutamil transpeptidase; NFS: NAFLD fibrosis score; NPV: negative predictive value; TE: transient elastography; PPV: positive predictive value; 2D-SWE: two-dimensional shear wave elastography; ARFI: acoustic radiation force impulse; MRE: magnetic resonance elastography Interleukin-6 (IL-6) has been investigated as a putative marker of NASH. IL-6 is an inflammatory cytokine that may increase in NASH (and in other inflammatory conditions) but it is also involved in anti-inflammatory activity and in metabolic or regenerative processes [52]. The use of IL-6 to identify NASH was evaluated alone and within a predictive model together with other biomarkers, showing a good AUROC ranging from 0.79 to 0.9 [53,54].
Elevated circulating levels of adiponectin were associated with elevated values of other inflammatory cytokines (IL-6, TNF-a) and with the presence of metabolic diseases (obesity, insulin resistance, dyslipidemia) [42]. Thus, adiponectin was evaluated in order to differentiate NASH and NAFL in a panel including also leptin and ghrelin, showing an AUROC of 0.79 [55].
Cytokeratin 18 (CK18) is a marker of cell death, the major hepatic intermediate filament protein cleaved by caspases during hepatocyte's apoptosis, and his usefulness for NASH diagnosis was studied for the cleaved (CK18-M30) or intact form (CK18-M65). CK18-M30 has been extensively studied and, in two meta-analyses, it was reported as an Se of 66-78% and Sp of 82-87%, with a pooled AUROC of 0.82 for discriminating NASH [56][57][58]. The use of the CK18 was also evaluated within combinatorial models including clinical parameters (diabetes, gender, BMI) [59,60], routine blood markers [ALT, platelets (PLT), TG] [61], adipocytokines and the combination of two isoforms of CK-18 [62]. Recently, in a study on 345 biopsy-proven NAFLD patients, the use in combination CK18-M30 and Golgi protein 73 (G-NASH model) has shown good accuracy for predicting NASH in patients with persistent normal ALT, with an AUROC of 0.84 [63]. Despite the fact that CK-18 has been widely validated, its testing is still not commercially available.
Mac-2 binding protein (Mac2bp) and fucosylated haptoglobin (Fuc-Hpt) (a glycoprotein which is undetectable in non-fibrotic liver) were also studied as potential markers for NASH in a prediction model that demonstrated an AUROC of 0.854 on a training cohort of 124 biopsy-proven NAFLD, and of 0.844 on a validation cohort of 382 patients [64].
Recently, a large multi-analysis on 122 studies on non-invasive tests for NASH vs. NAFL has evaluated the pooled Se and Sp of every single marker proposed. None of these showed both Se or Sp >80% [65]. In particularly, it has been evaluated the pooled Se and Sp of ALT on 8 studies (63.5%; 74.4%), AST on 5 studies (76.9%; 61.9%), IL-6 on 3 studies (60.6%; 83.9%), adiponectin on 4 studies (72%; 73%), CK18 -M30 on 15 studies (68.4%; 74.2%), CK18-M60 on 5 studies (73.2%; 73.7%), and Mac2bp on 2 studies (67%; 79%) [65]. Based on these findings, the authors concluded that none of these single markers or scoring systems can be recommended to differentiate NASH from NAFL [65]. Furthermore, in a recent editorial, it was pointed out that, more than the identification of NASH, the assessment of the degree of fibrosis has the greatest clinical relevance on NAFLD patients morbidity and mortality [66]. As a matter of fact, in a meta-analysis on 5 studies, with 1, 495 NAFLD patients, the presence of fibrosis in simple steatosis was associated with a greater risk of all-cause and liver-related mortality compared to NASH without fibrosis [67], and liver fibrosis dictates the long-term course of NAFLD [68]. Moreover, recent data suggest that non-invasive serum biomarkers (such as NFS) can predict mortality and CVD risk [69,70]. Given that NAFLD/NASH "pandemic" is fueling the upsurge in CVD [71], a growing number of patients with advanced liver fibrosis will be candidates for cardiovascular therapy in the near future [72]. In this picture, non-invasive diagnostic tools could be useful in the management of NAFLD patients for stratifying liver fibrosis as well as cardiovascular risk.
In Table 1 we report all the tests aimed at NASH diagnosis together with their diagnostic accuracy.

Biochemical tests for evaluation of fibrosis
Several simple scoring systems based on standard biochemical and hematological parameters were proposed for the discrimination of AF which, in NAFLD patients, is generally considered as a grade fibrosis ≥ 3 according to Brunt or METAVIR scoring systems. AST-platelet ratio index (APRI) and FIB-4, initially proposed on HCV patients, have been subsequently validated also in NAFLD patients, showing similar reliability [73]. In particular, FIB-4, evaluating age, AST, ALT, and platelet count, is one of the best performing tests for NAFLD and showed an 80% PPV and 90% NPV for AF, with a cutoff score > 3.25 and < 1.3 respectively [74]. NFS is a non-invasive test specific to fatty liver, calculated using age, BMI, diagnosis of impaired fasting glucose or diabetes, AST/ALT ratio, albumin serum levels and PLT count, that showed a PPV of 90% in detecting a liver fibrosis > F3 with a cutoff score > 0.675 and an NPV of 93% in excluding AF with a cutoff score < -1.455 [75].
In a recent meta-analysis on 64 studies, which included 13, 046 NAFLD patients, APRI, FIB-4, NFS, and BARD scores (calculated using BMI, AST, and ALT) [76] were compared in diagnostic performance of AF. NFS and FIB-4 demonstrated the highest accuracy for ruling out AF with an NPV > 90% and AUROC of 0.84 for both [77], showing clinical usefulness to exclude AF [78]. Also in diabetic patients, FIB-4 and NFS have an acceptable clinical utility in excluding AF, using the standard cutoff of < 1.3 and < -1.455, respectively [79]. Despite NFS and FIB-4 demonstrated to be accurate enough to be useful as first-line tools to identify patients with AF, about 30% of patients fell into the intermediate-risk category, in which further analyses are needed to clarify the diagnosis [80]. Moreover, in patients aged > 65 years, FIB-4 and NFS underperformed, showing unacceptable specificity. For this reason, in a recent study on 634 biopsy-proven NAFLD patients, new thresholds (FIB-4 > 2 and NFS > 0.12) were proposed, in order to lower the false positive rate maintaining the same specificity [81]. Similarly, in young adults (< 35 years) NFS and FIB-4 showed a poor diagnostic performance, with AUROCs of < 0.53 and, then, further investigations are needed to define an appropriate cutoff for this category [82]. In addition, it has to be pointed out that the use of these tests in a primary care referral setting, allows discriminating only patients with liver fibrosis > F3. Therefore, it does not identify F2 patients, who have a lower but not negligible risk of mortality. For this reason, a cutoff of 1 has been also proposed for FIB-4, to rule out the presence of any degree of fibrosis (i.e. F0 vs. F1-4) with an AUROC of 0.843 [83].
Recently the "HEPAmet" fibrosis scoring system has been proposed, composed of clinical variables and serum markers (age, female sex, diabetes, glucose, insulin, HOMA, AST, albumin, PLT). In its validation study on 2, 452 NAFLD patients, it showed a better diagnostic performance for diagnosis of AF compared to NFS and FIB-4 (AUROC 0.85 vs. 0.80) with an Sp of 97.2%, Se of 74%, NPV of 82% and a PPV of 76.3% [84].
High diagnostic accuracy in AF identification was demonstrated by FibroMeter®, a patented panel (AST, ALT, ferritin, platelet body weight, and age) that showed an AUROC 0.94, with Se 78.9% and an Sp of 95% [85]. Recently, a sequential combination of non-invasive test, combining FibroMeter and TE, namely Fibrometer VCTE, has been proposed in order to identify advanced liver fibrosis in patients inside the intermediate-risk grey-zone between the cutoffs of NFS and FIB-4, showing a good performance with an AUROC of 0.86 [86].
Besides the tests based on indirect marker of fibrosis, other indirect methods of assessing fibrogenic activity are based on the evaluation of some components of extracellular matrix turnover, that replace hepatocytes when the ongoing liver injury exceeds the hepatic regeneration.
The FibroTest® is a commercially available panel, previously mentioned, and showed a good diagnostic performance for AF in NAFLD patients with an AUROC of 0.84 for F3-4 [87].
Moreover, a recent study on 1, 079 NAFLD patients with a median follow-up of 6 years, demonstrated that FT provides a good long-term prognostic value for survival without liver-related deaths, with an AUROC of 0.941. In this way, FT may be useful as a second-line analysis in order to rule out those subjects at low risk from further immediate evaluation [88,89].
Different tests include hyaluronic acid (HA) as marker of fibrosis, because it is synthesized by stellate cells and metabolized by sinusoidal endothelial cells. "Hepascore", combining HA and alfa2-macroglobulin with clinical variables (age, gender) and blood-based parameter (GGT, bilirubin, ALT, AST), showed a good performance to identify AF in NAFLD patients with an AUROC of 0.81 and an NPV of 92%, but a better diagnostic performance in chronic viral hepatitis [90].
The ELF score is a commercial panel that evaluates, in addition to HA, tissue inhibitor of matrix metalloproteinase 1 (TIMP-1), and PIIINP, which, just as a single biomarker, provided a good reliability for diagnosis of cirrhosis with NPV of 95% [91]. The ELF score demonstrated a good performance with an AUROC of 0.9 using the threshold of 0.35. However, in a recent meta-analysis on 11 studies, it was found to have a limited specificity in excluding AF in NAFLD patients in the context of low disease prevalence (5-10%), suggesting a re-evaluation of its threshold values [92,93].
Recently, a new serum biomarker derived from collagen III synthesis, type III collagen formation (Pro-C3), was proposed in a study on 150 biopsy-proven NAFLD patients, showing an AUROC of 0.91 with an NPV of 97% for AF identification. Subsequently, it was included in a score incorporating clinical variables, the ADAPT (age, diabetes, and platelet), and studied in a cohort of 449 patients, confirming higher diagnostic accuracy (AUROC 0.87) compared to APRI, FIB-4 and NFS [94,95].
Similarly, on a cohort of 396 NAFLD patients, an algorithm using TIMP-1, HA and alfa-2 macroglobulin was recently validated, showing an AUROC 0.86 for identification of AF [96].
Type IV collagen-7S is another fibrosis marker studied for the detection of NASH and, recently, AF. It was evaluated as a single marker of liver fibrosis, showing a good diagnostic performance with AUROC of 0.827 and NPV of 0.84 [97], but also included in two predictive models, the NAFIC score (ferritin, fasting insulin, collagen IV) and CA-fibro index (collagen IV and AST) with AUROC of 0.824 and 0.845 respectively [98,99].
Finally, a recent study showed the diagnostic accuracy of collagen IV and Mac2bp in the detection of AF in NAFLD, reporting an AUROC of 0.83 for both, but further investigations are needed for their use in clinical practice [100].
In Table 2 tests aimed at the evaluation of fibrosis in NAFLD together with their diagnostic accuracy were reported.

Imaging techniques
As mentioned above, in most of the cases liver US is the first exam that leads to the finding of hepatic steatosis. The typical US features of fatty liver are: posterior US beam attenuation, loss of echoes from the diaphragm, and loss of echoes from the walls of the portal vein [101]. Based on these features, the steatosis can also be subjectively scored as mild, moderate, and severe with a fair degree of accuracy in detection of moderatesevere fatty liver, compared to liver biopsy. This has been confirmed by a meta-analysis on 34 studies that showed a pooled sensitivity and specificity respectively of 85% and 95% [102]. Unfortunately, in clinical practice, US shows a good sensitivity only in patients with a percentage of liver fat content above 12.5-25%. Therefore, patients with a lower, but still relevant, fat liver content may be missed [103]. Furthermore, its diagnostic accuracy is reduced in patients with obesity or coexistent CLD [104]. All the international guidelines consider liver US as the first-line imaging technique for the diagnosis of fatty liver in both clinical and epidemiological settings due to its safety, cost-effectiveness, and availability. Recent data suggest that US using semi-quantitative scores can detect steatosis as low as 10% [105]. A more accurate assessment of the amount of liver fat and the degree of liver fibrosis is performed by US-based or magnetic resonancebased techniques. MRI is more accurate, but its availability is very limited due to its high costs. A US-based technique, the "controlled attenuation parameter" (CAP), might be more accurate than conventional US for detecting liver steatosis, and has also the advantage of simultaneously estimating liver fibrosis, being coupled with liver stiffness measurement (LSM) [106]. However, LSM with CAP could not be readily available outside specialized centers. No data are available on a direct comparison of CAP and semi-quantitative US scores in patients with NAFLD or other CLD.
The imaging techniques aimed at the evaluation of fibrosis in NAFLD together with their diagnostic accuracy are reported in Table 3.

MRI
Magnetic resonance is the most accurate non-invasive technique for the evaluation of hepatic steatosis, because it can directly quantify the amount of liver TG through magnetic resonance spectroscopy (MRS) or magnetic resonance imaging (MRI), by quantifying the proton density fat fraction (PDFF). MRS-PDFF, due to various technical limitations (need for expertise in protocol prescription, data collection, and spectral analysis) and the lack of spectroscopy software in routine scanners, is not commonly used, while MRI-PDFF is more routinely available [107]. PDFF is represented by the fraction of MRI-visible protons bound to fat, divided by all protons in the liver (bound to fat and water). It allows a quantitative and objective evaluation of the entire fat content in the liver and can detect as little as 3% of steatosis [108].
The accuracy of MRI-PDFF was compared to liver histology by a multi-center study reporting a high AUROC (0.95) for liver steatosis identification [109]. Moreover, as reported by single-center studies, the MRI-PDFF seems to be more sensitive than liver histology in the longitudinal assessment of liver steatosis changes over time [110]. In a secondary analysis of the multi-center phase II trial on 113 patients enrolled to obeticholic acid or placebo, before and after the 72 weeks of treatment, steatosis was measured by MRI-PDFF paired with liver histology, and showed an accurate concordance between the decline > 30% of MRI-PDFF and > 2 point of NAFLD activity score, defined as significant improvement of liver steatosis [111]. MRI-PDFF is less susceptible to sampling errors compared to liver biopsy, because it detects the total amount of TG in the whole liver, but it cannot give information on necroinflammation. In fact, even if some studies have attempted to evaluate the ability of MR-based or US-based techniques to discriminate simple steatosis from NASH, at the moment neither of them can reliably be used for this purpose [112].
On the contrary, fibrosis may be non-invasively detected by using other techniques, of which "stiffness" (or "elasticity") and its family of related parameters are the best validated on the liver. The collagen deposition associated with fibrosis confers parenchymal rigidity, which can be evaluated by assessing its stiffness. The MRE determines the liver stiffness through the analysis of microdisplacements ("shear waves") of the tissue, using a modified phase-contrast imaging sequence able to detect the propagation of the shear wave within the hepatic parenchyma [113]. The shear wave's velocity is converted in LSM that is expressed, as a final result, in meters per second or kPa. In a meta-analysis on 9 studies on a total of 232 biopsy-proven NAFLD patients, high accuracy of MRE to detect AF and cirrhosis (AUROC 0.93 and 0.92 respectively) was reported, with an optimal cutoff for AF of 3.64 kPa [113]. In another recent meta-analysis on 5 studies (628 NAFLD patients) an AUROC of 0.96 for AF detection has been estimated using the same cutoff value [77]. A further improvement of the diagnostic accuracy of the MRE has been proposed by the use of 3D technology. In fact, in a head-to-head comparison with 2D-MRE, 3D-MRE showed an AUROC of 0.96 for AF detection [114]. However, 3D-MRE is a meticulous and time-consuming exam and, among other things, it has yet to be validated through multicenter studies. MRE has a low failure rate (1-2%) and its failure is associated with massive ascites, iron deposition, or high BMI [115]. The effect of BMI is still debated, in fact, a recent study on 111 patients with a mean BMI of 40.3 kg/m 2 , the intra-observer agreement was higher with MRE than with biopsy, providing an AUROC of 0.93 for detection of AF [116].
A novel MR-based method is the multiparametric MRI (Liver-MultiScan), that proposes to measure liver steatosis and correlate it not only with fibrosis, but also with inflammation, using T1 mapping for fibrosis and inflammation, T2 mapping for liver iron quantification and MRS for liver fat quantification [117]. In a pilot proof-of-concept study on 71 patients, multiparametric MRI showed an AUROC of 0.83 for detection of hepatocyte ballooning and lobular inflammation [118]. Other interesting data on these topics derive from a preclinical study on pigs and mice, in which the addition of damping ratio to 2D-MRE and MRI-PDFF contributes to higher diagnostic accuracy for detection of both inflammation and fibrosis at early stages, even before the development of histological alterations [119]. Nevertheless, these promising results still need further investigations.

US-based imagining technique TE
Unlike MRI, the US-based elastography detect the velocity of shear wave induced by US on liver parenchyma in order to estimate the liver stiffness as indirect marker of fibrosis. The pioneer US-based technique is the vibrant-controlled TE, developed using a dedicated device (Fibroscan®) [120]. Initially validated in viral hepatitis patients, TE has confirmed high diagnostic accuracy in evaluating liver fibrosis also in NAFLD. In a meta-analysis on TE studies (1, 047 NAFLD patients) performed until 2013, it reported an AUROC of 0.76-0.98 for detection of Metavir-F3 and an AUROC of 0.91-0.99 for F4, respectively at the cutoff of 8-10.4 kPa and > 10.3 kPa [56]. Worse diagnostic accuracy was reported for the detection of F2 (AUROC 0.79-0.87). TE is limited by the ascites or severe obesity, as the interposition of fluids or fat between the chest wall and the liver prevents the correct determination of the shear waves. A lower rate of failure or unreliable result is determined by the use of the XL-probe, providing, as shown in a most recent meta-analysis on 19 studies (4 using XL-probe), similar diagnostic accuracy in detection of AF compared to the M-probe (AUROC 0.87 vs. 0.86) [77]. Nevertheless, it must be noted that liver stiffness may be overestimated by TE in case of high liver inflammation activity (transaminase flare), extrahepatic cholestasis, or congestive heart failure [121].
Above all, compared to other routinely available biomarker of AF (FIB-4, APRI, NFS, BARD), TE shows the highest NPV, allowing confidently excluding AF at a cutoff < 8 kPa in NAFLD patients (NPV 95-100%) [122]. Moreover, compared to the other tests, only the FibroMeter has not been shown to be less accurate than TE, and, as previously mentioned, a non-invasive test based on the combination of these two tools, namely Fibrometer VCTE, has been proposed [86,123]. Recently, TE has been evaluated in combination with other serum biomarkers, improving the non-invasive detection of hepatic fibrosis in patients with NAFLD. The use in serial combination of TE, NFS and FIB-4 in patients in the grey area of the first test or in those with high values of liver median stiffness (> 9.6 kPa) or low NFS or FIB-4 value (< -1.45 and < 1.3 respectively), increased the diagnostic performance and reduced the diagnostic uncertainty area compared to the use of these tests alone [122]. The combination of ELF test and TE has shown higher diagnostic accuracy for the diagnosis of AF (Sp 97.9%) compared to ELF test alone (Sp 90.6%) [124]. The FibroScan-AST (FAST) score is a novel model (composed of Liver Median Stiffness, controlled attenuated parameter and AST), that showed a good diagnostic performance in identifying NASH with a Nash Activity Score > 4 + F > 2 with a PPV of 0.83-0.81 and an NPV of 0.85-0.71, respectively in derivation and validation cohort [125].
Although some comparison studies on the diagnostic accuracy of MRE and TE did not show a static difference between the two techniques, a recent meta-analysis, on 230 biopsy-proven NAFLD, demonstrated an undoubted diagnostic superiority of MRE compared to TE in detection of each stage of fibrosis [126].
CAP is a novel technique for the evaluation of hepatic steatosis. Using the TE M or XL-probes, it estimates the amount of liver fat by measuring the degree of US attenuation exerted by hepatic fat, expressed as decibel per meter (dB/m) [127]. Despite the fact that the initial study described good accuracy in detection of steatosis (AUROC of 0.91, 0.95 and 0.89 for steatosis ≥ 11%, ≥ 33% and ≥ 66% respectively) [127], several subsequent studies reported lower AUROCs (0.79, 0.76 and 0.76), at the limits of statistical significance, suggesting a limitation in precisely discriminating adjacent degrees of steatosis [128,129]. A recent meta-analysis on 2, 735 patients (537 NAFLD) reported, for steatosis degrees of ≥ 11%, ≥ 33% and ≥ 66%, AUROCs of 0.82, 0.86 and 0.88 at the proposed cutoff values of 248 dB/m, 268 dB/m, and 280 dB/m respectively [130]. As TE, CAP is influenced by BMI, but comparison studies between M-and XL-probes reported conflicting results about the cutoff values [131,132]. In comparison with MRI-PDFF, CAP underperformed for the diagnosis of all grades of steatosis (AUROC 0.99 vs. 0.85) [133].
At the moment, the proposed cutoff values for ≥ S2 steatosis range from 280 dB/m to 310 dB/m with an NPV of about 70%, therefore further studies are necessary before any firm conclusion can be drawn [112].
Acoustic radiation force imaging Acoustic radiation force imaging (ARFI) elastography is a US-based technique that uses the point shear wave elastography (pSWE). Compared to TE, pSWE has the vantage to be integrated into conventional US systems, evaluating the velocity of shear waves induced by a single acoustic impulse in a small region of interest (ROI) [134]. In CLD, ARFI has demonstrated good diagnostic accuracy for detection of AF (AUROC 0.84) and cirrhosis (AUROC 0.91), as reported in a systematic meta-analysis on 36 studies, but few studies were conducted on NAFLD patients [135]. In a review on 7 studies (723 NAFLD patients), a summary AUROC of 0.89 for detection of significant fibrosis (4 > F ≥ 2) was reported, with a summary sensitivity and specificity of 80% and 85% respectively [136]. However, at the moment, data on diagnostic accuracy of AF and cirrhosis are not still available and, therefore, pSWE is not included in the current guidelines of NAFLD. In a recent study of comparison between MRE and ARFI on a cohort of 125 biopsy-proven NAFLD patients, high diagnostic accuracy of MRE for diagnosis of any fibrosis was reported, especially in obese patients [137].

2D-SWE
2D-SWE is the most novel US-based elastography technique and, it evaluates the velocity of shear waves induced by multiple acoustic impulses in a larger ROI (2 cm x 2 cm), as single image or in real-time [134]. As ARFI, it is integrated into conventional ultrasonography systems. To date, few studies have evaluated the diagnostic accuracy of 2D-SWE in NAFLD patients. In a large meta-analysis on 1, 340 patients a subgroup of 156 NAFLD patients were included. In these patients, it showed an AUROC of 0.85 and 0.91 for diagnosis of significant fibrosis and cirrhosis respectively. In another subgroup of 91 NAFLD, a significantly better performance in diagnosis of AF compared to TE (AUROC difference 12%; P = 0.003) was also reported [138]. In a comparison study with TE and ARFI on 291 NAFLD, 2D-SWE outperformed for significant fibrosis and showed a similar or slightly better diagnostic accuracy for diagnosis of AF and cirrhosis. The cutoffs reported with a sensitivity and specificity > 90% were 8.3-10.7 kPa for F3 and > 10.5 kPa for F4 [139].
Furthermore, in a recent comparison study on 62 biopsy-proven NAFLD subjects, 2D-SWE did not demonstrate lower diagnostic accuracy than MRE for detection of AF. Nevertheless, even if this technique gave promising results, it needs further validation in NAFLD settings [140].

Future directions
Several novel strategies to detect NASH and AF in NAFLD patients are based on identification of molecules by OMICS approaches (genomic, metabolomic, proteomic, and lipidomic), providing a useful framework for designing and validating highly accurate predictive models. Circulating oxidized fatty acids and products of arachidonic acid metabolism associated with NASH were identified by a lipidomic approach [141]. An excellent NASH prediction model (AUROC 1.0) composed of a panel of plasma eicosanoids and other polyunsaturated fatty acids metabolites was identified in a proof-of-concept study [142]. Moreover, recently was proposed the "NASH ClinLipMet" Score, a novel model, based on lipid, metabolites, clinical markers, and PNPLA genotype, which identifies NASH patients with high accuracy (AUROC 0.86) [143,144].
As regards the proteomic approach, several protein-based biomarkers were identified, by means of mass spectrometry, which were able to identify NASH and AF. In a biomarkers discovery study on 69 NAFLD patients and 19 obese controls, between 1, 700 serum proteins studied, 6 patterns of proteins expression were identified that showed significant changes between simple steatosis, NASH and AF (F3/F4), able to differentiate with high accuracy these groups [145]. Using the proteomic approach, a highly multiplexed protein assay, namely SOMAscan, was developed on a 443-patient training set, showing excellent diagnostic accuracy (AUROC 0.932) in identifying steatosis in patients carrying the PNPLA3 rs 738409 genotype [146]. A novel biomarker derived from the analysis of urinary steroid metabolome, was evaluated on 275 subjects (121 biopsy-proven NAFLD, 48 alcohol-related cirrhosis and 106 controls) [147]. This gas chromatographymass spectrometry approach demonstrated not only high accuracy in discriminating AF (AUROC 0.92), but also in distinguishing alcohol related from NAFLD related cirrhosis [147].
The metabolomics technique was used to differentiate metabolic subtypes of NAFLD. In a study on 90 patients (21 NASH, 38 simple steatosis, 31 controls), among 56 selected metabolites, pyroglutamate showed the higher accuracy in NASH identification (AUROC 0.88), also compared with tumor necrosis factor-alfa, IL-8, and adiponectin [148]. In a recent translational study, a NASH metabolic profile was induced in methionineadenosyltransferase-1a knockout mice, which spontaneously develops NASH, and the serum metabolomes were compared with those of 535 biopsy-proven NAFLD subjects, identifying a specific metabolomics profile that could distinguish NASH from simple steatosis [149].
The OWliver is a commercial test for diagnosis of NASH based on metabolomics profile of 20 metabolites and validated in a cohort of 467 NAFLD patients, demonstrating good diagnostic accuracy in discrimination of NASH from NAFL, with an AUROC of 0.81 [150]. In order to identify the bile acid metabolome in NASH, a small study on 22 subjects (7 NASH) showed an increase in taurine and glycine-conjugated primary and secondary bile acids, hypothesizing a role of hydrophobic and cytotoxic secondary species of these bile acids in the pathogenesis of NASH [151].
In a study on 47 biopsy-proven NASH and 13 healthy controls, the changes in glycosylation were evaluated as biomarker of liver damage. The N-glycan profile was performed and the concentration of two glycans (NGA2F and NA2) was associated with the severity of NASH [152]. The logarithm ratio of NGA2F and NA2, namely GlycoNashTest, showed good diagnostic accuracy for the detection of NASH and AF, with an AUROC of 0.74 and 0.87 respectively [152]. These results were confirmed in a subsequent validation study on 224 NAFLD patients and also in a pediatric cohort of 51 NAFLD [153,154]. These results demonstrated an increase in under-galactosylation of serum proteins during chronic liver inflammation, but further investigations are necessary for a clinical application of these evidences.
As far as genomics are concerned, two genetic variants, located in PNPLA3 and Transmembrane 6 superfamily member 2 (TM6SF2) hepatic stellate cells variant, were associated with an increased risk of NAFLD, but their accuracy in predicting disease is not higher compared to other noninvasive biomarkers [155,156]. Recently the genetic variant rs641738 C>T located in membrane bound O-acyltransferase domain-containing 7 gene (MBOAT7) was also associated with an increased risk of development and severity of NAFLD [157]. Despite the importance of the impact of these genetic variants in a complex disease like NAFLD, the effect of a single mutation is unlike to be sufficient to be clinically meaningful. In a recent study on 4, 277 patients (488 NAFLD), the diagnostic accuracy of a single nucleotide polymorphism in intronic region of interferon lambda 4, incorporated with other biomarkers (HOMA-IR, GGT, AST, ALT, PLT) of fibrosis in a model named FibroGENE DT, was evaluated for the assessment of fibrosis severity, finding an AUROC of 0.8 and 0.83 for prediction of significant fibrosis and cirrhosis respectively, with an NPV of 96% in excluding cirrhosis [158].
Another kind of approach for the assessment of NAFLD is based on metagenomics signature of GUT microbiome, by the evaluation of its composition. In a preliminary study on 86 NAFLD biopsy-proven patients, the whole GUT-microbiome genome was sequenced from stool samples, identifying 37 bacterial species that were used to construct a Random Forest classifier model with robust diagnostic accuracy for AF (AUROC 0.936) [159]. Recently, in a study on GUT-microbiome composition on 50 patients with different CLD, evaluated by Fibroscan, Prevotella copri was identified as the strongest predictive microbe for AF in NAFLD patients (AUROC 0.82) [160]. In this way, this study reported that the microbial profile of advanced CLDs is characterized by an increase in the genus of Prevotella spp and a decrease in Bacteroides, encouraging their use as non-invasive markers of liver fibrosis. Similarly, in a study on 87 NAFLD biopsy-proven children compared to 37 children without NAFLD, an abundance of Prevotella copri was also found to be associated with severe fibrosis. The study also provided a predictive model that identified F3/F4 patients with an AUROC of 0.87 and was based on the measurement of ALT together with the quantification of genes encoding flagellar biosynthesis proteins [161].
Other emerging evidences on potential NAFLD biomarkers derive from studies on circulating extracellular vesicles (EV: exosomes and ectosomes) that contain various molecules, such as proteins, microRNAs, and DNA, of which an altered expression has been demonstrated in CLDs. In a study that carried out the profile of blood EVs using flow cytometry, ectosomes were demonstrated to be increased in monocytes and natural killer T-cells and decreased in neutrophils and leuco-endothelial cells in NASH patients, thus showing that the quantification of immune cell microparticles could be a potential diagnostic strategy for differentiating NASH from simple steatosis [162]. miRNAs regulate the post-transcriptional gene expression and it has been demonstrated to contribute to NAFLD pathogenesis at various levels. miR-122 is a key regulator of hepatic fatty-acid metabolism and its levels in circulating exosomes were found upregulated in NASH in comparison to simple steatosis, correlating with histological severity [163]. miR-34a concentration was also found significantly higher in NAFLD and, in a study on 111 biopsy-proven NAFLD, demonstrated to have higher diagnostic accuracy for diagnosing NASH (AUROC 0.81) in comparison to ALT, CK-18, FIB-4, and APRI [164]. A recent meta-analysis on 37 studies evaluated the diagnostic accuracy of serum levels of miR-122 and miR-34a, showing for the discrimination between NAFL and NASH an AUROC of 0.82 and 0.78, respectively [165]. Recently, a combination of coding and non-coding RNA expression levels, derived from a whole transcriptome analysis, was evaluated as potential biomarkers both of the presence of NAFLD and severity of fibrosis. Good diagnostic accuracy was found for detection of AF when the expressions of coding RNA of transforming grow factor beta 2 (TGFB2)/ non-coding RNA of TGFB2-overlaping transcript 1 were associated with FIB-4 (AUROC 0.891) or TE (AUROC 0.892) [166]. These approaches are promising, but their application in clinical practice is still limited by the lack of standardized protocols.

Conclusions
At the moment, in the field of NAFLD a wide range of non-invasive serum and imaging biomarkers have been developed with the aims of discriminating patients with NASH and evaluating the degree of steatosis and liver fibrosis.
Regarding NASH identification, the single markers or scoring systems didn't demonstrate either acceptable sensitivity and specificity or sufficient validation. Thus, at the present time, noninvasive serumbiomarkers can't be widely recommended to differentiate NASH from simple steatosis. Moreover, also imaging techniques haven't demonstrated reliable diagnostic accuracy or a sufficient validation for NASH discrimination, although MR-based modalities (multiparametric MRI) are promising. Therefore, liver biopsy still remains the most complete exam for the assessment of the patient with NAFLD.
Nevertheless non-invasive tests can be very useful, particularly for screening purposes, and their choice should be tailored according to the clinical settings (primary health care or referral center) and clinical needs (screening, staging of fibrosis disease, and follow-up). NFS and FIB-4, simple and inexpensive, should be used as first-line tests in primary health-care settings, allowing ruling out patients without AF. Other patented tests as FT, FibroMeter, Hepascore, or ELF, even if more specific and with a higher PPV for detecting AF, have the inconvenience of being performed at a cost. Imaging techniques, US-based (TE, 2D-SWE, and ARFI) or MRN-based (MRE), are more suited for referral centers, with the aim of identifying the patients who require a final diagnosis by liver biopsy. MRI-PDFF seems to be the most accurate method for the detection and grading of steatosis, but its application, at the time, seems to be suited for the assessment and follow-up of patients included in clinical trials. On the contrary, CAP could be used to identify steatosis in large unselected population settings, but an optimal definition of its cutoffs has yet to be accepted by international guidelines. Even if growing evidences show that serum markers and LSM may identify NAFLD patients at high risk of liver-related complications, good, accurate and widely applicable biomarkers that allow not only a precise stratification of fibrosis, but also the monitoring of the disease progression and therapeutic response are still to be found.